Ran into what I think is a documentation gap this morning, thought I’d share it with the interwebs to save folks time / stress / sanity.
Scenario:
- You install and confiure your Primary. All is well.
- You configure any number of other workers with other services (excluding extra gateways). All is well
- You activate one or more gateways on the workers from step 2
Result:
The workers on which you activated gateways now show themselves in a degraded state (using tabadmin status -v), and the gateway processes are stopped except on the primary.
What’s wrong?:
Nothing, actually. Evidently, we don’t actually use those gateways from step 2 until you configure an external load balancer (see the help topic “Add a Load Balancer”. I’m not 100% sure of this, but all the internal notes I’m finding seem to indirectly indicate this.
Since we aren’t spinning those gateways up, they remain stopped. And when a service is stopped on a worker, the worker will return “Degraded”. Nothing is really wrong here, though: the worker just doesn’t seem smart enough to know that the gateway is down for a good reason, and therefore complains about its health.
Nothing to see here, move along. All is well.
EDIT 25-Feb: A couple of other internal folks looked into this with me, and we figured out what was going on (why the non-primary gateways didn’t start up). Actually, they figured it out and then laughed at me and heaped well-deserved abuse upon my person.
I use SSL on my machine, and frankly forgot about that teensy-weensy fact. When you use SSL, you must copy the folder with your certs in it to each and every machine which will be hosting a gateway.
- I didn’t.
- Therefore, gateways don’t start.
- And then the workers hosting the gateways claim “Degraded”.
Another Scooby Doo mystery solved. Damn those kids!