-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Temporal to startup even if Elastic Search is down #866
Enable Temporal to startup even if Elastic Search is down #866
Conversation
…fix_temporal_startup_if_es_down
|
||
// Re-enable the healthcheck after client has successfully been created. | ||
client.Stop() | ||
elastic.SetHealthcheck(true)(client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a syntax!
FYI, creating yet another env ( |
turns out that the docker "check ES being online" is not only used by all in one setup for dev. |
…nu/temporal into fix_temporal_startup_if_es_down
A customer discovered an issue where Temporal Server will fail to start up if it is using ElasticSearch, and ElasticSearch is down. There are two separate issues:
This PR provides a proper fix for the first issue and a workaround for the second issue.
For the first issue, when we create the ElasticSearch client, we disable health checks, and then re-enable the healthchecks after the client is created. This ensures we can create the client even if ES is down.
For the second issue, we added another environment variable called "$BEST_EFFORT_CREATE_ES_SCHEMA." By default, this is false (meaning if ES is down, startup will block). If the user finds their startup is blocked because of this, they can set the variable to true. With a value of true, the docker script will continue after 30 seconds if it is unable to establish a connection to Elastic Search (we could also parameterize the 30 seconds as well).
We have a task at Temporal tracking the creation of a better story for the second issue.
Tested both fixes using local development environment and auto-setup containers. Also validated that ES catches up once it is started after Temporal has already started.