Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Temporal to startup even if Elastic Search is down #866

Merged

Conversation

mastermanu
Copy link
Member

@mastermanu mastermanu commented Oct 15, 2020

A customer discovered an issue where Temporal Server will fail to start up if it is using ElasticSearch, and ElasticSearch is down. There are two separate issues:

  1. Within the Server itself, we block Temporal from starting up if we cannot create an Elastic Search Client
  2. Within the startup of the Docker Image, we wait indefinitely until we are able to access ElasticSearch.

This PR provides a proper fix for the first issue and a workaround for the second issue.

For the first issue, when we create the ElasticSearch client, we disable health checks, and then re-enable the healthchecks after the client is created. This ensures we can create the client even if ES is down.

For the second issue, we added another environment variable called "$BEST_EFFORT_CREATE_ES_SCHEMA." By default, this is false (meaning if ES is down, startup will block). If the user finds their startup is blocked because of this, they can set the variable to true. With a value of true, the docker script will continue after 30 seconds if it is unable to establish a connection to Elastic Search (we could also parameterize the 30 seconds as well).

We have a task at Temporal tracking the creation of a better story for the second issue.

Tested both fixes using local development environment and auto-setup containers. Also validated that ES catches up once it is started after Temporal has already started.

@mastermanu mastermanu changed the title initial checkin Enable Temporal to startup even if Elastic Search is down Oct 15, 2020
docker/start.sh Outdated Show resolved Hide resolved

// Re-enable the healthcheck after client has successfully been created.
client.Stop()
elastic.SetHealthcheck(true)(client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a syntax!

@wxing1292
Copy link
Contributor

FYI, creating yet another env (BEST_EFFORT_CREATE_ES_SCHEMA) to support may not be a good idea, plus if ES cannot start during docker all in one setup, we should stop IMO.

@wxing1292
Copy link
Contributor

FYI, creating yet another env (BEST_EFFORT_CREATE_ES_SCHEMA) to support may not be a good idea, plus if ES cannot start during docker all in one setup, we should stop IMO.

turns out that the docker "check ES being online" is not only used by all in one setup for dev.
plz consider rename the BEST_EFFORT_CREATE_ES_SCHEMA env var

@mastermanu mastermanu merged commit 27965f4 into temporalio:master Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure to connect to ElasticSearch on startup should not fail startup
4 participants