Skip to content

Commit

Permalink
Enable Temporal to startup even if Elastic Search is down (#866)
Browse files Browse the repository at this point in the history
A customer discovered an issue where Temporal Server will fail to start up if it is using ElasticSearch, and ElasticSearch is down. There are two separate issues:

Within the Server itself, we block Temporal from starting up if we cannot create an Elastic Search Client
Within the startup of the Docker Image, we wait indefinitely until we are able to access ElasticSearch.
This PR provides a proper fix for the first issue and a workaround for the second issue.

For the first issue, when we create the ElasticSearch client, we disable health checks, and then re-enable the healthchecks after the client is created. This ensure we can create the client even if ES is down.

For the second issue, we added another environment variable called $ES_SCHEMA_SETUP_TIMEOUT_IN_SECONDS. By default, this is 0 (meaning if ES is down, docker start.sh script waits forever). If the user finds their startup is blocked because of this, they can set the variable to any integer value. The docker script will continue after $ES_SCHEMA_SETUP_TIMEOUT_IN_SECONDS seconds if it is unable to establish a connection to Elastic Search 

We have a task at Temporal tracking the creation of a better story for the second issue.
  • Loading branch information
mastermanu committed Oct 15, 2020
1 parent 548af47 commit 27965f4
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 3 deletions.
11 changes: 11 additions & 0 deletions common/elasticsearch/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,12 +90,23 @@ func NewClient(config *Config) (Client, error) {
client, err := elastic.NewClient(
elastic.SetURL(config.URL.String()),
elastic.SetSniff(false),

// Disable health check so we don't block client creation if ES happens to be down.
elastic.SetHealthcheck(false),

elastic.SetRetrier(elastic.NewBackoffRetrier(elastic.NewExponentialBackoff(128*time.Millisecond, 513*time.Millisecond))),
elastic.SetDecoder(&elastic.NumberDecoder{}), // critical to ensure decode of int64 won't lose precise
)

if err != nil {
return nil, err
}

// Re-enable the healthcheck after client has successfully been created.
client.Stop()
elastic.SetHealthcheck(true)(client)
client.Start()

return NewWrapperClient(client), nil
}

Expand Down
18 changes: 15 additions & 3 deletions docker/start.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ set -x

DB="${DB:-cassandra}"
ENABLE_ES="${ENABLE_ES:-false}"
ES_SCHEMA_SETUP_TIMEOUT_IN_SECONDS="${ES_SCHEMA_SETUP_TIMEOUT_IN_SECONDS:-0}"
ES_PORT="${ES_PORT:-9200}"
ES_SCHEME="${ES_SCHEME:-http}"
RF=${RF:-1}
Expand Down Expand Up @@ -119,16 +120,28 @@ wait_for_postgres() {
}


wait_for_es() {
setup_es() {
SECONDS=0

server=`echo $ES_SEEDS | awk -F ',' '{print $1}'`
URL="${ES_SCHEME}://$server:$ES_PORT"
curl -s $URL 2>&1 > /dev/null

until [ $? -eq 0 ]; do
duration=$SECONDS

if [ $ES_SCHEMA_SETUP_TIMEOUT_IN_SECONDS -gt 0 ] && [ $duration -ge $ES_SCHEMA_SETUP_TIMEOUT_IN_SECONDS ]; then
echo 'WARNING: timed out waiting for elasticsearch to start up. skipping index creation'
return;
fi

echo 'waiting for elasticsearch to start up'
sleep 1
curl -s $URL 2>&1 > /dev/null
done

echo 'elasticsearch started'
setup_es_template
}

wait_for_db() {
Expand Down Expand Up @@ -169,8 +182,7 @@ if [ "$1" = "autosetup" ]; then
fi

if [ "$ENABLE_ES" == "true" ]; then
wait_for_es
setup_es_template
setup_es
fi

exec bash /start-temporal.sh

0 comments on commit 27965f4

Please sign in to comment.