When creating a new Index, ZDB needs to wait for ES health status of "yellow" before returning control #79

eeeebbbbrrrr · 2016-01-29T23:19:38Z

The travis-ci tests occasionally fail the Postgres regression test for issue #58. After quite a bit of debugging, the failure isn't related to the changes introduced in issue #58, but instead the timing of the test is such that sometimes we try to call ZDB's _pgcount endpoint before the newly created ES index has finished migrating all its shards to the STARTED status.

After some head scratching, chatting with @nz (thanks!), and documentation reading, looks like ZDB needs to call the /_cluster/health/<index_name> endpoint with a ?wait_for_status=yellow before returning control back to Postgres. This should ensure that (at least) all the primary shards of a newly created index are actually available for use.

The text was updated successfully, but these errors were encountered:

eeeebbbbrrrr · 2016-01-29T23:22:38Z

For reference, this (https://gist.github.com/eeeebbbbrrrr/968edf5941c654f240ca) is a little shell script that can re-create the problem from the command-line

eeeebbbbrrrr · 2016-01-29T23:37:39Z

I wonder if doing this only when creating a new index is enough. I have a feeling that it could be necessary in any situation where ZDB runs a SearchRequest. Basically, if the SearchResponse indicates that the total shards != successful shards and failed shards is zero, then ?wait_for_status=yellow, and then try again.

This might be necessary in cases where a node (or the entire cluster) has been restarted and ZDB tries to query before all the indexes are at least yellow.

I'm not going to do anything about this case right now, but wanted to note that I've at least thought it could be a problem -- no reports of it yet.

nz · 2016-01-29T23:42:27Z

There are other cases where some shards report failure. Shard corruption is one, or there's syntax/query errors on multi-index searching with mapping mismatches. Individual shard timeouts are also plausible. You might end up stuck if you check the health too often :-)

eeeebbbbrrrr · 2016-01-29T23:47:07Z

Those are good points. All the more reason to hold off on doing something like this everywhere. ZDB is very good at detecting (and re-throwing) actual failures, which I suspect corruption/timeout issues would cause.

In this case of "successful shards" not being the same as "total shards", there's no actual indication of failure (ie, .getFailedShards() is zero). ES just doesn't seem to consider it a failure that all shards didn't respond if they're (at least) in an INITIALIZING state.

Issue #79

eeeebbbbrrrr · 2016-02-03T01:45:20Z

to be released in v2.6.4

eeeebbbbrrrr added a commit that referenced this issue Jan 30, 2016

resolving issue #79

5817950

eeeebbbbrrrr added bug v2.6.4 labels Feb 3, 2016

eeeebbbbrrrr self-assigned this Feb 3, 2016

eeeebbbbrrrr mentioned this issue Feb 3, 2016

Issue #79 #82

Merged

eeeebbbbrrrr added a commit that referenced this issue Feb 3, 2016

Merge pull request #82 from zombodb/issue-79

2aab553

Issue #79

eeeebbbbrrrr closed this as completed Feb 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When creating a new Index, ZDB needs to wait for ES health status of "yellow" before returning control #79

When creating a new Index, ZDB needs to wait for ES health status of "yellow" before returning control #79

eeeebbbbrrrr commented Jan 29, 2016

eeeebbbbrrrr commented Jan 29, 2016

eeeebbbbrrrr commented Jan 29, 2016

nz commented Jan 29, 2016

eeeebbbbrrrr commented Jan 29, 2016

eeeebbbbrrrr commented Feb 3, 2016

When creating a new Index, ZDB needs to wait for ES health status of "yellow" before returning control #79

When creating a new Index, ZDB needs to wait for ES health status of "yellow" before returning control #79

Comments

eeeebbbbrrrr commented Jan 29, 2016

eeeebbbbrrrr commented Jan 29, 2016

eeeebbbbrrrr commented Jan 29, 2016

nz commented Jan 29, 2016

eeeebbbbrrrr commented Jan 29, 2016

eeeebbbbrrrr commented Feb 3, 2016