Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large windowSize can trip ES circuit breaker, fail slices, crash ES nodes. #948

Closed
godber opened this issue Aug 3, 2022 · 9 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@godber
Copy link
Member

godber commented Aug 3, 2022

windowSize is defined here in the elasticsearch-reader-api, but is not externally configurable via job parameters, it seems (adding windowSize to job does not override the value). :

/**
* we should expose this because in some cases
* it might be an optimization to set this externally
*/
windowSize: number|undefined = undefined;

windowSize seems to be automatically set to match the per-index max_result_window setting from elasticsearch, and is ultimately used as the size parameter for the query:

/**
* This used verify the index.max_result_window size
* will be big enough to fix the within the requested
* slice size
*/
async getWindowSize(): Promise<number> {
const window = 'index.max_result_window';
const { index } = this.config;
const settings = await this.getSettings(index);
const matcher = indexMatcher(index);
for (const [key, configs] of Object.entries(settings)) {
if (matcher(key)) {
const defaultPath = configs.defaults[window];
const configPath = configs.settings[window];
// config goes first as it overrides an defaults
if (configPath) return toIntegerOrThrow(configPath);
if (defaultPath) return toIntegerOrThrow(defaultPath);
}
}
return this.config.size;
}

Here we can see that varying the size (windowSize) dramatically affects performance, despite the record count remaining the same (less than 50 records returned for this query):

curl -Ss "cluster/index*/_search?size=2000&track_total_hits=true" -XPOST -d '{"query":{"bool":{"must":[{"wildcard":{"_key":"aaad*"}}]}}}' -H 'content-type: application/json' | jq .took
38
curl -Ss "cluster/index*/_search?size=20000&track_total_hits=true" -XPOST -d '{"query":{"bool":{"must":[{"wildcard":{"_key":"aaad*"}}]}}}' -H 'content-type: application/json' | jq .took
45
curl -Ss "cluster/index*/_search?size=200000&track_total_hits=true" -XPOST -d '{"query":{"bool":{"must":[{"wildcard":{"_key":"aaad*"}}]}}}' -H 'content-type: application/json' | jq .took
484
curl -Ss "cluster/index*/_search?size=2000000&track_total_hits=true" -XPOST -d '{"query":{"bool":{"must":[{"wildcard":{"_key":"aaad*"}}]}}}' -H 'content-type: application/json' | jq .took
5134

Presumably we could manually set max_result_window on the indices and the Teraslice job would no longer trigger circuit breaker or crash es nodes, but ideally this would just be set on the job or more intelligently adapt to slice sizes somehow; if a slice only had 1000 records, perhaps a default windowSize of 1000*2 would be appropriate.

Maybe this issue was alluding to this problem: #13

(Thanks to @briend for this description)

@godber godber added the bug Something isn't working label Aug 3, 2022
@godber
Copy link
Member Author

godber commented Aug 3, 2022

I've discussed this with Kimbro and Brien today at some length. Our proposed solution would be, for workers, to set the URL parameter size to some multiple greater than 1 of the record count in the slice, say, count * 1.5. Then, if the number of records returned by the query equals the size parameter, it should throw those results away, log a warning message (TBD), increase size again by the same multiple. to size * 1.5, and so on. Up to some fixed number of "expansions", after which the job should error with an error on the job (message TBD).

So for example, for a slice with a count of 1000, size would be 1500, if the worker gets back 1500 records, that indicates that the result set has grown by 500, so we should increase size to 2250 and try again.

Does that seem reasonable @jsnoble ?

@godber
Copy link
Member Author

godber commented Aug 3, 2022

The other thing we noticed is that the workers appear to be querying with track_total_hits = true, which might incur extra overhead that is not put to use. We have also avoided making the workers use this track_total_hits result, because it may not actually be accurate, it may be an estimate.

@kstaken
Copy link
Member

kstaken commented Aug 3, 2022

Using the value from track_total_hits: true would be the easiest way for this to work as it would be the guide to start the second request if the first request is too small. If it is set to true it will count accurately.

Setting track_total_hits to a lower number just short circuits query evaluation once that limit is hit but here we're trying to get everything so there is no benefit to short circuiting and I would expect track_total_hits to have minimal or no impact since the query is explicitly trying to pull all the data.

We should test this to confirm of course.

godber added a commit that referenced this issue Aug 23, 2022
fetch now uses an expanded slice count as the query size

refs: #948
@godber
Copy link
Member Author

godber commented Aug 29, 2022

@briend can you updated this issue and let me know whether the changes in the Teraslice elasticsearch asset v3.1.0 resolves the original issue you saw?

@briend
Copy link

briend commented Aug 29, 2022

Yes, elasticsearch asset v3.1.0 resolves the issues; no circuit break trips in ES when using id_reader, and no noticeable changes in performance or resource usage of the teraslice workers.

@godber
Copy link
Member Author

godber commented Aug 29, 2022

Thanks Brien.

Unfortunately we have a problem in the internal Spaces QPL query use case. Given use of the fetcher in QPL it often expands the slice past the max_result_window which in turn triggers this error:

https://github.com/terascope/elasticsearch-assets/blob/master/packages/elasticsearch-asset-apis/src/elasticsearch-reader-api/ElasticsearchReaderAPI.ts#L139

Our first thought on addressing this will be to make it configurable whether that error is a hard failure or not. That way, Teraslice can still fail in this scenario, but QPL can configure itself to not fail. Not that we're happy about the situation.

@godber
Copy link
Member Author

godber commented Aug 29, 2022

So in talking through the situation with spaces queries, @kstaken asked:

"Does fetch do the expansion on the first request?"

The answer is "yes" and I think it's important that it do so, otherwise we increase the number of retried requests, which would be wasteful. We want the retry path to be a rare exception.

That being said, I also think my choice of expansion coefficient, 1.5 is pretty arbitrary and up for debate. It could even be variable.

@godber
Copy link
Member Author

godber commented Oct 13, 2022

Rather than changing the error handling as I tried here (which we can close without merging):

#957

I think it's just more straight forward if the Space Reader API just always uses the original fetch which sets the query size to max_window_size. It also skips all of the retry logic since size will be maxed out by default in that scenario.

#965

@godber
Copy link
Member Author

godber commented Nov 8, 2022

This was resolved in #965

@godber godber closed this as completed Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants