-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Fix inner hits + aggregations concurrency bug #128036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inner hits + aggregations concurrency bug #128036
Conversation
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Hi @benchaplin, I've created a changelog YAML for you. |
...er/src/internalClusterTest/java/org/elasticsearch/search/aggregations/metrics/TopHitsIT.java
Show resolved
Hide resolved
3647f68
to
ff7d042
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM great work
Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419
Hey @benchaplin I think it makes sense to backport this fix to 9.0 as well. Thoughts? |
Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419
Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419
Ah, agreed. Thanks for catching this, I got a little mixed up with versions. |
Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419
Fork InnerHitSubContext instances before source is fetched in aggregations to prevent inter-segment race conditions. Relates to elastic#122419
Resolves #122419.
There's a concurrency bug that occurs when doing aggregations on inner hits. It can result in one of three exceptions:
java.lang.IllegalStateException: Error retrieving path
java.lang.NullPointerException: Cannot invoke "java.util.Map.get(Object)" because "this.preloadedStoredFieldValues" is null
java.lang.AssertionError: invalid decRef call: already closed
The underlying issue is that
InnerHitSubContext
is not thread safe, yet instances are shared across leaf slice search threads during an aggregation. Specifically, the race condition occurs whenInnerHitSubContext.rootId
&InnerHitSubContext.rootSource
fields are set and accessed concurrently by multiple threads.The tests I've added to
TopHitsIT
reproduce the issue. If you paste those tests into main and run them a few times you should see one of the exceptions.I've solved this by forking the
InnerHitSubContext
instances, similar to what was done here #106990.SearchExecutionContext
is at times accessed fromInnerHitSubContext
, so I also had to make sure the forkedSearchExecutionContext
was used in those cases.