Skip to content

Threadpool merge executor does not block aborted merges #129613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

albertzaharovits
Copy link
Contributor

@albertzaharovits albertzaharovits commented Jun 18, 2025

This PR addresses a bug where aborted merges are blocked if there's insufficient disk space.

Previously, the merge disk space estimation did not consider if the operation has been aborted when/while it was enqueued for execution. Consequently, aborted merges, for e.g. when closing a shard, were blocked if their disk space estimation was exceeding the available disk space threshold. In this case, the shard close operation would itself block.

This fix estimates a disk space budget of 0 for aborted merges, and it periodically checks if any enqueued merge tasks have been aborted (more generally, it checks if the budget estimate for any merge tasks has changed, and reorders the queue if so). This way aborted merges are prioritized and are never blocked.

Fixes #129335

@albertzaharovits albertzaharovits self-assigned this Jun 18, 2025
@albertzaharovits albertzaharovits added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Jun 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @albertzaharovits, I've created a changelog YAML for you.

@albertzaharovits albertzaharovits changed the title Update budget estimates for enqueued merge tasks Threadpool merge executor does not block aborted merges Jun 18, 2025
// update the per-element budget (these are all the elements that are using any budget)
// updates the budget of enqueued elements (and possibly reorders the priority queue)
updateBudgetOfEnqueuedElementsAndReorderQueue();
// update the budget of dequeued, but still in-use elements (these are the elements that are consuming budget)
unreleasedBudgetPerElement.replaceAll((e, v) -> budgetFunction.applyAsLong(e.element()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will also adjust the budget of running merges that have been aborted to 0. That's a bit optimistic, but I find the alternative implementation convoluted, and it's probably counter-intuitive to estimate 0 for to-be-run merges but not for already-running ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v8.19.0 v9.0.3 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] ReactiveStorageIT testScaleDuringSplitOrClone failing
3 participants