Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Schema Deadlock: Split IncomingCommit into locked/unlocked parts #4313

Merged
merged 1 commit into from
Feb 28, 2024

Conversation

etiennedi
Copy link
Member

@etiennedi etiennedi commented Feb 26, 2024

What's being changed:

  • Split TxManager.IncomingCommitTransaction into three parts
  • Previously, it would hold the TxManager.Lock() for the entire duration
  • However, that lock is only designed to protect against accessing .currentTransaction
  • It is not needed for mutual exclusion of running Commit operations. They are already mutually exclusive because there can only be one Tx at a time
  • This lead to a circular kind of deadlock as described in Concurrent schema operations can lead to deadlock #4312 because the TxManager might be waiting for the SchemaManager which is in turn waiting for the TxManager under concurrent load
  • The fix now only holds the lock while accessing variables that need to be protected, but no longer holds the lock while running the commitFn which in turn locks the SchemaManager
  • fixes Concurrent schema operations can lead to deadlock #4312

Notes

The entire tx logic (and therefore both the bug and this fix) will go away with #4280 (to be released in v1.25), however, we cannot predict what update journeys are going to be like, so a fix for old versions will still come in very handy.

Review checklist

  • Documentation has been updated, if necessary. Link to changed documentation:
  • Chaos pipeline run or not necessary. Link to pipeline: not yet - will do
  • All new code is covered by tests where it is reasonable: Tested through the multi-tenancy-loadtest where it was first discovered. Another manual way to validate this is outlined in Concurrent schema operations can lead to deadlock #4312.
  • Performance tests have been run or not necessary.

Copy link

sonarcloud bot commented Feb 26, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@etiennedi
Copy link
Member Author

This PR is aimed at stable/v1.23. If accepted, we should at least release this as 1.23.x and 1.24.x, but we might also consider back-porting this to 1.22.x.

etiennedi added a commit to weaviate/weaviate-chaos-engineering that referenced this pull request Feb 26, 2024
@etiennedi etiennedi marked this pull request as ready for review February 27, 2024 07:00
@parkerduckworth parkerduckworth merged commit fd59773 into stable/v1.23 Feb 28, 2024
22 of 23 checks passed
@parkerduckworth parkerduckworth deleted the schema-deadlock-concurrency-gh-4312 branch February 28, 2024 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants