-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(launch): Fix race condition in agent thread clean up #6352
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6352 +/- ##
==========================================
- Coverage 77.24% 77.21% -0.04%
==========================================
Files 387 387
Lines 44441 44437 -4
==========================================
- Hits 34329 34310 -19
- Misses 10059 10074 +15
Partials 53 53
Flags with carried forward coverage won't be shown. Click here to find out more.
|
….com/wandb/wandb into launch/fix-race-condition-on-threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one nit
And I think we might want to acquire the lock before we read the job here https://github.com/wandb/wandb/blob/4a71959e0c9d440b8fdf5567c8db9ee6ca3101ad/wandb/sdk/launch/agent/agent.py#L285C51-L285C51 just to be safe
Fixes
Description
There was a race condition between the main agent loops
update_finished
function and the within thread, thread clean up. The agent could try to access a thread that was deleted inupdate_finished
but this would lead to a KeyError and crash the whole agent.The solution is to move all thread clean up out of the main agent loop, threads all clean up after themselves now.
🤖 Generated by Copilot at 2e337a1
Improved the launch agent logic and fixed a
TypeError
bug. The filewandb/sdk/launch/agent/agent.py
was refactored to use a generator for thread ids and to simplify the job status checking and thread finishing.Testing
How was this PR tested?
Checklist
🤖 Generated by Copilot at 2e337a1