Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix service performance issue during run shutdown #3262

Merged
merged 2 commits into from
Feb 19, 2022
Merged

Conversation

raubitsj
Copy link
Member

@raubitsj raubitsj commented Feb 19, 2022

Description

When we shutdown a run we force close the queues and sockets so that reader threads do not hang.
The code as written was basically in a tight loop of error handling.. This should behave more as intended.

This shutdown of queues and sockets could still be cleaner in the future.

There is another performance issue still being tracked down, but this is the primary culprit.

Regarding coverage... normally the ValueError case should not be hit, typically you get a handle error (OSError) first but i think there are some cases where you could get the ValueError first, would have to do some careful error injection to hit all these cases.

Testing

Manually confirmed fix in regression

Checklist

  • Name PR "[WB-NNNN][WB-MMMM] Add support for..." similar to entries in CHANGELOG.md
  • Include reference to internal ticket "Fixes WB-NNNN" (and github issue "Fixes #NNNN" if applicable)

@codecov
Copy link

codecov bot commented Feb 19, 2022

Codecov Report

Merging #3262 (6842d10) into master (95e6ced) will increase coverage by 0.03%.
The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3262      +/-   ##
==========================================
+ Coverage   79.91%   79.95%   +0.03%     
==========================================
  Files         213      213              
  Lines       27882    27882              
==========================================
+ Hits        22282    22292      +10     
+ Misses       5600     5590      -10     
Flag Coverage Δ
functest 56.61% <50.00%> (-0.01%) ⬇️
unittest 69.56% <0.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
wandb/sdk/service/server_sock.py 91.66% <50.00%> (-1.05%) ⬇️
wandb/sdk/lib/git.py 76.35% <0.00%> (ø)
wandb/sdk/lib/redirect.py 86.71% <0.00%> (+0.17%) ⬆️
wandb/sdk/wandb_run.py 88.17% <0.00%> (+0.28%) ⬆️
wandb/sdk/internal/stats.py 67.77% <0.00%> (+0.55%) ⬆️
wandb/sdk/launch/agent/agent.py 93.93% <0.00%> (+0.75%) ⬆️
wandb/sdk/internal/meta.py 90.74% <0.00%> (+3.08%) ⬆️

@raubitsj raubitsj marked this pull request as ready for review February 19, 2022 13:58
@raubitsj raubitsj requested a review from kptkin February 19, 2022 16:20
@raubitsj
Copy link
Member Author

(merging now to fix broken tests, any review comments will be done in a separate PR)

@raubitsj raubitsj merged commit 7f24ac7 into master Feb 19, 2022
@raubitsj raubitsj deleted the service-fix-perf branch February 19, 2022 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant