Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(launch): fail rqis if no run is created #6324

Merged
merged 14 commits into from
Sep 20, 2023
Merged

Conversation

bcsherma
Copy link
Contributor

@bcsherma bcsherma commented Sep 20, 2023

Fixes

Description

This PR makes the agent fail RQIs when a wandb run is not created even if the job executes successfully on the target system.

We will add a regression test for this: https://wandb.atlassian.net/browse/WB-15604

Copilot

🤖 Generated by Copilot at f230a14

Summary

🚀🧹🔧

Improved the usability and reliability of wandb.launch by removing unnecessary code and enhancing error and log messages in agent.py.

Sing, O Muse, of the valiant coder who refined the launch agent
And removed the false and needless lines that marred the splendid feature
He made the errors and the logs more clear and helpful to the user
Like Hermes, swift and cunning, he enhanced the wandb launcher

Walkthrough

  • Remove redundant and misleading code that checked for job_and_run_status.completed_status == "success" (link)
  • Improve error message when submitted job fails to call wandb.init by distinguishing between exit statuses and clarifying terminology (link)
  • Fix typo and clarify log message when thread has no run by specifying that no job or run was associated with the thread (link)

@codecov
Copy link

codecov bot commented Sep 20, 2023

Codecov Report

Merging #6324 (ef9c987) into main (75fa619) will decrease coverage by 0.05%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6324      +/-   ##
==========================================
- Coverage   77.18%   77.14%   -0.05%     
==========================================
  Files         387      387              
  Lines       44421    44430       +9     
==========================================
- Hits        34288    34276      -12     
- Misses      10080    10101      +21     
  Partials       53       53              
Flag Coverage Δ
unittest 80.90% <100.00%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
wandb/sdk/launch/agent/agent.py 78.53% <100.00%> (+4.53%) ⬆️

... and 14 files with indirect coverage changes

@github-actions github-actions bot added cc-fix and removed cc-fix labels Sep 20, 2023
@bcsherma bcsherma requested a review from a team as a code owner September 20, 2023 18:48
wandb/sdk/launch/agent/agent.py Outdated Show resolved Hide resolved
wandb/sdk/launch/agent/agent.py Outdated Show resolved Hide resolved
Copy link
Contributor

@KyleGoyette KyleGoyette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with 2 comments, please handle them before merging

wandb/sdk/launch/agent/agent.py Show resolved Hide resolved
wandb/sdk/launch/agent/agent.py Outdated Show resolved Hide resolved
add more comments about the retry loop and expected states
@bcsherma bcsherma merged commit a79ddae into main Sep 20, 2023
64 of 69 checks passed
@bcsherma bcsherma deleted the launch/handle-no-wandb-init branch September 20, 2023 22:40
@kptkin kptkin added this to the sdk-2023-09.2 milestone Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants