-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WB-7527] Launch AWS Sagemaker integration #3007
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3007 +/- ##
==========================================
+ Coverage 80.09% 80.16% +0.06%
==========================================
Files 209 210 +1
Lines 27615 27867 +252
==========================================
+ Hits 22119 22340 +221
- Misses 5496 5527 +31
Flags with carried forward coverage won't be shown. Click here to find out more.
|
wandb/sdk/launch/runner/aws.py
Outdated
sagemaker_args["VpcConfig"] = resource_args.get( | ||
"VpcConfig", resource_args.get("vpc_config") | ||
) | ||
sagemaker_args["Tags"] = resource_args.get("Tags", resource_args.get("tags")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol this whole section is brutal i wonder if we should even be enabling passing these in through cli args (ie force all of this through a json config instead)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the resource_args
can come through the launch spec and might differ from run to run. But I have an idea to solve this. I'll just check for the required ones, and convert all other snake_case examples to camelcase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can subkey all resource args for sagemaker to a subkey of resource_args called sagemaker
in that case we may as well drop the CLI arg for resource_args. I can drop that in this PR. I agree that this still isn't optimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good except one error message correction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one extra print statement that might need to be deleted but looks good!
Fixes WB-7527
Description
Adds support for an agent that runs and builds docker images locally. But then sends them to AWS sagemaker to run as training jobs.
Testing
Locally, and on an EC2 instance
Checklist