Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WB-7527] Launch AWS Sagemaker integration #3007

Merged
merged 78 commits into from
Feb 11, 2022
Merged

Conversation

KyleGoyette
Copy link
Contributor

@KyleGoyette KyleGoyette commented Dec 9, 2021

Fixes WB-7527

Description

Adds support for an agent that runs and builds docker images locally. But then sends them to AWS sagemaker to run as training jobs.

Testing

Locally, and on an EC2 instance

Checklist

  • Name PR "[WB-NNNN][WB-MMMM] Add support for..." similar to entries in CHANGELOG.md
  • Include reference to internal ticket "Fixes WB-NNNN" (and github issue "Fixes #NNNN" if applicable)

@codecov
Copy link

codecov bot commented Dec 9, 2021

Codecov Report

Merging #3007 (0147152) into master (06ff196) will increase coverage by 0.06%.
The diff coverage is 85.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3007      +/-   ##
==========================================
+ Coverage   80.09%   80.16%   +0.06%     
==========================================
  Files         209      210       +1     
  Lines       27615    27867     +252     
==========================================
+ Hits        22119    22340     +221     
- Misses       5496     5527      +31     
Flag Coverage Δ
functest 57.21% <1.43%> (-0.49%) ⬇️
unittest 69.76% <85.61%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
wandb/sdk/launch/launch.py 80.39% <ø> (ø)
wandb/cli/cli.py 66.04% <50.00%> (-0.14%) ⬇️
wandb/docker/__init__.py 86.66% <50.00%> (-0.95%) ⬇️
wandb/util.py 85.71% <75.00%> (-0.11%) ⬇️
wandb/sdk/launch/runner/aws.py 86.51% <86.51%> (ø)
wandb/sdk/launch/utils.py 93.84% <87.50%> (+0.22%) ⬆️
wandb/sdk/launch/docker.py 86.97% <91.66%> (+0.64%) ⬆️
wandb/sdk/launch/_project_spec.py 90.40% <100.00%> (+0.24%) ⬆️
wandb/sdk/launch/agent/agent.py 92.42% <100.00%> (ø)
wandb/sdk/launch/runner/abstract.py 69.86% <100.00%> (ø)
... and 9 more

@KyleGoyette KyleGoyette marked this pull request as ready for review December 9, 2021 06:02
wandb/sdk/launch/_project_spec.py Show resolved Hide resolved
wandb/sdk/launch/docker.py Show resolved Hide resolved
wandb/sdk/launch/runner/aws.py Outdated Show resolved Hide resolved
wandb/sdk/launch/runner/aws.py Outdated Show resolved Hide resolved
wandb/sdk/launch/runner/aws.py Outdated Show resolved Hide resolved
wandb/sdk/launch/runner/aws.py Show resolved Hide resolved
sagemaker_args["VpcConfig"] = resource_args.get(
"VpcConfig", resource_args.get("vpc_config")
)
sagemaker_args["Tags"] = resource_args.get("Tags", resource_args.get("tags"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol this whole section is brutal i wonder if we should even be enabling passing these in through cli args (ie force all of this through a json config instead)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the resource_args can come through the launch spec and might differ from run to run. But I have an idea to solve this. I'll just check for the required ones, and convert all other snake_case examples to camelcase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can subkey all resource args for sagemaker to a subkey of resource_args called sagemaker in that case we may as well drop the CLI arg for resource_args. I can drop that in this PR. I agree that this still isn't optimal

wandb/sdk/launch/_project_spec.py Outdated Show resolved Hide resolved
Copy link
Contributor

@stephchen stephchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good except one error message correction

wandb/sdk/launch/runner/aws.py Outdated Show resolved Hide resolved
Copy link
Contributor

@stephchen stephchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one extra print statement that might need to be deleted but looks good!

wandb/sdk/launch/launch.py Outdated Show resolved Hide resolved
@KyleGoyette KyleGoyette merged commit ba9106a into master Feb 11, 2022
@KyleGoyette KyleGoyette deleted the kyle/aws-integrations branch February 11, 2022 20:17
@raubitsj raubitsj added this to the sdk-2022-03.1 milestone Feb 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants