Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(artifacts): Add partial file downloads, via directory prefix #6911

Merged
merged 1 commit into from
Jan 29, 2024

Conversation

biaslucas
Copy link
Contributor

Description

What does the PR do?
Read: https://wandb.atlassian.net/browse/WB-13672

  • We allow users to optionally only download files that match the prefix of an input path prefix.
  • If nothing is specified, the behavior is as before, we download all paths in the artifact. This prefix match only applies to OUR W&B path structure, NOT references or download file path structures.
  • See the below for examples:

Testing

Download prefix case:
Screenshot 2024-01-27 at 10 24 10 AM
Screenshot 2024-01-27 at 10 24 25 AM
Screenshot 2024-01-27 at 10 24 32 AM

Result:
Screenshot 2024-01-27 at 10 24 51 AM

Basecase (no arg given and we download everything):
Uploading Screenshot 2024-01-27 at 10.25.22 AM.png…

Screenshot 2024-01-27 at 10 25 09 AM

Final comments: this method is easier to code but a little off from the 'optimal' way of doing things. Ideally within the graphQL query itself we specific a prefix and in the backend we do the filtering. however for now we will just filter after querying filenames (but importantly BEFORE we actually download the files).

There should be ample communication to clients that this argument is prefix / directory filtering for W&B FILEPATHS, not REFERENCE PATHS! they are not the same

Copy link

codecov bot commented Jan 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (494df00) 77.75% compared to head (51b5b4f) 79.63%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6911      +/-   ##
==========================================
+ Coverage   77.75%   79.63%   +1.87%     
==========================================
  Files         452      452              
  Lines       50734    50747      +13     
==========================================
+ Hits        39450    40412     +962     
+ Misses      10989    10040     -949     
  Partials      295      295              
Flag Coverage Δ
func 50.24% <57.14%> (+0.61%) ⬆️
system 64.33% <57.14%> (+0.02%) ⬆️
unit 58.92% <85.71%> (+4.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
wandb/sdk/artifacts/artifact.py 93.07% <100.00%> (+0.10%) ⬆️

... and 38 files with indirect coverage changes

Copy link
Contributor

@ibindlish ibindlish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@biaslucas I'm wondering if its worth checking whether the path passed in is a file and not a directory, then should we revert to directly downloading a single file instead of iterating over the entire manifest?

wandb/sdk/artifacts/artifact.py Outdated Show resolved Hide resolved
wandb/sdk/artifacts/artifact.py Outdated Show resolved Hide resolved
wandb/sdk/artifacts/artifact.py Outdated Show resolved Hide resolved
wandb/sdk/artifacts/artifact.py Outdated Show resolved Hide resolved
@biaslucas biaslucas merged commit fda0ce2 into main Jan 29, 2024
78 checks passed
@biaslucas biaslucas deleted the partial-download-artifacts branch January 29, 2024 22:38
@daviddai-evenup
Copy link

I think the argument name is path_prefix and not dir_path_prefix

@ibindlish
Copy link
Contributor

I think the argument name is path_prefix and not dir_path_prefix

You're right! Thank you - and apologies since this was misleading!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants