Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): add exist_ok=False to file.download() #4564

Merged
merged 6 commits into from
Dec 27, 2022
Merged

feat(sdk): add exist_ok=False to file.download() #4564

merged 6 commits into from
Dec 27, 2022

Conversation

janosh
Copy link
Contributor

@janosh janosh commented Dec 2, 2022

To avoid raising ValueError if file exists but you don't want to re-download.

Also adds type annotations.

…file exists but you don't want to redownload

also add type annotations
@janosh janosh changed the title Add exist_ok=False to file.download() feat add exist_ok=False to file.download() Dec 2, 2022
@kptkin kptkin added this to the sdk-2023-01.1 milestone Dec 3, 2022
@dmitryduev dmitryduev requested a review from a team December 6, 2022 17:23
@@ -2782,7 +2783,7 @@ def download(self, root=".", replace=False):
`ValueError` if file already exists and replace=False
Copy link
Contributor

@kptkin kptkin Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you update the doc string here.

@@ -2782,7 +2783,7 @@ def download(self, root=".", replace=False):
`ValueError` if file already exists and replace=False
"""
path = os.path.join(root, self.name)
if os.path.exists(path) and not replace:
if os.path.exists(path) and not replace and not exist_ok:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you are handling the case where the file exists and exist_ok==True, with your current change you will still re-download it (based on your description, you said you want to avoid downloading if the file exists)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kptkin Good catch. Sorry, was a bit hasty with this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries :) That's why we have reviews

@kptkin kptkin changed the title feat add exist_ok=False to file.download() feat(sdk): add exist_ok=False to file.download() Dec 8, 2022
Comment on lines 2788 to 2792
if os.path.exists(path):
if not replace and not exist_ok:
raise ValueError("File already exists, pass replace=True to overwrite")
elif not replace and exist_ok:
return open(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total nit: (up to you)

Suggested change
if os.path.exists(path):
if not replace and not exist_ok:
raise ValueError("File already exists, pass replace=True to overwrite")
elif not replace and exist_ok:
return open(path)
if os.path.exists(path) and not replace:
if not exist_ok:
raise ValueError("File already exists, pass replace=True to overwrite")
else:
return open(path)

raise ValueError("File already exists, pass replace=True to overwrite")
if os.path.exists(path):
if not replace and not exist_ok:
raise ValueError("File already exists, pass replace=True to overwrite")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can update the error message to reflect the option that the user can pass exist_ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, good idea!

@kptkin kptkin self-assigned this Dec 10, 2022
@kptkin kptkin merged commit 3dc8584 into wandb:main Dec 27, 2022
Copy link
Contributor

@kptkin kptkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved!

@janosh janosh deleted the file-download-exist-ok branch December 27, 2022 18:55
bcsherma added a commit that referenced this pull request Dec 29, 2022
commit d244d07
Author: Katia Patkin <87335417+kptkin@users.noreply.github.com>
Date:   Tue Dec 27 13:31:20 2022 -0800

    style(public-api): format public file with proper formating (#4697)

    fix format

commit 3dc8584
Author: Janosh Riebesell <janosh.riebesell@gmail.com>
Date:   Tue Dec 27 10:21:41 2022 -0800

    feat(sdk): add `exist_ok=False` to `file.download()` (#4564)

    * add exist_ok=False to file.download() to avoid raising ValueError if file exists but you don't want to redownload

    also add type annotations

    * handle case file exists and exist_ok==True in File.download()

    * point out exist_ok=True option in file already exists error

    * tweak if/else paths

    Co-authored-by: Katia Patkin <87335417+kptkin@users.noreply.github.com>

commit 4a4651e
Author: Vish Rajiv <8609620+vwrj@users.noreply.github.com>
Date:   Fri Dec 23 14:27:16 2022 -0800

    fix(artifacts): artifact.version should be the version index from the associated collection (#4486)

    fix(artifacts): artifact.version should be the version index from the associated collection

    Co-authored-by: Hugh Wimberly <hugh.wimberly@wandb.com>
    Co-authored-by: Dmitry Duev <dmitryduev@users.noreply.github.com>

commit a7de372
Author: Noah Luna <15202580+ngrayluna@users.noreply.github.com>
Date:   Fri Dec 23 14:19:14 2022 -0800

    docs(sdk): Removed less than, greater than characters from dosctrings… (#4687)

    docs(sdk): Removed less than, greater than characters from dosctrings. Re: It breaks the new doc engine Docusuarus.

commit a45629d
Author: Noah Luna <15202580+ngrayluna@users.noreply.github.com>
Date:   Fri Dec 23 14:18:00 2022 -0800

    docs(sdk): Fixed typo in docstring for data_types.Objects3D (#4543)

    Fixed typo in docstring for data_types.Objects3D

commit 6df4b88
Author: Dmitry Duev <dmitryduev@users.noreply.github.com>
Date:   Fri Dec 23 13:16:16 2022 -0800

    test(integrations): fix import tests (#4690)

    test(integrations): fix import tests

commit 8da62bb
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Fri Dec 23 11:46:00 2022 -0800

    refactor(artifacts): consolidate hash utilities into lib.hashutil (#4525)

    * Move all MD5 and base64 encode utilities to lib.hashutil

    * Remove duplicate function definitions

    * Refactor common code out into small functions

    * Use hypothesis to for testing

    * Remove unused code

    * Update type annotations for hash types

commit eb70114
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Thu Dec 22 09:44:23 2022 -0800

    test(artifacts): improve storage handler test coverage (#4674)

    * Add test case for adding a file to a finalized artifact

    * Add test case for caching local file references

    * Add GCS and WBArtifact storage handler tests

commit 7106d17
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Wed Dec 21 19:36:31 2022 -0800

    fix(artifacts): get digest directly instead of from the manifests' manifest (#4681)

    Reference digest directly instead of though the manifests' manifest

commit 0a21fd1
Author: KyleGoyette <kdgoyette@gmail.com>
Date:   Wed Dec 21 14:37:03 2022 -0800

    feat(launch): Default to using model-registry project for agent and launch_add (#4613)

commit 84be24c
Author: Griffin Tarpenning <griffin.tarpenning@wandb.com>
Date:   Wed Dec 21 13:20:02 2022 -0800

    chore(launch): remove fallback resource when not specified for a queue (#4637)

commit cf7df1d
Author: speezepearson <speezepearson@users.noreply.github.com>
Date:   Tue Dec 20 17:10:13 2022 -0800

    test(sdk): add tests for Api.upload_file_retry (#4639)

commit c1ad0c0
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Tue Dec 20 16:08:37 2022 -0800

    refactor(artifacts): use ArtifactEntry directly instead of subclassing (#4649)

    * Subsume ArtifactEntry into ArtifactManifestEntry

commit 491fc59
Author: speezepearson <speezepearson@users.noreply.github.com>
Date:   Tue Dec 20 15:26:54 2022 -0800

    test(sdk): add unit tests for filesync.StepUpload (#4652)

    * add unit tests for filesync.StepUpload

commit 4e667d0
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Tue Dec 20 15:02:09 2022 -0800

    fix(artifacts): correctly handle url-encoded local file references. (#4665)

    * Add test case for path names needing correct urldecoding

    * Use local_fil_uri_to_path instead of urlparse

    * Fix existing tests that used relative path uris instead of absolute paths
bcsherma added a commit that referenced this pull request Dec 29, 2022
commit d244d07
Author: Katia Patkin <87335417+kptkin@users.noreply.github.com>
Date:   Tue Dec 27 13:31:20 2022 -0800

    style(public-api): format public file with proper formating (#4697)

    fix format

commit 3dc8584
Author: Janosh Riebesell <janosh.riebesell@gmail.com>
Date:   Tue Dec 27 10:21:41 2022 -0800

    feat(sdk): add `exist_ok=False` to `file.download()` (#4564)

    * add exist_ok=False to file.download() to avoid raising ValueError if file exists but you don't want to redownload

    also add type annotations

    * handle case file exists and exist_ok==True in File.download()

    * point out exist_ok=True option in file already exists error

    * tweak if/else paths

    Co-authored-by: Katia Patkin <87335417+kptkin@users.noreply.github.com>

commit 4a4651e
Author: Vish Rajiv <8609620+vwrj@users.noreply.github.com>
Date:   Fri Dec 23 14:27:16 2022 -0800

    fix(artifacts): artifact.version should be the version index from the associated collection (#4486)

    fix(artifacts): artifact.version should be the version index from the associated collection

    Co-authored-by: Hugh Wimberly <hugh.wimberly@wandb.com>
    Co-authored-by: Dmitry Duev <dmitryduev@users.noreply.github.com>

commit a7de372
Author: Noah Luna <15202580+ngrayluna@users.noreply.github.com>
Date:   Fri Dec 23 14:19:14 2022 -0800

    docs(sdk): Removed less than, greater than characters from dosctrings… (#4687)

    docs(sdk): Removed less than, greater than characters from dosctrings. Re: It breaks the new doc engine Docusuarus.

commit a45629d
Author: Noah Luna <15202580+ngrayluna@users.noreply.github.com>
Date:   Fri Dec 23 14:18:00 2022 -0800

    docs(sdk): Fixed typo in docstring for data_types.Objects3D (#4543)

    Fixed typo in docstring for data_types.Objects3D

commit 6df4b88
Author: Dmitry Duev <dmitryduev@users.noreply.github.com>
Date:   Fri Dec 23 13:16:16 2022 -0800

    test(integrations): fix import tests (#4690)

    test(integrations): fix import tests

commit 8da62bb
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Fri Dec 23 11:46:00 2022 -0800

    refactor(artifacts): consolidate hash utilities into lib.hashutil (#4525)

    * Move all MD5 and base64 encode utilities to lib.hashutil

    * Remove duplicate function definitions

    * Refactor common code out into small functions

    * Use hypothesis to for testing

    * Remove unused code

    * Update type annotations for hash types

commit eb70114
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Thu Dec 22 09:44:23 2022 -0800

    test(artifacts): improve storage handler test coverage (#4674)

    * Add test case for adding a file to a finalized artifact

    * Add test case for caching local file references

    * Add GCS and WBArtifact storage handler tests

commit 7106d17
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Wed Dec 21 19:36:31 2022 -0800

    fix(artifacts): get digest directly instead of from the manifests' manifest (#4681)

    Reference digest directly instead of though the manifests' manifest

commit 0a21fd1
Author: KyleGoyette <kdgoyette@gmail.com>
Date:   Wed Dec 21 14:37:03 2022 -0800

    feat(launch): Default to using model-registry project for agent and launch_add (#4613)

commit 84be24c
Author: Griffin Tarpenning <griffin.tarpenning@wandb.com>
Date:   Wed Dec 21 13:20:02 2022 -0800

    chore(launch): remove fallback resource when not specified for a queue (#4637)

commit cf7df1d
Author: speezepearson <speezepearson@users.noreply.github.com>
Date:   Tue Dec 20 17:10:13 2022 -0800

    test(sdk): add tests for Api.upload_file_retry (#4639)

commit c1ad0c0
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Tue Dec 20 16:08:37 2022 -0800

    refactor(artifacts): use ArtifactEntry directly instead of subclassing (#4649)

    * Subsume ArtifactEntry into ArtifactManifestEntry

commit 491fc59
Author: speezepearson <speezepearson@users.noreply.github.com>
Date:   Tue Dec 20 15:26:54 2022 -0800

    test(sdk): add unit tests for filesync.StepUpload (#4652)

    * add unit tests for filesync.StepUpload

commit 4e667d0
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Tue Dec 20 15:02:09 2022 -0800

    fix(artifacts): correctly handle url-encoded local file references. (#4665)

    * Add test case for path names needing correct urldecoding

    * Use local_fil_uri_to_path instead of urlparse

    * Fix existing tests that used relative path uris instead of absolute paths
bcsherma added a commit that referenced this pull request Jan 21, 2023
commit d244d07
Author: Katia Patkin <87335417+kptkin@users.noreply.github.com>
Date:   Tue Dec 27 13:31:20 2022 -0800

    style(public-api): format public file with proper formating (#4697)

    fix format

commit 3dc8584
Author: Janosh Riebesell <janosh.riebesell@gmail.com>
Date:   Tue Dec 27 10:21:41 2022 -0800

    feat(sdk): add `exist_ok=False` to `file.download()` (#4564)

    * add exist_ok=False to file.download() to avoid raising ValueError if file exists but you don't want to redownload

    also add type annotations

    * handle case file exists and exist_ok==True in File.download()

    * point out exist_ok=True option in file already exists error

    * tweak if/else paths

    Co-authored-by: Katia Patkin <87335417+kptkin@users.noreply.github.com>

commit 4a4651e
Author: Vish Rajiv <8609620+vwrj@users.noreply.github.com>
Date:   Fri Dec 23 14:27:16 2022 -0800

    fix(artifacts): artifact.version should be the version index from the associated collection (#4486)

    fix(artifacts): artifact.version should be the version index from the associated collection

    Co-authored-by: Hugh Wimberly <hugh.wimberly@wandb.com>
    Co-authored-by: Dmitry Duev <dmitryduev@users.noreply.github.com>

commit a7de372
Author: Noah Luna <15202580+ngrayluna@users.noreply.github.com>
Date:   Fri Dec 23 14:19:14 2022 -0800

    docs(sdk): Removed less than, greater than characters from dosctrings… (#4687)

    docs(sdk): Removed less than, greater than characters from dosctrings. Re: It breaks the new doc engine Docusuarus.

commit a45629d
Author: Noah Luna <15202580+ngrayluna@users.noreply.github.com>
Date:   Fri Dec 23 14:18:00 2022 -0800

    docs(sdk): Fixed typo in docstring for data_types.Objects3D (#4543)

    Fixed typo in docstring for data_types.Objects3D

commit 6df4b88
Author: Dmitry Duev <dmitryduev@users.noreply.github.com>
Date:   Fri Dec 23 13:16:16 2022 -0800

    test(integrations): fix import tests (#4690)

    test(integrations): fix import tests

commit 8da62bb
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Fri Dec 23 11:46:00 2022 -0800

    refactor(artifacts): consolidate hash utilities into lib.hashutil (#4525)

    * Move all MD5 and base64 encode utilities to lib.hashutil

    * Remove duplicate function definitions

    * Refactor common code out into small functions

    * Use hypothesis to for testing

    * Remove unused code

    * Update type annotations for hash types

commit eb70114
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Thu Dec 22 09:44:23 2022 -0800

    test(artifacts): improve storage handler test coverage (#4674)

    * Add test case for adding a file to a finalized artifact

    * Add test case for caching local file references

    * Add GCS and WBArtifact storage handler tests

commit 7106d17
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Wed Dec 21 19:36:31 2022 -0800

    fix(artifacts): get digest directly instead of from the manifests' manifest (#4681)

    Reference digest directly instead of though the manifests' manifest

commit 0a21fd1
Author: KyleGoyette <kdgoyette@gmail.com>
Date:   Wed Dec 21 14:37:03 2022 -0800

    feat(launch): Default to using model-registry project for agent and launch_add (#4613)

commit 84be24c
Author: Griffin Tarpenning <griffin.tarpenning@wandb.com>
Date:   Wed Dec 21 13:20:02 2022 -0800

    chore(launch): remove fallback resource when not specified for a queue (#4637)

commit cf7df1d
Author: speezepearson <speezepearson@users.noreply.github.com>
Date:   Tue Dec 20 17:10:13 2022 -0800

    test(sdk): add tests for Api.upload_file_retry (#4639)

commit c1ad0c0
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Tue Dec 20 16:08:37 2022 -0800

    refactor(artifacts): use ArtifactEntry directly instead of subclassing (#4649)

    * Subsume ArtifactEntry into ArtifactManifestEntry

commit 491fc59
Author: speezepearson <speezepearson@users.noreply.github.com>
Date:   Tue Dec 20 15:26:54 2022 -0800

    test(sdk): add unit tests for filesync.StepUpload (#4652)

    * add unit tests for filesync.StepUpload

commit 4e667d0
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Tue Dec 20 15:02:09 2022 -0800

    fix(artifacts): correctly handle url-encoded local file references. (#4665)

    * Add test case for path names needing correct urldecoding

    * Use local_fil_uri_to_path instead of urlparse

    * Fix existing tests that used relative path uris instead of absolute paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants