Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(artifacts): consolidate hash utilities into lib.hashutil #4525

Merged
merged 63 commits into from
Dec 23, 2022

Conversation

moredatarequired
Copy link
Contributor

@moredatarequired moredatarequired commented Nov 23, 2022

Fixes WB-11889

Description

Consolidate the various helper functions for getting the MD5 hash of strings or files as hex or base64 encoded values; move all of them to util and add unit testing specifically for those functions. De-duplicate re-definitions.

Testing

New unit tests added where coverage was only incidental by way of testing something else. Other than that, a pure refactor, rely on existing tests.

Checklist

  • Include reference to internal ticket "Fixes WB-NNNN" and/or GitHub issue "Fixes #NNNN" (if applicable)
  • Ensure PR title compliance with the conventional commits standards

wandb/util.py Outdated Show resolved Hide resolved
@kptkin
Copy link
Contributor

kptkin commented Nov 23, 2022

Nice fix! Thanks!

@moredatarequired moredatarequired merged commit 8da62bb into main Dec 23, 2022
@moredatarequired moredatarequired deleted the artifacts/refactor-consolidate-util-funcs branch December 23, 2022 19:46
bcsherma added a commit that referenced this pull request Dec 29, 2022
commit d244d07
Author: Katia Patkin <87335417+kptkin@users.noreply.github.com>
Date:   Tue Dec 27 13:31:20 2022 -0800

    style(public-api): format public file with proper formating (#4697)

    fix format

commit 3dc8584
Author: Janosh Riebesell <janosh.riebesell@gmail.com>
Date:   Tue Dec 27 10:21:41 2022 -0800

    feat(sdk): add `exist_ok=False` to `file.download()` (#4564)

    * add exist_ok=False to file.download() to avoid raising ValueError if file exists but you don't want to redownload

    also add type annotations

    * handle case file exists and exist_ok==True in File.download()

    * point out exist_ok=True option in file already exists error

    * tweak if/else paths

    Co-authored-by: Katia Patkin <87335417+kptkin@users.noreply.github.com>

commit 4a4651e
Author: Vish Rajiv <8609620+vwrj@users.noreply.github.com>
Date:   Fri Dec 23 14:27:16 2022 -0800

    fix(artifacts): artifact.version should be the version index from the associated collection (#4486)

    fix(artifacts): artifact.version should be the version index from the associated collection

    Co-authored-by: Hugh Wimberly <hugh.wimberly@wandb.com>
    Co-authored-by: Dmitry Duev <dmitryduev@users.noreply.github.com>

commit a7de372
Author: Noah Luna <15202580+ngrayluna@users.noreply.github.com>
Date:   Fri Dec 23 14:19:14 2022 -0800

    docs(sdk): Removed less than, greater than characters from dosctrings… (#4687)

    docs(sdk): Removed less than, greater than characters from dosctrings. Re: It breaks the new doc engine Docusuarus.

commit a45629d
Author: Noah Luna <15202580+ngrayluna@users.noreply.github.com>
Date:   Fri Dec 23 14:18:00 2022 -0800

    docs(sdk): Fixed typo in docstring for data_types.Objects3D (#4543)

    Fixed typo in docstring for data_types.Objects3D

commit 6df4b88
Author: Dmitry Duev <dmitryduev@users.noreply.github.com>
Date:   Fri Dec 23 13:16:16 2022 -0800

    test(integrations): fix import tests (#4690)

    test(integrations): fix import tests

commit 8da62bb
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Fri Dec 23 11:46:00 2022 -0800

    refactor(artifacts): consolidate hash utilities into lib.hashutil (#4525)

    * Move all MD5 and base64 encode utilities to lib.hashutil

    * Remove duplicate function definitions

    * Refactor common code out into small functions

    * Use hypothesis to for testing

    * Remove unused code

    * Update type annotations for hash types

commit eb70114
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Thu Dec 22 09:44:23 2022 -0800

    test(artifacts): improve storage handler test coverage (#4674)

    * Add test case for adding a file to a finalized artifact

    * Add test case for caching local file references

    * Add GCS and WBArtifact storage handler tests

commit 7106d17
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Wed Dec 21 19:36:31 2022 -0800

    fix(artifacts): get digest directly instead of from the manifests' manifest (#4681)

    Reference digest directly instead of though the manifests' manifest

commit 0a21fd1
Author: KyleGoyette <kdgoyette@gmail.com>
Date:   Wed Dec 21 14:37:03 2022 -0800

    feat(launch): Default to using model-registry project for agent and launch_add (#4613)

commit 84be24c
Author: Griffin Tarpenning <griffin.tarpenning@wandb.com>
Date:   Wed Dec 21 13:20:02 2022 -0800

    chore(launch): remove fallback resource when not specified for a queue (#4637)

commit cf7df1d
Author: speezepearson <speezepearson@users.noreply.github.com>
Date:   Tue Dec 20 17:10:13 2022 -0800

    test(sdk): add tests for Api.upload_file_retry (#4639)

commit c1ad0c0
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Tue Dec 20 16:08:37 2022 -0800

    refactor(artifacts): use ArtifactEntry directly instead of subclassing (#4649)

    * Subsume ArtifactEntry into ArtifactManifestEntry

commit 491fc59
Author: speezepearson <speezepearson@users.noreply.github.com>
Date:   Tue Dec 20 15:26:54 2022 -0800

    test(sdk): add unit tests for filesync.StepUpload (#4652)

    * add unit tests for filesync.StepUpload

commit 4e667d0
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Tue Dec 20 15:02:09 2022 -0800

    fix(artifacts): correctly handle url-encoded local file references. (#4665)

    * Add test case for path names needing correct urldecoding

    * Use local_fil_uri_to_path instead of urlparse

    * Fix existing tests that used relative path uris instead of absolute paths
bcsherma pushed a commit that referenced this pull request Dec 29, 2022
)

* Move all MD5 and base64 encode utilities to lib.hashutil

* Remove duplicate function definitions

* Refactor common code out into small functions

* Use hypothesis to for testing

* Remove unused code

* Update type annotations for hash types
bcsherma added a commit that referenced this pull request Dec 29, 2022
commit d244d07
Author: Katia Patkin <87335417+kptkin@users.noreply.github.com>
Date:   Tue Dec 27 13:31:20 2022 -0800

    style(public-api): format public file with proper formating (#4697)

    fix format

commit 3dc8584
Author: Janosh Riebesell <janosh.riebesell@gmail.com>
Date:   Tue Dec 27 10:21:41 2022 -0800

    feat(sdk): add `exist_ok=False` to `file.download()` (#4564)

    * add exist_ok=False to file.download() to avoid raising ValueError if file exists but you don't want to redownload

    also add type annotations

    * handle case file exists and exist_ok==True in File.download()

    * point out exist_ok=True option in file already exists error

    * tweak if/else paths

    Co-authored-by: Katia Patkin <87335417+kptkin@users.noreply.github.com>

commit 4a4651e
Author: Vish Rajiv <8609620+vwrj@users.noreply.github.com>
Date:   Fri Dec 23 14:27:16 2022 -0800

    fix(artifacts): artifact.version should be the version index from the associated collection (#4486)

    fix(artifacts): artifact.version should be the version index from the associated collection

    Co-authored-by: Hugh Wimberly <hugh.wimberly@wandb.com>
    Co-authored-by: Dmitry Duev <dmitryduev@users.noreply.github.com>

commit a7de372
Author: Noah Luna <15202580+ngrayluna@users.noreply.github.com>
Date:   Fri Dec 23 14:19:14 2022 -0800

    docs(sdk): Removed less than, greater than characters from dosctrings… (#4687)

    docs(sdk): Removed less than, greater than characters from dosctrings. Re: It breaks the new doc engine Docusuarus.

commit a45629d
Author: Noah Luna <15202580+ngrayluna@users.noreply.github.com>
Date:   Fri Dec 23 14:18:00 2022 -0800

    docs(sdk): Fixed typo in docstring for data_types.Objects3D (#4543)

    Fixed typo in docstring for data_types.Objects3D

commit 6df4b88
Author: Dmitry Duev <dmitryduev@users.noreply.github.com>
Date:   Fri Dec 23 13:16:16 2022 -0800

    test(integrations): fix import tests (#4690)

    test(integrations): fix import tests

commit 8da62bb
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Fri Dec 23 11:46:00 2022 -0800

    refactor(artifacts): consolidate hash utilities into lib.hashutil (#4525)

    * Move all MD5 and base64 encode utilities to lib.hashutil

    * Remove duplicate function definitions

    * Refactor common code out into small functions

    * Use hypothesis to for testing

    * Remove unused code

    * Update type annotations for hash types

commit eb70114
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Thu Dec 22 09:44:23 2022 -0800

    test(artifacts): improve storage handler test coverage (#4674)

    * Add test case for adding a file to a finalized artifact

    * Add test case for caching local file references

    * Add GCS and WBArtifact storage handler tests

commit 7106d17
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Wed Dec 21 19:36:31 2022 -0800

    fix(artifacts): get digest directly instead of from the manifests' manifest (#4681)

    Reference digest directly instead of though the manifests' manifest

commit 0a21fd1
Author: KyleGoyette <kdgoyette@gmail.com>
Date:   Wed Dec 21 14:37:03 2022 -0800

    feat(launch): Default to using model-registry project for agent and launch_add (#4613)

commit 84be24c
Author: Griffin Tarpenning <griffin.tarpenning@wandb.com>
Date:   Wed Dec 21 13:20:02 2022 -0800

    chore(launch): remove fallback resource when not specified for a queue (#4637)

commit cf7df1d
Author: speezepearson <speezepearson@users.noreply.github.com>
Date:   Tue Dec 20 17:10:13 2022 -0800

    test(sdk): add tests for Api.upload_file_retry (#4639)

commit c1ad0c0
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Tue Dec 20 16:08:37 2022 -0800

    refactor(artifacts): use ArtifactEntry directly instead of subclassing (#4649)

    * Subsume ArtifactEntry into ArtifactManifestEntry

commit 491fc59
Author: speezepearson <speezepearson@users.noreply.github.com>
Date:   Tue Dec 20 15:26:54 2022 -0800

    test(sdk): add unit tests for filesync.StepUpload (#4652)

    * add unit tests for filesync.StepUpload

commit 4e667d0
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Tue Dec 20 15:02:09 2022 -0800

    fix(artifacts): correctly handle url-encoded local file references. (#4665)

    * Add test case for path names needing correct urldecoding

    * Use local_fil_uri_to_path instead of urlparse

    * Fix existing tests that used relative path uris instead of absolute paths
bcsherma added a commit that referenced this pull request Jan 21, 2023
commit d244d07
Author: Katia Patkin <87335417+kptkin@users.noreply.github.com>
Date:   Tue Dec 27 13:31:20 2022 -0800

    style(public-api): format public file with proper formating (#4697)

    fix format

commit 3dc8584
Author: Janosh Riebesell <janosh.riebesell@gmail.com>
Date:   Tue Dec 27 10:21:41 2022 -0800

    feat(sdk): add `exist_ok=False` to `file.download()` (#4564)

    * add exist_ok=False to file.download() to avoid raising ValueError if file exists but you don't want to redownload

    also add type annotations

    * handle case file exists and exist_ok==True in File.download()

    * point out exist_ok=True option in file already exists error

    * tweak if/else paths

    Co-authored-by: Katia Patkin <87335417+kptkin@users.noreply.github.com>

commit 4a4651e
Author: Vish Rajiv <8609620+vwrj@users.noreply.github.com>
Date:   Fri Dec 23 14:27:16 2022 -0800

    fix(artifacts): artifact.version should be the version index from the associated collection (#4486)

    fix(artifacts): artifact.version should be the version index from the associated collection

    Co-authored-by: Hugh Wimberly <hugh.wimberly@wandb.com>
    Co-authored-by: Dmitry Duev <dmitryduev@users.noreply.github.com>

commit a7de372
Author: Noah Luna <15202580+ngrayluna@users.noreply.github.com>
Date:   Fri Dec 23 14:19:14 2022 -0800

    docs(sdk): Removed less than, greater than characters from dosctrings… (#4687)

    docs(sdk): Removed less than, greater than characters from dosctrings. Re: It breaks the new doc engine Docusuarus.

commit a45629d
Author: Noah Luna <15202580+ngrayluna@users.noreply.github.com>
Date:   Fri Dec 23 14:18:00 2022 -0800

    docs(sdk): Fixed typo in docstring for data_types.Objects3D (#4543)

    Fixed typo in docstring for data_types.Objects3D

commit 6df4b88
Author: Dmitry Duev <dmitryduev@users.noreply.github.com>
Date:   Fri Dec 23 13:16:16 2022 -0800

    test(integrations): fix import tests (#4690)

    test(integrations): fix import tests

commit 8da62bb
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Fri Dec 23 11:46:00 2022 -0800

    refactor(artifacts): consolidate hash utilities into lib.hashutil (#4525)

    * Move all MD5 and base64 encode utilities to lib.hashutil

    * Remove duplicate function definitions

    * Refactor common code out into small functions

    * Use hypothesis to for testing

    * Remove unused code

    * Update type annotations for hash types

commit eb70114
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Thu Dec 22 09:44:23 2022 -0800

    test(artifacts): improve storage handler test coverage (#4674)

    * Add test case for adding a file to a finalized artifact

    * Add test case for caching local file references

    * Add GCS and WBArtifact storage handler tests

commit 7106d17
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Wed Dec 21 19:36:31 2022 -0800

    fix(artifacts): get digest directly instead of from the manifests' manifest (#4681)

    Reference digest directly instead of though the manifests' manifest

commit 0a21fd1
Author: KyleGoyette <kdgoyette@gmail.com>
Date:   Wed Dec 21 14:37:03 2022 -0800

    feat(launch): Default to using model-registry project for agent and launch_add (#4613)

commit 84be24c
Author: Griffin Tarpenning <griffin.tarpenning@wandb.com>
Date:   Wed Dec 21 13:20:02 2022 -0800

    chore(launch): remove fallback resource when not specified for a queue (#4637)

commit cf7df1d
Author: speezepearson <speezepearson@users.noreply.github.com>
Date:   Tue Dec 20 17:10:13 2022 -0800

    test(sdk): add tests for Api.upload_file_retry (#4639)

commit c1ad0c0
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Tue Dec 20 16:08:37 2022 -0800

    refactor(artifacts): use ArtifactEntry directly instead of subclassing (#4649)

    * Subsume ArtifactEntry into ArtifactManifestEntry

commit 491fc59
Author: speezepearson <speezepearson@users.noreply.github.com>
Date:   Tue Dec 20 15:26:54 2022 -0800

    test(sdk): add unit tests for filesync.StepUpload (#4652)

    * add unit tests for filesync.StepUpload

commit 4e667d0
Author: Hugh Wimberly <hugh.wimberly@wandb.com>
Date:   Tue Dec 20 15:02:09 2022 -0800

    fix(artifacts): correctly handle url-encoded local file references. (#4665)

    * Add test case for path names needing correct urldecoding

    * Use local_fil_uri_to_path instead of urlparse

    * Fix existing tests that used relative path uris instead of absolute paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants