Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: cleanup empty cache directory after a successful call to load_sample #4501

Merged

Conversation

neutrinoceros
Copy link
Member

@neutrinoceros neutrinoceros commented Jun 12, 2023

PR Summary

Context: internally, yt.load_sample uses pooch to retrieve data from the hub server. Because data is obtained most often as compressed archives, we first download it to a temporary directory (yt_download_cache). This is an implementation detail that should not be directly visible to users.
This PR does two things:

  • clean up the directory after it's served its purpose (this doesn't prevent pooch from re-creating it during the same session if needed)
  • prepend a . to its name so its "hidden" in the conventional POSIX sense

@neutrinoceros neutrinoceros added the enhancement Making something better label Jun 12, 2023
@neutrinoceros neutrinoceros added this to the 4.3.0 milestone Jun 12, 2023
@matthewturk
Copy link
Member

I don't think we should hide it -- dumping a couple gigs into a hidden directory is not something I really want to push. That being said, does this mean that if I use yt.load_sample multiple times in multiple sessions, each time it will download afresh?

@neutrinoceros neutrinoceros force-pushed the cleanup_after_load_sample branch 2 times, most recently from 9e64e91 to 294320c Compare June 12, 2023 14:26
@neutrinoceros
Copy link
Member Author

dumping a couple gigs into a hidden directory is not something I really want to push.

You have a point, I'll revert that part.

does this mean that if I use yt.load_sample multiple times in multiple sessions, each time it will download afresh?

it doesn't ! when the untared data is already on disk, load_sample doesn't download anything. This dir is only for tarballs, which we already auto-remove when untaring is successful.

@matthewturk
Copy link
Member

it doesn't ! when the untared data is already on disk, load_sample doesn't download anything. This dir is only for tarballs, which we already auto-remove when untaring is successful.

Great -- for some reason I thought it ran a hash on the tarball to see if it needed to re-download, rather than looking at contents of the directory.

@neutrinoceros
Copy link
Member Author

pooch internally verifies that the file's hash matches the data base we provide for security, that might be where you got that idea :)

@neutrinoceros neutrinoceros changed the title ENH: cleanup empty cache directory after a succesful call to load_sample ENH: cleanup empty cache directory after a successful call to load_sample Jun 13, 2023
Copy link
Contributor

@chrishavlin chrishavlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

@neutrinoceros neutrinoceros merged commit b754a80 into yt-project:main Jun 18, 2023
12 checks passed
@neutrinoceros neutrinoceros deleted the cleanup_after_load_sample branch June 18, 2023 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Making something better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants