Cache the result of _identify_fields to improve dataset initialization speed #2271

saethlin · 2019-06-11T03:22:02Z

This is a follow-on to #2146; I'm trying to cut down the overhead of yt to increase its usability in similar-looking applications (datasets >10 GB, but few operations on the dataset).

This PR halves the runtime of scripts such as mentioned in #2146 on our system, for 4-file MassiveFIRE snapshots. On such snapshots yt used to redundantly scan the backing HDF5 files 243 times for what Datasets they contain.

This halves the runtime of scripts like in yt-project#2146 for MassiveFIRE snapshots when run on HiPerGator's lustre filesystem. On such snapshots yt used to redundantly scan the backing HDF5 files 243 times for what Datasets they contain.

welcome · 2019-06-11T03:22:07Z

Hi! Welcome, and thanks for opening this pull request. We have some guidelines for new pull requests, and soon you'll hear back about the results of our tests and continuous integration checks. Thank you for your contribution!

matthewturk · 2019-06-11T14:20:18Z

Clever idea! Thanks for submitting this. It looks good to me!

brittonsmith

Very nice!

welcome · 2019-06-11T14:33:39Z

Hooray! Congratulations on your first merged pull request! We hope we keep seeing you around! 🎆

dnarayanan · 2019-06-11T14:53:18Z

nice work @saethlin !

Cache the result of _identify_fields

4806ebe

This halves the runtime of scripts like in yt-project#2146 for MassiveFIRE snapshots when run on HiPerGator's lustre filesystem. On such snapshots yt used to redundantly scan the backing HDF5 files 243 times for what Datasets they contain.

matthewturk approved these changes Jun 11, 2019

View reviewed changes

brittonsmith approved these changes Jun 11, 2019

View reviewed changes

jzuhone self-requested a review June 11, 2019 14:32

jzuhone approved these changes Jun 11, 2019

View reviewed changes

jzuhone merged commit fff13f1 into yt-project:yt-4.0 Jun 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache the result of _identify_fields to improve dataset initialization speed #2271

Cache the result of _identify_fields to improve dataset initialization speed #2271

saethlin commented Jun 11, 2019

welcome bot commented Jun 11, 2019

matthewturk commented Jun 11, 2019

brittonsmith left a comment

welcome bot commented Jun 11, 2019

dnarayanan commented Jun 11, 2019

Cache the result of _identify_fields to improve dataset initialization speed #2271

Cache the result of _identify_fields to improve dataset initialization speed #2271

Conversation

saethlin commented Jun 11, 2019

welcome bot commented Jun 11, 2019

matthewturk commented Jun 11, 2019

brittonsmith left a comment

Choose a reason for hiding this comment

welcome bot commented Jun 11, 2019

dnarayanan commented Jun 11, 2019