Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix ParamFileStore for on-demand frontend imports #4926

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chrishavlin
Copy link
Contributor

@chrishavlin chrishavlin commented Jun 13, 2024

Currently, the Dataset pickling is broken between python sessions when using the yt config option store_parameter_files = true. For example, given two scripts, create_a_pickle.py and load_a_pickle.py:

# create_a_pickle.py
import yt
import pickle

if __name__ == '__main__':

    fname = 'IsolatedGalaxy/galaxy0030/galaxy0030'
    ds = yt.load(fname)

    with open('yt_ds.pickle', 'wb') as handle:
        pickle.dump(ds, handle, protocol=pickle.HIGHEST_PROTOCOL)
# load_a_pickle.py
import pickle

if __name__ == '__main__':

    with open('yt_ds.pickle', 'rb') as handle:
        ds = pickle.load(handle)
   _ =  ds.index

The following fails on the attempt to load:

$ python create_a_pickle.py
$ python load_a_pickle.py
Traceback (most recent call last):
  File "/home/chavlin/src/yt_general/performance/yt_perf_testing/miscelleaneous_tests/dask_pickles_etc/pickle_from_file.py", line 5, in <module>
    ds = pickle.load(handle)
  File "/home/chavlin/src/yt_general/yt/yt/data_objects/static_output.py", line 2053, in _reconstruct_ds
    ds = datasets.get_ds_hash(*args)
  File "/home/chavlin/src/yt_general/yt/yt/utilities/parameter_file_storage.py", line 91, in get_ds_hash
    return self._convert_ds(self._records[hash])
  File "/home/chavlin/src/yt_general/yt/yt/utilities/parameter_file_storage.py", line 117, in _convert_ds
    raise UnknownDatasetType(class_name)
yt.utilities.parameter_file_storage.UnknownDatasetType: EnzoDataset

(I also ran into this problem when experimenting with some dask functionality...).

The issue comes how yt now imports frontends on demand (output_type_registry is reset between sessions).

@chrishavlin chrishavlin marked this pull request as draft June 13, 2024 20:34
@chrishavlin
Copy link
Contributor Author

Setting to draft for now because I want to see if I can write a test with some pytest monkeypatching.

Also note that the fix here is not backwards compatible -- IMO it's fine because I don't think many folks are using this functionality and it's not meant for long-term dataset storage anyway.

@chrishavlin chrishavlin added the backwards incompatible This change will change behavior label Jun 26, 2024
"tt",
"ctid",
"class_name",
"module_name",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the potentially backwards incompatible part: when using the on-disk store, this new field will be a new column in the csv. so if an on-disk store already exists, there will be an error due to the different number of columns and the user would need to remove the old on-disk store.

@chrishavlin chrishavlin marked this pull request as ready for review June 26, 2024 21:39
@chrishavlin
Copy link
Contributor Author

assuming tests pass this should be ready for review

@chrishavlin
Copy link
Contributor Author

oh, those are real failures. will take another go at those pytest fixtures....

@chrishavlin chrishavlin marked this pull request as draft June 26, 2024 21:41
# first check that the relevant frontend is imported:
# the output_type_registry is refreshed between sessions as
# frontends are imported on-demand now.
import importlib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to put this import at the top level ? importlib should be "free" to import anyway.

import importlib

module = ds_dict["module_name"]
fe_api = ".".join(module.split(".")[0:3]) + ".api"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure this is equivalent (I don't know how many "." are expected), but I'm willing to bet there's a clearer way to express this change in file extension.

Suggested change
fe_api = ".".join(module.split(".")[0:3]) + ".api"
fe_api = module.rpartition(".")[0] + ".api"



def test_on_disk_parameter_file_store(
monkeypatch, store_parameter_files, set_parameter_file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixtures that are used as setup/teardown but don't need an accessible name should be declared with pytest.mark.usefixture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards incompatible This change will change behavior bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants