Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAMSES loader does not recognize "groups" file structure #3785

Closed
mtrebitsch opened this issue Feb 7, 2022 · 6 comments · Fixed by #3811
Closed

RAMSES loader does not recognize "groups" file structure #3785

mtrebitsch opened this issue Feb 7, 2022 · 6 comments · Fixed by #3811
Assignees
Labels
bug code frontends Things related to specific frontends

Comments

@mtrebitsch
Copy link

Bug report

Bug summary

The RAMSES loader does not recognize the "groups" output structure (output_XXXXX/group_YYYYY/info_XXXXX.txt) with the version 4.0.1 of the code. This is due to the way the output structure is recognized by the RAMSESFileSanitizer class (see below), and cannot be bypassed by (sym)linking all the output files a new "clean" structure (output_XXXXX/info_XXXXX.txt) due to the file sanitizer resolving the symlinks.

Code for reproduction

ds = yt.load("/path/to/ramses/data/output_00075/group_00001/info_00075.txt")

or

ds = yt.load("/path/to/symlinked/ramses/data/output_00075/info_00075.txt")

Actual outcome

Screenshot of the issue

In [5]: ds = yt.load("/data84/obelisk/RHD/OUTPUT_DIR/output_00075/group_00001/info_00075.txt")
---------------------------------------------------------------------------
YTUnidentifiedDataType                    Traceback (most recent call last)
Input In [5], in <module>
----> 1 ds = yt.load("/data84/obelisk/RHD/OUTPUT_DIR/output_00075/group_00001/info_00075.txt")

File ~/.conda/envs/astro/lib/python3.10/site-packages/yt/loaders.py:98, in load(fn, *args, **kwargs)
     95 if len(candidates) > 1:
     96     raise YTAmbiguousDataType(fn, candidates)
---> 98 raise YTUnidentifiedDataType(fn, *args, **kwargs)

YTUnidentifiedDataType: Could not determine input format from `'/data84/obelisk/RHD/OUTPUT_DIR/output_00075/group_00001/info_00075.txt'`.

Expected outcome
ds should actually be the loaded dataset. Testing on a smaller simulation without the "groups" structure works.

Version Information

  • Operating System: GNU/Linux CentOS 8 stream
  • Python Version: Python 3.10.2
  • yt version: 4.0.1, through conda install --channel conda-forge yt

More details
I think the issue is in the RAMSESFileSanitizer class, defined in frontends/ramses/data_structures.py. The __init__ method uses the class method test_with_standard_file to check that the file structure is "right" by capturing the value of iout from the name of the folder (it should be 00075 in my case, but is resolved to 00001 because of the group structure), and then checking (with the static method check_standard_files) that there are files called info_{iout}.txt and amr_{iout}.txt, which fails in my case because of the wrong assumption for iout.

I think the symlinks trick I used doesn't work because the file sanitizer uses Path().resolve(), which does resolve the symlink and leads me back to the issue.

Proposed solution
I can think of mainly two ways of addressing this: either accounting properly for the possibility of groups or having the option to not resolve the symlinks.

The first solution seems a bit cleaner, and would require to change a bit test_with_standard_file and maybe test_with_folder_name. Currently the code matches the structure of the name with OUTPUT_DIR_RE = re.compile(r"(output|group)_(\d{5})"), but only uses the 2nd part to capture iout. Presumably, catching the "group" case is straightforward, but I'm not 100% sure what to do after that and how it would propagate to the rest of the code.

The second option is much quicker to implement: not resolving the symlinks actually solves my issue. So (following a suggestion from @neutrinoceros on Slack), adding a boolean flag to RAMSESDataset that would then be passed to RAMSESFileSanitizer should do the trick.

I will try to come up with a PR with that second option ASAP!

@welcome
Copy link

welcome bot commented Feb 7, 2022

Hi, and welcome to yt! Thanks for opening your first issue. We have an issue template that helps us to gather relevant information to help diagnosing and fixing the issue.

mtrebitsch pushed a commit to mtrebitsch/yt that referenced this issue Feb 7, 2022
…ixes Issue yt-project#3785

The extra keyword in the loader for the RAMSES frontend makes sure that the symlinks are not resolved by the `RAMSESFileSanitizer`. This is particularly helpful if the file structure is actually a set of links to the files, for instance to bypass the fact that `RAMSESFileSanitizer` doesn't deal well with outputs including "groups".
@neutrinoceros neutrinoceros added bug code frontends Things related to specific frontends labels Feb 7, 2022
cphyc pushed a commit that referenced this issue Feb 14, 2022
…ixes Issue #3785

The extra keyword in the loader for the RAMSES frontend makes sure that the symlinks are not resolved by the `RAMSESFileSanitizer`. This is particularly helpful if the file structure is actually a set of links to the files, for instance to bypass the fact that `RAMSESFileSanitizer` doesn't deal well with outputs including "groups".
@neutrinoceros
Copy link
Member

Fixed with #3786

@mtrebitsch
Copy link
Author

Sorry to prompt this, but I think #3786 doesn't fully fix the issue? In #3786 (comment) you suggested to merge only if doesn't close the issue.

In particular, something like

ds = yt.load("/path/to/ramses/data/output_00075/group_00001/info_00075.txt")

still fails, even with the fix. If needed I can reopen an issue, but I'm not sure I have a (proper) fix for that. I can try to think of one though!

@neutrinoceros
Copy link
Member

In #3786 (comment) you suggested to merge only if doesn't close the issue.

That was before we found a better fix, though it seems that I missed some details here and I should indeed not have closed this issue. Let's reopen

@neutrinoceros neutrinoceros reopened this Feb 14, 2022
@cphyc
Copy link
Member

cphyc commented Feb 15, 2022

Sorry to prompt this, but I think #3786 doesn't fully fix the issue? In #3786 (comment) you suggested to merge only if doesn't close the issue.

In particular, something like

ds = yt.load("/path/to/ramses/data/output_00075/group_00001/info_00075.txt")

still fails, even with the fix. If needed I can reopen an issue, but I'm not sure I have a (proper) fix for that. I can try to think of one though!

Could you try loading with

ds = yt.load("/path/to/ramses/data/output_00075/group_00001")

instead?

@mtrebitsch
Copy link
Author

This also fails, presumably because of the way test_with_folder_name checks for the output number, basically here:

iout_match = OUTPUT_DIR_RE.match(output_dir.name)

For a run with "groups", this will invariably yield 00001, which doesn't quite work. It's virtually the same issue as for test_with_standard_file:
iout_match = OUTPUT_DIR_RE.match(filename.parent.name)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug code frontends Things related to specific frontends
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants