Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: also validate unresolved symlinks (RAMSES) #3786

Merged
merged 4 commits into from
Feb 14, 2022

Conversation

mtrebitsch
Copy link

The extra keyword in the loader for the RAMSES frontend makes sure that the symlinks are not resolved by the RAMSESFileSanitizer. This is particularly helpful if the file structure is actually a set of links to the files, for instance to bypass the fact that RAMSESFileSanitizer doesn't deal well with outputs including "groups".

PR Summary

With yt 4, the RAMSES loader does not recognize the "groups" output structure (/path/to/output/files/output_XXXXX/group_YYYYY/info_XXXXX.txt). One relatively easy way to bypass that, in principle, is to do link all the output files in a separate folder without the nested "group" structure (/path/to/linked/files/output_XXXXX/info_XXXXX.txt). Beyond yt, this can be needed/useful for other analysis tools that are less flexible.

However, doing so does not work if the symlinks are resolved to their original location. This PR fixes this by adding an extra keyword preserve_symlinks to ensure that said links are preserved by the RAMSESFileSainitizer.
This is False by default, so it should not break any code.

This partially fixes issue #3785.

PR Checklist

  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

…ixes Issue yt-project#3785

The extra keyword in the loader for the RAMSES frontend makes sure that the symlinks are not resolved by the `RAMSESFileSanitizer`. This is particularly helpful if the file structure is actually a set of links to the files, for instance to bypass the fact that `RAMSESFileSanitizer` doesn't deal well with outputs including "groups".
@welcome
Copy link

welcome bot commented Feb 7, 2022

Hi! Welcome, and thanks for opening this pull request. We have some guidelines for new pull requests, and soon you'll hear back about the results of our tests and continuous integration checks. Thank you for your contribution!

@Xarthisius Xarthisius added bug code frontends Things related to specific frontends labels Feb 7, 2022
@neutrinoceros
Copy link
Member

That was quick !
So actually after reading the report I now understand that there should be another, cleaner way to address #3785, and I would prefer to go this route if we can avoid adding flags, which should be a last resort (sorry for suggesting it right away !).
I think you can use this patch in production but I'd rather not merge it if we can implement the correct fix. Do you want to have a go at it ?

@neutrinoceros
Copy link
Member

That being said I stand by my statement that not having the option to skip link resolving is a bug since it breaks your workflow, which seems absolutely sound to me (sorry for going back and forth on this). I think we could merge this on the condition that it doesn't close #3785, which should only be closed by the correct patch. What do you think ?

Note that if you go forward with this patch, you need to take care of linting (pre-commit.ci), see our docs on the matter https://yt-project.org/doc/developing/developing.html#automatically-checking-and-fixing-code-style

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding a couple suggestions on function signatures

yt/frontends/ramses/data_structures.py Outdated Show resolved Hide resolved
yt/frontends/ramses/data_structures.py Outdated Show resolved Hide resolved
yt/frontends/ramses/data_structures.py Outdated Show resolved Hide resolved
@neutrinoceros
Copy link
Member

One last thought for today: how about naming the flag resolve_symlinks instead so that there’s a direct correspondence between the verb and the conditional action ?

@mtrebitsch
Copy link
Author

I think you can use this patch in production but I'd rather not merge it if we can implement the correct fix. Do you want to have a go at it ?

Happy to think about it, but I'm not sure I understand enough of the code to not break anything. In any case, it would take a bit more time, but I can try to have a look (I'll maybe bug @cphyc for help with the Ramses frontend too…)

That being said I stand by my statement that not having the option to skip link resolving is a bug since it breaks your workflow, which seems absolutely sound to me (sorry for going back and forth on this). I think we could merge this on the condition that it doesn't close #3785, which should only be closed by the correct patch. What do you think ?

That would work for me :-)

Note that if you go forward with this patch, you need to take care of linting (pre-commit.ci), see our docs on the matter https://yt-project.org/doc/developing/developing.html#automatically-checking-and-fixing-code-style

Thanks for this, and for all the suggestions. Implementing and updating ASAP.

@neutrinoceros
Copy link
Member

I only have one comment remaining, after that I'd be inclined to merge this as a bug fix but I'd require a second reviewer because expanding the api could be regarded as a feature and not a bugfix. In my opinion this is a necessary addition to a half-baked feature (auto-resolving everything) which it's too late to take back, which is why I consider it a bugfix.

If a second reviewer agrees with this view, we could even ship this with the next bugfix release

neutrinoceros
neutrinoceros previously approved these changes Feb 8, 2022
@cphyc
Copy link
Member

cphyc commented Feb 8, 2022 via email

@neutrinoceros neutrinoceros changed the title New "preserve_symlinks" keyword for the RAMSES frontend. Add a "resolve_symlink" flag for the RAMSES frontend. Feb 9, 2022
@neutrinoceros neutrinoceros added this to the 4.0.3 milestone Feb 9, 2022
@neutrinoceros neutrinoceros changed the title Add a "resolve_symlink" flag for the RAMSES frontend. Add a "resolve_symlinks" flag for the RAMSES frontend. Feb 9, 2022
@neutrinoceros
Copy link
Member

Found myself working on a related problem here #3801, and now I'm thinking we could maybe avoid adding a flag to the API if we instead try to validate the user-input path with both the resolved and the unresolved version. This could work for all known workflows for RAMSES data, and would improve the user experience. @mtrebitsch, what do you think about this ?

@neutrinoceros
Copy link
Member

@mtrebitsch Here's a proof of concept patch to implement my previous suggestion

diff --git a/yt/frontends/ramses/data_structures.py b/yt/frontends/ramses/data_structures.py
index fc95e5727..85f80cfd5 100644
--- a/yt/frontends/ramses/data_structures.py
+++ b/yt/frontends/ramses/data_structures.py
@@ -1,6 +1,7 @@
 import os
 import weakref
 from collections import defaultdict
+from itertools import product
 from pathlib import Path

 import numpy as np
@@ -42,15 +43,16 @@ class RAMSESFileSanitizer:

     def __init__(self, filename):
         # Resolve so that it works with symlinks
-        filename = Path(filename).resolve()
-
         self.original_filename = filename

+        paths_to_try = (Path(filename), Path(filename).resolve())
         self.output_dir = None
         self.info_fname = None

-        for check_fun in (self.test_with_standard_file, self.test_with_folder_name):
-            ok, output_dir, info_fname = check_fun(filename)
+        check_functions = (self.test_with_standard_file, self.test_with_folder_name)
+
+        for path, check_fun in product(paths_to_try, check_functions):
+            ok, output_dir, info_fname = check_fun(path)
             if ok:
                 break

@mtrebitsch
Copy link
Author

Thanks @neutrinoceros, I'll try that over the week-end and I'll get back to you. This seems much cleaner than adding new keywords all the time indeed :-)

@mtrebitsch
Copy link
Author

One question though (sorry, it hit me when I had just sent the previous comment): is original_filename used anywhere? And is it expected to be the one passed as an argument or actually used by the rest of the function?

@neutrinoceros
Copy link
Member

As far as I can tell, original_filename should reflect user-provided data (right now it's not, because symlinks are resolved). Because it can be later used as part of an error message (see #3801), it could be really confusing that it doesn't always match what's been typed on the user side. I don't think it's used at the moment but I very much intend to change that with #3801 :)

I think it's okay rework this class so that this attribute better matches its name, and I'm confident that RAMSESFileSanitizer isn't meant to be public api so we're free to change it as much as needed.

@mtrebitsch
Copy link
Author

Thanks @neutrinoceros, the suggestion is working flawlessly (on my side at least)! I believe it solves the symlink issue, even though the PR's title isn't quite right now. :-)

@neutrinoceros neutrinoceros changed the title Add a "resolve_symlinks" flag for the RAMSES frontend. BUG: also validate unresolved symlinks (RAMSES) Feb 14, 2022
@neutrinoceros
Copy link
Member

Excellent ! I edited the title again. Given my involvement I don't want to merge this myself, but hopefully @cphyc will be happy to review it now :)

Copy link
Member

@cphyc cphyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not particularly happy about how clunky this turns out to be but it'd require a refactor for it to be nicer.

@cphyc cphyc enabled auto-merge (rebase) February 14, 2022 14:58
@cphyc cphyc merged commit 03c41c0 into yt-project:main Feb 14, 2022
@welcome
Copy link

welcome bot commented Feb 14, 2022

Hooray! Congratulations on your first merged pull request! We hope we keep seeing you around! 🎆

@lumberbot-app
Copy link

lumberbot-app bot commented Feb 14, 2022

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout yt-4.0.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -m1 03c41c0d9f47ee79f881ddc8b2a20ee3dcd433ae
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #3786: BUG: also validate unresolved symlinks (RAMSES)'
  1. Push to a named branch:
git push YOURFORK yt-4.0.x:auto-backport-of-pr-3786-on-yt-4.0.x
  1. Create a PR against branch yt-4.0.x, I would have named this PR:

"Backport PR #3786 on branch yt-4.0.x (BUG: also validate unresolved symlinks (RAMSES))"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

@neutrinoceros
Copy link
Member

Manually back ported in #3789

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug code frontends Things related to specific frontends
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants