Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amrex dataset detection appears broken #3005

Closed
zingale opened this issue Dec 21, 2020 · 18 comments · Fixed by #3666
Closed

amrex dataset detection appears broken #3005

zingale opened this issue Dec 21, 2020 · 18 comments · Fixed by #3666
Labels
bug code frontends Things related to specific frontends

Comments

@zingale
Copy link
Member

zingale commented Dec 21, 2020

Bug report

Bug summary

If I try to load a Castro dataset, I get:

  File "/raid/zingale/development/yt-mz/yt/yt/loaders.py", line 98, in load
    raise YTAmbiguousDataType(fn, candidates)
yt.utilities.exceptions.YTAmbiguousDataType: Multiple data type candidates for det_strang_cfl0.5_dtnuce1e+200_nzones256/det_x_plt00210
The following independent classes were detected as valid :
<class 'yt.frontends.boxlib.data_structures.CastroDataset'>
<class 'yt.frontends.boxlib.data_structures.AMReXDataset'>
A possible workaround is to directly instantiate one of the above.
Please report this to https://github.com/yt-project/yt/issues/new

Code for reproduction

Using this dataset:
http://bender.astro.sunysb.edu/random/det_x_plt00000.tgz

Do

yt.load("det_x_plt00000")

A workaround is to use the hash 9071f0dc3 for yt.

# Paste your code here
#
#

Actual outcome

# If applicable, paste the console output here
#
#

Expected outcome

should be able to load the dataset

Version Information

  • Operating System: Fedora 33
  • Python Version: 3.9.0
  • yt version: 4.0.dev0
  • Other Libraries (if applicable):

installed yt from source

@neutrinoceros
Copy link
Member

Hi @zingale, sorry about this, this must be an undesirable side effect of #2938
I worked a fair bit on simplifying the validators for Boxlib child classes, but it's still not very robust. In fact this is a predictable situation where both strings "amrex" and "castro" are present in the paramfile (job_info).
Could you explain why both are showing up and how should yt guess that it's indeed a Castro-typed dataset ?

@neutrinoceros neutrinoceros added bug code frontends Things related to specific frontends labels Dec 21, 2020
@zingale
Copy link
Member Author

zingale commented Dec 21, 2020

yes, they will both always show up in all MAESTROeX, NyX, and Castro datasets. The reason is that they are all different git repos, so we need to keep track of the individual hashes of each.

So Castro will always report the Castro, AMReX, and Microphysics hashes (the last doesn't matter to yt really)
MAESTROeX will always report the MAESTROeX, AMReX, and Microphysics
Nyx will always report the Nyx, AMReX hashes

probably true for WarpX too

@neutrinoceros
Copy link
Member

I see. I don't know what we should do about it then because I don't know of a reliable way yt could guess the appropriate type.
In the mean time you can replace the call to yt.load with

from yt.frontends.boxlib.api import CastroDataset
ds = CastroDataset(<data_file>)

@zingale
Copy link
Member Author

zingale commented Dec 21, 2020

A Castro job_info file always has:

 Castro Job Information

right at the top, so that unambiguously identifies it as a Castro dataset.

Likewise, MAESTROeX job_info files always have:

 MAESTROeX Job Information

right at the top.

@neutrinoceros
Copy link
Member

Nice ! Is this rule general to all boxlib types ?

@zingale
Copy link
Member Author

zingale commented Dec 21, 2020

sadly no. Each application is allowed to do whatever they want in the job_info file (and maybe not have one at all). There are probably dozens of AMReX codes, so I certainly don't know them all.

@neutrinoceros
Copy link
Member

That's already valuable info. The problem here is that it's more a matter of making AMRExDataset invalid. Do you have any experience with this one ?

@Xarthisius
Copy link
Member

Xarthisius commented Dec 21, 2020

Back in the day, when load() encountered a datasets that matched A.is_valid() and B.is_valid(), where B was a subclass of A, it automatically went with B. If that changed that sounds like a regression.
I now see that in this case these classes are siblings, so disregard my comment.

@neutrinoceros
Copy link
Member

actually @Xarthisius , I now think you had the right idea there. Maybe those classes shouldn't be siblings after all: if one code is a direct descendent of another, maybe this hierarchy should be reflected by inheritance here ?

@matthewturk
Copy link
Member

Reading this over, I am left to wonder if possibly a solution would be to provide the base-level amrex frontend, with optional "if detected" customizations. Do we have a handle on how distinct each are, and how those distinctions manifest themselves?

@zingale
Copy link
Member Author

zingale commented Oct 12, 2021

I would even be okay with something like ds = yt.load(file, hint="castro")

@neutrinoceros
Copy link
Member

@zingale I believe there's is already some support for this.

ds = yt.load(file, cparam_filename="my_job_info_file")

is supposed to help a lot, because we attempt to determine the most relevant Dataset class using cparam_filename.
Each Dataset class has its own default value for this parameter, and as far as I understand none of them are really justified because users are free to name theses parameter files whatever they want, but it's a little delicate to remove these default value, because it'd break users who rely on them. All things considered I think the technique I showed here should be advertised in the docs, and as a last resort there's always from yt.frontends.boxlib.api import ..., which should probably be documented too.

@munkm
Copy link
Member

munkm commented Oct 12, 2021

What if instead of hint= we added a kwarg for frontend= to override default detection? This would probably be the most user-intuitive way to support loading and wouldn't require an accessory file to load.

@neutrinoceros
Copy link
Member

this is starting to look a lot like #3510

@matthewturk
Copy link
Member

matthewturk commented Oct 12, 2021 via email

@zingale
Copy link
Member Author

zingale commented Oct 12, 2021

me too. I think frontend might not be the right term here, since there are multiple codes under a single front end, and that's what causes this problem, so maybe code="castro"?

@munkm
Copy link
Member

munkm commented Oct 12, 2021

code sounds like a good choice! I think we can be more explicit in our docs that frontend does not always mean the same thing as code too. It's easy to forget when some codes do have frontends named for them.

@neutrinoceros
Copy link
Member

I'm worried code could be somewhat confusing too as unit_system="code" is also a valid (and completely unrelated) argument for yt.load

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug code frontends Things related to specific frontends
Projects
None yet
5 participants