amrex dataset detection appears broken #3005

zingale · 2020-12-21T13:15:30Z

Bug report

Bug summary

If I try to load a Castro dataset, I get:

  File "/raid/zingale/development/yt-mz/yt/yt/loaders.py", line 98, in load
    raise YTAmbiguousDataType(fn, candidates)
yt.utilities.exceptions.YTAmbiguousDataType: Multiple data type candidates for det_strang_cfl0.5_dtnuce1e+200_nzones256/det_x_plt00210
The following independent classes were detected as valid :
<class 'yt.frontends.boxlib.data_structures.CastroDataset'>
<class 'yt.frontends.boxlib.data_structures.AMReXDataset'>
A possible workaround is to directly instantiate one of the above.
Please report this to https://github.com/yt-project/yt/issues/new

Code for reproduction

Using this dataset:
http://bender.astro.sunysb.edu/random/det_x_plt00000.tgz

Do

yt.load("det_x_plt00000")

A workaround is to use the hash 9071f0dc3 for yt.

# Paste your code here
#
#

Actual outcome

# If applicable, paste the console output here
#
#

Expected outcome

should be able to load the dataset

Version Information

Operating System: Fedora 33
Python Version: 3.9.0
yt version: 4.0.dev0
Other Libraries (if applicable):

installed yt from source

The text was updated successfully, but these errors were encountered:

neutrinoceros · 2020-12-21T14:38:08Z

Hi @zingale, sorry about this, this must be an undesirable side effect of #2938
I worked a fair bit on simplifying the validators for Boxlib child classes, but it's still not very robust. In fact this is a predictable situation where both strings "amrex" and "castro" are present in the paramfile (job_info).
Could you explain why both are showing up and how should yt guess that it's indeed a Castro-typed dataset ?

zingale · 2020-12-21T14:46:18Z

yes, they will both always show up in all MAESTROeX, NyX, and Castro datasets. The reason is that they are all different git repos, so we need to keep track of the individual hashes of each.

So Castro will always report the Castro, AMReX, and Microphysics hashes (the last doesn't matter to yt really)
MAESTROeX will always report the MAESTROeX, AMReX, and Microphysics
Nyx will always report the Nyx, AMReX hashes

probably true for WarpX too

neutrinoceros · 2020-12-21T15:01:43Z

I see. I don't know what we should do about it then because I don't know of a reliable way yt could guess the appropriate type.
In the mean time you can replace the call to yt.load with

from yt.frontends.boxlib.api import CastroDataset
ds = CastroDataset(<data_file>)

zingale · 2020-12-21T15:14:23Z

A Castro job_info file always has:

 Castro Job Information

right at the top, so that unambiguously identifies it as a Castro dataset.

Likewise, MAESTROeX job_info files always have:

 MAESTROeX Job Information

right at the top.

neutrinoceros · 2020-12-21T15:34:11Z

Nice ! Is this rule general to all boxlib types ?

zingale · 2020-12-21T15:37:32Z

sadly no. Each application is allowed to do whatever they want in the job_info file (and maybe not have one at all). There are probably dozens of AMReX codes, so I certainly don't know them all.

neutrinoceros · 2020-12-21T15:44:57Z

That's already valuable info. The problem here is that it's more a matter of making AMRExDataset invalid. Do you have any experience with this one ?

Xarthisius · 2020-12-21T15:54:19Z

Back in the day, when load() encountered a datasets that matched A.is_valid() and B.is_valid(), where B was a subclass of A, it automatically went with B. If that changed that sounds like a regression.
I now see that in this case these classes are siblings, so disregard my comment.

neutrinoceros · 2021-03-29T08:29:46Z

actually @Xarthisius , I now think you had the right idea there. Maybe those classes shouldn't be siblings after all: if one code is a direct descendent of another, maybe this hierarchy should be reflected by inheritance here ?

matthewturk · 2021-10-12T14:40:59Z

Reading this over, I am left to wonder if possibly a solution would be to provide the base-level amrex frontend, with optional "if detected" customizations. Do we have a handle on how distinct each are, and how those distinctions manifest themselves?

zingale · 2021-10-12T15:25:22Z

I would even be okay with something like ds = yt.load(file, hint="castro")

neutrinoceros · 2021-10-12T15:46:25Z

@zingale I believe there's is already some support for this.

ds = yt.load(file, cparam_filename="my_job_info_file")

is supposed to help a lot, because we attempt to determine the most relevant Dataset class using cparam_filename.
Each Dataset class has its own default value for this parameter, and as far as I understand none of them are really justified because users are free to name theses parameter files whatever they want, but it's a little delicate to remove these default value, because it'd break users who rely on them. All things considered I think the technique I showed here should be advertised in the docs, and as a last resort there's always from yt.frontends.boxlib.api import ..., which should probably be documented too.

munkm · 2021-10-12T15:57:35Z

What if instead of hint= we added a kwarg for frontend= to override default detection? This would probably be the most user-intuitive way to support loading and wouldn't require an accessory file to load.

neutrinoceros · 2021-10-12T16:06:43Z

this is starting to look a lot like #3510

matthewturk · 2021-10-12T17:47:25Z

I like some form of frontend/hint/etc.

…

On Tue, Oct 12, 2021 at 10:57 AM Madicken Munk ***@***.***> wrote: What if instead of hint= we added a kwarg for frontend= to override default detection? This would probably be the most user-intuitive way to support loading. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3005 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAVXO5JSFWWC34R2LOD7CTUGRLHVANCNFSM4VEGRYEQ> .

zingale · 2021-10-12T18:00:55Z

me too. I think frontend might not be the right term here, since there are multiple codes under a single front end, and that's what causes this problem, so maybe code="castro"?

munkm · 2021-10-12T18:55:42Z

code sounds like a good choice! I think we can be more explicit in our docs that frontend does not always mean the same thing as code too. It's easy to forget when some codes do have frontends named for them.

neutrinoceros · 2021-10-12T20:09:10Z

I'm worried code could be somewhat confusing too as unit_system="code" is also a valid (and completely unrelated) argument for yt.load

neutrinoceros added bug code frontends Things related to specific frontends labels Dec 21, 2020

zingale mentioned this issue Jun 7, 2021

some fixes for recent versions of yt AMReX-Astro/Castro#1878

Merged

5 tasks

neutrinoceros mentioned this issue Sep 14, 2021

Multiple data type candidates #3510

Closed

neutrinoceros mentioned this issue Nov 9, 2021

ENH: Allow user to set data set of the time for ambiguous data #3512

Closed

3 tasks

neutrinoceros linked a pull request Nov 9, 2021 that will close this issue

ENH: Allow user to set data set of the time for ambiguous data #3512

Closed

3 tasks

neutrinoceros mentioned this issue Nov 12, 2021

ENH: implement hint keyword argument for yt.load to help lifting ambiguities in dataformat #3666

Merged

2 tasks

matthewturk closed this as completed in #3666 Nov 17, 2021

yut23 mentioned this issue Apr 5, 2023

ENH: allow hint keyword for yt.load to select superclasses #4397

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

amrex dataset detection appears broken #3005

amrex dataset detection appears broken #3005

zingale commented Dec 21, 2020 •

edited

neutrinoceros commented Dec 21, 2020

zingale commented Dec 21, 2020

neutrinoceros commented Dec 21, 2020

zingale commented Dec 21, 2020

neutrinoceros commented Dec 21, 2020

zingale commented Dec 21, 2020

neutrinoceros commented Dec 21, 2020

Xarthisius commented Dec 21, 2020 •

edited

neutrinoceros commented Mar 29, 2021

matthewturk commented Oct 12, 2021

zingale commented Oct 12, 2021

neutrinoceros commented Oct 12, 2021

munkm commented Oct 12, 2021 •

edited

neutrinoceros commented Oct 12, 2021

matthewturk commented Oct 12, 2021 via email

zingale commented Oct 12, 2021

munkm commented Oct 12, 2021

neutrinoceros commented Oct 12, 2021

amrex dataset detection appears broken #3005

amrex dataset detection appears broken #3005

Comments

zingale commented Dec 21, 2020 • edited

Bug report

neutrinoceros commented Dec 21, 2020

zingale commented Dec 21, 2020

neutrinoceros commented Dec 21, 2020

zingale commented Dec 21, 2020

neutrinoceros commented Dec 21, 2020

zingale commented Dec 21, 2020

neutrinoceros commented Dec 21, 2020

Xarthisius commented Dec 21, 2020 • edited

neutrinoceros commented Mar 29, 2021

matthewturk commented Oct 12, 2021

zingale commented Oct 12, 2021

neutrinoceros commented Oct 12, 2021

munkm commented Oct 12, 2021 • edited

neutrinoceros commented Oct 12, 2021

matthewturk commented Oct 12, 2021 via email

zingale commented Oct 12, 2021

munkm commented Oct 12, 2021

neutrinoceros commented Oct 12, 2021

zingale commented Dec 21, 2020 •

edited

Xarthisius commented Dec 21, 2020 •

edited

munkm commented Oct 12, 2021 •

edited