Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add support for molecular fields (AMRex) #4908

Merged
merged 1 commit into from
Jun 5, 2024

Conversation

simonguichandut
Copy link
Contributor

PR Summary

The boxlib dataset parsing (for CASTRO/MAESTROeX) are currently set up for usual atomic species like "He4" or "C12", or descriptive names like "ash" with no numbers. The parsing breaks for e.g. molecules like "H2O" or "C6H6". This fix escapes those cases.

Ideally, we could have some fancy regex to correctly parse the molecules (and create proper a TeX string), but this is a very niche use of those codes.

PR Checklist

  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

Copy link

welcome bot commented May 17, 2024

Hi! Welcome, and thanks for opening this pull request. We have some guidelines for new pull requests, and soon you'll hear back about the results of our tests and continuous integration checks. Thank you for your contribution!

lab = r"X\left(%s%s\right)"
tex_label = lab % spec_match.groups()[::-1]

except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except statements should be avoided, please only catch specific exceptions. The code in the try branch should also be limited to the part that's expected to sometimes raise exceptions.

@matthewturk
Copy link
Member

Thanks for doing this, @simonguichandut ! It's tough for us to anticipate all the different representations, so making sure we're up to date on how codes represent molecules and aligning that with your expectations is really important. Thank you!

@zingale zingale added the code frontends Things related to specific frontends label May 17, 2024
@simonguichandut simonguichandut changed the title escape molecules escape molecules for boxlib datasets May 17, 2024
@zingale
Copy link
Member

zingale commented May 17, 2024

I tested this with a Castro and MAESTROeX plotfile and it works.

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this ! Here's a couple questions and suggestions.


except AttributeError as e:
# Catch exception cases for e.g. molecules (H2O, C6H6)
print("Could not parse species ", field)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use print to log errors. You could use ytLogger.error here instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really an "error" and ytLogger prints a whole traceback, not sure that is the ideal solution

Copy link
Member

@neutrinoceros neutrinoceros May 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really an "error"

Can you clarify ? To me, seeing return inside an except blocks means we've encountered a recoverable exception (error).

ytLogger prints a whole traceback

I'm not sure what you mean here. ytLogger is meant to print messages to the console and we generally don't include (even partial) tracebacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm using the wrong terminology. I'm just trying to say that the "error" in this case is not a "big deal". The species is not recognized, so we won't try to produce a nice tex label, that's it. ytLogger prints out very long error message and the whole call stack, which feels like overkill.

Here's what I get

python $MAESTROEX_DIR/Util/yt/plotsinglevar.py plt_0000100 "X(H2O)"
--- Logging error ---
Traceback (most recent call last):
  File "/Users/simon/anaconda3/lib/python3.11/logging/__init__.py", line 1110, in emit
    msg = self.format(record)
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/simon/anaconda3/lib/python3.11/logging/__init__.py", line 953, in format
    return fmt.format(record)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/simon/anaconda3/lib/python3.11/logging/__init__.py", line 687, in format
    record.message = record.getMessage()
                     ^^^^^^^^^^^^^^^^^^^
  File "/Users/simon/anaconda3/lib/python3.11/logging/__init__.py", line 377, in getMessage
    msg = msg % self.args
          ~~~~^~~~~~~~~~~
TypeError: not all arguments converted during string formatting
Call stack:
  File "/Users/simon/Desktop/codes/AMREX-Astro/MAESTROeX/Util/yt/plotsinglevar.py", line 94, in <module>
    plot_single_var(args.plotfile, args.outfile, args.variables, use_log, args.norm, args.minimum, args.maximum)
  File "/Users/simon/Desktop/codes/AMREX-Astro/MAESTROeX/Util/yt/plotsinglevar.py", line 52, in plot_single_var
    plots = yt.SlicePlot(ds, norm_axis, var_names)
  File "/Users/simon/Desktop/codes/yt-git/yt/visualization/plot_window.py", line 1821, in __init__
    (bounds, center, display_center) = get_window_parameters(
  File "/Users/simon/Desktop/codes/yt-git/yt/visualization/plot_window.py", line 70, in get_window_parameters
    width = ds.coordinates.sanitize_width(axis, width, None)
  File "/Users/simon/Desktop/codes/yt-git/yt/geometry/coordinates/coordinate_handler.py", line 300, in sanitize_width
    self.ds.index
  File "/Users/simon/Desktop/codes/yt-git/yt/data_objects/static_output.py", line 613, in index
    self.create_field_info()
  File "/Users/simon/Desktop/codes/yt-git/yt/data_objects/static_output.py", line 665, in create_field_info
    self.field_info.setup_fluid_fields()
  File "/Users/simon/Desktop/codes/yt-git/yt/frontends/boxlib/fields.py", line 475, in setup_fluid_fields
    nice_name, tex_label = _nice_species_name(field)
  File "/Users/simon/Desktop/codes/yt-git/yt/frontends/boxlib/fields.py", line 534, in _nice_species_name
    ytLogger.error("Could not parse species ", field)
Message: 'Could not parse species '
Arguments: ('X(H2O)',)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The traceback isn't a feature of the logger, but shows a mistake in your call to ytLogger.error:

Instead of

ytLogger.error("Could not parse species ", field)

write

ytLogger.error("Could not parse species %s", field)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thank you!

# if the species field is a descriptive name, then the match
# on the integer will be blank
# modify the tex string in this case to remove spurious tex spacing
lab = r"X\left(^{%s}%s\right)"
if spec_match.groups()[-1] == "":
lab = r"X\left(%s%s\right)"
tex_label = lab % spec_match.groups()[::-1]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the existence of #4845, where this file is moved, we're going to get merge conflict anyway so let's make their resolution as easy as possible by keeping the diff minimal.

Suggested change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand, you are only suggesting to remove the blank line here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #4845 is ready to merge, right?
should we wait on anything else in that PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand, you are only suggesting to remove the blank line here?

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we wait on anything else in that PR?

I just want to honor the "one week or so" delay I promised in my last call for objections. Reviews came in faster than I anticipated, but there's no harm in waiting a couple days more, I think.

# sometimes we make up descriptive names (e.g. ash).
# In niche cases, we might have a molecule with a number
# in the middle (e.g. H2O). We ignore those.
if any(char.isdigit() for char in field) and field[-2].isdigit():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand what this branch is supposed to capture. Both examples in the comment (C12, H20) check field[-2].isdigit(), so they both go through, but what I understand from the comment is that C12 should go, and H2O shouldn't. What am I missing ?

Copy link
Contributor Author

@simonguichandut simonguichandut May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the strings in the plotfiles are X(C12), X(H2O). Previously the last character before the ")" parenthesis would always be a number, but not in the H2O case, which will raise an error in weight = int(weight) a few lines down. Note that this block of code is for the moment superfluous, element and weight are not used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should reflect that then. Ideally the code would be expressive enough that a comment wouldn't even be needed.
Are you saying the whole block is superfluous ? If so, I think it should be removed, or at least not touched in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was put in there so eventually we could use the element and weight variables (@zingale ?)

# Here we can, later, add number density using 'element' and
# 'weight' inferred above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only made this change so that the code wouldn't break when there is a species like H2O.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't the ideal fix be to extend the regexp so that it doesn't break for these cases ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mentioned this when I submitted the PR. I tried for a while but it was too difficult for me. Ideally you would also make the numbers subscripts instead of superscript for the tex label for molecules. Again, having molecules in the first place is a very niche use of castro/maestro...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for clarifying. I suggest we slightly rephrase the logging error message to make it clear that it's a missing feature, not an actual bug, so that readers might choose to give it a try too :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a brief something, but please feel free to edit

yt/frontends/boxlib/fields.py Outdated Show resolved Hide resolved
@neutrinoceros neutrinoceros added the enhancement Making something better label May 21, 2024
@neutrinoceros
Copy link
Member

pre-commit.ci autofix

# In niche cases, we might have a molecule with a number
# in the middle (e.g. X(H2O)). Check if the last character before
# the ) is a digit.
if field[-2].isdigit():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find this too fragile, but digging for a way to make it more robust, I think I ended up solving the problem of parsing molecules. Do you mind if I push to your branch ? I could also open a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(disclaimer: I'm attending a conference this week so I might not be able to follow up on this immediately)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, whatever you think is best

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I'll set myself a reminder to execute this by next week, thank you for your patience !

@neutrinoceros
Copy link
Member

I'm going to follow through my comment from last week now. First, I'll rebase the branch to resolve merge conflicts.

@neutrinoceros
Copy link
Member

Here goes nothing

@neutrinoceros neutrinoceros marked this pull request as draft May 29, 2024 17:31
@neutrinoceros neutrinoceros changed the title escape molecules for boxlib datasets ENH: add support for molecular fields (AMRex) May 29, 2024
Comment on lines 505 to 506
# Here we can, later, add number density using 'element' and
# 'weight' inferred above
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was refactoring this branch, I realized it had really been left in a noop "TODO" state for 10 years and I feel like it's not worth keeping it around given that it creates discussion around code that's actually not used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also remove the same code in CastroFieldInfo.setup_fluid_fields().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Actually a separable problem so I'll open another PR. Thank you !

@neutrinoceros
Copy link
Member

I ended up radically re-orienting the PR's implementation. Hopefully the behavior I implemented is suitable.

@neutrinoceros
Copy link
Member

Ok, CI is green now, so let's open back for review. @simonguichandut what do you think ?

@yut23
Copy link
Member

yut23 commented May 31, 2024

The TeX formatting is a bit wonky now: molecules and descriptive names are missing the "X" that the isotopes have, and omega dot for isotopes is rendered like \dot{\omega}[X(He4)]. I think it would be better to have Substance.to_tex() return just the rendered name (e.g. ^{4}He and C_{12}H_{24}) and have setup_fluid_fields() add the parentheses and X/\dot{\omega}.

@neutrinoceros
Copy link
Member

Good points ! Separation of concerns FTW ! better now ?

@yut23
Copy link
Member

yut23 commented May 31, 2024

Yes, much better. Can we also add the tex formatting to CastroFieldInfo?

@neutrinoceros
Copy link
Member

done !
I'll rebase the branch again if you're happy with it

yt/frontends/amrex/fields.py Outdated Show resolved Hide resolved
yut23
yut23 previously approved these changes Jun 3, 2024
@neutrinoceros
Copy link
Member

thanks @yut23 for your input ! Now I think we should give @simonguichandut a couple weeks to comments before we can merge.

return rf"^{{{count}}}{element}"

def _to_tex_molecule(self) -> str:
return "".join(rf"{element}_{{{count}}}" for element, count in self._spec)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tex label of molecules have superfluous "0" subscripts, e.g. H2O:
plt_InitData_X(H2O)

Changing to
return "".join(rf"{element}_{{{count if count!=0 else ''}}}" for element, count in self._spec)
produces the correct tex label

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch. I adjusted tests and fixed this bug as you suggested !

@simonguichandut
Copy link
Contributor Author

Looks great to me! I've tested on MAESTROeX plotfiles, with and without molecules, and it works as expected.

@neutrinoceros
Copy link
Member

neutrinoceros commented Jun 3, 2024

Awesome, I squashed all my commits together, let me know if I should do the same with yours !

@simonguichandut
Copy link
Contributor Author

Awesome, I squashed all my commits together, let me know if I should do the same with yours !

I'm not sure what this means/does, but either way I will delete my branch and sync with the main repo! :)

@neutrinoceros
Copy link
Member

I'm proposing to rewrite the branch history so it contains just one commit. With some tools (mainly git bisect) makes it more confortable to debug if we need to revisit this change in the future. The reason I asked is that I don't want to make your life harder in any way, but if it sounds like it wouldn't, so I'll just go ahead.

Co-authored-by: Clément Robert <cr52@protonmail.com>
Copy link
Member

@yut23 yut23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@zingale
Copy link
Member

zingale commented Jun 4, 2024

How does this handle something like "H2", with is either an isotope (deuterium) or a molecule (molecular H)?
We probably want to default to isotopes.

@simonguichandut
Copy link
Contributor Author

How does this handle something like "H2", with is either an isotope (deuterium) or a molecule (molecular H)? We probably want to default to isotopes.

H2 does get recognized as the isotope

@neutrinoceros
Copy link
Member

neutrinoceros commented Jun 4, 2024

Yes, that's intentional, because from what I understand molecules are rarely seen in this context, but it's a shame that there's no way to tell for sure from just looking at the string "H2".
@zingale I could solidify the current behavior by adding a test case for it if you'd like.

@zingale
Copy link
Member

zingale commented Jun 5, 2024

no, I think it's fine. Just wanted to double check.

@neutrinoceros
Copy link
Member

Alright let's just merge then. Thanks all !

@neutrinoceros neutrinoceros merged commit ed776d7 into yt-project:main Jun 5, 2024
13 checks passed
Copy link

welcome bot commented Jun 5, 2024

Hooray! Congratulations on your first merged pull request! We hope we keep seeing you around! 🎆

@neutrinoceros neutrinoceros added this to the 4.4.0 milestone Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code frontends Things related to specific frontends enhancement Making something better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants