Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray object dl for SPH data: interpolate kernel table for (b/hsml)**2 instead of b/hsml #4783

Merged
merged 3 commits into from
Nov 6, 2024

Conversation

nastasha-w
Copy link
Contributor

@nastasha-w nastasha-w commented Jan 20, 2024

Fixes a (possible) bug in the calculation of path lengths dl for SPH/non-grid data in the YT Ray objects. This affects e.g., column density and absorption spectrum calculations, like those done in the Trident package.

PR Summary

Replace
dl = itab.interpolate_array(b / hsml) * mass / dens / hsml**2
by
dl = itab.interpolate_array((b / hsml)**2) * mass / dens / hsml**2
in the _generate_container_field_sph method of the YT Ray object (https://github.com/yt-project/yt/blob/main/yt/data_objects/selection_objects/ray.py).

This is the proposed fixed for the possible bug mentioned in issue #4781.
In short, the _generate_container_field_sph retrieves path lengths dl for normalized SPH kernels by linearly interpolating values calculated for a smallish set of (impact parameter / smoothing length) values. However, if I'm not mistaken, the table actually stores those dl values for different values of (impact parameter / smoothing length)^2. That means we need to input (b / hsml)**2 into the interpolation function, not (b / hsml).

PR Checklist

  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

The only documentation issue I can see here is that if this was indeed a bug, we should probably send out some message warning people who might have used/be using Ray objects for papers about the issue. As for tests, @chummels metioned a possible test on the slack channel involving SPH-ifying a grid dataset and comparing the surface/column densities retrieved from both.

Copy link

welcome bot commented Jan 20, 2024

Hi! Welcome, and thanks for opening this pull request. We have some guidelines for new pull requests, and soon you'll hear back about the results of our tests and continuous integration checks. Thank you for your contribution!

@matthewturk
Copy link
Member

I believe that the analysis in #4781 is correct! Thank you for your detailed report, @nastasha-w -- this is a great find, and very appreciated. I am going to mark approve, but I think for a change of this magnitude (even if it is just one line!) we do need another set of eyes.

matthewturk
matthewturk previously approved these changes Jan 23, 2024
@neutrinoceros
Copy link
Member

This is completely out of my expertise so I'll leave it to others to review; I just wanted to ask to anybody who'd want to push the button to please triage this to 4.3.1 or 4.4.0 depending on the expected impact ! thanks !

@nastasha-w
Copy link
Contributor Author

@matthewturk Thanks! It agree it's best to be sure about this. I did add a test for Ray objects to this pull request. I don't think it's entirely complete; for one thing, the mass / density factor in the dl calculation is always 1 in the test mock dataset I used. However, it does pass on this branch and fail on the main branch (2 main commits ago anyway).

The new test is in test_rays.py: test_ray_particle2(). It just uses a bunch of asserts, and returns None if all tests are ok. Is there anything else I need to do to incorporate this into automatic testing?

@nastasha-w
Copy link
Contributor Author

pre-commit.ci autofix

@nastasha-w
Copy link
Contributor Author

@matthewturk @jzuhone @langmm @chummels could you take a look at this pull request and approve it or give me some feedback? This slipped my mind for quite a while, but I do think there is an error in the code that this fixes. I merged in the current (well, very recent at least) main branch, and the tests are passing now.

@neutrinoceros neutrinoceros added this to the 4.4.0 milestone Jul 20, 2024
@nastasha-w
Copy link
Contributor Author

@neutrinoceros can I ask what the 4.4.0 milestone means? The github docs didn't seem very helpful on this.

@neutrinoceros
Copy link
Member

It means I think it's reasonable to expect this will be merged before we release yt 4.4.0, but it's more of a wishlist and less of a hard requirement.

jzuhone
jzuhone previously approved these changes Aug 16, 2024
@jzuhone
Copy link
Contributor

jzuhone commented Aug 16, 2024

@nastasha-w I am satisfied with this. @chummels let us know what you think, @nastasha-w added some tests which I think are helpful.

@nastasha-w
Copy link
Contributor Author

@jzuhone thanks!

@jzuhone
Copy link
Contributor

jzuhone commented Sep 6, 2024

@chummels we really need you to have a look at this. We should absolutely merge it early next week.

@neutrinoceros
Copy link
Member

marking this as a blocker as per John's message on the mailing list.

@nastasha-w
Copy link
Contributor Author

pre-commit.ci autofix

@nastasha-w
Copy link
Contributor Author

nastasha-w commented Sep 19, 2024

The latest PR commit should close #4991, assuming the checks/tests pass
edit: based on @neutrinoceros's comment below: I meant commit, not PR.

@chrishavlin chrishavlin linked an issue Sep 19, 2024 that may be closed by this pull request
@neutrinoceros
Copy link
Member

The latest PR

genuine question: do you mean the current state of this PR ? (#4783)

@neutrinoceros
Copy link
Member

neutrinoceros commented Sep 19, 2024

nevermind, the message you wrote on the issue is unambiguous ... and this PR is already linked.

@neutrinoceros
Copy link
Member

@nastasha-w could you rebase this on main to refresh CI please ?

@nastasha-w
Copy link
Contributor Author

nastasha-w commented Oct 8, 2024

I seem to have completely messed something up here...github seems to now be attributing all the changes on main to this branch, and some of the pre-commit.ci comments suggest some files got messed up too.

Update: ok, the messed up file seems to have been one function that just got completely duplicated in the same file somewhere. That's fixed now.

@nastasha-w
Copy link
Contributor Author

pre-commit.ci autofix

@nastasha-w
Copy link
Contributor Author

Welp, I ended up just locally saving copies of the files I actually changed in a different directory and pretty much rebase-nuking the whole branch back to the first commit. It worked well enough given the small number of changes involved. This does still incorporate the issue #4991 solution, since in the new history, I jumped straight to importing the functions I needed for the Ray tests from the testing.py file on the current (5f44989) main branch.

I'm sorry to anyone whose reviews or tests got messed up.

@chummels
Copy link
Member

OK, apologies for all of the delays on this. The main hold up with this PR on my side has been getting an old PR I implemented a few years back to work properly. I realize that code-wise, this PR is very minor, but it has significant implications for the behavior of the ray object, and thus, the entire Trident codebase. When I’ve done tests locally on FIRE datasets, I’m finding that it increases the values of the ray sampling by almost an order of magnitude (median value 8 ). Thus I want to ensure this is correct before merging.

The primary way I can envision testing this change to ensure consistent behavior between grid-based and particle-based snapshots, is simply to replicate a grid-based simulation snapshot into a particle-based simulation snapshot, apply the same code on both, and see if the PR gives the same results between the two. I wrote some code a few years ago that does just this, converting a grid-based sim output into a particle-based sim output, but it never ended up getting merged. PR #2187 . It works by simply monte carlo sampling a grid-based code into discrete particles, then loading those particles into a dataset using the load_particles() functionality. Amazingly, the PR basically works despite its age (one just has to change the import for the load_particles() call into yt.loaders ). One gets good behavior with this PR when one just increases the number of particles beyond the 1e5 val to 1e6 or greater.

The main challenge I’m facing is that when I send a ray object through a dataset that has been created using load_particles(), it does not populate the dts field, which is the relevant field which stores the “path length” of the ray through a fluid element (either a geometric path length in the case of grid-codes, or a pseudo-path length in the case of particle-based codes, which tries to capture something about the path through the smoothing kernel and proximity to the particle its sampling). So I’m having a difficult time making this direct comparison between grid- and particle-based codes to ensure that Nastasha’s PR is behaving correctly.

If anyone has any ideas on how to get datasets produced using load_particles() to behave properly with the ray object, I think this will be an easy test on whether the current PR works OK.

For reference, this is the code I'm trying to use:

import numpy as np
import yt
from yt.data_objects.data_containers import \
    monte_carlo_sample
fn = '~/src/yt-data/IsolatedGalaxy/galaxy0030/galaxy0030'
ds = yt.load(fn)
ds_part = monte_carlo_sample(ds, n_samples=int(1e5))

ray = ds.r[ds.domain_right_edge:ds.domain_left_edge]
ray_part = ds_part.r[ds_part.domain_right_edge:ds_part.domain_left_edge]

print(np.median(ray['dts']))
print(np.median(ray_part['dts']))

And the error that it invokes is:

Traceback (most recent call last):
  File "/Users/chummels/scratch/Nastasha/grid2part/grid2part.py", line 13, in <module>
    print(np.median(ray_part['dts']))
                    ~~~~~~~~^^^^^^^
  File "/Users/chummels/src/yt/yt/data_objects/data_containers.py", line 234, in __getitem__
    self.field_data[f] = self.ds.arr(self._generate_container_field(f))
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chummels/src/yt/yt/data_objects/selection_objects/ray.py", line 193, in _generate_container_field
    return self._generate_container_field_grid(field)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chummels/src/yt/yt/data_objects/selection_objects/ray.py", line 199, in _generate_container_field_grid
    return self._current_chunk.dtcoords
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chummels/src/yt/yt/geometry/geometry_handler.py", line 271, in cacheable_func
    tr = self._accumulate_values(n[1:])
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chummels/src/yt/yt/geometry/geometry_handler.py", line 306, in _accumulate_values
    f = getattr(obj, mname)
        ^^^^^^^^^^^^^^^^^^^
AttributeError: 'ParticleContainer' object has no attribute 'select_dtcoords'. Did you mean: 'select_fcoords'?

@nastasha-w
Copy link
Contributor Author

@chummels yeah, I ran into that same problem in my own tests... I think the check on which dt/dts generation method to use doesn't include the StreamParticleDataset option in the things that would make something an SPH simulation, so it tries to use the grid methods, which then fail.
I just added those field by calling the SPH methods 'by hand':

  ds = fake_sph_grid_ds(hsml_factor=1.0)
  ds.kernel_name = "cubic"

  ## Ray through the one particle at (0.5, 0.5, 0.5):
  ## test basic kernel integration
  eps = 0.0  # 1e-7
  start0 = np.array((1.0 + eps, 0.0, 0.5))
  end0 = np.array((0.0, 1.0 + eps, 0.5))
  ray0 = ds.ray(start0, end0)
  b0 = np.array([np.sqrt(2.0) * eps])
  hsml0 = np.array([0.05])
  len0 = np.sqrt(np.sum((end0 - start0) ** 2))
  # for a ParticleDataset like this one, the Ray object attempts
  # to generate the 't' and 'dts' fields using the grid method
  ray0.field_data["t"] = ray0.ds.arr(ray0._generate_container_field_sph("t"))
  ray0.field_data["dts"] = ray0.ds.arr(ray0._generate_container_field_sph("dts"))

@nastasha-w
Copy link
Contributor Author

Specifically, in the Ray object, the field generator chooses whether to use the SPH version of the field generator based on whether the dataset is an SPHDataset object. In other situations, like for the projection maps, the CartesianCoordinateHandler chooses whether to use the SPH methods based on

field = data_source._determine_fields(field)[0]
finfo = data_source.ds.field_info[field]
particle_datasets = (ParticleDataset, StreamParticlesDataset)
is_sph_field = finfo.is_sph_field
elif isinstance(data_source.ds, particle_datasets) and is_sph_field:
    # do SPH stuff

(Note that SPHDataset is a subclass of ParticleDataset.)

Would it be worth changing that bit of the code in Ray objects to make testing easier in the future?

@chummels
Copy link
Member

chummels commented Nov 1, 2024

@nastasha-w : Thanks for the suggestions! I tried to add in the "by-hand" call to generate the dts field, but I still failed, albeit with a different error. I guess for when you use LoadParticles() it creates a stream particle dataset, which lacks a kernel? Although I thought by specifying a smoothing length, it effectively has one already, so I'm confused. Error below. Maybe I am just doing this wrong, or maybe it's not fully implemented?

Traceback (most recent call last):
  File "/Users/chummels/scratch/Nastasha/grid2part/grid2part.py", line 12, in <module>
    ray_part.field_data["dts"] = ray_part.ds.arr(ray_part._generate_container_field_sph("dts"))
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chummels/src/yt/yt/data_objects/selection_objects/ray.py", line 225, in _generate_container_field_sph
    itab = SPHKernelInterpolationTable(self.ds.kernel_name)
                                       ^^^^^^^^^^^^^^^^^^^
AttributeError: 'StreamParticlesDataset' object has no attribute 'kernel_name'

@nastasha-w
Copy link
Contributor Author

nastasha-w commented Nov 1, 2024

@chummels That's what this line is for:

 ds.kernel_name = "cubic"

the StreamParticleDataset doesn't automatically do a bunch of SPH stuff.

@neutrinoceros
Copy link
Member

Guys, as much as it pains me to say this: if this is indeed a bugfix, it could easily be release in yt 4.4.1. Should this really block 4.4.0 ? we are more than a month beyond "schedule" (there's no actual schedule, but this finalizing yt 4.4.0 has been on my todo list for 2 months now and I'd really like to get it out the door).

@chrishavlin
Copy link
Contributor

Guys, as much as it pains me to say this: if this is indeed a bugfix, it could easily be release in yt 4.4.1. Should this really block 4.4.0 ?

I think the motivation for getting this out in a feature release is that it significantly changes the output for these methods so it feels a bit bigger than a standard bugfix -- even though the api isn't breaking it is in some sense breaking expected behavior relative to prior behavior? that said IMO it'd be OK to release in 4.4.1 but it'd be good to send out some broader messages to the email list and slack to note the change when/if it goes in (which I'm happy to help send out). And if an email goes out for 4.4.0 it might be worth adding an extra note to bring attention to this PR just to put it on people's radar for 4.4.1.

@jzuhone
Copy link
Contributor

jzuhone commented Nov 5, 2024

@neutrinoceros this is a big enough bug it warrants inclusion in a feature release. We need to wait on 4.4.0 for this. I will chat with @chummels about this today.

@neutrinoceros
Copy link
Member

I mean nothing forbids doing a 4.5.0 soon after 4.4.0.

@jzuhone
Copy link
Contributor

jzuhone commented Nov 6, 2024

@neutrinoceros I talked to @chummels today and I can predict with confidence that we can get this into 4.4.0

@chummels
Copy link
Member

chummels commented Nov 6, 2024

OK, I was at speaking at a dark sky festival all weekend, which precluded me from working on this until today. But thanks to @nastasha-w 's suggestion, I was able to get my testing code to work. As discussed above, I take a grid-based dataset and create a mirrored particle based dataset, populating it by use of a monte carlo method. Then I simply place a number of random ray objects in those two datasets, using the same coordinates for the ray start and end coordinates for each dataset. Lastly, I calculate the column density for each ray as a product of the gas density in each cell/particle intersected by the ray and the path length through the ray (i.e., the dts field). Lastly, I plot the probability distribution function of these rays' column densities and compare between the grid and particle datasets. I do this both before this PR and after this PR.

The column density distribution should be the same in the grid and the particle datasets, so we can use this test to identify whether the PR addresses the problem and whether there is a problem or not. I tested this out on two datasets, the IsolatedGalaxy dataset and the enzo_cosmology_plus dataset using 1000 random sightlines to define the ray objects. Here are the results:

IsolatedGalaxy
grid
particles
before
after

enzo_cosmology_plus
grid
particles
before
after

In both datasets, we see the same behavior, that prior to this PR, the bug revealed itself with particle-based datasets reporting column densities too low by a factor of 3 or greater. After the PR, these differences between particle- and grid-based datasets are statistically speaking gone. Thus, I am convinced that this PR addresses the bug, and it should absolutely be included in the codebase. Many apologies for all of the holdups on my end, and thank you for your patience. I just wanted to ensure that this was correct, as this will change previously published results for a number of papers that used Trident to generate spectra with particle-based datasets like FIRE, TNG, Eagle, etc. I think we need to have a conversation about how to report this bugfix to the community, but I don't want it to hold up this PR any longer.

Many many thanks go out to @nastasha-w for catching this subtle and deep bug, and for coming up with a way to resolve the problem. Great work!

@nastasha-w
Copy link
Contributor Author

@chummels Thanks for doing this detailed testing!

@jzuhone jzuhone merged commit 7703ed5 into yt-project:main Nov 6, 2024
13 checks passed
@neutrinoceros neutrinoceros removed the blocker Highest priority level label Nov 6, 2024
@neutrinoceros
Copy link
Member

thanks all !

I think we need to have a conversation about how to report this bugfix to the community, but I don't want it to hold up this PR any longer.

yes please. I usually just include the PR title in the release notes, except for highlights, and this definetly feels like one. I'd like to include a 2 or 3 lines summary of the change, though I trust it would be best handled by you guys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLN: remove duplicate functions in two SPH backend PRs
7 participants