Ray object dl for SPH data: interpolate kernel table for (b/hsml)**2 instead of b/hsml #4783

nastasha-w · 2024-01-20T03:37:04Z

Fixes a (possible) bug in the calculation of path lengths dl for SPH/non-grid data in the YT Ray objects. This affects e.g., column density and absorption spectrum calculations, like those done in the Trident package.

PR Summary

Replace
dl = itab.interpolate_array(b / hsml) * mass / dens / hsml**2
by
dl = itab.interpolate_array((b / hsml)**2) * mass / dens / hsml**2
in the _generate_container_field_sph method of the YT Ray object (https://github.com/yt-project/yt/blob/main/yt/data_objects/selection_objects/ray.py).

This is the proposed fixed for the possible bug mentioned in issue #4781.
In short, the _generate_container_field_sph retrieves path lengths dl for normalized SPH kernels by linearly interpolating values calculated for a smallish set of (impact parameter / smoothing length) values. However, if I'm not mistaken, the table actually stores those dl values for different values of (impact parameter / smoothing length)^2. That means we need to input (b / hsml)**2 into the interpolation function, not (b / hsml).

PR Checklist

New features are documented, with docstrings and narrative docs
Adds a test for any bugs fixed. Adds tests for new features.

The only documentation issue I can see here is that if this was indeed a bug, we should probably send out some message warning people who might have used/be using Ray objects for papers about the issue. As for tests, @chummels metioned a possible test on the slack channel involving SPH-ifying a grid dataset and comparing the surface/column densities retrieved from both.

welcome · 2024-01-20T03:37:06Z

Hi! Welcome, and thanks for opening this pull request. We have some guidelines for new pull requests, and soon you'll hear back about the results of our tests and continuous integration checks. Thank you for your contribution!

matthewturk · 2024-01-23T17:30:09Z

I believe that the analysis in #4781 is correct! Thank you for your detailed report, @nastasha-w -- this is a great find, and very appreciated. I am going to mark approve, but I think for a change of this magnitude (even if it is just one line!) we do need another set of eyes.

neutrinoceros · 2024-01-23T17:40:12Z

This is completely out of my expertise so I'll leave it to others to review; I just wanted to ask to anybody who'd want to push the button to please triage this to 4.3.1 or 4.4.0 depending on the expected impact ! thanks !

nastasha-w · 2024-01-24T03:45:52Z

@matthewturk Thanks! It agree it's best to be sure about this. I did add a test for Ray objects to this pull request. I don't think it's entirely complete; for one thing, the mass / density factor in the dl calculation is always 1 in the test mock dataset I used. However, it does pass on this branch and fail on the main branch (2 main commits ago anyway).

The new test is in test_rays.py: test_ray_particle2(). It just uses a bunch of asserts, and returns None if all tests are ok. Is there anything else I need to do to incorporate this into automatic testing?

nastasha-w · 2024-07-19T20:27:01Z

pre-commit.ci autofix

nastasha-w · 2024-07-19T22:18:32Z

@matthewturk @jzuhone @langmm @chummels could you take a look at this pull request and approve it or give me some feedback? This slipped my mind for quite a while, but I do think there is an error in the code that this fixes. I merged in the current (well, very recent at least) main branch, and the tests are passing now.

nastasha-w · 2024-07-22T21:45:26Z

@neutrinoceros can I ask what the 4.4.0 milestone means? The github docs didn't seem very helpful on this.

neutrinoceros · 2024-07-23T08:48:08Z

It means I think it's reasonable to expect this will be merged before we release yt 4.4.0, but it's more of a wishlist and less of a hard requirement.

jzuhone · 2024-08-16T19:46:21Z

@nastasha-w I am satisfied with this. @chummels let us know what you think, @nastasha-w added some tests which I think are helpful.

nastasha-w · 2024-08-16T23:01:59Z

@jzuhone thanks!

jzuhone · 2024-09-06T19:31:30Z

@chummels we really need you to have a look at this. We should absolutely merge it early next week.

neutrinoceros · 2024-09-18T15:12:38Z

marking this as a blocker as per John's message on the mailing list.

nastasha-w · 2024-09-19T12:30:30Z

pre-commit.ci autofix

nastasha-w · 2024-09-19T12:36:44Z

The latest PR commit should close #4991, assuming the checks/tests pass
edit: based on @neutrinoceros's comment below: I meant commit, not PR.

neutrinoceros · 2024-09-19T16:41:48Z

The latest PR

genuine question: do you mean the current state of this PR ? (#4783)

neutrinoceros · 2024-09-19T16:43:03Z

nevermind, the message you wrote on the issue is unambiguous ... and this PR is already linked.

neutrinoceros · 2024-10-08T17:27:26Z

@nastasha-w could you rebase this on main to refresh CI please ?

nastasha-w · 2024-10-08T18:12:39Z

I seem to have completely messed something up here...github seems to now be attributing all the changes on main to this branch, and some of the pre-commit.ci comments suggest some files got messed up too.

Update: ok, the messed up file seems to have been one function that just got completely duplicated in the same file somewhere. That's fixed now.

See issue 4781 for a description of the (possible) bug this fixes.

nastasha-w · 2024-10-08T23:47:56Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

nastasha-w · 2024-10-09T00:00:27Z

Welp, I ended up just locally saving copies of the files I actually changed in a different directory and pretty much rebase-nuking the whole branch back to the first commit. It worked well enough given the small number of changes involved. This does still incorporate the issue #4991 solution, since in the new history, I jumped straight to importing the functions I needed for the Ray tests from the testing.py file on the current (5f44989) main branch.

I'm sorry to anyone whose reviews or tests got messed up.

chummels · 2024-10-31T19:29:10Z

OK, apologies for all of the delays on this. The main hold up with this PR on my side has been getting an old PR I implemented a few years back to work properly. I realize that code-wise, this PR is very minor, but it has significant implications for the behavior of the ray object, and thus, the entire Trident codebase. When I’ve done tests locally on FIRE datasets, I’m finding that it increases the values of the ray sampling by almost an order of magnitude (median value 8 ). Thus I want to ensure this is correct before merging.

The primary way I can envision testing this change to ensure consistent behavior between grid-based and particle-based snapshots, is simply to replicate a grid-based simulation snapshot into a particle-based simulation snapshot, apply the same code on both, and see if the PR gives the same results between the two. I wrote some code a few years ago that does just this, converting a grid-based sim output into a particle-based sim output, but it never ended up getting merged. PR #2187 . It works by simply monte carlo sampling a grid-based code into discrete particles, then loading those particles into a dataset using the load_particles() functionality. Amazingly, the PR basically works despite its age (one just has to change the import for the load_particles() call into yt.loaders ). One gets good behavior with this PR when one just increases the number of particles beyond the 1e5 val to 1e6 or greater.

The main challenge I’m facing is that when I send a ray object through a dataset that has been created using load_particles(), it does not populate the dts field, which is the relevant field which stores the “path length” of the ray through a fluid element (either a geometric path length in the case of grid-codes, or a pseudo-path length in the case of particle-based codes, which tries to capture something about the path through the smoothing kernel and proximity to the particle its sampling). So I’m having a difficult time making this direct comparison between grid- and particle-based codes to ensure that Nastasha’s PR is behaving correctly.

If anyone has any ideas on how to get datasets produced using load_particles() to behave properly with the ray object, I think this will be an easy test on whether the current PR works OK.

For reference, this is the code I'm trying to use:

import numpy as np
import yt
from yt.data_objects.data_containers import \
    monte_carlo_sample
fn = '~/src/yt-data/IsolatedGalaxy/galaxy0030/galaxy0030'
ds = yt.load(fn)
ds_part = monte_carlo_sample(ds, n_samples=int(1e5))

ray = ds.r[ds.domain_right_edge:ds.domain_left_edge]
ray_part = ds_part.r[ds_part.domain_right_edge:ds_part.domain_left_edge]

print(np.median(ray['dts']))
print(np.median(ray_part['dts']))

And the error that it invokes is:

Traceback (most recent call last):
  File "/Users/chummels/scratch/Nastasha/grid2part/grid2part.py", line 13, in <module>
    print(np.median(ray_part['dts']))
                    ~~~~~~~~^^^^^^^
  File "/Users/chummels/src/yt/yt/data_objects/data_containers.py", line 234, in __getitem__
    self.field_data[f] = self.ds.arr(self._generate_container_field(f))
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chummels/src/yt/yt/data_objects/selection_objects/ray.py", line 193, in _generate_container_field
    return self._generate_container_field_grid(field)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chummels/src/yt/yt/data_objects/selection_objects/ray.py", line 199, in _generate_container_field_grid
    return self._current_chunk.dtcoords
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chummels/src/yt/yt/geometry/geometry_handler.py", line 271, in cacheable_func
    tr = self._accumulate_values(n[1:])
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chummels/src/yt/yt/geometry/geometry_handler.py", line 306, in _accumulate_values
    f = getattr(obj, mname)
        ^^^^^^^^^^^^^^^^^^^
AttributeError: 'ParticleContainer' object has no attribute 'select_dtcoords'. Did you mean: 'select_fcoords'?

nastasha-w · 2024-10-31T21:38:41Z

@chummels yeah, I ran into that same problem in my own tests... I think the check on which dt/dts generation method to use doesn't include the StreamParticleDataset option in the things that would make something an SPH simulation, so it tries to use the grid methods, which then fail.
I just added those field by calling the SPH methods 'by hand':

  ds = fake_sph_grid_ds(hsml_factor=1.0)
  ds.kernel_name = "cubic"

  ## Ray through the one particle at (0.5, 0.5, 0.5):
  ## test basic kernel integration
  eps = 0.0  # 1e-7
  start0 = np.array((1.0 + eps, 0.0, 0.5))
  end0 = np.array((0.0, 1.0 + eps, 0.5))
  ray0 = ds.ray(start0, end0)
  b0 = np.array([np.sqrt(2.0) * eps])
  hsml0 = np.array([0.05])
  len0 = np.sqrt(np.sum((end0 - start0) ** 2))
  # for a ParticleDataset like this one, the Ray object attempts
  # to generate the 't' and 'dts' fields using the grid method
  ray0.field_data["t"] = ray0.ds.arr(ray0._generate_container_field_sph("t"))
  ray0.field_data["dts"] = ray0.ds.arr(ray0._generate_container_field_sph("dts"))

nastasha-w · 2024-10-31T21:51:44Z

Specifically, in the Ray object, the field generator chooses whether to use the SPH version of the field generator based on whether the dataset is an SPHDataset object. In other situations, like for the projection maps, the CartesianCoordinateHandler chooses whether to use the SPH methods based on

field = data_source._determine_fields(field)[0]
finfo = data_source.ds.field_info[field]
particle_datasets = (ParticleDataset, StreamParticlesDataset)
is_sph_field = finfo.is_sph_field
elif isinstance(data_source.ds, particle_datasets) and is_sph_field:
    # do SPH stuff

(Note that SPHDataset is a subclass of ParticleDataset.)

Would it be worth changing that bit of the code in Ray objects to make testing easier in the future?

chummels · 2024-11-01T06:05:28Z

@nastasha-w : Thanks for the suggestions! I tried to add in the "by-hand" call to generate the dts field, but I still failed, albeit with a different error. I guess for when you use LoadParticles() it creates a stream particle dataset, which lacks a kernel? Although I thought by specifying a smoothing length, it effectively has one already, so I'm confused. Error below. Maybe I am just doing this wrong, or maybe it's not fully implemented?

Traceback (most recent call last):
  File "/Users/chummels/scratch/Nastasha/grid2part/grid2part.py", line 12, in <module>
    ray_part.field_data["dts"] = ray_part.ds.arr(ray_part._generate_container_field_sph("dts"))
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chummels/src/yt/yt/data_objects/selection_objects/ray.py", line 225, in _generate_container_field_sph
    itab = SPHKernelInterpolationTable(self.ds.kernel_name)
                                       ^^^^^^^^^^^^^^^^^^^
AttributeError: 'StreamParticlesDataset' object has no attribute 'kernel_name'

nastasha-w · 2024-11-01T14:04:22Z

@chummels That's what this line is for:

 ds.kernel_name = "cubic"

the StreamParticleDataset doesn't automatically do a bunch of SPH stuff.

neutrinoceros · 2024-11-05T15:27:29Z

Guys, as much as it pains me to say this: if this is indeed a bugfix, it could easily be release in yt 4.4.1. Should this really block 4.4.0 ? we are more than a month beyond "schedule" (there's no actual schedule, but this finalizing yt 4.4.0 has been on my todo list for 2 months now and I'd really like to get it out the door).

chrishavlin · 2024-11-05T15:41:00Z

Guys, as much as it pains me to say this: if this is indeed a bugfix, it could easily be release in yt 4.4.1. Should this really block 4.4.0 ?

I think the motivation for getting this out in a feature release is that it significantly changes the output for these methods so it feels a bit bigger than a standard bugfix -- even though the api isn't breaking it is in some sense breaking expected behavior relative to prior behavior? that said IMO it'd be OK to release in 4.4.1 but it'd be good to send out some broader messages to the email list and slack to note the change when/if it goes in (which I'm happy to help send out). And if an email goes out for 4.4.0 it might be worth adding an extra note to bring attention to this PR just to put it on people's radar for 4.4.1.

jzuhone · 2024-11-05T17:59:51Z

@neutrinoceros this is a big enough bug it warrants inclusion in a feature release. We need to wait on 4.4.0 for this. I will chat with @chummels about this today.

neutrinoceros · 2024-11-05T20:45:45Z

I mean nothing forbids doing a 4.5.0 soon after 4.4.0.

jzuhone · 2024-11-06T03:10:45Z

@neutrinoceros I talked to @chummels today and I can predict with confidence that we can get this into 4.4.0

chummels · 2024-11-06T07:04:15Z

OK, I was at speaking at a dark sky festival all weekend, which precluded me from working on this until today. But thanks to @nastasha-w 's suggestion, I was able to get my testing code to work. As discussed above, I take a grid-based dataset and create a mirrored particle based dataset, populating it by use of a monte carlo method. Then I simply place a number of random ray objects in those two datasets, using the same coordinates for the ray start and end coordinates for each dataset. Lastly, I calculate the column density for each ray as a product of the gas density in each cell/particle intersected by the ray and the path length through the ray (i.e., the dts field). Lastly, I plot the probability distribution function of these rays' column densities and compare between the grid and particle datasets. I do this both before this PR and after this PR.

The column density distribution should be the same in the grid and the particle datasets, so we can use this test to identify whether the PR addresses the problem and whether there is a problem or not. I tested this out on two datasets, the IsolatedGalaxy dataset and the enzo_cosmology_plus dataset using 1000 random sightlines to define the ray objects. Here are the results:

IsolatedGalaxy

enzo_cosmology_plus

In both datasets, we see the same behavior, that prior to this PR, the bug revealed itself with particle-based datasets reporting column densities too low by a factor of 3 or greater. After the PR, these differences between particle- and grid-based datasets are statistically speaking gone. Thus, I am convinced that this PR addresses the bug, and it should absolutely be included in the codebase. Many apologies for all of the holdups on my end, and thank you for your patience. I just wanted to ensure that this was correct, as this will change previously published results for a number of papers that used Trident to generate spectra with particle-based datasets like FIRE, TNG, Eagle, etc. I think we need to have a conversation about how to report this bugfix to the community, but I don't want it to hold up this PR any longer.

Many many thanks go out to @nastasha-w for catching this subtle and deep bug, and for coming up with a way to resolve the problem. Great work!

nastasha-w · 2024-11-06T09:02:57Z

@chummels Thanks for doing this detailed testing!

neutrinoceros · 2024-11-06T13:44:00Z

thanks all !

I think we need to have a conversation about how to report this bugfix to the community, but I don't want it to hold up this PR any longer.

yes please. I usually just include the PR title in the release notes, except for highlights, and this definetly feels like one. I'd like to include a 2 or 3 lines summary of the change, though I trust it would be best handled by you guys.

jzuhone added the bug label Jan 20, 2024

jzuhone assigned matthewturk, jzuhone, chummels and langmm Jan 20, 2024

matthewturk previously approved these changes Jan 23, 2024

View reviewed changes

nastasha-w dismissed matthewturk’s stale review via 1fd0dc5 January 24, 2024 03:34

nastasha-w mentioned this pull request Jan 25, 2024

bug in Projection Plots for SPH datasets #4788

Closed

neutrinoceros added this to the 4.4.0 milestone Jul 20, 2024

nastasha-w mentioned this pull request Jul 22, 2024

Sph projections, slices, gridding: backend and off-axis #4939

Merged

7 tasks

jzuhone previously approved these changes Aug 16, 2024

View reviewed changes

nastasha-w mentioned this pull request Sep 14, 2024

CLN: remove duplicate functions in two SPH backend PRs #4991

Closed

neutrinoceros added the blocker Highest priority level label Sep 18, 2024

nastasha-w dismissed jzuhone’s stale review via 6f34755 September 19, 2024 12:26

chrishavlin linked an issue Sep 19, 2024 that may be closed by this pull request

CLN: remove duplicate functions in two SPH backend PRs #4991

Closed

nastasha-w added 2 commits October 9, 2024 01:40

interpolate kernel table for (b/hsml)**2 instead of b/hsml

e95d843

See issue 4781 for a description of the (possible) bug this fixes.

added test for SPH data Ray dl calc

1e17256

nastasha-w force-pushed the issue4781_sph_ray_dl branch from 7872048 to 1e17256 Compare October 8, 2024 23:46

[pre-commit.ci] auto fixes from pre-commit.com hooks

37e4b1e

for more information, see https://pre-commit.ci

chummels approved these changes Nov 6, 2024

View reviewed changes

jzuhone approved these changes Nov 6, 2024

View reviewed changes

jzuhone merged commit 7703ed5 into yt-project:main Nov 6, 2024
13 checks passed

neutrinoceros removed the blocker Highest priority level label Nov 6, 2024

chrishavlin mentioned this pull request Nov 6, 2024

possible bug in Ray objects: calculation of projected lengths dl for non-grid/SPH data #4781

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ray object dl for SPH data: interpolate kernel table for (b/hsml)**2 instead of b/hsml #4783

Ray object dl for SPH data: interpolate kernel table for (b/hsml)**2 instead of b/hsml #4783

nastasha-w commented Jan 20, 2024 •

edited

Loading

welcome bot commented Jan 20, 2024

matthewturk commented Jan 23, 2024

neutrinoceros commented Jan 23, 2024

nastasha-w commented Jan 24, 2024

nastasha-w commented Jul 19, 2024

nastasha-w commented Jul 19, 2024

nastasha-w commented Jul 22, 2024

neutrinoceros commented Jul 23, 2024

jzuhone commented Aug 16, 2024

nastasha-w commented Aug 16, 2024

jzuhone commented Sep 6, 2024

neutrinoceros commented Sep 18, 2024

nastasha-w commented Sep 19, 2024

nastasha-w commented Sep 19, 2024 •

edited

Loading

neutrinoceros commented Sep 19, 2024

neutrinoceros commented Sep 19, 2024 •

edited

Loading

neutrinoceros commented Oct 8, 2024

nastasha-w commented Oct 8, 2024 •

edited

Loading

nastasha-w commented Oct 8, 2024

nastasha-w commented Oct 9, 2024

chummels commented Oct 31, 2024

nastasha-w commented Oct 31, 2024

nastasha-w commented Oct 31, 2024

chummels commented Nov 1, 2024

nastasha-w commented Nov 1, 2024 •

edited

Loading

neutrinoceros commented Nov 5, 2024

chrishavlin commented Nov 5, 2024

jzuhone commented Nov 5, 2024

neutrinoceros commented Nov 5, 2024

jzuhone commented Nov 6, 2024

chummels commented Nov 6, 2024

nastasha-w commented Nov 6, 2024

neutrinoceros commented Nov 6, 2024

Ray object dl for SPH data: interpolate kernel table for (b/hsml)**2 instead of b/hsml #4783

Ray object dl for SPH data: interpolate kernel table for (b/hsml)**2 instead of b/hsml #4783

Conversation

nastasha-w commented Jan 20, 2024 • edited Loading

PR Summary

PR Checklist

welcome bot commented Jan 20, 2024

matthewturk commented Jan 23, 2024

neutrinoceros commented Jan 23, 2024

nastasha-w commented Jan 24, 2024

nastasha-w commented Jul 19, 2024

nastasha-w commented Jul 19, 2024

nastasha-w commented Jul 22, 2024

neutrinoceros commented Jul 23, 2024

jzuhone commented Aug 16, 2024

nastasha-w commented Aug 16, 2024

jzuhone commented Sep 6, 2024

neutrinoceros commented Sep 18, 2024

nastasha-w commented Sep 19, 2024

nastasha-w commented Sep 19, 2024 • edited Loading

neutrinoceros commented Sep 19, 2024

neutrinoceros commented Sep 19, 2024 • edited Loading

neutrinoceros commented Oct 8, 2024

nastasha-w commented Oct 8, 2024 • edited Loading

nastasha-w commented Oct 8, 2024

nastasha-w commented Oct 9, 2024

chummels commented Oct 31, 2024

nastasha-w commented Oct 31, 2024

nastasha-w commented Oct 31, 2024

chummels commented Nov 1, 2024

nastasha-w commented Nov 1, 2024 • edited Loading

neutrinoceros commented Nov 5, 2024

chrishavlin commented Nov 5, 2024

jzuhone commented Nov 5, 2024

neutrinoceros commented Nov 5, 2024

jzuhone commented Nov 6, 2024

chummels commented Nov 6, 2024

nastasha-w commented Nov 6, 2024

neutrinoceros commented Nov 6, 2024

nastasha-w commented Jan 20, 2024 •

edited

Loading

nastasha-w commented Sep 19, 2024 •

edited

Loading

neutrinoceros commented Sep 19, 2024 •

edited

Loading

nastasha-w commented Oct 8, 2024 •

edited

Loading

nastasha-w commented Nov 1, 2024 •

edited

Loading