Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yt-4.0 adding "gather" approach to arbitrary_grid and slice SPH pixelization routines #1828

Merged
merged 72 commits into from Jul 24, 2018

Conversation

Projects
None yet
4 participants
@AshKelly
Copy link
Member

commented Jun 10, 2018

PR Summary

The new PR adds the ability to use SPH gather smoothing on an arbitrary_grid or an on-axis slice plot

import yt
from yt.units import kpc

ds = yt.load('GadgetDiskGalaxy/snapshot_200.hdf5')
_, c = ds.find_max(('gas', 'density'))
width = 50 * kpc
field = ('gas', 'density')

ds.sph_smoothing_style = "gather"
ds.num_neighbors = 40

# slice
yt.SlicePlot(ds, 'x', field, center=c, width=width)

# arb grid
ag = ds.arbitrary_grid(c - width / 2,
                       c + width / 2,
                       [800]*2 + [1])
dens2 = ag[field][:, :, 0].d

PR Checklist

  • Code passes flake8 checker
  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

@AshKelly AshKelly changed the title [WIP] yt-4.0 adding "gather" approach to pixelization routines [WIP] yt-4.0 adding "gather" approach to SPH pixelization routines Jun 10, 2018

pixel_pids = np.zeros((xsize, ysize, zsize, n_neighbors),
dtype=int)
pixel_distances = np.zeros((xsize, ysize, zsize, n_neighbors),
dtype=float)

This comment has been minimized.

Copy link
@ngoldbaum

ngoldbaum Jun 18, 2018

Member

These might be big arrays for a large dimensions. Do you think there's a way to either avoid allocating these or only allocate them if they're needed? For example, scatter smoothing doesn't need to allocate these arrays.

I haven't thought terribly carefully about this, but I think ine way to avoid allocating this would be to write a completely separate function for the gather smoothing that loops over voxels rather than particles. Then you only need to allocate an n_neighbors length 1D array that you can reuse for each pixel.

This comment has been minimized.

Copy link
@AshKelly

AshKelly Jun 18, 2018

Author Member

That was pretty much what I had in mind. I was thinking about writing two seperate functions and having a wrapper that calls gather or scatter depending on the dataset properties.

I'm not very knowledgeable with cython, but is there a better way to allocate memory of this kind of size? are these arrays contiguous in memory?

This comment has been minimized.

Copy link
@ngoldbaum

ngoldbaum Jun 18, 2018

Member

This is how I'd allocate the memory if I needed to do it. You can explicitly mark them as C-contiguous in memory if you declare them as cdef np.float64_t[:,:,:,::1] and cdef np.float64_t[::1,:,:,:] for Fortran-contiguous. If you want to get fancier than that, see https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#specifying-more-general-memory-layouts.

But of course the simplest approach is not to allocate the memory at all :)

@AshKelly

This comment has been minimized.

Copy link
Member Author

commented Jun 19, 2018

Ah, that's interesting to know! I'll make that change in my next commit - just working on checking that this function returns sensible results

AshKelly added some commits Jun 9, 2018

added some name mangled functions to the bounded priority queue to de…
…al with a second array of particle id's so it is possible to know which neighbours are the closest
beginning to hook up the modified bounded priority queue to the cykdt…
…ree to find the nearest neighbours and their pids
further additions to the code which should be able to loop through pi…
…xels finding the nearest neighbours and their pids
fixed some bugs so the arb_grid scatter approach can be called direct…
…ly without error. also disabled normalization for comparison with the scatter appraoch. there seems to be a bug with the scatter and gather approach returning vastly different results

@AshKelly AshKelly force-pushed the AshKelly:yt-4.0-nn-list branch from 8ca50f8 to a45de72 Jun 21, 2018

fix bug with gather approach for the abitrary grid. There seems to be…
… a very slow copy process which occurs
@@ -1309,17 +1310,28 @@ def pixelize_sph_kernel_gather_arbitrary_grid(np.float64_t[:, :, :] buff,
kernel_func = get_kernel_func(kernel_name)

# generate the kdtree and find neighbours
# I think there is a better way of the passing than making a new array

This comment has been minimized.

Copy link
@AshKelly

AshKelly Jun 22, 2018

Author Member

@ngoldbaum can you recommend a better way of passing these memory views as a multi dimensional numpy array? i think this process is very slow!

left_edge=np.array([bounds[0], bounds[2], bounds[4]]),
right_edge=np.array([bounds[1], bounds[3], bounds[5]]),
periodic=np.array([False, False, False]),
leafsize=n_neighbors)

This comment has been minimized.

Copy link
@ngoldbaum

ngoldbaum Jun 22, 2018

Member

Why not use the global kdtree attached to the dataset instead of allocating a new one here?

In [1]: ds.parameter_filename
Out[1]: '/Users/goldbaum/Documents/test/GadgetDiskGalaxy/snapshot_200.hdf5'

In [2]: ds.index.kdtree
yt : [INFO     ] 2018-06-22 09:32:50,242 Allocating for 1.191e+07 particles
Initializing coarse index : 100%|████████████████| 19/19 [00:02<00:00, 10.81it/s]
Initializing refined index: 100%|████████████████| 19/19 [00:03<00:00,  6.02it/s]
yt : [INFO     ] 2018-06-22 09:32:57,243 Loading KDTree from snapshot_200.hdf5.kdtree
yt : [INFO     ] 2018-06-22 09:32:59,776 Detected hash mismatch, regenerating KDTree
yt : [INFO     ] 2018-06-22 09:32:59,873 Allocating KDTree for 4334546 particles
Out[2]: <cykdtree.kdtree.PyKDTree at 0x11f9af2c8>

See: https://github.com/yt-project/yt/blob/yt-4.0/yt/frontends/sph/data_structures.py#L125-L143

This comment has been minimized.

Copy link
@AshKelly

AshKelly Jun 22, 2018

Author Member

ah, thank you very much! I didn't realise this existed

@AshKelly AshKelly force-pushed the AshKelly:yt-4.0-nn-list branch from ca8a46e to 7cfff26 Jun 25, 2018

AshKelly added some commits Jun 26, 2018

restructure of the voxel pixelizers using the gather appraoch to take…
… in a tree and reduce the name mangling of the bounded priority queue functions
add the gather approach back now finally debugged - gather approach w…
…orking, requires wiring into the global kdtree
implemented the gather appraoch to the arbitrary_grid - this appears …
…to be working - however is NOT memory conservative
@AshKelly

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2018

hey @ngoldbaum I've added an example plot. Currently, I just load every particle which is obviously not memory conservative, but I thought we could discuss a better method tomorrow. The mapping approach doesn't seem too great to me

AshKelly added some commits Jul 2, 2018

@jzuhone

This comment has been minimized.

Copy link
Contributor

commented Jul 20, 2018

Here is a comparison testing using an Arepo dataset of mine (since smoothing lengths for Arepo are weird, one should not take this test too seriously).

Script is:

import yt
ds = yt.load("snapshot_300.hdf5")

#ds.sph_smoothing_style = "gather"
#ds.num_neighbors = 16

slc = yt.SlicePlot(ds, "z", [("gas","density")], width=(500.0,"kpc"), center="max")
slc.save()

Note that I comment and uncomment the two lines above out to produce the different plots.

With defaults:

snapshot_300_slice_z_density

With gather and ds.num_neighbors = 32 (the default):

snapshot_300_slice_z_density

Note that gather smooths it out a bit. Decreasing ds.num_neighbors = 16 helps out a bit:

snapshot_300_slice_z_density

The differences between these are all somewhat subtle, however. I think it's pretty good.

I should also note that when I ran this on the whole box (~40 Mpc) it was very slow, despite the fact that most of the particles are in the central region. There are ~20 million particles in this dataset.

This is just for information--I don't think that we should judge the merits of this PR (which is awesome!) based on this Arepo dataset, considering that support is still experimental and limited to my fork at the moment.

@AshKelly

This comment has been minimized.

Copy link
Member Author

commented Jul 20, 2018

Thanks for checking it out!

I'm actually shopping currently but will look at the plots when I'm home.

Can you tell which parts were slow, was it the initial KDTree build or the neighbor generating? There should be progress bars which say what's happening and expected times etc

EDIT: In my experience you get optimum neighbor searching efficiency when leaves contain around 1.5-2x the number of desired neighbors, so for 16 neighbors, with the default leafsize of 64, that is a bit sub optimum.

The other thing is, I can't tell from your reply whether or not it had to build the KDTree, this can take around 10 seconds (I think) for 20 million particles. We do have some loose plans to optimize a lot of this using "spatial" chunking, rather than "io" chunking in the future if the performance hit really is on the neighbor finding / interpolation.

I just tested on a different dataset with 6 million particles and the interpolation step is slower than I'd like due to a memory conservative approach which requires a random look up. Another issue was that the KDTree kept being rebuilt which took about 30 seconds - this a bug which I need to look into separately. if you reproduce this, I'll make an issue and look into it.

KDTree ~ 30 seconds to build (it should be loaded from memory almost instantly)
KNN ~ 10 seconds
Interpolating ~ 20 seconds ("spatial" chunking will slash this, but I can also look into other things)

yt : [INFO     ] 2018-07-20 20:17:10,916 Loading KDTree from snap_028_z000p000.0.hdf5.kdtree
yt : [INFO     ] 2018-07-20 20:17:26,542 Detected hash mismatch, regenerating KDTree
yt : [INFO     ] 2018-07-20 20:17:27,096 Allocating KDTree for 6381559 particles
Generating neighbor lists: 643939it [00:10, 64264.44it/s] 

Are you also getting the detected hash mismatch line?

@ngoldbaum

This comment has been minimized.

Copy link
Member

commented Jul 23, 2018

There are certainly things we can improve here related to performance but that should come in a second pass so we can include the functionality on the yt-4.0 branch.

I'd like to get at least one more review from e.g. @JBorrow, @qobilidop, @chummels, or maybe @brittonsmith before merging.

@JBorrow

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2018

@nathangoldbaum, after re-reading Price's paper and thinking about this a bit, I'm convinced that yes you should be using the 3D kernel for a slice plot.

Just to check: you do this by setting the 'pixels' at the slice plane, and then doing a neighbour search around them to get your Nngbs, then construct the density/weighted quantity you want for that pixel (as a 3D quantity)?

@ngoldbaum

This comment has been minimized.

Copy link
Member

commented Jul 23, 2018

Just to check: you do this by setting the 'pixels' at the slice plane, and then doing a neighbour search around them to get your Nngbs, then construct the density/weighted quantity you want for that pixel (as a 3D quantity)?

Yup, exactly.

@JBorrow

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2018

If this ends up in the yt-4.0 paper, it would be fantastic to have a good visualisation of this (the ones in Price's paper are not that great).

@AshKelly

This comment has been minimized.

Copy link
Member Author

commented Jul 23, 2018

I agree - the paper is a fantastic reference but definitely hard to follow at times. I find a lot of the supporting figures vague.

# Then we just transpose if the buffer x and y are
# different than the plot x and y
if y < x:
buff = buff.transpose()
else:
raise NotImplementedError(
"A pixelization routine has not been implemented for %s "

This comment has been minimized.

Copy link
@JBorrow

JBorrow Jul 23, 2018

Contributor

Could be converted to .format() syntax whilst you're here.

This comment has been minimized.

Copy link
@AshKelly

AshKelly Jul 23, 2018

Author Member

I guess it could be - i try to just keep these pull requests quite on task and only change things I need to for the feature/bug I'm working on

This comment has been minimized.

Copy link
@ngoldbaum

ngoldbaum Jul 23, 2018

Member

And in general both ways of formatting strings are OK in python code and will be supported indefinitely. I think the initial intention in Python 3.0 was to remove % formatted strings in favor of .format() but then they realized that would break way too much code. I actually prefer %-formatting for simple stuff.

This comment has been minimized.

Copy link
@JBorrow

JBorrow Jul 23, 2018

Contributor

Fair enough!

set_axes_range(&axes, -1)

dists = np.zeros((input_positions.shape[0], num_neigh), dtype="float64")
pids = np.zeros((input_positions.shape[0], num_neigh), dtype="int64")

This comment has been minimized.

Copy link
@JBorrow

JBorrow Jul 23, 2018

Contributor

Do these need to be signed?

This comment has been minimized.

Copy link
@AshKelly

AshKelly Jul 23, 2018

Author Member

They don't need to be, but conventionally they can. We start with an invalidated queue and heap set to -1. If for some reason we didn't find 60 nearest neighbors, i.e, if filling chunkwise we could store intermediate negative values into both of these arrays. That convention could be changed though.

This comment has been minimized.

Copy link
@JBorrow

JBorrow Jul 23, 2018

Contributor

Okay, just checking -- if you are using negative numbers then go ahead.

for xi in range(xsize):
for yi in range(ysize):
for zi in range(zsize):
h_j2 = dists[xi, yi, zi, 0]

This comment has been minimized.

Copy link
@JBorrow

JBorrow Jul 23, 2018

Contributor

Is this the smoothing length of j squared?

This comment has been minimized.

Copy link
@AshKelly

AshKelly Jul 23, 2018

Author Member

Yeah, despite my ambiguous naming the bounded priority queue actually stores the distances squared.

This comment has been minimized.

Copy link
@JBorrow

JBorrow Jul 23, 2018

Contributor

Okay, it might be worth making that a little clearer here. This is the distance squared? h is usually reserved for smoothing length -- I would use the notation r_ij_2 (normally we talk about interacting particle "i", yourself, with particle "j", another particle).

This comment has been minimized.

Copy link
@AshKelly

AshKelly Jul 23, 2018

Author Member

It is a smoothing length squared. We are saying that the smoothing length squared of the voxel is equal to distance squared to the furthest neighbor in its nearest neighbor list. Maybe this needs more commenting, after working with the code for a while, it's hard to tell which bits need explicit comments.

This comment has been minimized.

Copy link
@JBorrow

JBorrow Jul 23, 2018

Contributor

Great, I would like to have a few more comments here (or more descriptive variable names) as I mentioned in my other comment -- but other than that this looks good.

This comment has been minimized.

Copy link
@AshKelly

AshKelly Jul 23, 2018

Author Member

Now you've highlighted this region - I can see how that particular line can be confusing, it's a bit arbitrary. I'll add some more comments to this section.

particle = pids[xi, yi, zi, pi]

w_j = pmass[particle] / pdens[particle] / hsml[particle]**3
coeff = w_j * quantity_to_smooth[particle]

This comment has been minimized.

Copy link
@JBorrow

JBorrow Jul 23, 2018

Contributor

Sorry to be pedantic again -- but for readability we would typically reserve w for the kernel (i.e. the result of kernel_func * 1/h^3). I know the terminology in Price's paper mixes these, but I would always stick with eqn. 25-style notation.

I would replace this with

prefactor_ij = pmass[particle] / pdens[particle] / hsml[particle]**3
q_ij = math.sqrt(dists[xi, yi, zi, pi]*ih_j2)
smoothed_quantity_j = prefactor_ij * quantity_to_smooth[particle] * kernel_func(q_ij)

buff[xi, yi, zi] += smoothed_quantity_j 

Are these all just particle j's (i.e. the neighbour) quantities?

This comment has been minimized.

Copy link
@AshKelly

AshKelly Jul 23, 2018

Author Member

Yes, they are all quantities from the same neighbor.

That is a good argument, there is deviation from typical SPH notation. I think if I make this change, then it would be good to make that change to every pixelization function in this file to preserve consistency (currently, the notation in this function is exactly the same as every other function in this file). Thoughts on that @ngoldbaum?

@@ -1326,6 +1326,8 @@ def pixelize_sph_kernel_gather_arbitrary_grid(np.float64_t[:, :, :] buff,
for xi in range(xsize):
for yi in range(ysize):
for zi in range(zsize):
# we set the smoothing length squared of the voxel to the

This comment has been minimized.

Copy link
@AshKelly

AshKelly Jul 24, 2018

Author Member

added the explanation of this step @JBorrow

else:
coeff = w_j * hsml[j] * quantity_to_smooth[j] * _weight_field[j]
prefactor_j *= quantity_to_smooth[j] * _weight_field[j]

This comment has been minimized.

Copy link
@AshKelly

AshKelly Jul 24, 2018

Author Member

I removed the misleading use of w here, as this is a scatter smoothing method, the prefactor only depends on j, hence the name prefactor_j @JBorrow

This comment has been minimized.

Copy link
@JBorrow

JBorrow Jul 24, 2018

Contributor

I'm now much happier with this notation, good work @AshKelly!

This comment has been minimized.

Copy link
@AshKelly

AshKelly Jul 24, 2018

Author Member

Great, thank you very much!

@JBorrow

This comment has been minimized.

Copy link
Contributor

commented Jul 24, 2018

Now that the (minor) readability concerns have been addressed, I'm happy with (and really impressed by) this PR!

@ngoldbaum

This comment has been minimized.

Copy link
Member

commented Jul 24, 2018

Merging. The only test failure is an unrelated timeout.

Huge congrats Ash, this is a real achievement.

@ngoldbaum ngoldbaum merged commit 4e2c97f into yt-project:yt-4.0 Jul 24, 2018

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@AshKelly

This comment has been minimized.

Copy link
Member Author

commented Jul 24, 2018

Thank you both so much for comments and feedback! Also, thanks to @jzuhone for taking the time to give this a test.

It's great to get this merged! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.