New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating tests to use fake dataset instead of real data #1809
Conversation
In some cases, we might want to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks great and I like where this is going. I've left a few comments below, mostly to point out places where I'm worried using fake data might subtly change what functionality is getting tested and suggesting ways to make the fake data behave more like the real dataset you're replacing.
Some of the suggestions I made are a bit more involved than just tweaks of the current code, if you feel like I'm asking too much and you'd like to punt for later just let me know.
ds = load_octree(octree_mask=octree_mask, data=quantities, bbox=bbox, | ||
over_refine_factor=0, partial_coverage=0) | ||
cgrid = ds.covering_grid(0, left_edge=ds.domain_left_edge, | ||
dims=ds.domain_dimensions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call move this code into a new function in yt/testing.py
named fake_octree_ds
.
It might also be worthwhile to add a few more AMR levels to the fake octree dataset you're creating. Right now there are only 3 levels (the root oct is fully refined and two of the root oct's child octs are refined), maybe add two or three more AMR levels, just to make sure we're not testing with too simple of an AMR structure.
The format of octree_mask
can be a little confusing, there's a very nice explanation for how to encode an octree in this way in the docs for the hyperion code: http://docs.hyperion-rt.org/en/stable/advanced/indepth_oct.html
(we could probably improve on our own documentation by adapting that explanation into our docs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
scg = ds.smoothed_covering_grid(1, [0, 0, 0], [128, 128, 1]) | ||
assert_equal(scg['density'].shape, [128, 128, 1]) | ||
scg = ds.smoothed_covering_grid(1, [0.0, 0.0, 0.0], ds.domain_dimensions) | ||
assert_equal(scg['density'].shape, ds.domain_dimensions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this test is making sure that smoothed_covering_grid
properly works with 2D datasets.
One thing you could do is make fake_random_ds
take a dimensionality
keyword argument that lets it create 1D and 2D test datasets. That way you could test 1D, 2D, and 3D in this test relatively straightforwardly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the field ndims represents dimensionality
in fake_random_ds
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ndims
is a shorthand there for the number of grid points in the uniform grid fake_random_ds
creates. For example:
In [5]: ds = fake_random_ds(64)
In [6]: ds.dimensionality
Out[6]: 3
In [7]: ds.domain_dimensions
Out[7]: array([64, 64, 64])
So for ndims=64
, we get a 64^3 uniform resolution grid.
All that said, it looks like this actually works out of the box right now:
In [1]: import yt
In [2]: from yt.testing import fake_random_ds
In [3]: ds = fake_random_ds([32, 32, 1])
In [4]: ds.dimensionality
Out[4]: 2
And similarly:
In [8]: ds = fake_random_ds([64, 1, 1])
In [9]: ds.dimensionality
Out[9]: 1
So it looks like you don't need to modify fake_random_ds
after all.
yt/data_objects/tests/test_fluxes.py
Outdated
@@ -115,10 +109,18 @@ def test_correct_output_unit(): | |||
sur = ds.surface(sp1,"HI_Density", .5*Nmax) | |||
sur['x'][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can delete this original test now i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
yt/data_objects/tests/test_fluxes.py
Outdated
@@ -115,10 +109,18 @@ def test_correct_output_unit(): | |||
sur = ds.surface(sp1,"HI_Density", .5*Nmax) | |||
sur['x'][0] | |||
|
|||
@requires_file(ISOGAL) | |||
def test_correct_output_unit_fake_ds(): | |||
# implementing test_correct_output_unit() with fake dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you delete this comment but retain the comment in the original test referencing #1368?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
yt/data_objects/tests/test_fluxes.py
Outdated
def test_radius_surface(): | ||
# see #1407 | ||
ds = load(ISOGAL) | ||
ds = fake_random_ds(64, nprocs=4, particles=16**3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you set the length_unit
to a non-default value, something like length_unit=10
? That will make sure the units of code_length
are different from the default unit of the radius field ('cm'
), which is what this test is trying to look for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
_negative = (False, False, False, False, True, True, True, False, False, | ||
False) | ||
ds = fake_particle_ds(fields=_fields, units=_units, negative=_negative, | ||
npart=16 ** 2, length_unit=1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to use fake_random_ds
here, but tell it to include particle fields as well. By using a purely particle dataset here you're changing what this test is actually doing under the hood. It needs to be run using a dataset with both mesh and particle fields because the profile that gets created below gets binned using two mesh fields ('gas', 'temperature')
and ('gas', 'density')
, but that actual field that's being profiled is a deposited particle field. So you need both a mesh to deposit onto and particle fields to deposit onto the mesh.
Let me know if you run into trouble getting fake_random_ds
working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need help in implementing this test case with fake_random_ds
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at this example:
In [3]: ds = fake_random_ds(32, particle_fields=("particle_position_x", "particle_position_y", "particle_position_z", "particle_mass", "particle_velocity_x", "particle_velocity_y", "particle_velocity_z"), particle_field_units=('cm', 'cm', 'cm', 'g', 'cm/s', 'cm/s', 'cm/s'), particles=16)
In [4]: ds.particle_type_counts
Out[4]: {'io': 16}
In [5]: ad = ds.all_data()
In [6]: ad['gas', 'density']
Out[6]:
YTArray([0.76017901, 0.96855994, 0.49205428, ..., 0.78097504, 0.61756868,
0.27913556]) g/cm**3
In [7]: ad['gas', 'density'].shape
Out[7]: (32768,)
In [8]: ad['deposit', 'io_cic']
Out[8]: YTArray([0., 0., 0., ..., 0., 0., 0.]) g/cm**3
In [9]: ad['deposit', 'io_cic'].shape
Out[9]: (32768,)
In [11]: ad['deposit', 'io_mass'].sum()
Out[11]: 8.105905659452372 g
In [12]: ad['io', 'particle_mass'].sum()
Out[12]: 8.105905659452372 g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, pushed the changes.
@ngoldbaum Thanks for the detailed comments, this will definitely help me in updating the PR. |
@@ -444,6 +444,51 @@ def fake_vr_orientation_test_ds(N = 96, scale=1): | |||
ds = load_uniform_grid(data, arr.shape, bbox=bbox) | |||
return ds | |||
|
|||
|
|||
def construct_octree_mask(prng=RandomState(0x1d3d3d3), refined=[True]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just in case you didn't know where this comes from:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 good to know...
I would have kept it 0x4d3d3d3, if it had not generated a trivial mask.
fix small issue missed in #1809
Removed real data usage from test cases
PR Summary
Updated test cases to use
fake_particle_ds
orfake_random_ds
instead of real dataset and other such changes.This should increase coverage and reduce runtime (both by small amount).
PR Checklist