Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating tests to use fake dataset instead of real data #1809

Merged
merged 7 commits into from Jun 1, 2018

Conversation

git-abhishek
Copy link
Contributor

@git-abhishek git-abhishek commented May 30, 2018

Removed real data usage from test cases

PR Summary

Updated test cases to use fake_particle_ds or fake_random_ds instead of real dataset and other such changes.
This should increase coverage and reduce runtime (both by small amount).

PR Checklist

  • Code passes flake8 checker
  • New features are documented, with docstrings and narrative docs
  • Adds a test for any bugs fixed. Adds tests for new features.

@matthewturk
Copy link
Member

In some cases, we might want to use fake_amr_ds instead, as it will test out the machinery slightly differently. But, I think all of these usages are just great. Thank you!

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great and I like where this is going. I've left a few comments below, mostly to point out places where I'm worried using fake data might subtly change what functionality is getting tested and suggesting ways to make the fake data behave more like the real dataset you're replacing.

Some of the suggestions I made are a bit more involved than just tweaks of the current code, if you feel like I'm asking too much and you'd like to punt for later just let me know.

ds = load_octree(octree_mask=octree_mask, data=quantities, bbox=bbox,
over_refine_factor=0, partial_coverage=0)
cgrid = ds.covering_grid(0, left_edge=ds.domain_left_edge,
dims=ds.domain_dimensions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call move this code into a new function in yt/testing.py named fake_octree_ds.

It might also be worthwhile to add a few more AMR levels to the fake octree dataset you're creating. Right now there are only 3 levels (the root oct is fully refined and two of the root oct's child octs are refined), maybe add two or three more AMR levels, just to make sure we're not testing with too simple of an AMR structure.

The format of octree_mask can be a little confusing, there's a very nice explanation for how to encode an octree in this way in the docs for the hyperion code: http://docs.hyperion-rt.org/en/stable/advanced/indepth_oct.html

(we could probably improve on our own documentation by adapting that explanation into our docs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

scg = ds.smoothed_covering_grid(1, [0, 0, 0], [128, 128, 1])
assert_equal(scg['density'].shape, [128, 128, 1])
scg = ds.smoothed_covering_grid(1, [0.0, 0.0, 0.0], ds.domain_dimensions)
assert_equal(scg['density'].shape, ds.domain_dimensions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this test is making sure that smoothed_covering_grid properly works with 2D datasets.

One thing you could do is make fake_random_ds take a dimensionality keyword argument that lets it create 1D and 2D test datasets. That way you could test 1D, 2D, and 3D in this test relatively straightforwardly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the field ndims represents dimensionality in fake_random_ds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ndims is a shorthand there for the number of grid points in the uniform grid fake_random_ds creates. For example:

In [5]: ds = fake_random_ds(64)

In [6]: ds.dimensionality
Out[6]: 3

In [7]: ds.domain_dimensions
Out[7]: array([64, 64, 64])

So for ndims=64, we get a 64^3 uniform resolution grid.

All that said, it looks like this actually works out of the box right now:

In [1]: import yt

In [2]: from yt.testing import fake_random_ds

In [3]: ds = fake_random_ds([32, 32, 1])

In [4]: ds.dimensionality
Out[4]: 2

And similarly:

In [8]: ds = fake_random_ds([64, 1, 1])

In [9]: ds.dimensionality
Out[9]: 1

So it looks like you don't need to modify fake_random_ds after all.

@@ -115,10 +109,18 @@ def test_correct_output_unit():
sur = ds.surface(sp1,"HI_Density", .5*Nmax)
sur['x'][0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can delete this original test now i think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -115,10 +109,18 @@ def test_correct_output_unit():
sur = ds.surface(sp1,"HI_Density", .5*Nmax)
sur['x'][0]

@requires_file(ISOGAL)
def test_correct_output_unit_fake_ds():
# implementing test_correct_output_unit() with fake dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you delete this comment but retain the comment in the original test referencing #1368?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

def test_radius_surface():
# see #1407
ds = load(ISOGAL)
ds = fake_random_ds(64, nprocs=4, particles=16**3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you set the length_unit to a non-default value, something like length_unit=10? That will make sure the units of code_length are different from the default unit of the radius field ('cm'), which is what this test is trying to look for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

_negative = (False, False, False, False, True, True, True, False, False,
False)
ds = fake_particle_ds(fields=_fields, units=_units, negative=_negative,
npart=16 ** 2, length_unit=1.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to use fake_random_ds here, but tell it to include particle fields as well. By using a purely particle dataset here you're changing what this test is actually doing under the hood. It needs to be run using a dataset with both mesh and particle fields because the profile that gets created below gets binned using two mesh fields ('gas', 'temperature') and ('gas', 'density'), but that actual field that's being profiled is a deposited particle field. So you need both a mesh to deposit onto and particle fields to deposit onto the mesh.

Let me know if you run into trouble getting fake_random_ds working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need help in implementing this test case with fake_random_ds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at this example:

In [3]: ds = fake_random_ds(32, particle_fields=("particle_position_x", "particle_position_y", "particle_position_z", "particle_mass", "particle_velocity_x", "particle_velocity_y", "particle_velocity_z"), particle_field_units=('cm', 'cm', 'cm', 'g', 'cm/s', 'cm/s', 'cm/s'), particles=16)

In [4]: ds.particle_type_counts
Out[4]: {'io': 16}

In [5]: ad = ds.all_data()

In [6]: ad['gas', 'density']
Out[6]:
YTArray([0.76017901, 0.96855994, 0.49205428, ..., 0.78097504, 0.61756868,
         0.27913556]) g/cm**3

In [7]: ad['gas', 'density'].shape
Out[7]: (32768,)

In [8]: ad['deposit', 'io_cic']
Out[8]: YTArray([0., 0., 0., ..., 0., 0., 0.]) g/cm**3

In [9]: ad['deposit', 'io_cic'].shape
Out[9]: (32768,)

In [11]: ad['deposit', 'io_mass'].sum()
Out[11]: 8.105905659452372 g

In [12]: ad['io', 'particle_mass'].sum()
Out[12]: 8.105905659452372 g

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, pushed the changes.

@git-abhishek
Copy link
Contributor Author

@ngoldbaum Thanks for the detailed comments, this will definitely help me in updating the PR.

@@ -444,6 +444,51 @@ def fake_vr_orientation_test_ds(N = 96, scale=1):
ds = load_uniform_grid(data, arr.shape, bbox=bbox)
return ds


def construct_octree_mask(prng=RandomState(0x1d3d3d3), refined=[True]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case you didn't know where this comes from:

https://www.youtube.com/watch?v=XWX4GUYGQXQ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 good to know...
I would have kept it 0x4d3d3d3, if it had not generated a trivial mask.

@ngoldbaum ngoldbaum merged commit d3894d9 into yt-project:master Jun 1, 2018
ngoldbaum pushed a commit to ngoldbaum/yt that referenced this pull request Jun 1, 2018
@git-abhishek git-abhishek deleted the cc_data_objects branch June 1, 2018 15:57
ngoldbaum pushed a commit that referenced this pull request Jun 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants