Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unyt #2219

Merged
merged 72 commits into from
Jul 4, 2019
Merged

Unyt #2219

merged 72 commits into from
Jul 4, 2019

Conversation

ngoldbaum
Copy link
Member

Make yt use unyt.

WIP for now until I get the tests to pass.

Nathan Goldbaum added 30 commits June 22, 2018 13:02
Copy link
Member

@matthewturk matthewturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! This is a ton of work, and lots of awesome changes. Thank you for doing this, @ngoldbaum . I've left some comments (mostly questions) inline, but I'm quite happy to see this effort.

object has a large number of attributes corresponding to the names of units and
physical constants. All units known to the dataset will be available, including
custom units. In situations where you might have used ``ds.arr`` or ``ds.quan``
before, you can now safely use ``ds.units``:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -643,7 +653,7 @@ def save_as_dataset(self, filename=None, fields=None):

extra_attrs = dict([(arg, getattr(self, arg, None))
for arg in self._con_args + self._tds_attrs])
extra_attrs["con_args"] = self._con_args
extra_attrs["con_args"] = repr(self._con_args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why we need to do this -- _con_args is a tuple of strings, so it should serialize correctly in h5py without this. Now we're turning the tuple into a string that represents the tuple, yes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h5py serializes a tuple of strings as a string array, this was leading to a case where I wasn't able to round-trip con_args anymore. Using repr and eval seemed like the most straightforward way to take care of that.

@@ -186,8 +186,8 @@ def deposit(self, positions, fields = None, method = None,
op.initialize()
mylog.debug("Depositing %s (%s^3) particles into %s Octs",
positions.shape[0], positions.shape[0]**0.3333333, nvals[-1])
pos = np.asarray(positions.convert_to_units("code_length"),
dtype="float64")
positions.convert_to_units("code_length")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as long as positions is an internally-generated field, it will be safe to drop the float64. But I did have to stop and think about that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can add an astype('float64')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No no, I think astype will make a copy, and I think we're safe without it.

@@ -216,6 +217,70 @@ def __get__(self, instance, owner):
def __set__(self, instance, value):
self.data[instance] = value

class ConstantContainer(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these go into a separate file? And I think for my own understanding, just a comment about what each is supposed to do (doesn't have to be docstrings or anything extensive) would be helpful.

yt/data_objects/tests/test_dataset_access.py Show resolved Hide resolved
@@ -62,7 +64,7 @@ def _magnetic_field_strength(field, data):

def _magnetic_energy(field, data):
B = data[ftype,"magnetic_field_strength"]
return 0.5*B*B/mag_factors[B.units.dimensions]
return 0.5*B*B/mag_factors(B.units.dimensions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sometimes get confused by closures -- since mag_factors only has ds from the outer namespace, I think it might be safer to make it accept ds as an argument and put it at the outermost function level.

@@ -86,7 +86,7 @@ def _parse_parameter_file(self):
for key in f.attrs.keys():
v = parse_h5_attr(f, key)
if key == "con_args":
v = v.astype("str")
v = eval(v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the pair to the repr() call above, I think. I'm not totally sure this is safe. What's the use case for this particular set of changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explained above in the question about repr. Not sure what you mean by "safe" - are you talking about a security concern or something? This is all data that's internal to yt...

@@ -1068,7 +1147,7 @@ def set_units(self):
omega_lambda=self.omega_lambda,
omega_radiation=self.omega_radiation,
use_dark_factor = use_dark_factor,
w_0 = w_0, w_a = w_a)
w_0 = w_0, w_a = w_a, unit_registry=self.unit_registry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely have unintended consequences. Most of the cosmology calculations return values in the comoving frame but over multiple redshifts. The only way to make this possible was to equate the comoving and proper reference frames in the cosmology calculator's unit registry. I think the dataset and its cosmology calculator need to have distinct unit registries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - can you give me a test case where this would matter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you've already changed it, but PR #2230 adds a test for this.

@@ -86,7 +86,7 @@ def _parse_parameter_file(self):
for key in f.attrs.keys():
v = parse_h5_attr(f, key)
if key == "con_args":
v = v.astype("str")
v = eval(v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a backward incompatible change. Old datasets won't be loadable. Is there a reason for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to fix a test. The ytdata frontend is using the json representation of the unit registry, which I had to change in unyt so I don't think there's a clear way to load old registries written out by yt.units and load them with unyt again.

Are we really supporting loading ytdata datasets accross yt versions? There's no tests for that, for example...

@matthewturk
Copy link
Member

@ngoldbaum -- looks like the only remaining conflict is in appveyor.yml, and all existing tests pass. Does the current state work with older YTData outputs? If so, what else remains before we can merge this in?

@ngoldbaum
Copy link
Member Author

There would need to either be support in yt or unyt to load old unit registries. I was hoping to finish that off before I left NCSA but did not get it done unfortunately.

@matthewturk
Copy link
Member

@ngoldbaum Do you have a suggestion for how we can move forward? Is this something that could reasonably be implemented by someone else?

@ngoldbaum
Copy link
Member Author

I think so, yes. Help would be greatly appreciated. Support could either live in yt or unyt, whichever is easier.

@brittonsmith
Copy link
Member

We now have tests in master for loading old ytdata datasets, so we either have a block here or in merging yt-4.0 into master. I'd be happy to look into this, but I don't really know where to look. @ngoldbaum, would it be possible to lay out what needs to be done to make sure old ytdata is still loadable?

@ngoldbaum
Copy link
Member Author

The json unit registry loader in unyt would either need to be patched to be able to load old unit registries or there would need to be a shim in yt that can parse the old json unit registries and emit a new unyt UnitRegistry object.

@brittonsmith
Copy link
Member

Can you say a bit about how the registries are different?

@ngoldbaum
Copy link
Member Author

The relevant change is this one: yt-project/unyt@6283d20. Also take a look at the explanation in the pull request: yt-project/unyt#81

The change in the json output format is because we no longer use srepr to represent the symbol objects and instead just save the name of the symbol object. That means we go from saving e.g. Symbol('mass', positive=True) in the output to saving just mass.

With the old way of loading unit registries, sympy creates a distinct symbol object that satisfies == equality but not is equality. This is problematic because we use is equality a bunch of places inside unyt for speed. My fix was to make the dimensions in unyt.dimensions be singletons and use them as the namespace for restoring the unit registry json.

I think you'd need to do a similar trick with the old json objects, parsing e.g. Symbol('energy', positive=True) to just 'energy', and then either creating the unit registry manually or using the unyt loader.

I'm very sorry that I didn't do this myself and finish this before leaving. I'm pretty much completely burnt out on yt development and every time I tried to start on this I'd distract myself and feel guilty about it. I should have asked for help earlier and I'm sorry that I didn't.

@ngoldbaum
Copy link
Member Author

yt-project/unyt#93 should hopefully unblock this

@ngoldbaum ngoldbaum changed the title [WIP] Unyt Unyt Jul 2, 2019
@matthewturk
Copy link
Member

Looks like appveyor dies with:

======================================================================
FAIL: yt.frontends.stream.tests.test_stream_particles.test_load_particles_with_data_source
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Miniconda36-x64\envs\test\lib\site-packages\nose\case.py", line 197, in runTest
    self.test(*self.arg)
  File "C:\projects\yt\yt\frontends\stream\tests\test_stream_particles.py", line 338, in test_load_particles_with_data_source
    assert in_cgs(get_cu(ds1, dim)) == in_cgs(get_cu(ds2, dim))
AssertionError

I'm not entirely sure but maybe this is just a test that doesn't need to be run anymore?

@ngoldbaum
Copy link
Member Author

No the test probably needs to be fixed, it’s been a while since this PR was updated. I didn’t actually try running the test locally yet.

@ngoldbaum
Copy link
Member Author

I've opened yt-project/unyt#96 to fix the failing test. Unfortunately we're going to need another unyt version bump to get the fix here in yt.

@matthewturk
Copy link
Member

AWESOME!

Copy link
Member

@brittonsmith brittonsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this.

@matthewturk matthewturk merged commit e8f5030 into yt-project:yt-4.0 Jul 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants