Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling quantify() on already-quantified data arrays #47

Closed
Tracked by #80
lukelbd opened this issue Dec 24, 2020 · 5 comments · Fixed by #175
Closed
Tracked by #80

Calling quantify() on already-quantified data arrays #47

lukelbd opened this issue Dec 24, 2020 · 5 comments · Fixed by #175

Comments

@lukelbd
Copy link

lukelbd commented Dec 24, 2020

Currently, if data_array.pint.quantify() is called on an object that is already quantified, a ValueError is always raised:

if isinstance(self.da.data, Quantity):
raise ValueError(
f"Cannot attach unit {units} to quantity: data "
f"already has units {self.da.data.units}"
)

IMHO it would be preferable to have the following behavior changes when calling quantify on an already-quantified data arrays, to make things more robust:

  1. If units is None and data_array has no 'units' attribute, do nothing; just return a shallow copy. It makes sense to me if quantify behaves like an identify operator on already-quantified arrays. Sort of like np.array(np.array([1])).
  2. If units is not None, but they match the existing pint.Quantity units, behave like step 1.
  3. If units is not None, but they do not match the existing units, replace the units and emit a warning (or raise an error...?).
  4. If data_array has a 'units' attribute, but they match the existing pint.Quantity units, behave like step 1.
  5. If data_array has a 'units' attribute, but they do not match the existing units, replace the units and emit a warning (or raise an error...?).

Also, the following behavior might be useful whether or not the data is already quantified:

  1. If both units is not None and data_array has a 'units' attribute, ignore the attribute after emitting a warning.

Any objections to these changes? If I (...eventually) submit a PR implementing them, would it be accepted?

@lukelbd
Copy link
Author

lukelbd commented Dec 24, 2020

Note I feel pretty unsure about items 3 and 5. Open to discussion on all points.

@keewis
Copy link
Collaborator

keewis commented Jan 6, 2021

I'm sorry for the late reply, @lukelbd, and thanks for the suggestions.

I'm not sure about 1: I can sympathize with the desire to have a bare quantify return immediately if there are no units attributes. Unfortunately, it is still pretty easy to somehow drop all attributes (see pydata/xarray#3891), so having a bare quantify fail is somewhat of a safeguard against that. Of course, stops being an issue once xarray keeps attributes everywhere by default.

2 should be pretty uncontroversial, but I think 3 should still raise an error (if you really want to replace a unit, you can always dequantify first)

4 might make sense, although in that case I would drop the attribute: having a unit in both the attribute and the quantity means that the attribute can quickly get out-of-date. In order to allow roundtripping we should probably allow specifying the format of the string representation used in dequantify.

5 has the issues of 3 and 4 combined (it replaces the units using possibly out-of-date attributes), so I think this should stay an error.

My mental model of quantify is that it uses the attributes by default, but a user can also override them using the arguments. This is why I'm also not sure about 6: do we really need a warning if passing arguments to quantify means "I explicitly want to use that unit for this variable instead of a possibly existing attribute"?

Thoughts, @jthielen, @TomNicholas?

@lukelbd
Copy link
Author

lukelbd commented Jan 7, 2021

No worries! Just to clarify points 1-5 were all in reference to DataArrays that are quantified. So point 1 means if the underlying data is already pint.Quantity, and you call a bare quantify(), nothing should happen since the data is already quantified. Just like calling a bare dequantify() on non-pint.Quantity data should do nothing. Unless I misunderstood your objection? I agree that bare quantify() with no units attribute or units argument should fail if the underlying data is not pint.Quantity. The use case for this behavior would be ensuring the underlying data is pint.Quantity without prior knowledge (analogous to np.asarray).

Points 2 and 4 are thus no-ops in the same vein as point 1: A non-bare quantify() call should have the same result as a bare quantify() call if the units (either passed with an argument, or discovered in the 'units' attribute) are the same as the pint.Quantity units. Agree with you on point 4 that the 'units' attribute should be dropped.

Also agree with you that errors for points 3 and 5 probably make more sense than warnings.

Hadn't considered your point about 6 -- definitely makes sense!

@keewis
Copy link
Collaborator

keewis commented Jan 8, 2021

thanks for clarifying, I did misunderstand 1. Yes, if (some of) the data variables / coordinates are already quantified and there are no attributes we could have quantify return successfully.

@TomNicholas
Copy link
Collaborator

I like the idea of .quantify() acting like the identity operator unless you explicitly ask for incompatible units: I can easily imagine someone working with data from multiple sources and wanting to make sure they are all quantified by just applying .quantify() to all the xarray objects they have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants