Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert all arrays in a dataset into the same unit #45

Closed
mikapfl opened this issue Nov 9, 2020 · 8 comments · Fixed by #63
Closed

convert all arrays in a dataset into the same unit #45

mikapfl opened this issue Nov 9, 2020 · 8 comments · Fixed by #63

Comments

@mikapfl
Copy link
Contributor

mikapfl commented Nov 9, 2020

Hi,

If I have a quantified Dataset ds, trying to convert all arrays in the Dataset into the same unit yields an error:

>>> import xarray as xr
>>> import pint_xarray
>>> ds = xr.Dataset({'test': xr.DataArray([1,2,3], attrs={'units': 'kg'})}).pint.quantify()
>>> ds.pint.to('g')
~/.local/lib/python3.8/site-packages/pint_xarray/accessors.py in to(self, units, **unit_kwargs)
    639             b        (x) float64 <Quantity([-1000.  -750.  -500.  -250.     0.], 'gra...
    640         """
--> 641         units = either_dict_or_kwargs(units, unit_kwargs, "to")
    642 
    643         return conversion.convert_units(self.ds, units)

~/.local/lib/python3.8/site-packages/pint_xarray/accessors.py in either_dict_or_kwargs(positional, keywords, method_name)
     90     if positional is not None:
     91         if not is_dict_like(positional):
---> 92             raise ValueError(
     93                 f"the first argument to .{method_name} must be a dictionary"
     94             )

ValueError: the first argument to .to must be a dictionary

The workaround is to generate a temporary dictionary like so:

>>> ds.pint.to({x: 'g' for x in ds.keys()})
<xarray.Dataset>
Dimensions:  (dim_0: 3)
Dimensions without coordinates: dim_0
Data variables:
    test     (dim_0) float64 <Quantity([1000. 2000. 3000.], 'gram')>

However, I think that is kind of ugly. Given that converting a whole dataset to the same unit is rather intuitive and (I guess) common, I think it would be great if Dataset.pint.to supported this directly.

I could also have a look at writing a patch, but I'll be at pydata later this week, so I won't be able to look at this until next week.

Cheers,

Mika

@keewis
Copy link
Collaborator

keewis commented Nov 9, 2020

when designing this method I considered Dataset objects to rarely have a single shared dimensionality (the most obvious example is a single variable Dataset). I guess that is not true, especially outside the geosciences?

My only concern when implementing this is that the error reporting could be a pain: there's only a single exception per incompatible dimensionality so we'd have to check the compatibility separately.

I could also have a look at writing a patch

yes, please. And don't worry about being slow, we're not in a hurry to release the next version.

@TomNicholas
Copy link
Collaborator

Given that converting a whole dataset to the same unit is rather intuitive and (I guess) common

I'm not sure how common this is - single-variable datasets are normally just represented as DataArrays, and multi-variable datasets are (in my personal experience) normally variables with different units, but shared coordinates.

There might be another way to do this currently which is possibly slightly nicer too? Something like

for var in ds:
    var.pint.to('unit')

I would be interested to hear how you think the API could be improved though :)

@dcherian
Copy link

dcherian commented Nov 11, 2020

This is a job for map! ds.map(lambda x: x.pint.to("unit"))

@keewis
Copy link
Collaborator

keewis commented Nov 11, 2020

thanks for the hint, @dcherian, that does indeed work and would be a bit easier to remember than creating a temporary dict. If we go for this, it might be better to declare this a documentation bug and add the map trick to the examples section of the Dataset.pint.to docstring.

@mikapfl
Copy link
Contributor Author

mikapfl commented Nov 12, 2020

I'm not sure how common this is - single-variable datasets are normally just represented as DataArrays, and multi-variable datasets are (in my personal experience) normally variables with different units, but shared coordinates.

Maybe my use case (one Dataset of DataArrays each describing emissions of one gas) is also special, but I found I need to convert a whole multi-variable Dataset quite often, sometimes just to convert a Dataset into a DataArray, sometimes to normalize to a standard unit before 'dequantify'ing and storing the Dataset as a CSV.

@keewis
Copy link
Collaborator

keewis commented Nov 24, 2020

thanks for sharing your use case, @mikapfl, this makes it so much easier to decide whether or not to include this shortcut.

What do you think about the Dataset.map trick? It's definitely a bit more to write than the syntax you propose but doesn't require changing anything.

Just for reference, the reason why we have been so reluctant to add this is that if it turns out to be a bad decision later we would have to go through a deprecation cycle to remove it. Then again, the library is intentionally marked with the alpha trove classifier, so we would be able to keep the deprecation cycle fairly short. I guess we can try introducing the new syntax as a experimental feature, and roll back if we notice there's anything wrong with it. Thoughts, @xarray-contrib/pint-xarray-devs?

@mikapfl
Copy link
Contributor Author

mikapfl commented Nov 25, 2020

I think ds.map(lambda x: x.pint.to('unit')) is more readable than ds.pint.to({x: 'unit' for x in ds.keys()}), so just documenting that would also be fine. However, it was surprising to me that ds.pint.to('unit') didn't work, I instinctively always try that first before remembering I have to treat Datasets differently than DataArrays.

Either way you decide, I can prepare a pull request. (-:

@keewis
Copy link
Collaborator

keewis commented Feb 8, 2021

after consideration, I agree that making the API for Dataset.pint.to and DataArray.pint.to symmetric seems good for usability.

If you still need this feel free to send in a PR (I'm sorry for taking so long to decide)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants