Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose string representation when dequantifying #121

Closed
TomNicholas opened this issue Jul 9, 2021 · 11 comments · Fixed by #127 or #132
Closed

Choose string representation when dequantifying #121

TomNicholas opened this issue Jul 9, 2021 · 11 comments · Fixed by #127 or #132
Labels
documentation Improvements or additions to documentation

Comments

@TomNicholas
Copy link
Collaborator

Pint necessarily makes decisions about the "default" string representation of a unit, but at the moment that inhibits roundtripping:

In [15]: da = xr.DataArray([1,2,3], dims='x', attrs={'units': 'm'})

In [16]: q = da.pint.quantify()

In [17]: q.pint.dequantify()
Out[17]: 
<xarray.DataArray (x: 3)>
array([1, 2, 3])
Dimensions without coordinates: x
Attributes:
    units:    meter

The attr has gone from 'm' to 'meter'.

We should provide an optional argument to dequantify to allow users to specify what they want the units attribute to end up as, to allow them to restore their data to it's original state more easily.

This came up with @jbusecke and his CMIP6 data

@TomNicholas TomNicholas added the enhancement New feature or request label Jul 9, 2021
@keewis
Copy link
Collaborator

keewis commented Jul 9, 2021

Take a look at the format parameter of dequantify for this. Also, it should be possible to globally set the default_format attribute of a registry if you want to avoid setting it on every call.

Edit: if that doesn't work I'd be happy to help with debugging

This won't work for cfunits-style unit strings which would require changes to pint.

@TomNicholas
Copy link
Collaborator Author

I did try using format, and this happened:

In [6]: da = xr.DataArray([1,2,3], dims='x', attrs={'units': 'm'})

In [7]: q = da.pint.quantify()

In [8]: q.pint.dequantify(format='m')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-3cacb126d528> in <module>
----> 1 q.pint.dequantify(format='m')

~/Documents/Work/Code/pint-xarray/pint_xarray/accessors.py in dequantify(self, format)
    368         unit_format = f"{{:{format}}}" if isinstance(format, str) else format
    369 
--> 370         units = units_to_str_or_none(units, unit_format)
    371         return (
    372             self.da.pipe(conversion.strip_units)

~/Documents/Work/Code/pint-xarray/pint_xarray/accessors.py in units_to_str_or_none(mapping, unit_format)
     83     formatter = str if not unit_format else lambda v: unit_format.format(v)
     84 
---> 85     return {
     86         key: formatter(value) if isinstance(value, Unit) else value
     87         for key, value in mapping.items()

~/Documents/Work/Code/pint-xarray/pint_xarray/accessors.py in <dictcomp>(.0)
     84 
     85     return {
---> 86         key: formatter(value) if isinstance(value, Unit) else value
     87         for key, value in mapping.items()
     88     }

~/Documents/Work/Code/pint-xarray/pint_xarray/accessors.py in <lambda>(v)
     81 
     82 def units_to_str_or_none(mapping, unit_format):
---> 83     formatter = str if not unit_format else lambda v: unit_format.format(v)
     84 
     85     return {

~/Documents/Work/Code/pint/pint/unit.py in __format__(self, spec)
     92             units = self._units
     93 
---> 94         return format(units, spec)
     95 
     96     def format_babel(self, spec="", locale=None, **kwspec):

~/Documents/Work/Code/pint/pint/util.py in __format__(self, spec)
    451 
    452     def __format__(self, spec):
--> 453         return format_unit(self, spec)
    454 
    455     def format_babel(self, spec, **kwspec):

~/Documents/Work/Code/pint/pint/formatting.py in format_unit(unit, spec, **kwspec)
    266             return "dimensionless"
    267 
--> 268     spec = _parse_spec(spec)
    269     fmt = dict(_FORMATS[spec])
    270     fmt.update(kwspec)

~/Documents/Work/Code/pint/pint/formatting.py in _parse_spec(spec)
    253                 result = ch
    254         elif ch.isalpha():
--> 255             raise ValueError("Unknown conversion specified " + ch)
    256         else:
    257             break

ValueError: Unknown conversion specified m

Maybe that's not the right way to use the argument? The docs are quite vague though...

@keewis
Copy link
Collaborator

keewis commented Jul 9, 2021

right, that's a documentation bug. The valid values are pint format specifiers, the ones you could use in an f-string:

a = ureg.Quantity([3, 4], "m")
f"{a:P}"

I think there's the P (pretty), L (latex) and H (HTML) format specifiers and the # and ~ modifiers which select which name to use (f"{a:~P}" would return "m" while f"{a:P}" should return "meter").

This is pretty well hidden in the pint documentation, too (at least, I can't find it as quickly as I would like to), so I tend to use the source code as a reference. I guess I did know about this before, but I just realized that we (or rather, I) should try to make that more visible, and then link to that section in the dequantify docstrings.

You should get what you want using q.pint.dequantify(format="P") (or by setting ureg.default_format = "P").

Both options need to be documented and there should also be some examples.

@keewis keewis added documentation Improvements or additions to documentation and removed enhancement New feature or request labels Jul 9, 2021
@TomNicholas
Copy link
Collaborator Author

Ah, okay, thanks! I had no idea pint even had an option to do that.

There is no way I would have worked that out from the current documentation alone though, so it does need to at least have a link to the relevant section of pint's docs.

Might we still want the option to override the string being written out though? These 3 cases (P, L, H) can't possibly cover all the possible variations we might want to restore for round-tripping?

@keewis
Copy link
Collaborator

keewis commented Jul 9, 2021

see String formatting (hidden within the tutorial) for the docs. This does not mention the "#" modifier, though.

Allowing to override the units might make the code pretty complicated because we'd have to check that the units are identical, so I'd prefer adding more formats to pint. Or maybe we can allow passing a function to format?

@dcherian
Copy link

These 3 cases (P, L, H) can't possibly cover all the possible variations we might want to restore for round-tripping?

Hmmm... how do you restore the formatting exactly without storing it someplace (like encoding)? But then, you have another place to keep track of units and you lose the benefits of pint!

cfunits-style unit strings which would require changes to pint.

This "sounds" like what might be needed for full CF style units support with pint-xarray but I'm not sure. cc @jthielen

And if it can be set as part of the unit registry then cf_xarray could provide it.

@TomNicholas
Copy link
Collaborator Author

Hmmm... how do you restore the formatting exactly without storing it someplace (like encoding)? But then, you have another place to keep track of units and you lose the benefits of pint!

Yeah I'm not completely sure this makes sense either... What do you think @jbusecke?

@jthielen
Copy link
Collaborator

I agree that to restore the formatting exactly for round-tripping, the original string has to be stored somewhere. With that in mind, my inclination would be to build it into Pint. If the Quantity/Unit was defined with a string, that original string can then be stored as property, be accessed with a new format specifier (which presumably falls back to default), and if the Quantity is mutated, then that cached original unit string is invalidated. With pint-xarray, then all that is needed is to use that new format specifier.

Does this seem like a reasonable thing to propose upstream?

@keewis
Copy link
Collaborator

keewis commented Jul 24, 2021

reopening for the other issues.

@keewis keewis reopened this Jul 24, 2021
@keewis
Copy link
Collaborator

keewis commented Jul 24, 2021

@jthielen, I'm not quite sure this is the cleanest solution, but I'd bring it up on the pint issue tracker anyways to see if someone has a better idea.

Regarding the formats, I'd like to be able to register custom format specs, such that packages like cf-xarray can register a cfunits format spec (and set that as the registry's default format), which would convert e.g. Unit("m / s^2") to "m s-2".

Edit: that could also use the formatter mentioned in Ouranosinc/xclim#780

@keewis
Copy link
Collaborator

keewis commented Jul 27, 2022

the original issue has been addressed, if exact roundtripping is still a desired feature we should discuss that in a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
4 participants