Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitround Codec #299

Merged
merged 26 commits into from
May 18, 2022
Merged

Bitround Codec #299

merged 26 commits into from
May 18, 2022

Conversation

rabernat
Copy link
Contributor

@rabernat rabernat commented Dec 17, 2021

Eventually closes #298

TODO:

  • Unit tests and/or doctests in docstrings
  • tox -e py39 passes locally
  • Docstrings and API docs for any new/modified user-facing classes and functions
  • Changes documented in docs/release.rst
  • tox -e docs passes locally
  • GitHub Actions CI passes
  • Test coverage to 100% (Coveralls passes)

Comment on lines 21 to 23
maskbits = 23 - self.keepbits
mask = (0xFFFFFFFF >> maskbits) << maskbits
half_quantum1 = (1 << (maskbits - 1)) - 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section will error with ValueError: negative shift count if keepbits=23. That causes the test_no_rounding test to error.

Copy link

@rkouznetsov rkouznetsov Dec 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a feature. half_quantum1 is indeed inexpressible if your quantum is the least significant bit... I would just add return from the beginning or not call the sub at all if keepbits == 23. If keepbits > 23 (or < 0) you might wish to raise an exception...
What is the kind of failure the test is supposed to capture?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find a condition like keepbits >= 0 reasonable, but note that keepbits < 0 within reasonable limits (see below) shouldn't cause problems as this will just round the exponent bits. E.g.

julia> using BitInformation
julia> round(4f0,-1)
2.0f0

So in this format only ...,0.125,0.5,2,8,... are representable. However, as the exponent bits describe logarithmically distributed floats, round to nearest is now round to nearest in log-space. Meaning that while 4 is round down to 2 in the example above,

julia> round(4.1f0,-1)
8.0f0

although in lin-space 4.1 is closer to 2 than to 8. Sure, this has to be treated with caution, as NaN/Inf are defined through their exponents, meaning you end up with situations like

julia> round(NaN32,-1)
-0.0f0
julia> round(Inf32,-1)
-0.0f0

And the carry bit can propagate into the sign bit (for |x|>2), which is also weird

julia> round(4f0,-8)
-0.0f0
julia> round(-4f0,-8)
0.0f0

Comment on lines +62 to +70
def test_approx_equal(dtype):
a = np.random.random_sample((300, 200)).astype(dtype)
ar = round(a, APPROX_KEEPBITS[dtype])
# Mimic julia behavior - https://docs.julialang.org/en/v1/base/math/#Base.isapprox
rtol = np.sqrt(np.finfo(np.float32).eps)
# This gets us much closer but still failing for ~6% of the array
# It does pass if we add 1 to keepbits (11 instead of 10)
# Is there an off-by-one issue here?
np.testing.assert_allclose(a, ar, rtol=rtol)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this test passes if we use keepbits=11 instead of 10 makes me think we are dealing with a off-by-one issue, perhaps related to julia vs. python indexing

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I just guessed the tolerances here. The motivation for this test was less to test exactness, but just to flag immediately if rounding is completely off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then I will just bump keepbits to 11 for this test.

def encode(self, buf):
# TODO: figure out if we need to make a copy
# Currently this appears to be overwriting the input buffer
# Is that the right behavior?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In BitInformation.jl the rounding is implemented as scalar version which does not overwrite the input (as float32 is immutable so a copy is created anyway), however, I define a rounding function for arrays, that can either act in-place (i.e. overwriting the bits in an existing array) or acts on a copy of the array, such that the input array in unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. This is more a question about numcodecs (e.g. for @jakirkham), rather than about BitInformation.jl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we shouldn't be overwriting the input buffer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing

b = a.view(dtype=np.int32)

to

b = a.view(dtype=np.int32).copy()

to avoid the overwriting of the input buffer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a.astype(np.int32, copy=True) is more canonical.

def test_round_zero_to_zero(dtype):
a = np.zeros((3, 2), dtype=dtype)
# Don't understand Milan's original test:
# How is it possible to have negative keepbits?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just end up rounding the exponent bits, see other comment

@pep8speaks
Copy link

pep8speaks commented Jan 4, 2022

Hello @rabernat! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-05-18 17:50:38 UTC

@rabernat
Copy link
Contributor Author

rabernat commented Apr 7, 2022

I want to apologize for letting this PR stall. I wanted to get it started as a proof of concept. If anyone else is excited about this feature and wants to work on it, I absolutely invite you to take over my PR and continue pushing it forward. cc @aaronspring

@martindurant
Copy link
Member

It looks pretty complete - what more needs doing?

@rabernat
Copy link
Contributor Author

rabernat commented Apr 7, 2022

It looks pretty complete - what more needs doing?

  • Implement other dtypes
  • More thorough testing, to align with the way the other codecs are tests

martindurant and others added 6 commits April 7, 2022 14:34
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
@aaronspring
Copy link

I can add https://github.com/observingClouds/bitinformation_pipeline/blob/749623784286a04a54b6107fc8dd99d23d44f1b0/tests/test_bitround.py#L61 to testing, where I test this PR against Milan's bitinformation.round implementation in Julia.

@martindurant
Copy link
Member

I can add

Does that require adding julia into the tests?
It would be nice to test for float64 as well as float32. Maybe float16 too, although I can't really imagine that being relevant.

@aaronspring
Copy link

Sorry I meant I can add this to the discussion. Yes these tests run Julia code and shouldn’t be included here.

@martindurant
Copy link
Member

Yes these tests run Julia code and shouldn’t be included here.

For a small test data array, we can inline the expected values.

numcodecs/bitround.py Outdated Show resolved Hide resolved
Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing to push this forward Ryan and Martin! 😄

Sorry for being somewhat absent here. Though am happy to see others providing you feedback to improve on this work.

Had a couple minor comments below. Not attached to any particular code suggestions, but thinking we can do a bit of simplification in a few spots and provide a bit more context for future readers. Happy to clarify anything if needed 🙂

numcodecs/bitround.py Show resolved Hide resolved
numcodecs/bitround.py Outdated Show resolved Hide resolved
numcodecs/bitround.py Outdated Show resolved Hide resolved
numcodecs/bitround.py Outdated Show resolved Hide resolved
numcodecs/bitround.py Show resolved Hide resolved
numcodecs/bitround.py Outdated Show resolved Hide resolved
@martindurant
Copy link
Member

@jakirkham , I agree and adopted your changes.

numcodecs/bitround.py Outdated Show resolved Hide resolved
Co-authored-by: jakirkham <jakirkham@gmail.com>
@martindurant
Copy link
Member

Do we need to drop py36? Something's up with the environment.
cf #308 (for adding 3.10)

@martindurant
Copy link
Member

@jakirkham , any thoughts?

@aaronspring
Copy link

I think we can now rerun CI after merging from main

@joshmoore
Copy link
Member

Fixed the release conflicts.

@rsignell-usgs
Copy link

rsignell-usgs commented May 18, 2022

@rabernat, if I install this numcodecs PR, should I be able to just specify this in the encoding argument dict (somehow) in xarray's ds.to_zarr()?

@aaronspring
Copy link

See example in observingClouds/xbitinfo#75

@rsignell-usgs
Copy link

@aaronspring , awesome! That's exactly what I was looking for!

@martindurant
Copy link
Member

Will merge at the end of my day if there are no more comments.

Copy link
Contributor Author

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor docstring suggestion.

Thanks for seeing this through Martin!

numcodecs/bitround.py Outdated Show resolved Hide resolved
numcodecs/bitround.py Outdated Show resolved Hide resolved
martindurant and others added 2 commits May 18, 2022 13:50
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
@jakirkham jakirkham merged commit aab5392 into zarr-developers:master May 18, 2022
@jakirkham
Copy link
Member

Thanks Martin & Ryan and everyone who helped review! 🙏

rsignell-usgs added a commit to rsignell-usgs/xbitinfo that referenced this pull request May 19, 2022
no longer need to pull from rabernat branch, since zarr-developers/numcodecs#299
@joshmoore joshmoore mentioned this pull request Aug 1, 2022
7 tasks
@milankl
Copy link

milankl commented Oct 11, 2022 via email

@martindurant
Copy link
Member

Here is line 67: https://github.com/zarr-developers/numcodecs/blob/main/numcodecs/bitround.py#L67

The only occurrence of 23 I see in the code is the mapping of mantissa bits by dtype. @milankl , can you please link to the specific line you see or propose a PR to correct any problem?

@milankl
Copy link

milankl commented Oct 11, 2022

Sorry, I don't know why this post was sent now. I may have sent an email at the beginning of the year to contribute when the PR was still open, maybe this email got lost and only delivered now? Weird. The PR clearly contains the maskbits = bits - self.keepbits lines instead of a hardcoded 23, so consider my previous comment as irrelevant.

@martindurant
Copy link
Member

OK then! I'll blame the gremlins :)

@joshmoore
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support new "bitinformation" codec in numcodecs
10 participants