Bitround Codec #299

rabernat · 2021-12-17T16:31:10Z

Eventually closes #298

TODO:

Unit tests and/or doctests in docstrings
tox -e py39 passes locally
Docstrings and API docs for any new/modified user-facing classes and functions
Changes documented in docs/release.rst
tox -e docs passes locally
GitHub Actions CI passes
Test coverage to 100% (Coveralls passes)

rabernat · 2021-12-17T16:34:32Z

numcodecs/bitround.py

+        maskbits = 23 - self.keepbits
+        mask = (0xFFFFFFFF >> maskbits) << maskbits
+        half_quantum1 = (1 << (maskbits - 1)) - 1


This section will error with ValueError: negative shift count if keepbits=23. That causes the test_no_rounding test to error.

This is a feature. half_quantum1 is indeed inexpressible if your quantum is the least significant bit... I would just add return from the beginning or not call the sub at all if keepbits == 23. If keepbits > 23 (or < 0) you might wish to raise an exception...
What is the kind of failure the test is supposed to capture?

I find a condition like keepbits >= 0 reasonable, but note that keepbits < 0 within reasonable limits (see below) shouldn't cause problems as this will just round the exponent bits. E.g.

julia> using BitInformation julia> round(4f0,-1) 2.0f0

So in this format only ...,0.125,0.5,2,8,... are representable. However, as the exponent bits describe logarithmically distributed floats, round to nearest is now round to nearest in log-space. Meaning that while 4 is round down to 2 in the example above,

julia> round(4.1f0,-1) 8.0f0

although in lin-space 4.1 is closer to 2 than to 8. Sure, this has to be treated with caution, as NaN/Inf are defined through their exponents, meaning you end up with situations like

julia> round(NaN32,-1) -0.0f0 julia> round(Inf32,-1) -0.0f0

And the carry bit can propagate into the sign bit (for |x|>2), which is also weird

julia> round(4f0,-8) -0.0f0 julia> round(-4f0,-8) 0.0f0

numcodecs/tests/test_bitround.py

rabernat · 2021-12-17T17:23:37Z

numcodecs/tests/test_bitround.py

+def test_approx_equal(dtype):
+    a = np.random.random_sample((300, 200)).astype(dtype)
+    ar = round(a, APPROX_KEEPBITS[dtype])
+    # Mimic julia behavior - https://docs.julialang.org/en/v1/base/math/#Base.isapprox
+    rtol = np.sqrt(np.finfo(np.float32).eps)
+    # This gets us much closer but still failing for ~6% of the array
+    # It does pass if we add 1 to keepbits (11 instead of 10)
+    # Is there an off-by-one issue here?
+    np.testing.assert_allclose(a, ar, rtol=rtol)


The fact that this test passes if we use keepbits=11 instead of 10 makes me think we are dealing with a off-by-one issue, perhaps related to julia vs. python indexing

To be honest, I just guessed the tolerances here. The motivation for this test was less to test exactness, but just to flag immediately if rounding is completely off.

Ok then I will just bump keepbits to 11 for this test.

milankl · 2021-12-20T15:53:11Z

numcodecs/bitround.py

+    def encode(self, buf):
+        # TODO: figure out if we need to make a copy
+        # Currently this appears to be overwriting the input buffer
+        # Is that the right behavior?


In BitInformation.jl the rounding is implemented as scalar version which does not overwrite the input (as float32 is immutable so a copy is created anyway), however, I define a rounding function for arrays, that can either act in-place (i.e. overwriting the bits in an existing array) or acts on a copy of the array, such that the input array in unchanged.

Understood. This is more a question about numcodecs (e.g. for @jakirkham), rather than about BitInformation.jl.

No we shouldn't be overwriting the input buffer.

How about changing

b = a.view(dtype=np.int32)

to

b = a.view(dtype=np.int32).copy()

to avoid the overwriting of the input buffer?

a.astype(np.int32, copy=True) is more canonical.

milankl · 2021-12-20T15:53:56Z

numcodecs/tests/test_bitround.py

+def test_round_zero_to_zero(dtype):
+    a = np.zeros((3, 2), dtype=dtype)
+    # Don't understand Milan's original test:
+    # How is it possible to have negative keepbits?


You just end up rounding the exponent bits, see other comment

pep8speaks · 2022-01-04T09:32:27Z

Hello @rabernat! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-05-18 17:50:38 UTC

rabernat · 2022-04-07T15:55:53Z

I want to apologize for letting this PR stall. I wanted to get it started as a proof of concept. If anyone else is excited about this feature and wants to work on it, I absolutely invite you to take over my PR and continue pushing it forward. cc @aaronspring

martindurant · 2022-04-07T16:10:33Z

It looks pretty complete - what more needs doing?

rabernat · 2022-04-07T16:30:27Z

It looks pretty complete - what more needs doing?

Implement other dtypes
More thorough testing, to align with the way the other codecs are tests

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

aaronspring · 2022-04-07T19:51:11Z

I can add https://github.com/observingClouds/bitinformation_pipeline/blob/749623784286a04a54b6107fc8dd99d23d44f1b0/tests/test_bitround.py#L61 to testing, where I test this PR against Milan's bitinformation.round implementation in Julia.

martindurant · 2022-04-07T20:30:10Z

I can add

Does that require adding julia into the tests?
It would be nice to test for float64 as well as float32. Maybe float16 too, although I can't really imagine that being relevant.

aaronspring · 2022-04-07T20:47:42Z

Sorry I meant I can add this to the discussion. Yes these tests run Julia code and shouldn’t be included here.

martindurant · 2022-04-07T20:48:35Z

Yes these tests run Julia code and shouldn’t be included here.

For a small test data array, we can inline the expected values.

numcodecs/bitround.py

jakirkham

Thanks for continuing to push this forward Ryan and Martin! 😄

Sorry for being somewhat absent here. Though am happy to see others providing you feedback to improve on this work.

Had a couple minor comments below. Not attached to any particular code suggestions, but thinking we can do a bit of simplification in a few spots and provide a bit more context for future readers. Happy to clarify anything if needed 🙂

numcodecs/bitround.py

martindurant · 2022-04-15T16:45:48Z

@jakirkham , I agree and adopted your changes.

numcodecs/bitround.py

Co-authored-by: jakirkham <jakirkham@gmail.com>

martindurant · 2022-04-22T19:52:18Z

Do we need to drop py36? Something's up with the environment.
cf #308 (for adding 3.10)

martindurant · 2022-04-28T00:57:20Z

@jakirkham , any thoughts?

aaronspring · 2022-05-11T21:03:06Z

I think we can now rerun CI after merging from main

joshmoore · 2022-05-12T06:51:11Z

Fixed the release conflicts.

rsignell-usgs · 2022-05-18T14:12:39Z

@rabernat, if I install this numcodecs PR, should I be able to just specify this in the encoding argument dict (somehow) in xarray's ds.to_zarr()?

aaronspring · 2022-05-18T14:17:38Z

See example in observingClouds/xbitinfo#75

rsignell-usgs · 2022-05-18T14:45:47Z

@aaronspring , awesome! That's exactly what I was looking for!

martindurant · 2022-05-18T17:38:17Z

Will merge at the end of my day if there are no more comments.

rabernat

Just one minor docstring suggestion.

Thanks for seeing this through Martin!

numcodecs/bitround.py

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

jakirkham · 2022-05-18T19:16:22Z

Thanks Martin & Ryan and everyone who helped review! 🙏

no longer need to pull from rabernat branch, since zarr-developers/numcodecs#299

milankl · 2022-10-11T09:22:59Z

There’s a hardcoded 23 (representing the number of mantissa bits) in line 67, that should be 10,23,52 for float16/32/64. So that line should be ‘maskbits = bits - self.keepbits’ I guess

…

On 9 Apr 2022, at 14:05, Aaron Spring ***@***.***> wrote: When I compare this PRs round against bitinformation.jl.round in https://github.com/observingClouds/bitinformation_pipeline/blob/2f02df358521d28604b5be69ce9fb0388169e797/tests/test_bitround.py#L64, I get equals results only for float32 but not for float16 or float64, see https://github.com/observingClouds/bitinformation_pipeline/runs/5953867041?check_suite_focus=true — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

martindurant · 2022-10-11T13:17:53Z

Here is line 67: https://github.com/zarr-developers/numcodecs/blob/main/numcodecs/bitround.py#L67

The only occurrence of 23 I see in the code is the mapping of mantissa bits by dtype. @milankl , can you please link to the specific line you see or propose a PR to correct any problem?

milankl · 2022-10-11T13:24:07Z

Sorry, I don't know why this post was sent now. I may have sent an email at the beginning of the year to contribute when the PR was still open, maybe this email got lost and only delivered now? Weird. The PR clearly contains the maskbits = bits - self.keepbits lines instead of a hardcoded 23, so consider my previous comment as irrelevant.

martindurant · 2022-10-11T13:25:05Z

OK then! I'll blame the gremlins :)

joshmoore · 2022-10-11T13:32:57Z

rabernat added 2 commits December 17, 2021 09:58

added new codec

cdb77b2

wrote some tests

c0b6347

rabernat commented Dec 17, 2021

View reviewed changes

numcodecs/tests/test_bitround.py Outdated Show resolved Hide resolved

rabernat mentioned this pull request Dec 17, 2021

Support new "bitinformation" codec in numcodecs #298

Closed

use julia-style approximation

9e0c943

rabernat commented Dec 17, 2021

View reviewed changes

milankl reviewed Dec 20, 2021

View reviewed changes

add range limits to bitround

06a27d7

fix PEP error

7c7dc7c

milankl mentioned this pull request Apr 7, 2022

Bitround in python or julia observingClouds/xbitinfo#25

Closed

observingClouds mentioned this pull request Apr 7, 2022

bp.jl_bitround observingClouds/xbitinfo#29

Merged

milankl mentioned this pull request Apr 7, 2022

Incorrect round away from zero for keepbits=significand_bits milankl/BitInformation.jl#36

Closed

martindurant and others added 6 commits April 7, 2022 14:34

Add docs and allow for multple precisions

69263a5

fix

6df3b69

Update numcodecs/bitround.py

b5abbbb

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

Update numcodecs/bitround.py

7d9846a

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

Update numcodecs/bitround.py

76e9f6f

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

Update numcodecs/bitround.py

2f6207e

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

document keepbits

594202a

rabernat commented Apr 8, 2022

View reviewed changes

numcodecs/bitround.py Outdated Show resolved Hide resolved

jakirkham reviewed Apr 14, 2022

View reviewed changes

suggested changes

775d368

jakirkham reviewed Apr 15, 2022

View reviewed changes

numcodecs/bitround.py Outdated Show resolved Hide resolved

Update numcodecs/bitround.py

7deff68

Co-authored-by: jakirkham <jakirkham@gmail.com>

aaronspring mentioned this pull request May 4, 2022

register on pypi observingClouds/xbitinfo#14

Closed

observingClouds mentioned this pull request May 4, 2022

Failing CI for python3.6 MacOS #317

Closed

Merge branch 'master' into bitround

cb93cb6

rabernat commented May 18, 2022

View reviewed changes

numcodecs/bitround.py Outdated Show resolved Hide resolved

numcodecs/bitround.py Outdated Show resolved Hide resolved

martindurant and others added 2 commits May 18, 2022 13:50

Update numcodecs/bitround.py

66b7b1a

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

Update numcodecs/bitround.py

56d9511

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>

jakirkham approved these changes May 18, 2022

View reviewed changes

jakirkham merged commit aab5392 into zarr-developers:master May 18, 2022

rsignell-usgs added a commit to rsignell-usgs/xbitinfo that referenced this pull request May 19, 2022

install numcodecs from origin

3e8f56d

no longer need to pull from rabernat branch, since zarr-developers/numcodecs#299

rsignell-usgs mentioned this pull request May 19, 2022

Remove dependency on rabernat branch observingClouds/xbitinfo#103

Merged

joshmoore mentioned this pull request Aug 1, 2022

Register BitRound (fix #346) #347

Merged

7 tasks

Bitround Codec #299

Bitround Codec #299

Conversation

rabernat commented Dec 17, 2021 • edited by martindurant Loading

Choose a reason for hiding this comment

rkouznetsov Dec 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Jan 4, 2022 • edited Loading

Comment last updated at 2022-05-18 17:50:38 UTC

rabernat commented Apr 7, 2022

martindurant commented Apr 7, 2022

rabernat commented Apr 7, 2022

aaronspring commented Apr 7, 2022

martindurant commented Apr 7, 2022

aaronspring commented Apr 7, 2022

martindurant commented Apr 7, 2022

jakirkham left a comment

Choose a reason for hiding this comment

martindurant commented Apr 15, 2022

martindurant commented Apr 22, 2022

martindurant commented Apr 28, 2022

aaronspring commented May 11, 2022

joshmoore commented May 12, 2022

rsignell-usgs commented May 18, 2022 • edited Loading

aaronspring commented May 18, 2022

rsignell-usgs commented May 18, 2022

martindurant commented May 18, 2022

rabernat left a comment

Choose a reason for hiding this comment

jakirkham commented May 18, 2022

milankl commented Oct 11, 2022 via email

martindurant commented Oct 11, 2022

milankl commented Oct 11, 2022

martindurant commented Oct 11, 2022

joshmoore commented Oct 11, 2022

rabernat commented Dec 17, 2021 •

edited by martindurant

Loading

rkouznetsov Dec 18, 2021 •

edited

Loading

pep8speaks commented Jan 4, 2022 •

edited

Loading

rsignell-usgs commented May 18, 2022 •

edited

Loading