-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitround Codec #299
Bitround Codec #299
Conversation
numcodecs/bitround.py
Outdated
maskbits = 23 - self.keepbits | ||
mask = (0xFFFFFFFF >> maskbits) << maskbits | ||
half_quantum1 = (1 << (maskbits - 1)) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section will error with ValueError: negative shift count
if keepbits=23
. That causes the test_no_rounding
test to error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a feature. half_quantum1 is indeed inexpressible if your quantum is the least significant bit... I would just add return from the beginning or not call the sub at all if keepbits == 23. If keepbits > 23 (or < 0) you might wish to raise an exception...
What is the kind of failure the test is supposed to capture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find a condition like keepbits >= 0
reasonable, but note that keepbits < 0
within reasonable limits (see below) shouldn't cause problems as this will just round the exponent bits. E.g.
julia> using BitInformation
julia> round(4f0,-1)
2.0f0
So in this format only ...,0.125,0.5,2,8,... are representable. However, as the exponent bits describe logarithmically distributed floats, round to nearest is now round to nearest in log-space. Meaning that while 4 is round down to 2 in the example above,
julia> round(4.1f0,-1)
8.0f0
although in lin-space 4.1 is closer to 2 than to 8. Sure, this has to be treated with caution, as NaN/Inf are defined through their exponents, meaning you end up with situations like
julia> round(NaN32,-1)
-0.0f0
julia> round(Inf32,-1)
-0.0f0
And the carry bit can propagate into the sign bit (for |x|>2), which is also weird
julia> round(4f0,-8)
-0.0f0
julia> round(-4f0,-8)
0.0f0
def test_approx_equal(dtype): | ||
a = np.random.random_sample((300, 200)).astype(dtype) | ||
ar = round(a, APPROX_KEEPBITS[dtype]) | ||
# Mimic julia behavior - https://docs.julialang.org/en/v1/base/math/#Base.isapprox | ||
rtol = np.sqrt(np.finfo(np.float32).eps) | ||
# This gets us much closer but still failing for ~6% of the array | ||
# It does pass if we add 1 to keepbits (11 instead of 10) | ||
# Is there an off-by-one issue here? | ||
np.testing.assert_allclose(a, ar, rtol=rtol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this test passes if we use keepbits=11 instead of 10 makes me think we are dealing with a off-by-one issue, perhaps related to julia vs. python indexing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I just guessed the tolerances here. The motivation for this test was less to test exactness, but just to flag immediately if rounding is completely off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok then I will just bump keepbits to 11 for this test.
numcodecs/bitround.py
Outdated
def encode(self, buf): | ||
# TODO: figure out if we need to make a copy | ||
# Currently this appears to be overwriting the input buffer | ||
# Is that the right behavior? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In BitInformation.jl the rounding is implemented as scalar version which does not overwrite the input (as float32 is immutable so a copy is created anyway), however, I define a rounding function for arrays, that can either act in-place (i.e. overwriting the bits in an existing array) or acts on a copy of the array, such that the input array in unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. This is more a question about numcodecs (e.g. for @jakirkham), rather than about BitInformation.jl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we shouldn't be overwriting the input buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing
b = a.view(dtype=np.int32)
to
b = a.view(dtype=np.int32).copy()
to avoid the overwriting of the input buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a.astype(np.int32, copy=True)
is more canonical.
def test_round_zero_to_zero(dtype): | ||
a = np.zeros((3, 2), dtype=dtype) | ||
# Don't understand Milan's original test: | ||
# How is it possible to have negative keepbits? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You just end up rounding the exponent bits, see other comment
I want to apologize for letting this PR stall. I wanted to get it started as a proof of concept. If anyone else is excited about this feature and wants to work on it, I absolutely invite you to take over my PR and continue pushing it forward. cc @aaronspring |
It looks pretty complete - what more needs doing? |
|
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
I can add https://github.com/observingClouds/bitinformation_pipeline/blob/749623784286a04a54b6107fc8dd99d23d44f1b0/tests/test_bitround.py#L61 to testing, where I test this PR against Milan's bitinformation.round implementation in Julia. |
Does that require adding julia into the tests? |
Sorry I meant I can add this to the discussion. Yes these tests run Julia code and shouldn’t be included here. |
For a small test data array, we can inline the expected values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for continuing to push this forward Ryan and Martin! 😄
Sorry for being somewhat absent here. Though am happy to see others providing you feedback to improve on this work.
Had a couple minor comments below. Not attached to any particular code suggestions, but thinking we can do a bit of simplification in a few spots and provide a bit more context for future readers. Happy to clarify anything if needed 🙂
@jakirkham , I agree and adopted your changes. |
Co-authored-by: jakirkham <jakirkham@gmail.com>
Do we need to drop py36? Something's up with the environment. |
@jakirkham , any thoughts? |
I think we can now rerun CI after merging from main |
Fixed the release conflicts. |
@rabernat, if I install this numcodecs PR, should I be able to just specify this in the |
See example in observingClouds/xbitinfo#75 |
@aaronspring , awesome! That's exactly what I was looking for! |
Will merge at the end of my day if there are no more comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor docstring suggestion.
Thanks for seeing this through Martin!
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Thanks Martin & Ryan and everyone who helped review! 🙏 |
no longer need to pull from rabernat branch, since zarr-developers/numcodecs#299
There’s a hardcoded 23 (representing the number of mantissa bits) in line 67, that should be 10,23,52 for float16/32/64. So that line should be ‘maskbits = bits - self.keepbits’ I guess
… On 9 Apr 2022, at 14:05, Aaron Spring ***@***.***> wrote:
When I compare this PRs round against bitinformation.jl.round in https://github.com/observingClouds/bitinformation_pipeline/blob/2f02df358521d28604b5be69ce9fb0388169e797/tests/test_bitround.py#L64, I get equals results only for float32 but not for float16 or float64, see https://github.com/observingClouds/bitinformation_pipeline/runs/5953867041?check_suite_focus=true
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.
|
Here is line 67: https://github.com/zarr-developers/numcodecs/blob/main/numcodecs/bitround.py#L67 The only occurrence of 23 I see in the code is the mapping of mantissa bits by dtype. @milankl , can you please link to the specific line you see or propose a PR to correct any problem? |
Sorry, I don't know why this post was sent now. I may have sent an email at the beginning of the year to contribute when the PR was still open, maybe this email got lost and only delivered now? Weird. The PR clearly contains the |
OK then! I'll blame the gremlins :) |
Eventually closes #298
TODO:
tox -e py39
passes locallytox -e docs
passes locally