-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Fix addition of zero values during pushes and logic error. #2
Conversation
This was otherwise a bug where reading large values, with constituent bits spanning multiple bytes, could lead to a situation where partial reads resulted in a zero value for that read alone, which otherwise would properly shift the accumulated value to generate the correct result, but were getting silently dumped on the floor. As well, the push method for LittleEndian was improperly shifting the value, instead of the accumulator, with the accumulated bits instead of the bits for the given value.
For all the little-endian files I've seen, bytes are packed with the most-significant bits last. For instance, take the first 4 bytes of an Intel ordered TIFF:
WavPack, TrueAudio and other little-endian formats I've seen work similarly. Do you have an example format I could poke at? |
Yeah, I'll see if I can whip up a small reproduction case to illustrate my point. |
This should do it:
It immediately will blow up on master trying to compare the "random value", but that's because of the This data is from a non-open standard for Starcraft 2/Heroes of the Storm replay files -- these are popular online games. Now, here's the really wild thing: the way that both my reference code (that I'm using as a reference for writing a Rust-based version) and your code work is that for a little-endian 32-bit value... it will actually read it in big endian! If you look at the byte order above, for the first four bytes, for those to equal the value I assert it to be -- which is also what my reference code says the integer value of those those bytes represents -- you have to interpret them in big-endian order. The way that BitQueue is being used to hold the output value of a single read, it's only ever going to read the bits in big-endian order. |
So, I'm a little inebriated so my thoughts might not be fully cogent here, but: my reference code refers to itself as reading little-endian values from a given stream... but this is clearly not the case. It is reading it, as your code does, in big-endian format. However, when I use big-endianness with BitReader, things do not work for me as they do when I'm using little-endianness... so now I'm just really confused. :) Am I somehow just wildly misinterpreting endianness? Have I been staring at this so long that I have it backwards? Or does the code for both endiannesses in bitstream_io have it backwards? |
There's now a new method -- position() -- that returns the number of bytes specifically read by the BitReader instance. While this does not map to the precise position of the reader passed into BitReader, it is relative to where the reader was at when passed to the BitReader.
Simply switching LittleEndian to BigEndian in your code makes all the assertions pass, so your data must be big-endian. Those initial "Random Value" bytes are a dead giveaway. In a big-endian stream with the most-significant bits first, the value is:
If it were a little-endian stream with the least-significant bits first, that value would be:
Though Rust's byteorder crate only works on byte-aligned values, it should give the same results for a 32-bit value in each endianness. |
Ugh, the annoying thing is that you're right that this is clearly big-endian behavior but my code only works against this PR while using little endianness. Now that I've spent more time looking, though, I understand what Endianness is doing in the push method for LE and that it's not a bug. Here's where it's still weird, though:
So, it's quirky in that things are assembled in big-endian order but they're intended to be read LSB first. Seems like at this point I just need to write my own bitstream reader, because clearly this format is not following the big endian/MSB unwritten rule. Appreciate you doing the back-and-forth since it helped me grok this weirdness better. :D |
This was otherwise a bug where reading large values, with constituent bits spanning multiple bytes, could lead to a situation where partial reads resulted in a zero value for that read alone, which otherwise would properly shift the accumulated value to generate the correct result, but were getting silently dumped on the floor.
For example:
Bitstream, little-endian ordering: [(00000001)] [000000(00)]
Each set of values in brackets is a single byte, and the target value we want to extract is in the parentheses, spanning 10 bits across two bytes. With the previous code, if we read the remaining two bits, we'd get a zero value, giving us an accumulated value of one, when the real value is actually four.
As well, the push method for LittleEndian was improperly shifting the value, instead of the accumulator, with the accumulated bits instead of the bits for the given value.
The biggest issue here is that the unit tests are all out of whack after this change. I've yet to tackle them because I wanted to run this by you first. For me, this was a legitimate bug, but maybe for your use case, it's not?
Any deep thoughts here? I've gone back and forth on whether or not I should just copypasta the bits I need into my project and be done with it, but figured I'd see if you thought it made sense to forge ahead getting these things upstream.