Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix addition of zero values during pushes and logic error. #2

Closed
wants to merge 2 commits into from

Conversation

tobz
Copy link
Contributor

@tobz tobz commented Oct 3, 2017

This was otherwise a bug where reading large values, with constituent bits spanning multiple bytes, could lead to a situation where partial reads resulted in a zero value for that read alone, which otherwise would properly shift the accumulated value to generate the correct result, but were getting silently dumped on the floor.

For example:

Bitstream, little-endian ordering: [(00000001)] [000000(00)]

Each set of values in brackets is a single byte, and the target value we want to extract is in the parentheses, spanning 10 bits across two bytes. With the previous code, if we read the remaining two bits, we'd get a zero value, giving us an accumulated value of one, when the real value is actually four.

As well, the push method for LittleEndian was improperly shifting the value, instead of the accumulator, with the accumulated bits instead of the bits for the given value.

The biggest issue here is that the unit tests are all out of whack after this change. I've yet to tackle them because I wanted to run this by you first. For me, this was a legitimate bug, but maybe for your use case, it's not?

Any deep thoughts here? I've gone back and forth on whether or not I should just copypasta the bits I need into my project and be done with it, but figured I'd see if you thought it made sense to forge ahead getting these things upstream.

This was otherwise a bug where reading large values, with constituent
bits spanning multiple bytes, could lead to a situation where partial
reads resulted in a zero value for that read alone, which otherwise
would properly shift the accumulated value to generate the correct
result, but were getting silently dumped on the floor.

As well, the push method for LittleEndian was improperly shifting the
value, instead of the accumulator, with the accumulated bits instead of
the bits for the given value.
@tobz tobz changed the title Fix addition of zero values during pushes and logic error. [WIP] Fix addition of zero values during pushes and logic error. Oct 3, 2017
@tuffy
Copy link
Owner

tuffy commented Oct 3, 2017

For all the little-endian files I've seen, bytes are packed with the most-significant bits last. For instance, take the first 4 bytes of an Intel ordered TIFF:

use std::io::Cursor;
use bitstream_io::{LE, BitReader};

let tiff: [u8;4] = [0x49, 0x49, 0x2A, 0x00];
let mut c = Cursor::new(&tiff);
let mut r = BitReader::<LE>::new(&mut c);
let mut buf = [0,0];
assert!(r.read_bytes(&mut buf).is_ok());
assert_eq!(&buf, b"II");
assert_eq!(r.read::<u16>(16).unwrap(), 42);

WavPack, TrueAudio and other little-endian formats I've seen work similarly. Do you have an example format I could poke at?

@tobz
Copy link
Contributor Author

tobz commented Oct 3, 2017

Yeah, I'll see if I can whip up a small reproduction case to illustrate my point.

@tobz
Copy link
Contributor Author

tobz commented Oct 4, 2017

This should do it:

extern crate bitstream_io;

use std::io::Cursor;
use bitstream_io::{BitReader, LittleEndian};

fn main() {
    let rv_dflt: [u8;10] = [0x3D, 0x59, 0xF4, 0xC6, 0x01, 0x00, 0x44, 0x66, 0x6C, 0x74];

    let mut c = Cursor::new(&rv_dflt);
    let mut r = BitReader::<LittleEndian>::new(&mut c);

    // Read the random value.
    let rv = r.read::<u32>(32).unwrap();
    assert_eq!(rv, 1029305542);

    // Read our length value for the game cache string.
    let buf_len = r.read::<u32>(10).unwrap();
    assert_eq!(buf_len, 4);

    // Line things up.
    r.byte_align();

    // Read the game cache value.
    let mut buf = vec![0u8; buf_len as usize];
    r.read_bytes(buf.as_mut_slice()).unwrap();
    assert_eq!(buf.as_slice(), b"Dflt");
}

It immediately will blow up on master trying to compare the "random value", but that's because of the value <<= *bits_acc line which is busted. If you fix that line, it blows up on getting buf_len as it reads 8 bits (0b00000001) and then goes to read the next 2 bits (0b00). If those two bits don't get pushed onto the accumulator (value_acc at 0b00000001 shifted two bytes to the left == 0b0000000100) then the value stays at 1.

This data is from a non-open standard for Starcraft 2/Heroes of the Storm replay files -- these are popular online games.

Now, here's the really wild thing: the way that both my reference code (that I'm using as a reference for writing a Rust-based version) and your code work is that for a little-endian 32-bit value... it will actually read it in big endian!

If you look at the byte order above, for the first four bytes, for those to equal the value I assert it to be -- which is also what my reference code says the integer value of those those bytes represents -- you have to interpret them in big-endian order.

The way that BitQueue is being used to hold the output value of a single read, it's only ever going to read the bits in big-endian order.

@tobz
Copy link
Contributor Author

tobz commented Oct 4, 2017

So, I'm a little inebriated so my thoughts might not be fully cogent here, but:

my reference code refers to itself as reading little-endian values from a given stream... but this is clearly not the case. It is reading it, as your code does, in big-endian format. However, when I use big-endianness with BitReader, things do not work for me as they do when I'm using little-endianness... so now I'm just really confused. :)

Am I somehow just wildly misinterpreting endianness? Have I been staring at this so long that I have it backwards? Or does the code for both endiannesses in bitstream_io have it backwards?

There's now a new method -- position() -- that returns the number of
bytes specifically read by the BitReader instance.  While this does not
map to the precise position of the reader passed into BitReader, it is
relative to where the reader was at when passed to the BitReader.
@tuffy
Copy link
Owner

tuffy commented Oct 4, 2017

Simply switching LittleEndian to BigEndian in your code makes all the assertions pass, so your data must be big-endian. Those initial "Random Value" bytes are a dead giveaway. In a big-endian stream with the most-significant bits first, the value is:

0x3D, 0x59, 0xF4, 0xC6 == 0x3D59F4C6 == 1029305542

If it were a little-endian stream with the least-significant bits first, that value would be:

0x3D, 0x59, 0xF4, 0xC6 == 0xC6F4593D == 3337902397

Though Rust's byteorder crate only works on byte-aligned values, it should give the same results for a 32-bit value in each endianness.

@tobz
Copy link
Contributor Author

tobz commented Oct 4, 2017

Ugh, the annoying thing is that you're right that this is clearly big-endian behavior but my code only works against this PR while using little endianness. Now that I've spent more time looking, though, I understand what Endianness is doing in the push method for LE and that it's not a bug.

Here's where it's still weird, though:

extern crate bitstream_io;

use std::io::Cursor;
use bitstream_io::{BitReader, BigEndian};

fn main() {
    let init_start: [u8;20] = [0x10, 0x0a, 0x65, 0x72, 0x65, 0x76, 0x6c, 0x79, 0x64, 0x65, 0x75, 0x78, 0x01, 0x00, 0x02, 0xfc, 0xff, 0xff, 0xff, 0x07];

    let mut c = Cursor::new(&init_start);
    let mut r = BitReader::<BigEndian>::new(&mut c);

    let pl = r.read::<u32>(5).unwrap();
    assert_eq!(pl, 16);

    // Read our length value for the game cache string.
    let name_len = r.read::<u32>(8).unwrap();
    assert_eq!(name_len, 10);

    // Line things up.
    r.byte_align();

    // read the player name
    let mut buf = vec![0u8; name_len as usize];
    r.read_bytes(buf.as_mut_slice());
    assert_eq!(buf.as_slice(), b"erevlydeux");
}

So, it's quirky in that things are assembled in big-endian order but they're intended to be read LSB first.

Seems like at this point I just need to write my own bitstream reader, because clearly this format is not following the big endian/MSB unwritten rule.

Appreciate you doing the back-and-forth since it helped me grok this weirdness better. :D

@tobz tobz closed this Oct 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants