Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.PackedInt(Array/Slice/Io) are unsound #17997

Open
InspectorBoat opened this issue Nov 14, 2023 · 2 comments
Open

std.PackedInt(Array/Slice/Io) are unsound #17997

InspectorBoat opened this issue Nov 14, 2023 · 2 comments
Labels
bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@InspectorBoat
Copy link

Zig Version

0.12.0-dev.1486+234693bcb

Steps to Reproduce and Observed Behavior

PackedIntIo currently works like this:

The general technique employed here is to cast bytes in the array to a container integer (having bits % 8 == 0) large enough to contain the number of bits we want, then we can retrieve or store the new value with a relative minimum of masking and shifting. In this worst case, this means that we'll need an integer that's actually 1 byte larger than the minimum required to store the bits, because it is possible that the bits start at the end of the first byte, continue through zero or more, then end in the beginning of the last. But, if we try to access a value in the very last byte of memory with that integer size, that extra byte will be out of bounds. Depending on the circumstances of the memory, that might mean the OS fatally kills the program. Thus, we use a larger container (MaxIo) most of the time, but a smaller container (MinIo) when touching the last byte of the memory.

(from packed_int_array.zig)

The MinIo and MaxIo container types are defined as follows:

const int_bits = @bitSizeOf(Int);

// In the best case, this is the number of bytes we need to touch
// to read or write a value, as bits.
const min_io_bits = ((int_bits + 7) / 8) * 8;

// In the worst case, this is the number of bytes we need to touch
// to read or write a value, as bits. To calculate for int_bits > 1,
// set aside 2 bits to touch the first and last bytes, then divide
// by 8 to see how many bytes can be filled up in between.
const max_io_bits = switch (int_bits) {
    0 => 0,
    1 => 8,
    else => ((int_bits - 2) / 8 + 2) * 8,
};

// The maximum container int type
const MinIo = std.meta.Int(.unsigned, min_io_bits);

// The minimum container int type
const MaxIo = std.meta.Int(.unsigned, max_io_bits);

This code makes the false assumption that reading *align(1) MinIo or *align(1) MaxIo only reads @bitSizeOf(MinIo/MaxIo) / 8 bytes:

pub fn get(bytes: []const u8, index: usize, bit_offset: u7) Int {
    if (int_bits == 0) return 0;

    const bit_index = (index * int_bits) + bit_offset;
    const max_end_byte = (bit_index + max_io_bits) / 8;

    //using the larger container size will potentially read out of bounds
    if (max_end_byte > bytes.len) return getBits(bytes, MinIo, bit_index);
    return getBits(bytes, MaxIo, bit_index);
}

However, (at least on x86-64) accessing an exotically sized integer actually accesses the backing naturally-sized integer. For example, reading *u24 will read a full dword. (relevant issue?). For a PackedIntIo(u24), the above code assumes that if reading a *align(1) MaxIo would cause an out of bounds access, reading *align(1) MinIo must be safe. However, both code paths will read a full 4 bytes, which can cause a segfault when crossing page boundaries. This can be observed with:

test "slice segfault" {
    var pages = try std.testing.allocator.alloc(u8, std.mem.page_size * 3);
    defer std.testing.allocator.free(pages);

    const slice = std.PackedIntSlice(u24).init(pages, 4096);

    _ = slice.get(4095); // segfault
}

Expected Behavior

A possible fix:

pub fn get(bytes: []const u8, index: usize, bit_offset: u7) Int {
    if (int_bits == 0) return 0;

    const bit_index = (index * int_bits) + bit_offset;
    // use the backing size of MaxIo instead
    const max_end_byte = (bit_index + @sizeOf(MaxIo) * 8) / 8;

    //using the larger container size will potentially read out of bounds
    if (max_end_byte > bytes.len) {
        if (@sizeOf(MinIo) < @sizeOf(MaxIo)) {
            return getBits(bytes, MinIo, bit_index);
        } else {
            // alternative function that retrieves bytes individually
            return getBitsContainerless(bytes, bit_index);
        }
    }
    return getBits(bytes, MaxIo, bit_index);
}

However, I don't think this code would be correct. It requires that MinIo

  1. must be sufficient to hold the bytes required (that is, we must have actually hit the best case of requiring the smaller container)
  2. must not read/write out of bounds

These conditions are true under the incorrect assumptions that reading a *uX actually only reads X bits, and that MaxIo is at most only 1 byte larger than MinIo. I suspect they no longer hold when MaxIo can be 1, 2, 4, or 8 bytes larger. It would be better to just use the alternate function, getBitsContainerless regardless of the size of MinIo.
Sorry if I got anything wrong about the semantics, I'm still quite new to all this.

@InspectorBoat InspectorBoat added the bug Observed behavior contradicts documented or intended behavior label Nov 14, 2023
@squeek502 squeek502 added the standard library This issue involves writing Zig code for the standard library. label Nov 14, 2023
@slonik-az
Copy link
Sponsor

Pages are word-aligned. If reads are word-aligned too, there will be no segfaults. Of course, flexible int size access would require more masks and shifts but it is a price worth paying to prevent segfaults.

@notcancername
Copy link
Contributor

@slonik-az That may be implemented as an optimization on specific platforms then, like #17161.

@Vexu Vexu added this to the 0.14.0 milestone Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

5 participants