Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault when bitcasting u8x64 vector to [2]u8x32 within function #17996

Closed
travisstaloch opened this issue Nov 14, 2023 · 10 comments · Fixed by #18729
Closed

segfault when bitcasting u8x64 vector to [2]u8x32 within function #17996

travisstaloch opened this issue Nov 14, 2023 · 10 comments · Fixed by #18729
Labels
backend-llvm The LLVM backend outputs an LLVM IR Module. bug Observed behavior contradicts documented or intended behavior regression It worked in a previous version of Zig, but stopped working.
Milestone

Comments

@travisstaloch
Copy link
Sponsor Contributor

Zig Version

0.12.0-dev.1595+70d8baaec

Steps to Reproduce and Observed Behavior

This first happened 2 days ago in simdjzon's ci here. That ci run used zig-linux-x86_64-0.12.0-dev.1591+3fc6a2f11. There have been no recent changes to the project and this never happened before then.

The following is a minimal reproduction of the segfault.

// /tmp/tmp.zig
pub const STEP_SIZE = 64;
pub const u8x32 = @Vector(32, u8);
pub const u8x64 = @Vector(64, u8);

pub fn main() !void {
    const input: []const u8 =
        \\{
        \\    "Width": 800,
        \\    "Height": 600,
        \\    "Title": "View from my room",
        \\    "Url": "http://ex.com/img.png",
        \\    "Private": false,
        \\    "Owner": null
        \\}
    ;

    // this works fine.
    const input_vec: u8x64 = input[0..STEP_SIZE].*;
    const chunks = @as([2]u8x32, @bitCast(input_vec));
    _ = chunks;

    // the segfault only happens when passing the 64 byte vector to a function
    try next(input[0..STEP_SIZE].*);
}

fn next(input_vec: u8x64) !void {
    const chunks = @as([2]u8x32, @bitCast(input_vec));
    _ = chunks;
}
zig run /tmp/tmp.zig
Segmentation fault at address 0x0
/tmp/tmp.zig:27:5: 0x21cefa in next (tmp)
    const chunks = @as([2]u8x32, @bitCast(input_vec));
    ^
/tmp/tmp.zig:23:13: 0x21cf71 in main (tmp)
    try next(input[0..STEP_SIZE].*);
            ^
/home/travis/dev/zig/zig/download/0.12.0-dev.1595+70d8baaec/files/lib/std/start.zig:585:37: 0x21ceb7 in posixCallMainAndExit (tmp)
            const result = root.main() catch |err| {
                                    ^
/home/travis/dev/zig/zig/download/0.12.0-dev.1595+70d8baaec/files/lib/std/start.zig:253:5: 0x21c9e1 in _start (tmp)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)
Aborted (core dumped)

Expected Behavior

no segfault.

@travisstaloch travisstaloch added the bug Observed behavior contradicts documented or intended behavior label Nov 14, 2023
@Vexu Vexu added backend-llvm The LLVM backend outputs an LLVM IR Module. regression It worked in a previous version of Zig, but stopped working. labels Nov 14, 2023
@Vexu Vexu added this to the 0.12.0 milestone Nov 14, 2023
@travisstaloch
Copy link
Sponsor Contributor Author

Not sure if it helps, but I've included a --verbose-air dump below. I've verified that it still reproduces the segfault with the previous main() which is now removed.

pub const STEP_SIZE = 64;
pub const u8x32 = @Vector(32, u8);
pub const u8x64 = @Vector(64, u8);

export fn x(input: [*]const u8) void {
    next(input[0..STEP_SIZE].*);
}

export fn next(input_vec: u8x64) void {
    const chunks = @as([2]u8x32, @bitCast(input_vec));
    _ = chunks;
}
$ zig version
0.12.0-dev.1606+569182dbb2
$ zig build-lib --verbose-air /tmp/tmp.zig 
# Begin Function AIR: tmp.x:
# Total AIR+Liveness bytes: 375B
# AIR Instructions:         19 (171B)
# AIR Extra Data:           23 (92B)
# Liveness tomb_bits:       16B
# Liveness Extra Data:      0 (0B)
# Liveness special table:   0 (0B)
  %0 = arg([*]const u8, 0)
  %1!= save_err_return_trace_index()
  %3!= dbg_block_begin()
  %4!= dbg_stmt(2:9)
  %5 = alloc(*[*]const u8)
  %6!= store_safe(%5, %0!)
  %7 = bitcast(*const [*]const u8, %5!)
  %8!= dbg_stmt(2:15)
  %9 = load([*]const u8, %7!)
  %10 = ptr_add([*]const u8, %9!, @Air.Inst.Ref.zero_usize)
  %11 = bitcast(*const [64]u8, %10!)
  %12 = load([64]u8, %11!)
  %13 = bitcast(@Vector(64, u8), %12!)
  %14!= dbg_stmt(2:9)
  %15!= call(<fn (@Vector(64, u8)) callconv(.C) void, (function 'next')>, [%13!])
  %16!= dbg_block_end()
  %18!= ret(@Air.Inst.Ref.void_value)
# End Function AIR: tmp.x

# Begin Function AIR: tmp.next:
# Total AIR+Liveness bytes: 264B
# AIR Instructions:         12 (108B)
# AIR Extra Data:           13 (52B)
# Liveness tomb_bits:       8B
# Liveness Extra Data:      0 (0B)
# Liveness special table:   0 (0B)
  %0 = arg(@Vector(64, u8), 0)
  %1!= save_err_return_trace_index()
  %3!= dbg_block_begin()
  %4!= dbg_stmt(2:5)
  %6 = bitcast([2]@Vector(32, u8), %0!)
  %7!= dbg_var_val(%6!, "chunks")
  %8!= dbg_stmt(3:5)
  %9!= dbg_block_end()
  %11!= ret(@Air.Inst.Ref.void_value)
# End Function AIR: tmp.next

@travisstaloch
Copy link
Sponsor Contributor Author

travisstaloch commented Nov 24, 2023

No idea if this is a helpful clue, but just wanted to note that the error message shown when running the original file changed from Segmentation fault at address 0x0 to General protection exception (no address available) now with 0.12.0-dev.1717+54f4abae2.

$ zig version
0.12.0-dev.1717+54f4abae2
$ zig run /tmp/tmp2.zig 
General protection exception (no address available)
/tmp/tmp2.zig:29:5: 0x21cf5a in next (tmp2)
    const chunks = @as([2]u8x32, @bitCast(input_vec));
    ^
/tmp/tmp2.zig:25:13: 0x21cfd1 in main (tmp2)
    try next(input[0..STEP_SIZE].*);
            ^
/home/travis/dev/zig/zig/download/0.12.0-dev.1717+54f4abae2/files/lib/std/start.zig:585:37: 0x21cf17 in posixCallMainAndExit (tmp2)
            const result = root.main() catch |err| {
                                    ^
/home/travis/dev/zig/zig/download/0.12.0-dev.1717+54f4abae2/files/lib/std/start.zig:253:5: 0x21ca41 in _start (tmp2)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)
Aborted (core dumped)

@matu3ba
Copy link
Contributor

matu3ba commented Nov 29, 2023

No idea if this is a helpful clue, General protection exception (no address available)

https://en.wikipedia.org/wiki/General_protection_fault

  • faulting program accesses memory that it should not access
  • In terms of the x86 architecture, general protection faults are specific to segmentation-based protection when it comes to memory accesses. However, general protection faults are still used to report other protection violations (aside from memory access violations) when paging is used, such as the use of instructions not accessible from the current privilege level (CPL).

Does this only reproduce on x86 or is the error different on other archs? Does it reproduce with baseline?

@travisstaloch
Copy link
Sponsor Contributor Author

Does this only reproduce on x86 or is the error different on other archs? Does it reproduce with baseline?

i just checked with a slightly newer zig 0.12.0-dev.1753+a98d4a66e and this still segfaults on my system "target": "x86_64-linux.6.5.6...6.5.6-gnu.2.35",.

however it does not segfault with -mcpu=baseline, the program runs to completion:

/tmp $ zig run /tmp/tmp.zig -mcpu=baseline
/tmp $ echo $?
0

i'm not sure about other archs. i have qemu with its standard arches but i'm not sure whether an emulated report is valuable here. let me know if there are any specific commands that i could run and report.

@travisstaloch
Copy link
Sponsor Contributor Author

I did a little more investigating on this. It seems to only affect Debug and ReleaseSmall

I slightly modified the original, only adding a couple std.mem.doNotOptimizeAway()s. ReleaseFast and Safe seem to be ok but Debug is the same and Small segfaults:

const std = @import("std");

pub const STEP_SIZE = 64;
pub const u8x32 = @Vector(32, u8);
pub const u8x64 = @Vector(64, u8);

pub fn main() !void {
    const input: []const u8 =
        \\{
        \\    "Width": 800,
        \\    "Height": 600,
        \\    "Title": "View from my room",
        \\    "Url": "http://ex.com/img.png",
        \\    "Private": false,
        \\    "Owner": null
        \\}
    ;

    // this works fine.
    const input_vec: u8x64 = input[0..STEP_SIZE].*;
    const chunks = @as([2]u8x32, @bitCast(input_vec));
    std.mem.doNotOptimizeAway(chunks);

    // the segfault only happens when passing the 64 byte vector to a function
    next(input[0..STEP_SIZE].*);
}

fn next(input_vec: u8x64) void {
    const chunks = @as([2]u8x32, @bitCast(input_vec));
    std.mem.doNotOptimizeAway(chunks);
}
/tmp $ zig run /tmp/tmp.zig 
General protection exception (no address available)
/tmp/tmp.zig:29:5: 0x21d0fa in next (tmp)
    const chunks = @as([2]u8x32, @bitCast(input_vec));
    ^
/tmp/tmp.zig:25:9: 0x21d063 in main (tmp)
    next(input[0..STEP_SIZE].*);
        ^
/home/travis/dev/zig/zig/download/0.12.0-dev.1753+a98d4a66e/files/lib/std/start.zig:585:37: 0x21cfd7 in posixCallMainAndExit (tmp)
            const result = root.main() catch |err| {
                                    ^
/home/travis/dev/zig/zig/download/0.12.0-dev.1753+a98d4a66e/files/lib/std/start.zig:253:5: 0x21cb01 in _start (tmp)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x0 in ??? (???)
Aborted (core dumped)
/tmp $ zig run /tmp/tmp.zig -OReleaseFast
/tmp $ zig run /tmp/tmp.zig -OReleaseSafe
/tmp $ zig run /tmp/tmp.zig -OReleaseSmall
Segmentation fault (core dumped)

@travisstaloch
Copy link
Sponsor Contributor Author

oh and here it is on godbolt too: https://godbolt.org/z/h6zjMaEaE

@N00byEdge
Copy link
Sponsor Contributor

N00byEdge commented Nov 30, 2023

I'd just like to add that a general protection fault in long mode either comes from:

  • Accessing a descriptor you don't have access to (I assume you're not playing with these registers)
  • Dereferencing a noncanonical pointer (top bits aren't the same, bad values for example could be 0xAAAA..., 0xF532..., etc, only 0x0000... or 0xFFFF... is okay)

The only one of these I've seen happen in non-OS zig code is the latter, where the top bits (and possibly bottom too, but that's irrelevant) of a pointer were undefined.

They are kind of like page faults in that you get them from dereferencing a bad pointer, but a different kind of bad.

@travisstaloch
Copy link
Sponsor Contributor Author

travisstaloch commented Nov 30, 2023

just a guess, but my hunch is that this bug has something to do with the stack frame size. i noticed that adding a print statement like this prevents the miscompilation:

fn next(input_vec: u8x64) void {
    const chunks = @as([2]u8x32, @bitCast(input_vec));
    std.mem.doNotOptimizeAway(chunks);
    std.debug.print("{any}\n", .{chunks});
}
/tmp $ zig run /tmp/tmp.zig
{ { 123, 10, 32, 32, 32, 32, 34, 87, 105, 100, 116, 104, 34, 58, 32, 56, 48, 48, 44, 10, 32, 32, 32, 32, 34, 72, 101, 105, 103, 104, 116, 34 }, { 58, 32, 54, 48, 48, 44, 10, 32, 32, 32, 32, 34, 84, 105, 116, 108, 101, 34, 58, 32, 34, 86, 105, 101, 119, 32, 102, 114, 111, 109, 32, 109 } }
/tmp $ echo $?
0

@travisstaloch
Copy link
Sponsor Contributor Author

here is debug (reproduces fault) side by side with release fast (no fault)

https://godbolt.org/z/15G64WMMh

@travisstaloch
Copy link
Sponsor Contributor Author

i notice there are some differences in the debug mode assembly generated by 0.10 https://godbolt.org/z/q36hveP7c and trunk https://godbolt.org/z/MGWjjz595. not sure if this explains anything. just wanted to mention because this did work in 0.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend-llvm The LLVM backend outputs an LLVM IR Module. bug Observed behavior contradicts documented or intended behavior regression It worked in a previous version of Zig, but stopped working.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants