Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M1 Max jit_write_protect issue #1470

Closed
geohot opened this issue Oct 27, 2021 · 48 comments
Closed

M1 Max jit_write_protect issue #1470

geohot opened this issue Oct 27, 2021 · 48 comments
Labels

Comments

@geohot
Copy link
Contributor

geohot commented Oct 27, 2021

Moved to Unicorn 2 for M1 support. Getting this ~50% of the time

MIPS CPU, with Go bindings if it matters. Perhaps someone has an idea.

Process 34232 stopped
* thread #10, stop reason = EXC_BAD_ACCESS (code=2, address=0x280000098)
    frame #0: 0x0000000101040a14 libunicorn.2.dylib`tb_gen_code_mips + 304
libunicorn.2.dylib`tb_gen_code_mips:
->  0x101040a14 <+304>: str    x8, [x9, #0x18]
    0x101040a18 <+308>: ldur   w8, [x29, #-0x14]
    0x101040a1c <+312>: ldur   x9, [x29, #-0x38]
    0x101040a20 <+316>: str    w8, [x9]
Target 0: (mipsevm.test) stopped.
(lldb) bt
* thread #10, stop reason = EXC_BAD_ACCESS (code=2, address=0x280000098)
  * frame #0: 0x0000000101040a14 libunicorn.2.dylib`tb_gen_code_mips + 304
    frame #1: 0x000000010102b74c libunicorn.2.dylib`tb_find + 92
    frame #2: 0x000000010102b1b0 libunicorn.2.dylib`cpu_exec_mips + 244
    frame #3: 0x0000000100fd8ce0 libunicorn.2.dylib`tcg_cpu_exec + 76
    frame #4: 0x0000000100fd8c0c libunicorn.2.dylib`resume_all_vcpus_mips + 96
    frame #5: 0x0000000100fd8dfc libunicorn.2.dylib`vm_start_mips + 24
    frame #6: 0x0000000100fc676c libunicorn.2.dylib`uc_emu_start + 352
    frame #7: 0x00000001002f7f04 mipsevm.test`_cgo_81152a5834e5_Cfunc_uc_emu_start + 44

Speed isn't that important to me, is there any way to disable threading?

@wtdcode
Copy link
Member

wtdcode commented Oct 27, 2021

Hello, unicorn is designed to be single-thread internally. It's probably some null pointer dereference. Unfortunately, I don't have an M1 machine for testing purpose currently.

@geohot
Copy link
Contributor Author

geohot commented Oct 27, 2021

It has to be something with threads, it only happens 50% of the time

@wtdcode
Copy link
Member

wtdcode commented Oct 27, 2021

It has to be something with threads, it only happens 50% of the time

I'm 100% sure Unicorn2 internally is single-threaded unless you are using timeout option in uc_emu_start. When I added support for M1, that also happened and turned out that it was some undefined behavior. You may build a debug version and paste the source line and I may help.

@geohot
Copy link
Contributor Author

geohot commented Oct 27, 2021

Built with debug

Process 51353 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x280000098)
    frame #0: 0x00000001012109b4 libunicorn.2.dylib`tb_gen_code_mips(cpu=0x00000001300c8000, pc=0, cs_base=0, flags=268435632, cflags=-16777216) at translate-all.c:1512:16
   1509     }
   1510
   1511     gen_code_buf = tcg_ctx->code_gen_ptr;
-> 1512     tb->tc.ptr = gen_code_buf;
   1513     tb->pc = pc;
   1514     tb->cs_base = cs_base;
   1515     tb->flags = flags;
Target 0: (mipsevm.test) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x280000098)
  * frame #0: 0x00000001012109b4 libunicorn.2.dylib`tb_gen_code_mips(cpu=0x00000001300c8000, pc=0, cs_base=0, flags=268435632, cflags=-16777216) at translate-all.c:1512:16
    frame #1: 0x00000001011fb6ec libunicorn.2.dylib`tb_find(cpu=0x00000001300c8000, last_tb=0x0000000000000000, tb_exit=0, cf_mask=0) at cpu-exec.c:252:14
    frame #2: 0x00000001011fb150 libunicorn.2.dylib`cpu_exec_mips(uc=0x0000000129008200, cpu=0x00000001300c8000) at cpu-exec.c:566:18
    frame #3: 0x00000001011a8c80 libunicorn.2.dylib`tcg_cpu_exec(uc=0x0000000129008200) at cpus.c:95:17
    frame #4: 0x00000001011a8bac libunicorn.2.dylib`resume_all_vcpus_mips(uc=0x0000000129008200) at cpus.c:183:13
    frame #5: 0x00000001011a8d9c libunicorn.2.dylib`vm_start_mips(uc=0x0000000129008200) at cpus.c:203:5
    frame #6: 0x000000010119670c libunicorn.2.dylib`uc_emu_start(uc=0x0000000129008200, begin=0, until=1588396036, timeout=0, count=0) at uc.c:734:5
    frame #7: 0x00000001002f7f04 mipsevm.test`_cgo_81152a5834e5_Cfunc_uc_emu_start + 44

tb does have a value, it's not null

(lldb) print tb;
(TranslationBlock *) $0 = 0x0000000280000080

@wtdcode
Copy link
Member

wtdcode commented Oct 27, 2021

I see. You are getting unalignment access.
See this line:
> stop reason = EXC_BAD_ACCESS (code=2, address=0x280000098)

Sorry I get it wrong.

@wtdcode
Copy link
Member

wtdcode commented Oct 27, 2021

Looks like getting an OOB access, pretty strange.

@geohot
Copy link
Contributor Author

geohot commented Oct 27, 2021

It's super weird, the issue seems to be in alloc_code_gen_buffer, I can't memset the buffer to 0

@wtdcode
Copy link
Member

wtdcode commented Oct 27, 2021

It's super weird, the issue seems to be in alloc_code_gen_buffer, I can't memset the buffer to 0

It's W^X protection on Apple Silicon. A JIT buffer can't be granted write and execute permission at the same time. I guess it's some allocation problem but again unfortunately I'm unable to test. Maybe you could trace how that buffer is mmap-ed.

@geohot
Copy link
Contributor Author

geohot commented Oct 27, 2021

Yes, it seems like this! The buffer isn't writable...but the weirdest thing is that it works sometimes...

I added an mprotect that fails after.

This https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon

@wtdcode
Copy link
Member

wtdcode commented Oct 27, 2021

Yes, it seems like this! The buffer isn't writable...but the weirdest thing is that it works sometimes...

I added an mprotect that fails after.

You couldn't use protect. Apple has a private API.

@geohot
Copy link
Contributor Author

geohot commented Oct 27, 2021

Ahh, pthread_jit_write_protect_np I see. Okay, I think I can trace this down. It's somewhat supported, I think it's just being called at the wrong time and that's the race.

@wtdcode
Copy link
Member

wtdcode commented Oct 27, 2021

Ahh, pthread_jit_write_protect_np I see. Okay, I think I can trace this down. It's somewhat supported, I think it's just being called at the wrong time and that's the race.

That patch is ported from upstream qemu (and from UTM in fact). Maybe from Unicorn we have to add those calls somewhere else. Anyway, you have this function:

static inline void jit_write_protect(int enabled)
{
    return pthread_jit_write_protect_np(enabled);
}

@geohot
Copy link
Contributor Author

geohot commented Oct 27, 2021

This fixes it, though I don't really understand why. It seemed like it was called with false earlier.

geohot@bd90068

@geohot geohot changed the title Unicorn 2 - M1 Max Race Condition Unicorn 2 - M1 Max jit_write_protect issue Oct 27, 2021
@geohot geohot changed the title Unicorn 2 - M1 Max jit_write_protect issue M1 Max jit_write_protect issue Oct 27, 2021
@wtdcode
Copy link
Member

wtdcode commented Oct 27, 2021

This fixes it, though I don't really understand why. It seemed like it was called with false earlier.

geohot@bd90068

Okay I know the root cause and would post a fix once I have M1 environment to test the fix.

@wtdcode
Copy link
Member

wtdcode commented Oct 27, 2021

This fixes it, though I don't really understand why. It seemed like it was called with false earlier.

geohot@bd90068

btw, could you post a piece of simple reproduction code?

@wtdcode wtdcode added the bug label Oct 27, 2021
@geohot
Copy link
Contributor Author

geohot commented Oct 27, 2021

The built in examples reproduce it on my machine.

@marysaka
Copy link

This fixes it, though I don't really understand why. It seemed like it was called with false earlier.

geohot@bd90068

Same issue here tho this doesn't fixed it for me on MBA M1 and macOS 12.0.1.

Also testing with an aarch64 context

@wtdcode
Copy link
Member

wtdcode commented Oct 29, 2021

This fixes it, though I don't really understand why. It seemed like it was called with false earlier.
geohot@bd90068

Same issue here tho this doesn't fixed it for me on MBA M1 and macOS 12.0.1.

Also testing with an aarch64 context

Any reproduction code?

@marysaka
Copy link

This fixes it, though I don't really understand why. It seemed like it was called with false earlier.
geohot@bd90068

Same issue here tho this doesn't fixed it for me on MBA M1 and macOS 12.0.1.
Also testing with an aarch64 context

Any reproduction code?

So after a bit more of researches, it seems to fix the issue for Unicorn itself.
However it should be noted that this break C# usage of it as it seems to make MAP_JIT regions allocated by other JIT break from what I can see.

My wild guess is that the patch sent here result in all MAP_JIT pages from current thread to be RW-, resulting in a crash when coming back to JITed code.

@wtdcode
Copy link
Member

wtdcode commented Oct 30, 2021

This fixes it, though I don't really understand why. It seemed like it was called with false earlier.
geohot@bd90068

Same issue here tho this doesn't fixed it for me on MBA M1 and macOS 12.0.1.
Also testing with an aarch64 context

Any reproduction code?

So after a bit more of researches, it seems to fix the issue for Unicorn itself. However it should be noted that this break C# usage of it as it seems to make MAP_JIT regions allocated by other JIT break from what I can see.

My wild guess is that the patch sent here result in all MAP_JIT pages from current thread to be RW-, resulting in a crash when coming back to JITed code.

This of course is a quick and dirty fix. I need some production code to publish a real fix.

@marysaka
Copy link

marysaka commented Oct 30, 2021

This fixes it, though I don't really understand why. It seemed like it was called with false earlier.
geohot@bd90068

Same issue here tho this doesn't fixed it for me on MBA M1 and macOS 12.0.1.
Also testing with an aarch64 context

Any reproduction code?

So after a bit more of researches, it seems to fix the issue for Unicorn itself. However it should be noted that this break C# usage of it as it seems to make MAP_JIT regions allocated by other JIT break from what I can see.
My wild guess is that the patch sent here result in all MAP_JIT pages from current thread to be RW-, resulting in a crash when coming back to JITed code.

This of course is a quick and dirty fix. I need some production code to publish a real fix.

I don't have much production code to share in public atm.
So for a quick simple reproducer running on .NET 6 RC 2:

using System;
using System.Runtime.InteropServices;

namespace testing
{
    public class Test
    {
        [DllImport("unicorn")]
        public static extern uint uc_version(out uint major, out uint minor);

        private const uint UC_ARCH_ARM64 = 2;
        private const uint UC_MODE_LITTLE_ENDIAN = 0;

        [DllImport("unicorn")]
        public static extern int uc_open(uint arch, uint mode, out IntPtr uc);

        [DllImport("unicorn")]
        public static extern int uc_close(IntPtr uc);

        public static void Main(string[] args)
        {
            Console.WriteLine("uc_version");

            uc_version(out uint major, out uint minor);

            Console.WriteLine($"Unicorn v{major}.{minor}");

            Console.WriteLine("uc_open");

            int err = uc_open(UC_ARCH_ARM64, UC_MODE_LITTLE_ENDIAN, out IntPtr uc);

            Console.WriteLine("Crashed?");

            if (err == 0)
            {
                uc_close(uc);
            }

            Console.WriteLine("Done.");
        }
    }
}

Console output:

uc_version
Unicorn v2.0
uc_open
[1]    11258 bus error  ./test_debug_unicorn

@wtdcode
Copy link
Member

wtdcode commented Oct 30, 2021

This fixes it, though I don't really understand why. It seemed like it was called with false earlier.
geohot@bd90068

Same issue here tho this doesn't fixed it for me on MBA M1 and macOS 12.0.1.
Also testing with an aarch64 context

Any reproduction code?

So after a bit more of researches, it seems to fix the issue for Unicorn itself. However it should be noted that this break C# usage of it as it seems to make MAP_JIT regions allocated by other JIT break from what I can see.
My wild guess is that the patch sent here result in all MAP_JIT pages from current thread to be RW-, resulting in a crash when coming back to JITed code.

This of course is a quick and dirty fix. I need some production code to publish a real fix.

I don't have much production code to share in public atm. So for a quick simple reproducer running on .NET 6 RC 2:

using System;
using System.Runtime.InteropServices;

namespace testing
{
    public class Test
    {
        [DllImport("unicorn")]
        public static extern uint uc_version(out uint major, out uint minor);

        private const uint UC_ARCH_ARM64 = 2;
        private const uint UC_MODE_LITTLE_ENDIAN = 0;

        [DllImport("unicorn")]
        public static extern int uc_open(uint arch, uint mode, out IntPtr uc);

        [DllImport("unicorn")]
        public static extern int uc_close(IntPtr uc);

        public static void Main(string[] args)
        {
            Console.WriteLine("uc_version");

            uc_version(out uint major, out uint minor);

            Console.WriteLine($"Unicorn v{major}.{minor}");

            Console.WriteLine("uc_open");

            int err = uc_open(UC_ARCH_ARM64, UC_MODE_LITTLE_ENDIAN, out IntPtr uc);

            Console.WriteLine("Crashed?");

            if (err == 0)
            {
                uc_close(uc);
            }

            Console.WriteLine("Done.");
        }
    }
}

Console output:

uc_version
Unicorn v2.0
uc_open
[1]    11258 bus error  ./test_debug_unicorn

Sry it's a typo. I mean 'reproduction' code. Could you make a double check if the equivalent C code also produces the same crash?

@marysaka
Copy link

This fixes it, though I don't really understand why. It seemed like it was called with false earlier.
geohot@bd90068

Same issue here tho this doesn't fixed it for me on MBA M1 and macOS 12.0.1.
Also testing with an aarch64 context

Any reproduction code?

So after a bit more of researches, it seems to fix the issue for Unicorn itself. However it should be noted that this break C# usage of it as it seems to make MAP_JIT regions allocated by other JIT break from what I can see.
My wild guess is that the patch sent here result in all MAP_JIT pages from current thread to be RW-, resulting in a crash when coming back to JITed code.

This of course is a quick and dirty fix. I need some production code to publish a real fix.

I don't have much production code to share in public atm. So for a quick simple reproducer running on .NET 6 RC 2:

using System;
using System.Runtime.InteropServices;

namespace testing
{
    public class Test
    {
        [DllImport("unicorn")]
        public static extern uint uc_version(out uint major, out uint minor);

        private const uint UC_ARCH_ARM64 = 2;
        private const uint UC_MODE_LITTLE_ENDIAN = 0;

        [DllImport("unicorn")]
        public static extern int uc_open(uint arch, uint mode, out IntPtr uc);

        [DllImport("unicorn")]
        public static extern int uc_close(IntPtr uc);

        public static void Main(string[] args)
        {
            Console.WriteLine("uc_version");

            uc_version(out uint major, out uint minor);

            Console.WriteLine($"Unicorn v{major}.{minor}");

            Console.WriteLine("uc_open");

            int err = uc_open(UC_ARCH_ARM64, UC_MODE_LITTLE_ENDIAN, out IntPtr uc);

            Console.WriteLine("Crashed?");

            if (err == 0)
            {
                uc_close(uc);
            }

            Console.WriteLine("Done.");
        }
    }
}

Console output:

uc_version
Unicorn v2.0
uc_open
[1]    11258 bus error  ./test_debug_unicorn

Sry it's a typo. I mean 'reproduction' code. Could you make a double check if the equivalent C code also produces the same crash?

No crash with the following C variant:

#include <unicorn/unicorn.h>
#include <stdio.h>


int main (int ac, char **av) {
    unsigned int major;
    unsigned int minor;

    uc_version(&major, &minor);

    printf("Unicorn v%d.%d\n", major, minor);

    printf("uc_open\n");

    uc_engine *engine = NULL;

    int res = uc_open(UC_ARCH_ARM64, UC_MODE_LITTLE_ENDIAN, &engine);

    printf("Crashed?\n");

    if (res == 0)
    {
        uc_close(engine);
    }

    printf("Done.\n");

    return 0;
}

Console output:

Unicorn v2.0
uc_open
Crashed?
Done.

@wtdcode
Copy link
Member

wtdcode commented Oct 30, 2021

This fixes it, though I don't really understand why. It seemed like it was called with false earlier.
geohot@bd90068

Same issue here tho this doesn't fixed it for me on MBA M1 and macOS 12.0.1.
Also testing with an aarch64 context

Any reproduction code?

So after a bit more of researches, it seems to fix the issue for Unicorn itself. However it should be noted that this break C# usage of it as it seems to make MAP_JIT regions allocated by other JIT break from what I can see.
My wild guess is that the patch sent here result in all MAP_JIT pages from current thread to be RW-, resulting in a crash when coming back to JITed code.

This of course is a quick and dirty fix. I need some production code to publish a real fix.

I don't have much production code to share in public atm. So for a quick simple reproducer running on .NET 6 RC 2:

using System;
using System.Runtime.InteropServices;

namespace testing
{
    public class Test
    {
        [DllImport("unicorn")]
        public static extern uint uc_version(out uint major, out uint minor);

        private const uint UC_ARCH_ARM64 = 2;
        private const uint UC_MODE_LITTLE_ENDIAN = 0;

        [DllImport("unicorn")]
        public static extern int uc_open(uint arch, uint mode, out IntPtr uc);

        [DllImport("unicorn")]
        public static extern int uc_close(IntPtr uc);

        public static void Main(string[] args)
        {
            Console.WriteLine("uc_version");

            uc_version(out uint major, out uint minor);

            Console.WriteLine($"Unicorn v{major}.{minor}");

            Console.WriteLine("uc_open");

            int err = uc_open(UC_ARCH_ARM64, UC_MODE_LITTLE_ENDIAN, out IntPtr uc);

            Console.WriteLine("Crashed?");

            if (err == 0)
            {
                uc_close(uc);
            }

            Console.WriteLine("Done.");
        }
    }
}

Console output:

uc_version
Unicorn v2.0
uc_open
[1]    11258 bus error  ./test_debug_unicorn

Sry it's a typo. I mean 'reproduction' code. Could you make a double check if the equivalent C code also produces the same crash?

No crash with the following C variant:

#include <unicorn/unicorn.h>
#include <stdio.h>


int main (int ac, char **av) {
    unsigned int major;
    unsigned int minor;

    uc_version(&major, &minor);

    printf("Unicorn v%d.%d\n", major, minor);

    printf("uc_open\n");

    uc_engine *engine = NULL;

    int res = uc_open(UC_ARCH_ARM64, UC_MODE_LITTLE_ENDIAN, &engine);

    printf("Crashed?\n");

    if (res == 0)
    {
        uc_close(engine);
    }

    printf("Done.\n");

    return 0;
}

Console output:

Unicorn v2.0
uc_open
Crashed?
Done.

Okay, then it seems so indeed. I would have a look once I have some M1 machine to debug.

@marysaka
Copy link

So I removed the original patch by @geohot, did a full clean build and it seems to not have effect on my side in the end.

I might have missed to check switching to the dev branch and not applying the patch at first sorry 😅

Maybe it might be worth making another issue for the C# binding issues as it is starting to be a bit out of topic?

@wtdcode
Copy link
Member

wtdcode commented Oct 30, 2021

So I removed the original patch by @geohot, did a full clean build and it seems to not have effect on my side in the end.

I might have missed to check switching to the dev branch and not applying the patch at first sorry 😅

Maybe it might be worth making another issue for the C# binding issues as it is starting to be a bit out of topic?

Nevermind, just go head. I would check if it doesn't work on UC2.

@wtdcode
Copy link
Member

wtdcode commented Nov 5, 2021

The built in examples reproduce it on my machine.

I tested sample.go with go bindings and I couldn't reproduce the crash

@wtdcode
Copy link
Member

wtdcode commented Nov 5, 2021

The built in examples reproduce it on my machine.

I can't reproduce your crash on an M1 machine. Is it a bug caused by M1 MAX?

So I removed the original patch by @geohot, did a full clean build and it seems to not have effect on my side in the end.

I might have missed to check switching to the dev branch and not applying the patch at first sorry 😅

Maybe it might be worth making another issue for the C# binding issues as it is starting to be a bit out of topic?

I also tried switch write protection before function calls but didn't get crash.

@geohot
Copy link
Contributor Author

geohot commented Nov 6, 2021

The code I've been working on is open source. https://github.com/geohot/cannon

Try "go test" in mipsevm using upstream unicorn (change https://github.com/geohot/cannon/blob/master/build_unicorn.sh)

I can try to post a minimum repro tomorrow. Though I'm 90% sure the mips sample crashed too.

The crash was not 100% of the time, in my app maybe 80%.

@wtdcode
Copy link
Member

wtdcode commented Nov 6, 2021

The code I've been working on is open source. https://github.com/geohot/cannon

Try "go test" in mipsevm using upstream unicorn (change https://github.com/geohot/cannon/blob/master/build_unicorn.sh)

I can try to post a minimum repro tomorrow. Though I'm 90% sure the mips sample crashed too.

The crash was not 100% of the time, in my app maybe 80%.

I tried your cannon exactly yesterday, both on your latest commit and the one when you fired this issue. Both worked fine.

A minimum reproduction script would be of great help.

@geohot
Copy link
Contributor Author

geohot commented Nov 7, 2021

So I tried a simple repro and couldn't get it. I can repro in cannon though

# clone and build cannon
./build_unicorn.sh
cd unicorn2
git revert bd90068fa014bb082c0b7ef6d20f7bccd3f581e0 # my fix
make -j8
cd mipsevm
go test -run TestCompareUnicornEvm

Still working on a min repro with upstream

My bad, you can't do it with the examples. I was getting confused with unicorn (not 2), which the examples do trigger it. This seems way more subtle.

@wtdcode
Copy link
Member

wtdcode commented Nov 7, 2021

make -j8

AFAIK, unicorn2 doesn't have a Makefile so make -j8 won't work.

@geohot
Copy link
Contributor Author

geohot commented Nov 7, 2021

The build unicorn script does cmake in the directory first. That's not a full repro, that's a change. Updated and still working on minimal repro

@wtdcode
Copy link
Member

wtdcode commented Nov 7, 2021

The build unicorn script does cmake in the directory first. That's not a full repro, that's a change. Updated and still working on minimal repro

I get:

2021/11/08 01:45:23 open ../artifacts/contracts/MIPS.sol/MIPS.json: no such file or directory
exit status 1

@geohot
Copy link
Contributor Author

geohot commented Nov 7, 2021

Yea you have to build it. Working on a simpler repro. It's subtle, seems to depend on the heap state. If I remove a completely unrelated map write it doesn't crash.

@geohot
Copy link
Contributor Author

geohot commented Nov 7, 2021

Okay pushed a repro that doesn't depend on that:

cd mipsevm
go test -run TestUnicornCrash

@wtdcode
Copy link
Member

wtdcode commented Nov 7, 2021

Okay pushed a repro that doesn't depend on that:

cd mipsevm
go test -run TestUnicornCrash
--- FAIL: TestUnicornCrash (0.00s)
panic: runtime error: slice bounds out of range [:192] with capacity 0 [recovered]
	panic: runtime error: slice bounds out of range [:192] with capacity 0

Looks like it crashes inside go code?

@geohot
Copy link
Contributor Author

geohot commented Nov 7, 2021

You have to run it from the mipsevm dir. Sorry, cannon really isn't built for this.

kafka@tubby:~/fun/cannon/mipsevm$ go test -run TestUnicornCrash
fatal error: unexpected signal during runtime execution
[signal SIGBUS: bus error code=0x1 addr=0x280000098 pc=0x105322a60]

runtime stack:
runtime.throw({0x10453096f, 0x2a})
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/runtime/panic.go:1198 +0x54
runtime.sigpanic()
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/runtime/signal_unix.go:719 +0x230

goroutine 4 [syscall]:
runtime.cgocall(0x10451efe8, 0x1400005ad78)
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/runtime/cgocall.go:156 +0x50 fp=0x1400005ad30 sp=0x1400005acf0 pc=0x104227ac0
github.com/unicorn-engine/unicorn/bindings/go/unicorn._Cfunc_uc_emu_start(0x115008200, 0x0, 0x5ead0004, 0x0, 0x0)
        _cgo_gotypes.go:249 +0x44 fp=0x1400005ad70 sp=0x1400005ad30 pc=0x10433ab34
github.com/unicorn-engine/unicorn/bindings/go/unicorn.(*uc).StartWithOptions.func1(0x1400006db00, 0x0, 0x5ead0004, 0x1400005ae48)
        /Users/kafka/fun/cannon/unicorn2/bindings/go/unicorn/unicorn.go:112 +0x8c fp=0x1400005add0 sp=0x1400005ad70 pc=0x10433c89c
github.com/unicorn-engine/unicorn/bindings/go/unicorn.(*uc).StartWithOptions(0x1400006db00, 0x0, 0x5ead0004, 0x1400005ae48)
        /Users/kafka/fun/cannon/unicorn2/bindings/go/unicorn/unicorn.go:112 +0x40 fp=0x1400005ae10 sp=0x1400005add0 pc=0x10433c7c0
github.com/unicorn-engine/unicorn/bindings/go/unicorn.(*uc).Start(0x1400006db00, 0x0, 0x5ead0004)
        /Users/kafka/fun/cannon/unicorn2/bindings/go/unicorn/unicorn.go:117 +0x48 fp=0x1400005ae60 sp=0x1400005ae10 pc=0x10433c908
mipsevm.TestUnicornCrash(0x14000125380)
        /Users/kafka/fun/cannon/mipsevm/unicorn_crash_test.go:36 +0x260 fp=0x1400005af70 sp=0x1400005ae60 pc=0x10451e160
testing.tRunner(0x14000125380, 0x104637bf8)
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1259 +0x104 fp=0x1400005afc0 sp=0x1400005af70 pc=0x1042ff9f4
runtime.goexit()
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/runtime/asm_arm64.s:1133 +0x4 fp=0x1400005afc0 sp=0x1400005afc0 pc=0x104290794
created by testing.(*T).Run
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1306 +0x328

goroutine 1 [chan receive]:
testing.(*T).Run(0x140001251e0, {0x10452440e, 0x10}, 0x104637bf8)
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1307 +0x344
testing.runTests.func1(0x140001251e0)
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1598 +0x80
testing.tRunner(0x140001251e0, 0x14000169d18)
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1259 +0x104
testing.runTests(0x1400000e768, {0x10495b4c0, 0x9, 0x9}, {0xc05a231226e4c980, 0x8bb2e27012, 0x104962100})
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1596 +0x3ec
testing.(*M).Run(0x14000192000)
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1504 +0x4fc
main.main()
        _testmain.go:59 +0x17c
exit status 2
FAIL    mipsevm 0.116s

And 1 in 10 times it'll pass

@wtdcode
Copy link
Member

wtdcode commented Nov 7, 2021

You have to run it from the mipsevm dir. Sorry, cannon really isn't built for this.

kafka@tubby:~/fun/cannon/mipsevm$ go test -run TestUnicornCrash
fatal error: unexpected signal during runtime execution
[signal SIGBUS: bus error code=0x1 addr=0x280000098 pc=0x105322a60]

runtime stack:
runtime.throw({0x10453096f, 0x2a})
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/runtime/panic.go:1198 +0x54
runtime.sigpanic()
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/runtime/signal_unix.go:719 +0x230

goroutine 4 [syscall]:
runtime.cgocall(0x10451efe8, 0x1400005ad78)
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/runtime/cgocall.go:156 +0x50 fp=0x1400005ad30 sp=0x1400005acf0 pc=0x104227ac0
github.com/unicorn-engine/unicorn/bindings/go/unicorn._Cfunc_uc_emu_start(0x115008200, 0x0, 0x5ead0004, 0x0, 0x0)
        _cgo_gotypes.go:249 +0x44 fp=0x1400005ad70 sp=0x1400005ad30 pc=0x10433ab34
github.com/unicorn-engine/unicorn/bindings/go/unicorn.(*uc).StartWithOptions.func1(0x1400006db00, 0x0, 0x5ead0004, 0x1400005ae48)
        /Users/kafka/fun/cannon/unicorn2/bindings/go/unicorn/unicorn.go:112 +0x8c fp=0x1400005add0 sp=0x1400005ad70 pc=0x10433c89c
github.com/unicorn-engine/unicorn/bindings/go/unicorn.(*uc).StartWithOptions(0x1400006db00, 0x0, 0x5ead0004, 0x1400005ae48)
        /Users/kafka/fun/cannon/unicorn2/bindings/go/unicorn/unicorn.go:112 +0x40 fp=0x1400005ae10 sp=0x1400005add0 pc=0x10433c7c0
github.com/unicorn-engine/unicorn/bindings/go/unicorn.(*uc).Start(0x1400006db00, 0x0, 0x5ead0004)
        /Users/kafka/fun/cannon/unicorn2/bindings/go/unicorn/unicorn.go:117 +0x48 fp=0x1400005ae60 sp=0x1400005ae10 pc=0x10433c908
mipsevm.TestUnicornCrash(0x14000125380)
        /Users/kafka/fun/cannon/mipsevm/unicorn_crash_test.go:36 +0x260 fp=0x1400005af70 sp=0x1400005ae60 pc=0x10451e160
testing.tRunner(0x14000125380, 0x104637bf8)
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1259 +0x104 fp=0x1400005afc0 sp=0x1400005af70 pc=0x1042ff9f4
runtime.goexit()
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/runtime/asm_arm64.s:1133 +0x4 fp=0x1400005afc0 sp=0x1400005afc0 pc=0x104290794
created by testing.(*T).Run
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1306 +0x328

goroutine 1 [chan receive]:
testing.(*T).Run(0x140001251e0, {0x10452440e, 0x10}, 0x104637bf8)
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1307 +0x344
testing.runTests.func1(0x140001251e0)
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1598 +0x80
testing.tRunner(0x140001251e0, 0x14000169d18)
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1259 +0x104
testing.runTests(0x1400000e768, {0x10495b4c0, 0x9, 0x9}, {0xc05a231226e4c980, 0x8bb2e27012, 0x104962100})
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1596 +0x3ec
testing.(*M).Run(0x14000192000)
        /opt/homebrew/Cellar/go/1.17.2/libexec/src/testing/testing.go:1504 +0x4fc
main.main()
        _testmain.go:59 +0x17c
exit status 2
FAIL    mipsevm 0.116s

Yes, I did a full clone and ran in mipsevm dir.

@geohot
Copy link
Contributor Author

geohot commented Nov 7, 2021

pushed a change, try it now. idk that seems like it can't find the test file. just pushed another change to make it simpler

@wtdcode
Copy link
Member

wtdcode commented Nov 7, 2021

pushed a change, try it now. idk that seems like it can't find the test file. just pushed another change to make it simpler

Great I got your crash locally. I would have a look into it.

@geohot
Copy link
Contributor Author

geohot commented Nov 7, 2021

Nice!

Yea sorry I got confused with the examples, it was the 1.0 examples that failed (for the same reason, but there was no fix in 1.0 at all, not the almost right fix in 2.0)

It's weird, if you remove that stuff with the map in the test, it passes. But that's completely unrelated, it's just heap grooming.

Either way, glad you reproed it.

Confirming it always passes if you leave "bd90068fa014bb082c0b7ef6d20f7bccd3f581e0" in as well

@geohot
Copy link
Contributor Author

geohot commented Nov 7, 2021

This should be a standalone repro, no file required. Now even simpler

package main

import (
	"log"
	"testing"

	uc "github.com/unicorn-engine/unicorn/bindings/go/unicorn"
)

func TestUnicornCrash(t *testing.T) {
	mu, err := uc.NewUnicorn(uc.ARCH_MIPS, uc.MODE_32|uc.MODE_BIG_ENDIAN)
	if err != nil {
		log.Fatal(err)
	}

	// weird heap grooming (doesn't crash without this)
	junk := make(map[uint32](uint32))
	for i := 0; i < 1000000; i += 4 {
		junk[uint32(i)] = 0xaaaaaaaa
	}

	mu.Start(0, 4)
}

@wtdcode
Copy link
Member

wtdcode commented Nov 7, 2021

Looks like being related to dotnet/runtime#41991.

My wild guess is that your map allocation triggers golang internal thread scheduler, which brings ffi calls to a new thread.

@geohot
Copy link
Contributor Author

geohot commented Nov 7, 2021

Ahh, I'm not that familiar with golang internals, but this sounds very possible. If you do the heap grooming before the NewUnicorn, it doesn't crash.

Can confirm

runtime.LockOSThread()

before the map fixes it!

What a crazy runtime. I guess Go assumes they are saving and restoring all the relevant context for the OS thread, so they can schedule the goroutines anywhere. But they are not setting and restoring the pthread JIT state. I don't know what Go promises the programmer in this case and if this is supposed to be handled by the runtime or not.

Either way, a fun bug. I tried looking for exactly this at the beginning to see if my tid changed, but I must have missed it (gettid is a weird syscall). The solution is perhaps something like my fix but less hacky, to explicitly set the JIT state right before you expect to write/exec it and not assume it stays between calls.

@wtdcode
Copy link
Member

wtdcode commented Nov 7, 2021

Ahh, I'm not that familiar with golang internals, but this sounds very possible. If you do the heap grooming before the NewUnicorn, it doesn't crash.

Can confirm

runtime.LockOSThread()

before the map fixes it!

What a crazy runtime. I guess Go assumes they are saving and restoring all the relevant context for the OS thread, so they can schedule the goroutines anywhere. But they are not setting and restoring the pthread JIT state. I don't know what Go promises the programmer in this case and if this is supposed to be handled by the runtime or not.

Either way, a fun bug. I tried looking for exactly this at the beginning to see if my tid changed, but I must have missed it (gettid is a weird syscall). The solution is perhaps something like my fix, to explicitly set the JIT state right before you expect to write/exec it and not assume it stays between calls.

I did some debugging. The thread id doesn't change indeed but the state of JIT protection changes as dotnet/runtime#41991 suggests. I guess that is due to some thread reuse strategy. Anyway, even with LockOSThread we couldn't guarantee every UC API is called in the same thread and the state of JIT protection remains same across ffi calls so the only solution is to explicitly set the state like your hacks. I would push a fix to it.

Thank you for the reproduction script.

@geohot
Copy link
Contributor Author

geohot commented Nov 7, 2021

Cool, I'm not crazy! Because I did think I checked the tid correctly. I guess something must change the pthread_jit_write_protect_np state.

I'm so over security like this that makes programming harder and likely doesn't affect exploiters much at all. When I did browser exploits I loved "mitigations" because they only bothered noobs.

@wtdcode
Copy link
Member

wtdcode commented Nov 7, 2021

Fixed in 94a82ed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants