-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
M1 Max jit_write_protect issue #1470
Comments
Hello, unicorn is designed to be single-thread internally. It's probably some null pointer dereference. Unfortunately, I don't have an M1 machine for testing purpose currently. |
It has to be something with threads, it only happens 50% of the time |
I'm 100% sure Unicorn2 internally is single-threaded unless you are using |
Built with debug
tb does have a value, it's not null
|
Sorry I get it wrong. |
Looks like getting an OOB access, pretty strange. |
It's super weird, the issue seems to be in |
It's W^X protection on Apple Silicon. A JIT buffer can't be granted write and execute permission at the same time. I guess it's some allocation problem but again unfortunately I'm unable to test. Maybe you could trace how that buffer is mmap-ed. |
Yes, it seems like this! The buffer isn't writable...but the weirdest thing is that it works sometimes... I added an mprotect that fails after. |
You couldn't use protect. Apple has a private API. |
Ahh, pthread_jit_write_protect_np I see. Okay, I think I can trace this down. It's somewhat supported, I think it's just being called at the wrong time and that's the race. |
That patch is ported from upstream qemu (and from UTM in fact). Maybe from Unicorn we have to add those calls somewhere else. Anyway, you have this function: static inline void jit_write_protect(int enabled)
{
return pthread_jit_write_protect_np(enabled);
} |
This fixes it, though I don't really understand why. It seemed like it was called with false earlier. |
Okay I know the root cause and would post a fix once I have M1 environment to test the fix. |
btw, could you post a piece of simple reproduction code? |
The built in examples reproduce it on my machine. |
Same issue here tho this doesn't fixed it for me on MBA M1 and macOS 12.0.1. Also testing with an aarch64 context |
Any reproduction code? |
So after a bit more of researches, it seems to fix the issue for Unicorn itself. My wild guess is that the patch sent here result in all MAP_JIT pages from current thread to be RW-, resulting in a crash when coming back to JITed code. |
This of course is a quick and dirty fix. I need some production code to publish a real fix. |
I don't have much production code to share in public atm. using System;
using System.Runtime.InteropServices;
namespace testing
{
public class Test
{
[DllImport("unicorn")]
public static extern uint uc_version(out uint major, out uint minor);
private const uint UC_ARCH_ARM64 = 2;
private const uint UC_MODE_LITTLE_ENDIAN = 0;
[DllImport("unicorn")]
public static extern int uc_open(uint arch, uint mode, out IntPtr uc);
[DllImport("unicorn")]
public static extern int uc_close(IntPtr uc);
public static void Main(string[] args)
{
Console.WriteLine("uc_version");
uc_version(out uint major, out uint minor);
Console.WriteLine($"Unicorn v{major}.{minor}");
Console.WriteLine("uc_open");
int err = uc_open(UC_ARCH_ARM64, UC_MODE_LITTLE_ENDIAN, out IntPtr uc);
Console.WriteLine("Crashed?");
if (err == 0)
{
uc_close(uc);
}
Console.WriteLine("Done.");
}
}
} Console output:
|
Sry it's a typo. I mean 'reproduction' code. Could you make a double check if the equivalent C code also produces the same crash? |
No crash with the following C variant: #include <unicorn/unicorn.h>
#include <stdio.h>
int main (int ac, char **av) {
unsigned int major;
unsigned int minor;
uc_version(&major, &minor);
printf("Unicorn v%d.%d\n", major, minor);
printf("uc_open\n");
uc_engine *engine = NULL;
int res = uc_open(UC_ARCH_ARM64, UC_MODE_LITTLE_ENDIAN, &engine);
printf("Crashed?\n");
if (res == 0)
{
uc_close(engine);
}
printf("Done.\n");
return 0;
} Console output:
|
Okay, then it seems so indeed. I would have a look once I have some M1 machine to debug. |
So I removed the original patch by @geohot, did a full clean build and it seems to not have effect on my side in the end. I might have missed to check switching to the dev branch and not applying the patch at first sorry 😅 Maybe it might be worth making another issue for the C# binding issues as it is starting to be a bit out of topic? |
Nevermind, just go head. I would check if it doesn't work on UC2. |
I tested sample.go with go bindings and I couldn't reproduce the crash |
I can't reproduce your crash on an M1 machine. Is it a bug caused by M1 MAX?
I also tried switch write protection before function calls but didn't get crash. |
The code I've been working on is open source. https://github.com/geohot/cannon Try "go test" in mipsevm using upstream unicorn (change https://github.com/geohot/cannon/blob/master/build_unicorn.sh) I can try to post a minimum repro tomorrow. Though I'm 90% sure the mips sample crashed too. The crash was not 100% of the time, in my app maybe 80%. |
I tried your cannon exactly yesterday, both on your latest commit and the one when you fired this issue. Both worked fine. A minimum reproduction script would be of great help. |
So I tried a simple repro and couldn't get it. I can repro in cannon though
Still working on a min repro with upstream My bad, you can't do it with the examples. I was getting confused with unicorn (not 2), which the examples do trigger it. This seems way more subtle. |
AFAIK, unicorn2 doesn't have a Makefile so |
The build unicorn script does cmake in the directory first. That's not a full repro, that's a change. Updated and still working on minimal repro |
I get: 2021/11/08 01:45:23 open ../artifacts/contracts/MIPS.sol/MIPS.json: no such file or directory |
Yea you have to build it. Working on a simpler repro. It's subtle, seems to depend on the heap state. If I remove a completely unrelated map write it doesn't crash. |
Okay pushed a repro that doesn't depend on that:
|
Looks like it crashes inside go code? |
You have to run it from the mipsevm dir. Sorry, cannon really isn't built for this.
And 1 in 10 times it'll pass |
Yes, I did a full clone and ran in mipsevm dir. |
pushed a change, try it now. idk that seems like it can't find the test file. just pushed another change to make it simpler |
Great I got your crash locally. I would have a look into it. |
Nice! Yea sorry I got confused with the examples, it was the 1.0 examples that failed (for the same reason, but there was no fix in 1.0 at all, not the almost right fix in 2.0) It's weird, if you remove that stuff with the map in the test, it passes. But that's completely unrelated, it's just heap grooming. Either way, glad you reproed it. Confirming it always passes if you leave "bd90068fa014bb082c0b7ef6d20f7bccd3f581e0" in as well |
This should be a standalone repro, no file required. Now even simpler package main
import (
"log"
"testing"
uc "github.com/unicorn-engine/unicorn/bindings/go/unicorn"
)
func TestUnicornCrash(t *testing.T) {
mu, err := uc.NewUnicorn(uc.ARCH_MIPS, uc.MODE_32|uc.MODE_BIG_ENDIAN)
if err != nil {
log.Fatal(err)
}
// weird heap grooming (doesn't crash without this)
junk := make(map[uint32](uint32))
for i := 0; i < 1000000; i += 4 {
junk[uint32(i)] = 0xaaaaaaaa
}
mu.Start(0, 4)
} |
Looks like being related to dotnet/runtime#41991. My wild guess is that your map allocation triggers golang internal thread scheduler, which brings ffi calls to a new thread. |
Ahh, I'm not that familiar with golang internals, but this sounds very possible. If you do the heap grooming before the NewUnicorn, it doesn't crash. Can confirm
before the map fixes it! What a crazy runtime. I guess Go assumes they are saving and restoring all the relevant context for the OS thread, so they can schedule the goroutines anywhere. But they are not setting and restoring the pthread JIT state. I don't know what Go promises the programmer in this case and if this is supposed to be handled by the runtime or not. Either way, a fun bug. I tried looking for exactly this at the beginning to see if my tid changed, but I must have missed it (gettid is a weird syscall). The solution is perhaps something like my fix but less hacky, to explicitly set the JIT state right before you expect to write/exec it and not assume it stays between calls. |
I did some debugging. The thread id doesn't change indeed but the state of JIT protection changes as dotnet/runtime#41991 suggests. I guess that is due to some thread reuse strategy. Anyway, even with Thank you for the reproduction script. |
Cool, I'm not crazy! Because I did think I checked the tid correctly. I guess something must change the pthread_jit_write_protect_np state. I'm so over security like this that makes programming harder and likely doesn't affect exploiters much at all. When I did browser exploits I loved "mitigations" because they only bothered noobs. |
Fixed in 94a82ed |
Moved to Unicorn 2 for M1 support. Getting this ~50% of the time
MIPS CPU, with Go bindings if it matters. Perhaps someone has an idea.
Speed isn't that important to me, is there any way to disable threading?
The text was updated successfully, but these errors were encountered: