Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misaligned pointer dereference on Mac M1 Pro & Mac M2 Pro #4072

Closed
Eitu33 opened this issue Jul 11, 2023 · 24 comments · Fixed by anoma/namada#1778
Closed

Misaligned pointer dereference on Mac M1 Pro & Mac M2 Pro #4072

Eitu33 opened this issue Jul 11, 2023 · 24 comments · Fixed by anoma/namada#1778
Assignees
Labels
priority-medium Medium priority issue
Milestone

Comments

@Eitu33
Copy link

Eitu33 commented Jul 11, 2023

We have been encountering the following issue when running wasmer 3.3 and wasmer 4.0 on M1 & M2:

thread 'execution' panicked at 'misaligned pointer dereference: address must be a multiple of 0x10 but is 0x109a5fc18', /Users/thomas/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-4.0.0/src/trap/traphandlers.rs:219:28
rustc 1.72.0-nightly (f4b80cacf 2023-06-30)
binary: rustc
commit-hash: f4b80cacf93ca216c75f6ae12f4b9dec19eba42f
commit-date: 2023-06-30
host: aarch64-apple-darwin
release: 1.72.0-nightly
LLVM version: 16.0.5

This issue is not reproduced on:

  • linux_x86_64
  • win_x86_64

After looking into the traphandlers.rs code, we noticed that there was a cast made towards the libc::ucontext_t type:

let ucontext = &mut *(context as *mut libc::ucontext_t);

But the used libc::ucontext_t is different depending on the platform, on linux we have:

libc::unix::linux_like::linux::gnu::b64::x86_64
pub struct ucontext_t // size = 936 (0x3A8), align = 0x8

Located at libc-0.2.147/src/unix/linux_like/linux/gnu/b64/x86_64/mod.rs

And for mac we have:

libc::unix::bsd::apple::b64::aarch64::align
pub struct ucontext_t // size = 880 (0x370), align = 0x10

Located at libc-0.2.147/src/unix/bsd/apple/b64/aarch64/align.rs

The context that is being casted into ucontext_t at L219 has its alignment set at 0x08:

context: *mut c_void // size = 8, align = 0x8

This produces a misalignment between the expected 0x10 and the provided 0x08 on these types of mac architectures.

This is currently a big issue for us, if help is required we would be glad to contribute.

@theduke
Copy link
Contributor

theduke commented Jul 11, 2023

@ptitSeb probably a duplicate of #4059 ?

@bilboquet
Copy link

@ptitSeb probably a duplicate of #4059 ?

I tested issue #4072 with @Eitu33 who reported it.
So I'm just giving my 2 cents.
I don't think those issues are duplicates as #4072 does not happen on Linux.

@tzemanovic
Copy link

we're also hitting this in 2.3.0 in anoma/namada#1721

@syrusakbary
Copy link
Member

Might be related: #4059

@mamcx
Copy link

mamcx commented Jul 26, 2023

Get same error on M1 16GB with wasmer-vm-3.1.1/src/trap/traphandlers.rs:220:28

@kulakowski
Copy link

Note that this is likely becoming visible due to rust-lang/rust#98112 landing in Rust 1.70, which adds debug assertions to misaligned pointer dereferences.

It's possible that macOS does not guarantee this pointer is going to be 16 byte aligned even though a structure within it (the NEON floating point state) is nominally supposed to be. I can't easily find anything about it, you might end up having to file a radar about it. I am sure you can work around it writing out your own less-aligned structure to cast to, though, but that's obviously not fun.

@bilboquet
Copy link

bilboquet commented Jul 26, 2023

Hi,

In the original issue it's not an assert that fails.
It's that cast: context as *mut libc::ucontext_t that causes the panic.
context is *mut c_void

AFAIK, for the cast to work one need align(*mut c_void) >= align(*mut libc::ucontext_t)
but align(*mut c_void) == 0x8 and align(*mut libc::ucontext_t) == 0x10 so obviously the property is not meet.

looking at https://github.com/rust-lang/libc/blob/main/src/unix/bsd/apple/b64/aarch64/align.rs
if ucontext_t is 0x10 align it's because it inherits it's alignment from __darwin_mcontext64 which itself inherits it from __darwin_arm_neon_state64 which is explicitly aligned to 16 here if libc_int128 is not defined.
I guess that if libc_int128 is defined this line pub __v: [::__uint128_t; 32], would also causes the struct to be 16 aligned.

If this is correct (don't' have M1 or M2 so I can't try) this seems to be a dead end, but this would lead me to think that on such platform * c_void should be aligned to 16, so maybe there is something to look on the rust std side.
But pointers are aligned to usize isn't it?

Looking at https://doc.rust-lang.org/beta/core/ffi/enum.c_void.html# I found this

To model pointers to opaque types in FFI, until extern type is stabilized, it is recommended to use a newtype wrapper around an empty byte array. See the Nomicon for details.
I'm not smart enough to find a solution using opaque type (more precisely I think it won't help here), but maybe some else will.

Regards

@sug0
Copy link

sug0 commented Jul 30, 2023

I was able to hack a fix to this issue, based on the research presented in this thread.

sug0@df6aa26#diff-5d272e62ff3e9e2d5249a6300df4eb51eeb019e143582bfe8f93d7cd3674d176R150-R168

If anyone has some free time, please test the patch. The branch (applel-fix-heliax) in the linked remote contains other alignment patches, originally authored by some folks from the NEAR project. All patches are based on v2.3.0, but I reckon they should be easily ported over to new versions.

Cheers,
Tiago

@ptitSeb
Copy link
Contributor

ptitSeb commented Jul 31, 2023

The fix looks fine, and I can implement it. The issue is, how can I reproduce the issue, there is no simple way to have it seems?

@ptitSeb
Copy link
Contributor

ptitSeb commented Jul 31, 2023

@sug0 do you plan to create a PR based on current master fr this fix?

@bilboquet
Copy link

@ptitSeb
You don't have the required hardware or is it something else?

@ptitSeb
Copy link
Contributor

ptitSeb commented Jul 31, 2023

I have the hardware, but not a simple way to reproduce the issue.

@bilboquet
Copy link

bilboquet commented Jul 31, 2023

We have code that fails here.
It's not simple but at least it's public code.
I'm gonna see with my teammates if we can simplifies things.

Edit:
At least we can do tests for you if you want.

@ptitSeb
Copy link
Contributor

ptitSeb commented Aug 2, 2023

@bilboquet can you test with that branch #4120

@bilboquet
Copy link

@ptitSeb yes sure, will do this with my teammate who has the hardware

@ptitSeb
Copy link
Contributor

ptitSeb commented Aug 14, 2023

@bilboquet any news on the fix efficiency?

@Eitu33
Copy link
Author

Eitu33 commented Aug 16, 2023

@ptitSeb not tested yet, will do this week

@Eitu33
Copy link
Author

Eitu33 commented Aug 18, 2023

@ptitSeb the exact same scenarios that caused the issue initially are now working as expected with the changes made in #4120. If we find any related issue when doing more extensive testing of our system I will keep you updated here. I believe this issue can be closed for now.

@ptitSeb
Copy link
Contributor

ptitSeb commented Aug 21, 2023

Ok, closing the issue!

@ptitSeb ptitSeb closed this as completed Aug 21, 2023
@junxzm1990
Copy link

@Eitu33, do you have a PoC to reproduce the issue in wasmer 3.3 or wasmer 4.0? thanks.

@Eitu33
Copy link
Author

Eitu33 commented Aug 23, 2023

@junxzm1990 No, the issue was reproduced in a relatively complex scenario. I could make a PoC but I'm not sure I can find the time / get the clearance to do so. What I work on is open source so If you really want to reproduce the issue I could make a branch on our repo with everything setup to reproduce, but that would be very far from a minimal example.

@junxzm1990
Copy link

junxzm1990 commented Aug 23, 2023

@Eitu33, A repo for reproduction would be greatly appreciated. I can follow up on the repo attempting to create a simpler PoC.

@dflse
Copy link

dflse commented Aug 29, 2023

Hello.

I've tried to experiment with some thinks, but I think I've got MRE of exact same problem.
In some cases changing *mut oneshot::Sender<()> to another type worked without panics

Wasm module code:

use std::ffi::c_void;
use tokio::sync::oneshot;

#[no_mangle]
extern "C" fn callback(ptr: *mut c_void, code: i32, data: *const u8, len: u32) {
    let tx = unsafe { Box::from_raw(ptr as *mut oneshot::Sender<()>) };
}

Dependencies:

[dependencies]
tokio = { version = "1.24.2", default-features = false, features = [ "rt", "sync" ] }

[crates-io.patch]
tokio = { git = "https://github.com/wasix-org/tokio.git", branch = "1.24.2-fixed" }

Build with:
cargo wasix build

And host part is:

    let wasm_bytes = include_bytes!("../../target/wasm32-wasmer-wasi/debug/module.wasm");
    let mut store = Store::default();
    let module = Module::new(&store, wasm_bytes)?;

    let mut wasi_fenv = WasiEnv::builder("test").finalize(&mut store)?;
    let mut imports = wasi_fenv
        .import_object_for_all_wasi_versions(&mut store, &module)
        .unwrap();

    let shared_memory = module.imports().memories().next().map(|a| *a.ty());

    let spawn_type = match shared_memory {
        Some(ty) => SpawnMemoryType::CreateMemoryOfType(ty),
        None => SpawnMemoryType::CreateMemory,
    };

    let memory = {
        let mut store_mut = store.as_store_mut();
        let data_mut = wasi_fenv.data_mut(&mut store_mut).clone();
        data_mut.tasks().build_memory(&mut store_mut, spawn_type)?
    };

    imports.define("env", "memory", memory.clone().unwrap());

    let instance = Instance::new(&mut store, &module, &imports).unwrap();
    wasi_fenv.initialize_with_memory(&mut store, instance.clone(), memory, true)?;

    Ok(())

Panics:

thread 'main' panicked at 'misaligned pointer dereference: address must be a multiple of 0x8 but is 0x13307170c', /Users/flassie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-4.1.2/src/instance/mod.rs:167:18
stack backtrace:
   0: rust_begin_unwind
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_nounwind_fmt
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:96:14
   2: core::panicking::panic_misaligned_pointer_dereference
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:175:5
   3: wasmer_vm::instance::Instance::imported_memory
             at /Users/flassie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-4.1.2/src/instance/mod.rs:167:18
   4: wasmer_vm::instance::Instance::get_vmmemory
             at /Users/flassie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-4.1.2/src/instance/mod.rs:256:26
   5: wasmer_vm::instance::Instance::memory_init
             at /Users/flassie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-4.1.2/src/instance/mod.rs:748:22
   6: wasmer_vm_memory32_init
             at /Users/flassie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wasmer-vm-4.1.2/src/libcalls.rs:628:9
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread caused non-unwinding panic. aborting.

I've also tried to find what exactly causes panic on module side, so:

use std::ffi::c_void;
use tokio::sync::oneshot;

#[no_mangle]
extern "C" fn callback(ptr: *mut c_void, code: i32, data: *const u8, len: u32) {
    let ptr: *mut oneshot::Sender<()> = ptr as _; 
    let tx = unsafe { Box::from_raw(ptr) };        // if commented - everything is ok
}

@ptitSeb
Copy link
Contributor

ptitSeb commented Aug 29, 2023

I don't think this is this issue you are encontering, but this one #4059 @dflse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-medium Medium priority issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.