Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharing memory between the host and a WebAssembly module #2307

Closed
kiancross opened this issue Sep 27, 2023 · 7 comments
Closed

Sharing memory between the host and a WebAssembly module #2307

kiancross opened this issue Sep 27, 2023 · 7 comments

Comments

@kiancross
Copy link

kiancross commented Sep 27, 2023

I am trying to understand how memory is shared between the host and a WebAssembly module (generated using wasm2c from an object file generated with emcc).

Consider the following very simple example:

int foo(char *buff) {
  return buff[0];
}

This is compiled using emcc and then wasm2c and clang to get an object file. However, the function signature eventually becomes:

u32 w2c_unsafe_foo(w2c_unsafe*, u32);

I assume, based on the available WebAssembly types, there is no available (semantic) equivalence between the original C code and a WebAssembly representation?

In this case, naturally, passing a pointer to this function casts it to a u32, and then the buff[0] access causes a trap (I suppose because the pointer is to a region of memory outside of the WebAssembly module's allocated memory?).

What is the correct way to share memory in and out of a WebAssembly module? I have had a look at the rot13 example, but this uses a .wat file, where it is possible to import an external library (e.g., host). As was alluded to me elsewhere (#2289 (comment)), this seems to be what is required to access 'things' (functions/memory) from the host.

Is it possible to do this in C code, which is then compiled into a WebAssebly module?


Based on the rot13 example, I think I need to do something like the following:

int foo() {
  char buff[1024];
  host_get_buff(buff, 1024);
  return buff[0];
}

host_get_buff would be implemented in the host code and copy the relevant data into buff? The questions I have are as follows.

  1. Copying into buff requires access to this region of memory in host_get_buff. In the ro13 example, this appears to be done by allocating a page into w2c_host, which is passed as the first argument to host_get_buff when it is called. I think the WebAssebly module is then set to use this page for all of its memory uses?

(import "host" "mem" (memory $mem 1))

  1. Given that the module appears to have its memory set 'globally' to that of w2c_host, does this mean that it is not possible to have multiple host libraries, each sharing data into the module? If it is, how could it be achieved?

  2. I think that in my example, what I actually need is buff to be a pointer to a valid memory region from that allocated in the host by wasm_rt_allocate_memory? Then both the host and module can access this shared memory? But how do I guarantee this? Doesn't the C to WASM compiler decide what memory is used to back buff?


Suppose I wanted to pass a deep data structure from the host into a WebAssebly module. This structure has numerous pointers to sub-structures, pointers to buffers etc.

Am I correct in thinking that I need a function which is capable of deep-copying the data structure from one region of memory to another? Then, once copied to memory accessible by the WebAssembly module, any changes made by the module need to be copied back to the host?

If this is the case, then it is not too dissimilar to implementing IPC between two processes?

@sbc100
Copy link
Member

sbc100 commented Sep 27, 2023

Am I correct in thinking that I need a function which is capable of deep-copying the data structure from one region of memory to another? Then, once copied to memory accessible by the WebAssembly module, any changes made by the module need to be copied back to the host?

Any memory that you want to be visible to the WebAssembly instance needs to be allocated inside the instance's memory region. Performing as deep copy in and out at API boundary is one way to achieve this.

Another alternative is have the host directly use memory allocated inside the module's memory. For example, you could export a malloc function from your module and then call w2c_unsafe_malloc to get a memory pointer into the instance's wasm_rt_memory_t. Then you can read and write to that location (remember that the return value from malloc is an offset relative to the start of the memory's buffer), which is shared with the instance. without needing to copy.

What you can't do is allocate memory on the outside and somehow "transfer" to the instance without a copy.

@sbc100
Copy link
Member

sbc100 commented Sep 27, 2023

(BTW, this is exactly how WebAssembly works on the web.. you either need to make a copy of the data, or have it initially allocated from within the instance memory)

@kiancross
Copy link
Author

Do you know of any examples where the WebAssembly module was originally written in C (C -> WASM -> C)? The examples in this repo use .wat, and I'm not entirely sure how to 'import' functions (like below) from the host into the WebAssembly module.

(import "host" "mem" (memory $mem 1))
(import "host" "fill_buf" (func $fill_buf (param i32 i32) (result i32)))
(import "host" "buf_done" (func $buf_done (param i32 i32)))

@keithw
Copy link
Member

keithw commented Sep 28, 2023

Here's an example of how you could write the rot13.wat module in C and compile it with a wasm32-targeting clang. The generated code isn't quite as concise, but it works as a drop-in replacement to the hand-written one (importing functions and its memory from the host):

$ cat rot13_in_c.c
#include <stdint.h>

uint32_t fill_buf(char* ptr, uint32_t size) __attribute((import_module("host"), import_name("fill_buf")));
void buf_done(char* ptr, uint32_t size) __attribute((import_module("host"), import_name("buf_done")));

char rot13c(char c)
{
  char ch = c & 0xdf;
  if (ch < 'A') return c;
  if (ch <= 'M') return c + 13;
  if (ch <= 'Z') return c - 13;
  return c;
}

void rot13(void) __attribute((export_name("rot13")))
{
  uint32_t size = fill_buf(0, 1024);

  for (char* ptr = 0; ptr < (char*)size; ptr++) {
    char rot13_char = rot13c(*ptr);
    *ptr = rot13_char;
  }

  buf_done(0, size);
}
$ clang -nostdlib -Wl,--no-entry -Xlinker --import-memory=host,mem -Os rot13_in_c.c -o rot13_in_c.wasm
$ wasm2wat -f --generate-names rot13_in_c.wasm | tee ./wabt/wasm2c/examples/rot13/rot13.wat
(module $rot13_in_c.wasm
  (type $t0 (func (param i32 i32) (result i32)))
  (type $t1 (func (param i32 i32)))
  (type $t2 (func))
  (import "host" "mem" (memory $host.mem 2))
  (import "host" "fill_buf" (func $fill_buf (type $t0)))
  (import "host" "buf_done" (func $buf_done (type $t1)))
  (func $rot13 (type $t2)
    (local $l0 i32) (local $l1 i32) (local $l2 i32) (local $l3 i32) (local $l4 i32)
    (local.set $l0
      (i32.const 0))
    (block $B0
      (br_if $B0
        (i32.eqz
          (local.tee $l1
            (call $fill_buf
              (i32.const 0)
              (i32.const 1024)))))
      (local.set $l2
        (local.get $l1))
      (loop $L1
        (block $B2
          (br_if $B2
            (i32.lt_s
              (i32.extend8_s
                (local.tee $l4
                  (i32.and
                    (local.tee $l3
                      (i32.load8_u
                        (local.get $l0)))
                    (i32.const -33))))
              (i32.const 65)))
          (block $B3
            (br_if $B3
              (i32.gt_u
                (local.tee $l4
                  (i32.and
                    (local.get $l4)
                    (i32.const 255)))
                (i32.const 77)))
            (local.set $l3
              (i32.add
                (local.get $l3)
                (i32.const 13)))
            (br $B2))
          (local.set $l3
            (select
              (i32.add
                (local.get $l3)
                (i32.const -13))
              (local.get $l3)
              (i32.lt_u
                (local.get $l4)
                (i32.const 91)))))
        (i32.store8
          (local.get $l0)
          (local.get $l3))
        (local.set $l0
          (i32.add
            (local.get $l0)
            (i32.const 1)))
        (br_if $L1
          (local.tee $l2
            (i32.add
              (local.get $l2)
              (i32.const -1))))))
    (call $buf_done
      (i32.const 0)
      (local.get $l1)))
  (table $T0 1 1 funcref)
  (global $__stack_pointer (mut i32) (i32.const 66560))
  (export "rot13" (func $rot13)))
$ cd wabt/wasm2c/examples/rot13/
$ make
../../../bin/wat2wasm rot13.wat -o rot13.wasm
../../../bin/wasm2c rot13.wasm -o rot13.c --disable-simd
cc -I../..   -c -o rot13.o rot13.c
cc -I../..   -c -o main.o main.c
cc   rot13.o main.o ../../wasm-rt-impl.o /usr/lib/x86_64-linux-gnu/libm.so   -o rot13
$ ./rot13 This is a test.
This -> Guvf
is -> vf
a -> n
test. -> grfg.

@kiancross
Copy link
Author

Thanks both of you for your help. The import_module attribute is especially useful to know about.

Is there an alternative to the below, which does not rely on specifying the start and end address of the memory region?

uint32_t size = fill_buf(0, 1024);

e.g., is something like the following safe?

char b[1024];
uint32_t size = fill_buf(b, sizeof(b));

Otherwise, memory management in complex programmes will get quite complex? Could char b[1024]; could also be replaced by a standard call to malloc?

@keithw
Copy link
Member

keithw commented Oct 4, 2023

e.g., is something like the following safe?

char b[1024];
uint32_t size = fill_buf(b, sizeof(b));

Sure, this is safe and will work fine, with a few provisos:

  • You'll have to change some of the rest of the code to match (e.g. the for loop and the call to buf_done).

  • The host's fill_buf and buf_done routines should be bounds-checking the pointer and size arguments; they shouldn't just trust the values given by an arbitrary Wasm module. WebAssembly's safety guarantees only apply to the Wasm module itself; the host is responsible for its own safety.

  • Right now the main.c program has hard-coded the initial size of the Wasm memory that the module imports, e.g.:

    /* Create a structure to store the memory and current string, allocating 1
    page of Wasm memory (64 KiB) that the rot13 module instance will import. */
    wasm_rt_allocate_memory(&host.memory, 1, 1, false);

    which matches the (import "host" "mem" (memory $mem 1)) line in the hand-written rot13.wat module.

    But clang/LLVM generates a module that imports a memory that must have at least 2 initial pages:

    (import "host" "mem" (memory $host.mem 2))

    It would violate the import subtyping rules to provide a "wrong-typed" import (in this case, a too-small memory) like this, because the Wasm module might try to read or write beyond the limits of the memory. The "safe" thing to do is either:

    • have the module define and export its memory (instead of importing it from the host), or
    • have the host create a memory that meets the minimum size requirements before providing it as an import. wasm2c has helpers for this, e.g. the main.c code could look like this:
    wasm_rt_allocate_memory(&host.memory, wasm2c_rot13_min_host_mem, wasm2c_rot13_max_host_mem, false);

    We've talked about having wasm2c create code to auto-enforce the import subtyping rules that apply across a module boundary, but we don't have this yet -- it's the embedder's responsibility to link modules together in a way that each receives well-typed imports.

Could char b[1024]; could also be replaced by a standard call to malloc?

Sure, this also works fine but you'll need to link the module with libc to get malloc. This works:

clang -mexec-model=reactor -Xlinker --import-memory=host,mem -Os rot13_in_c.c -o rot13_in_c.wasm

@keithw
Copy link
Member

keithw commented Oct 31, 2023

Closing as answered.

@keithw keithw closed this as completed Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants