-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix malloc() to work with -sWASM_WORKERS #16703
base: main
Are you sure you want to change the base?
Changes from 7 commits
c11edf1
c52006b
d170d1a
857cf08
837c731
2149dd6
778842c
93af776
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
#include <emscripten.h> | ||
#include <emscripten/wasm_worker.h> | ||
#include <emscripten/threading.h> | ||
#include <emscripten/emmalloc.h> | ||
#include <stdio.h> | ||
#include <stdlib.h> | ||
#include <assert.h> | ||
|
||
// Tests that operating malloc from Wasm Workers is thread-safe. | ||
|
||
emscripten_semaphore_t workDone = EMSCRIPTEN_SEMAPHORE_T_STATIC_INITIALIZER(0); | ||
|
||
// Make sure we have tiny byte of TLS data that causes LLVM TLS size to be not a multiple of four from the get-go. | ||
__thread uint8_t dummyTls; | ||
|
||
void work() | ||
{ | ||
assert(dummyTls == 0); | ||
for(int i = 0; i < 100000; ++i) | ||
{ | ||
dummyTls += 1; | ||
void *ptr = malloc(emscripten_random() * 10000); | ||
assert(ptr); | ||
free(ptr); | ||
} | ||
printf("dummyTls: %d\n", (int)dummyTls); | ||
emscripten_semaphore_release(&workDone, 1); | ||
} | ||
|
||
void allThreadsDone(volatile void *address, uint32_t idx, ATOMICS_WAIT_RESULT_T result, void *userData) | ||
{ | ||
#ifdef EMMALLOC | ||
assert(emmalloc_validate_memory_regions() == 0); | ||
#endif | ||
printf("Test passed!\n"); | ||
#ifdef REPORT_RESULT | ||
REPORT_RESULT(0); | ||
#endif | ||
} | ||
|
||
int main() | ||
{ | ||
#define NUM_THREADS 10 | ||
for(int i = 0; i < NUM_THREADS; ++i) | ||
{ | ||
emscripten_wasm_worker_t worker = emscripten_malloc_wasm_worker(1024); | ||
emscripten_wasm_worker_post_function_v(worker, work); | ||
} | ||
|
||
emscripten_semaphore_async_acquire(&workDone, NUM_THREADS, allThreadsDone, 0, EMSCRIPTEN_WAIT_ASYNC_INFINITY); | ||
} |
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1271,6 +1271,8 @@ def __init__(self, **kwargs): | |||||||||||||
self.memvalidate = kwargs.pop('memvalidate') | ||||||||||||||
self.verbose = kwargs.pop('verbose') | ||||||||||||||
self.is_debug = kwargs.pop('is_debug') or self.memvalidate or self.verbose | ||||||||||||||
self.thread_safe = kwargs.pop('thread_safe') | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This library already inherits from |
||||||||||||||
self.use_spinlocks = kwargs.pop('use_spinlocks') | ||||||||||||||
|
||||||||||||||
super().__init__(**kwargs) | ||||||||||||||
|
||||||||||||||
|
@@ -1296,15 +1298,25 @@ def get_cflags(self): | |||||||||||||
cflags += ['-DMALLOC_FAILURE_ACTION=', '-DEMSCRIPTEN_NO_ERRNO'] | ||||||||||||||
if self.is_tracing: | ||||||||||||||
cflags += ['--tracing'] | ||||||||||||||
# Target dlmalloc multithreading aware implementation if using pthreads or Wasm Workers | ||||||||||||||
cflags += [f'-DUSE_LOCKS={self.thread_safe}'] | ||||||||||||||
# When using Wasm Workers, target spinlocks. Otherwise target pthread mutexes when using pthreads. | ||||||||||||||
cflags += [f'-DUSE_SPIN_LOCKS={self.use_spinlocks}'] | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not just always defined USE_SPIN_LOCKS for consistency ... and it looks like USE_LOCKS gets defined to 1 when USE_SPIN_LOCKS is set. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I used USE_SPIN_LOCKS always, the obvious review would be "can you benchmark that it does not regress performance for any user?", which I really don't. I wanted to keep the existing behavior for existing pthreads users to be sure that there is no danger of regressing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I did not want to disrupt the structure of the flags that were set from the way they originally were. The documentation in dlmalloc reads emscripten/system/lib/dlmalloc.c Lines 157 to 162 in 32831ed
suggesting that the intent is that one first sets Originally in dlmalloc.c side Emscripten configured both of these flags, so I opted to preserve the same. Passing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think this is worth a worthwhile thing to do if only to avoid the extra complexity of adding an additional library variant. It seems highly beneficial to use to use the wasm code in both wasm workers mode and without, no? And then we can avoid this new There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if spinlocks are good enough for wasm workers why wouldn't they be good enough for the normal pthread use case? |
||||||||||||||
|
||||||||||||||
return cflags | ||||||||||||||
|
||||||||||||||
def get_base_name_prefix(self): | ||||||||||||||
return 'lib%s' % self.malloc | ||||||||||||||
|
||||||||||||||
def get_base_name(self): | ||||||||||||||
name = super().get_base_name() | ||||||||||||||
name = self.get_base_name_prefix() | ||||||||||||||
if self.is_debug and not self.memvalidate and not self.verbose: | ||||||||||||||
name += '-debug' | ||||||||||||||
|
||||||||||||||
# Add the prefixes for malloc ('-mt' / '-ww') after the '-debug' part, since the -debug string | ||||||||||||||
# is part of the user-submitted -sMALLOC prefix (e.g. -sMALLOC=dlmalloc-debug) | ||||||||||||||
name += super().get_base_name().replace(self.get_base_name_prefix(), '') | ||||||||||||||
|
||||||||||||||
if not self.use_errno: | ||||||||||||||
# emmalloc doesn't actually use errno, but it's easier to build it again | ||||||||||||||
name += '-noerrno' | ||||||||||||||
|
@@ -1333,6 +1345,8 @@ def get_default_variation(cls, **kwargs): | |||||||||||||
is_tracing=settings.EMSCRIPTEN_TRACING, | ||||||||||||||
memvalidate='memvalidate' in settings.MALLOC, | ||||||||||||||
verbose='verbose' in settings.MALLOC, | ||||||||||||||
use_spinlocks=settings.WASM_WORKERS, | ||||||||||||||
thread_safe=settings.SHARED_MEMORY, | ||||||||||||||
**kwargs | ||||||||||||||
) | ||||||||||||||
|
||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather see this land as a seperate PR since it doesn't just effect WASM_WORKERS.
Its also rather involved to figure out what this change does. As far as I can tell it means that USE_SPIN_LOCKS gets defined to 1 by default which ends up also defining USE_LOCKS, and then USE_SPIN_LOCKS ends up using
__sync_lock_test_and_set
and__sync_lock_release
for locking instead of pthead_mutex_t? Is that right?Anyway it seems involved enough for its own PR I think..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a productive use of time to artificially split to fix Wasm Workers to first work with emmalloc, and then with dlmalloc.
This change just moves the definition of
USE_LOCKS
andUSE_SPIN_LOCKS
from dlmalloc.c into system_libs.py. I had to do that migration for the following reason:Earlier with system_libs.py, I had decided to save on the permutational explosion by not building each of the four combinations of {no threads, pthreads} X {no wasm workers, wasm workers} of system libraries, but instead the combo "pthreads but no wasm workers" is omitted as considered to be redundant, so we only build the three variants "singlethreaded" (no thread safety), "wasm workers" (use atomics and shmem so user can link in wasm workers), and "wasm workers+pthreads" (use atomics, shmem and pthread API, user can link in wasm workers) of each multithreaded system library.
However now with dlmalloc, it looks like we do want to configure it to recognize only when "pthreads but not wasm workers" is being used. This is because that is the scenario when we want to use pthread mutex locks in dlmalloc. When building with pthreads+wasm workers we want to use spinlocks because malloc could be called from wasm workers that won't be able to use pthread mutex api.
Maybe there will be more code in the future that requires such special case, and we do need to build all the four variants - but so far it seems this is the only place, so opted to do this configuration specifically for libmalloc only to keep embuilder build times smaller.