-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix malloc() to work with -sWASM_WORKERS #16703
base: main
Are you sure you want to change the base?
Conversation
…l work with Wasm Workers, but dlmalloc does not yet work. Fix emscripten_semaphore_async_acquire(). Add a test.
…lloc-ww-debug-noerrno' but try to use 'libemmalloc-debug-ww-noerrno'
da7571c
to
d170d1a
Compare
…ince they need to configure different in Wasm Workers vs pthreads build modes.
44dcb4d
to
778842c
Compare
#define USE_LOCKS 1 | ||
#define USE_SPIN_LOCKS 0 // Ensure we use pthread_mutex_t. | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather see this land as a seperate PR since it doesn't just effect WASM_WORKERS.
Its also rather involved to figure out what this change does. As far as I can tell it means that USE_SPIN_LOCKS gets defined to 1 by default which ends up also defining USE_LOCKS, and then USE_SPIN_LOCKS ends up using __sync_lock_test_and_set
and __sync_lock_release
for locking instead of pthead_mutex_t? Is that right?
Anyway it seems involved enough for its own PR I think..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a productive use of time to artificially split to fix Wasm Workers to first work with emmalloc, and then with dlmalloc.
This change just moves the definition of USE_LOCKS
and USE_SPIN_LOCKS
from dlmalloc.c into system_libs.py. I had to do that migration for the following reason:
Earlier with system_libs.py, I had decided to save on the permutational explosion by not building each of the four combinations of {no threads, pthreads} X {no wasm workers, wasm workers} of system libraries, but instead the combo "pthreads but no wasm workers" is omitted as considered to be redundant, so we only build the three variants "singlethreaded" (no thread safety), "wasm workers" (use atomics and shmem so user can link in wasm workers), and "wasm workers+pthreads" (use atomics, shmem and pthread API, user can link in wasm workers) of each multithreaded system library.
However now with dlmalloc, it looks like we do want to configure it to recognize only when "pthreads but not wasm workers" is being used. This is because that is the scenario when we want to use pthread mutex locks in dlmalloc. When building with pthreads+wasm workers we want to use spinlocks because malloc could be called from wasm workers that won't be able to use pthread mutex api.
Maybe there will be more code in the future that requires such special case, and we do need to build all the four variants - but so far it seems this is the only place, so opted to do this configuration specifically for libmalloc only to keep embuilder build times smaller.
# Target dlmalloc multithreading aware implementation if using pthreads or Wasm Workers | ||
cflags += [f'-DUSE_LOCKS={self.thread_safe}'] | ||
# When using Wasm Workers, target spinlocks. Otherwise target pthread mutexes when using pthreads. | ||
cflags += [f'-DUSE_SPIN_LOCKS={self.use_spinlocks}'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just always defined USE_SPIN_LOCKS for consistency ... and it looks like USE_LOCKS gets defined to 1 when USE_SPIN_LOCKS is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I used USE_SPIN_LOCKS always, the obvious review would be "can you benchmark that it does not regress performance for any user?", which I really don't. I wanted to keep the existing behavior for existing pthreads users to be sure that there is no danger of regressing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like USE_LOCKS gets defined to 1 when USE_SPIN_LOCKS is set."
I did not want to disrupt the structure of the flags that were set from the way they originally were. The documentation in dlmalloc reads
emscripten/system/lib/dlmalloc.c
Lines 157 to 162 in 32831ed
Thread-safety: NOT thread-safe unless USE_LOCKS defined non-zero | |
When USE_LOCKS is defined, each public call to malloc, free, | |
etc is surrounded with a lock. By default, this uses a plain | |
pthread mutex, win32 critical section, or a spin-lock if if | |
available for the platform and not disabled by setting | |
USE_SPIN_LOCKS=0. However, if USE_RECURSIVE_LOCKS is defined, |
suggesting that the intent is that one first sets USE_LOCKS
to 1 to enable thread safety, and then further adjusts e.g. with USE_SPIN_LOCKS
to 0 to customize what type of locking to use.
Originally in dlmalloc.c side Emscripten configured both of these flags, so I opted to preserve the same. Passing -DUSE_SPIN_LOCKS
without defining -DUSE_LOCKS
would not follow the wording in the docs, and no harm in explicitly passing both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I used USE_SPIN_LOCKS always, the obvious review would be "can you benchmark that it does not regress performance for any user?", which I really don't. I wanted to keep the existing behavior for existing pthreads users to be sure that there is no danger of regressing.
I think this is worth a worthwhile thing to do if only to avoid the extra complexity of adding an additional library variant. It seems highly beneficial to use to use the wasm code in both wasm workers mode and without, no? And then we can avoid this new -ww
variant of the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if spinlocks are good enough for wasm workers why wouldn't they be good enough for the normal pthread use case?
…ime rounding". It does look like optimizer does sweep the runtime rounding away, so no bother optimizing manually. This reverts commit 837c731.
@@ -1271,6 +1271,8 @@ def __init__(self, **kwargs): | |||
self.memvalidate = kwargs.pop('memvalidate') | |||
self.verbose = kwargs.pop('verbose') | |||
self.is_debug = kwargs.pop('is_debug') or self.memvalidate or self.verbose | |||
self.thread_safe = kwargs.pop('thread_safe') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This library already inherits from MTLibrary
and has the is_mt
attribute.. is this not the same thing?
Fix malloc() to work with -sWASM_WORKERS. Currently only emmalloc will work with Wasm Workers, but dlmalloc does not yet work. Fix emscripten_semaphore_async_acquire(). Add a test.
EDIT: Actually, simpler to add both dlmalloc and emmalloc support in the same PR, instead of deferring the dlmalloc part to a later PR, so updated to do both in this one.