Fix malloc() to work with -sWASM_WORKERS #16703

juj · 2022-04-12T17:37:04Z

Fix malloc() to work with -sWASM_WORKERS. Currently only emmalloc will work with Wasm Workers, but dlmalloc does not yet work. Fix emscripten_semaphore_async_acquire(). Add a test.

EDIT: Actually, simpler to add both dlmalloc and emmalloc support in the same PR, instead of deferring the dlmalloc part to a later PR, so updated to do both in this one.

…l work with Wasm Workers, but dlmalloc does not yet work. Fix emscripten_semaphore_async_acquire(). Add a test.

…lloc-ww-debug-noerrno' but try to use 'libemmalloc-debug-ww-noerrno'

…ding

…ince they need to configure different in Wasm Workers vs pthreads build modes.

sbc100 · 2022-04-14T00:28:45Z

system/lib/dlmalloc.c

-#define USE_LOCKS 1
-#define USE_SPIN_LOCKS 0 // Ensure we use pthread_mutex_t.
-#endif
-


I think I'd rather see this land as a seperate PR since it doesn't just effect WASM_WORKERS.

Its also rather involved to figure out what this change does. As far as I can tell it means that USE_SPIN_LOCKS gets defined to 1 by default which ends up also defining USE_LOCKS, and then USE_SPIN_LOCKS ends up using __sync_lock_test_and_set and __sync_lock_release for locking instead of pthead_mutex_t? Is that right?

Anyway it seems involved enough for its own PR I think..

I don't think it is a productive use of time to artificially split to fix Wasm Workers to first work with emmalloc, and then with dlmalloc.

This change just moves the definition of USE_LOCKS and USE_SPIN_LOCKS from dlmalloc.c into system_libs.py. I had to do that migration for the following reason:

Earlier with system_libs.py, I had decided to save on the permutational explosion by not building each of the four combinations of {no threads, pthreads} X {no wasm workers, wasm workers} of system libraries, but instead the combo "pthreads but no wasm workers" is omitted as considered to be redundant, so we only build the three variants "singlethreaded" (no thread safety), "wasm workers" (use atomics and shmem so user can link in wasm workers), and "wasm workers+pthreads" (use atomics, shmem and pthread API, user can link in wasm workers) of each multithreaded system library.

However now with dlmalloc, it looks like we do want to configure it to recognize only when "pthreads but not wasm workers" is being used. This is because that is the scenario when we want to use pthread mutex locks in dlmalloc. When building with pthreads+wasm workers we want to use spinlocks because malloc could be called from wasm workers that won't be able to use pthread mutex api.

Maybe there will be more code in the future that requires such special case, and we do need to build all the four variants - but so far it seems this is the only place, so opted to do this configuration specifically for libmalloc only to keep embuilder build times smaller.

sbc100 · 2022-04-14T00:30:39Z

tools/system_libs.py

+    # Target dlmalloc multithreading aware implementation if using pthreads or Wasm Workers
+    cflags += [f'-DUSE_LOCKS={self.thread_safe}']
+    # When using Wasm Workers, target spinlocks. Otherwise target pthread mutexes when using pthreads.
+    cflags += [f'-DUSE_SPIN_LOCKS={self.use_spinlocks}']


Why not just always defined USE_SPIN_LOCKS for consistency ... and it looks like USE_LOCKS gets defined to 1 when USE_SPIN_LOCKS is set.

If I used USE_SPIN_LOCKS always, the obvious review would be "can you benchmark that it does not regress performance for any user?", which I really don't. I wanted to keep the existing behavior for existing pthreads users to be sure that there is no danger of regressing.

it looks like USE_LOCKS gets defined to 1 when USE_SPIN_LOCKS is set."

I did not want to disrupt the structure of the flags that were set from the way they originally were. The documentation in dlmalloc reads

emscripten/system/lib/dlmalloc.c

Lines 157 to 162 in 32831ed

Thread-safety: NOT thread-safe unless USE_LOCKS defined non-zero

When USE_LOCKS is defined, each public call to malloc, free,

etc is surrounded with a lock. By default, this uses a plain

pthread mutex, win32 critical section, or a spin-lock if if

available for the platform and not disabled by setting

USE_SPIN_LOCKS=0. However, if USE_RECURSIVE_LOCKS is defined,

suggesting that the intent is that one first sets USE_LOCKS to 1 to enable thread safety, and then further adjusts e.g. with USE_SPIN_LOCKS to 0 to customize what type of locking to use.

Originally in dlmalloc.c side Emscripten configured both of these flags, so I opted to preserve the same. Passing -DUSE_SPIN_LOCKS without defining -DUSE_LOCKS would not follow the wording in the docs, and no harm in explicitly passing both?

If I used USE_SPIN_LOCKS always, the obvious review would be "can you benchmark that it does not regress performance for any user?", which I really don't. I wanted to keep the existing behavior for existing pthreads users to be sure that there is no danger of regressing.

I think this is worth a worthwhile thing to do if only to avoid the extra complexity of adding an additional library variant. It seems highly beneficial to use to use the wasm code in both wasm workers mode and without, no? And then we can avoid this new -ww variant of the library.

if spinlocks are good enough for wasm workers why wouldn't they be good enough for the normal pthread use case?

…ime rounding". It does look like optimizer does sweep the runtime rounding away, so no bother optimizing manually. This reverts commit 837c731.

sbc100 · 2022-04-14T15:24:43Z

tools/system_libs.py

@@ -1271,6 +1271,8 @@ def __init__(self, **kwargs):
    self.memvalidate = kwargs.pop('memvalidate')
    self.verbose = kwargs.pop('verbose')
    self.is_debug = kwargs.pop('is_debug') or self.memvalidate or self.verbose
+    self.thread_safe = kwargs.pop('thread_safe')


This library already inherits from MTLibrary and has the is_mt attribute.. is this not the same thing?

juj added 2 commits April 12, 2022 20:36

Fix malloc() to work with -sWASM_WORKERS. Currently only emmalloc wil…

c11edf1

…l work with Wasm Workers, but dlmalloc does not yet work. Fix emscripten_semaphore_async_acquire(). Add a test.

Add support for Wasm Workers + dlmalloc

c52006b

juj mentioned this pull request Apr 12, 2022

Audio Worklets #16449

Merged

Fix issue with libmalloc variation string order: would build 'libemma…

d170d1a

…lloc-ww-debug-noerrno' but try to use 'libemmalloc-debug-ww-noerrno'

juj force-pushed the malloc_with_wasm_workers branch from da7571c to d170d1a Compare April 13, 2022 09:59

juj added 3 commits April 13, 2022 13:32

Improve test to ensure it does not rely on external TLS size

857cf08

Add notes about TODOs for __builtin_wasm_tls_size() compile time roun…

837c731

…ding

Rebaseline test_minimal_runtime_code_size_hello_wasm_worker test.

2149dd6

juj enabled auto-merge (squash) April 13, 2022 10:37

juj disabled auto-merge April 13, 2022 15:10

Configure dlmalloc USE_LOCKS and USE_SPIN_LOCKS from system_libs.py s…

778842c

…ince they need to configure different in Wasm Workers vs pthreads build modes.

juj force-pushed the malloc_with_wasm_workers branch from 44dcb4d to 778842c Compare April 13, 2022 15:12

sbc100 reviewed Apr 14, 2022

View reviewed changes

Revert "Add notes about TODOs for __builtin_wasm_tls_size() compile t…

93af776

…ime rounding". It does look like optimizer does sweep the runtime rounding away, so no bother optimizing manually. This reverts commit 837c731.

sbc100 reviewed Apr 14, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix malloc() to work with -sWASM_WORKERS #16703

Fix malloc() to work with -sWASM_WORKERS #16703

Uh oh!

juj commented Apr 12, 2022 •

edited

Loading

Uh oh!

sbc100 Apr 14, 2022

Uh oh!

juj Apr 14, 2022 •

edited

Loading

Uh oh!

sbc100 Apr 14, 2022

Uh oh!

juj Apr 14, 2022

Uh oh!

juj Apr 14, 2022 •

edited

Loading

Uh oh!

sbc100 Apr 14, 2022

Uh oh!

sbc100 Apr 14, 2022

Uh oh!

sbc100 Apr 14, 2022

Uh oh!

Uh oh!

	Thread-safety: NOT thread-safe unless USE_LOCKS defined non-zero
	When USE_LOCKS is defined, each public call to malloc, free,
	etc is surrounded with a lock. By default, this uses a plain
	pthread mutex, win32 critical section, or a spin-lock if if
	available for the platform and not disabled by setting
	USE_SPIN_LOCKS=0. However, if USE_RECURSIVE_LOCKS is defined,

Fix malloc() to work with -sWASM_WORKERS #16703

Are you sure you want to change the base?

Fix malloc() to work with -sWASM_WORKERS #16703

Uh oh!

Conversation

juj commented Apr 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbc100 Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

juj Apr 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbc100 Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

juj Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

juj Apr 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbc100 Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

sbc100 Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

sbc100 Apr 14, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

juj commented Apr 12, 2022 •

edited

Loading

juj Apr 14, 2022 •

edited

Loading

juj Apr 14, 2022 •

edited

Loading