-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WasmFS: external wasmMemory backend #20017
base: main
Are you sure you want to change the base?
Conversation
Very interesting, thank you for posting this! The spotlights section has some really great features that I definitely agree we want to support. After thinking on this, I think the key benefit is to use an entirely separate Memory for storage. Another way to achieve that could be by compiling a separate program which would have its own Memory. That is, instead of
Then we could implement various file storage programs, including one with 4GB support, and also by just recompiling to wasm64 we could allow even more than 4GB. That is one benefit to compiling the file storage to wasm. Another is that I think writing such a backend in C++ (or another language) may be more robust than JS - JS is great for small backends that interface with Web APIs, but (as this PR shows) there are a lot of options in the space of file storage, and as code gets larger it is usually more efficient to do in wasm. I don't have specific ideas for the API yet, but perhaps it could build on the existing JSImpl backend approach we have somehow. Thoughts? |
Implementing better FS schema in another wasm program is definitely a good idea. I reused MemoryBackend for directory part because writing a full FS require lots of efforts, and my memory pool implementation is an easy one as well. If we can run a wasm FS backend that directly manipulate the memory pool, we may support much more advanced features. For the implementation details, I have several concerns based on our demand:
|
any update? |
Recently we developed this new wasmfs backend based on our demands, and we are happy to contribute it to the webassembly community. Any suggestion is welcome.
We have deployed this backend in our web app for 3 months in the production environment, and it works well. This pr is a fully productive version, and tests will be completed in a few weeks. For this pr, we would like to know everyone's idea on whether this feature could be merged and how to improve it further.
The doc is attached below.
ExtWasmMemFS: A 'WebAssembly.Memory' Based, Fully-Multithreaded Synchronizing WasmFS Backend
This doc describes a file storage system implemented using WebAssembly.Memory. Provides the ability to synchronize reads and writes directly from any thread.
Based on Emscripten 3.1.28 + version.
Spotlights
postmessage
to other threads, so the performance will not be affected by main thread usage.WebAssembly.Memory.grow()
is supported along with WebAssembly itself, not likeSAB.grow()
which only supported recently. Good compatibility makes it can be easily adopted to production environment.Implementation
The essence is to implement a memory pool on WebAssembly.Memory.
Data Layout
All ExtWasmMemFS data is stored in the following singleton object, shared with all threads:
Our data is stored in three buffers: dataFiles, control, and index.
extWasmMemFS.dataFiles
The dataFiles buffer is the main buffer for storing the contents of the file, so WebAssembly.Memory, which can be grown, is used.
The dataFiles are composed as follows:
To ensure continuous memory read and write, for copy performance, each file is saved in a single file_block.
All empty_blocks form a doubly linked list. The first 8 bytes of dataFiles are not used in order to guarantee that ptr == 0 is meaningless.
extWasmMemFS.index
index
stores the file_block header pointer for each file, as well as the file size.The layout is as follows:
extWasmMemFS.control
Store 5 int32 numbers
Function Details
Read-write lock
Handle provided to the C++ section
Method for allocating file blocks of a given size on dataFiles
How to delete file blocks on dataFiles
Defrag - Expand blocks backwards
Defrag - try to put into unalloc
Try to expand a file_block in place:
1. Because we have defragmentation to expand empty_blocks backwards, one empty_block lookahead is enough.
2. If the empty_block too large, we will split it into two blocks,maintain doubly linked list, and then merged the front one into file_block.
File writing (write API)
File reading (read API)
In-thread TypedArray object cache and cache invalidation
The read/write of SharedArrayBuffer requires the creation of TypedArray objects, and frequent creation of TypedArray objects will affect performance, so we need to maintain a set of TypedArray objects per thread to facilitate reading and writing. In ExtWasmMemFS we construct the following extWasmMemFS_local object in each web worker:
However, the cache may be invalid when WebAssembly.Memory.prototype.grow() get called. At that time, it is necessary to recreate the corresponding TypedArray:
Because we shall maintain a multi-threaded architecture, other threads cannot receive messages about buffer changes, so other threads need to check and update at a certain time.
And, because the thread must take the write lock of the global read-write lock when perform the buffer grow. So in other thread, it is sufficient to do a typearray recreation only after every time the thread gets the write lock.