Description
What is the issue with the Encoding Standard?
I did reply with a (apologies bad) link here #172 but I also documented issues related to SharedArrayBuffer or resizable ArrayBuffer (not even shared) here: https://gist.github.com/WebReflection/3324b5ac79768c85efbf3b725d6a9d73
Background
We are using Dynamic Workers and Atomics to simulate a blocking operation from interpreted PLs (or even JavaScript itself) that do the following:
- create a SAB of length 8 (2 * Int32 bytes length) and post it with proxied details to the main thread
- wait sync to be sure notify happened (an old Firefox issue that won't be resolved) at index 0 and the length (max int32 positive boundary) of the resulting binary-serialized data is known
- we postMessage the new SAB with such length + 4 bytes (due notify issue in Firefox when index 0 is assigned to 0 and then notified)
- the previously binary-serialized outcome is stored in the SAB via a view and index 0 is set to 1 to notify it's ready
- the worker grab via "same" view binary content, deserializes it and it moves forward
Issues with this approach
Mostly performance but also memory: a tab with a worker that uses this strategy uses a lot of RAM (at least twice the ram) for every single operation until that's completed and this is bad for mobile phones or less powerful devices, or people with just dozens opened tabs that use similar strategy.
Ideally
I am working to refactor that dance to work in this ideal way:
- we create a resizable SharedArrayBuffer (max Int32 upper size or half of it as growability) on the worker that can also be reused as there's no concurrency while it's synchronously waiting via Atomics
- we handle proxy details and send such SAB right away
- the main binary-serialize results directly in such SAB and notify it's ready
- the worker binary-deserialize the reused SAB and keep going
This refactoring has the following obvious advantages:
- there is only one SAB per worker (and usually one worker per main thread), thanks to the fact growable SAB is widely usable these days
- there is a single
postMessage
dance - the binary-serializer never creates unnecessary intermediate representation of whatever value that needs to be stored into a buffer
- the binary-deserializer never creates unnecessary intermediate buffer slices or whatsoever to retrieve the result of this roundtrip
- the memory consumption is kept minimal, the growing is predictable, everything is faster
The current issue
I've spent way more time than I should've to find a performant way to avoid one-off creation of typed array views that bloat in RAM and bother GC because new TextEncoder().encodeInto(str, view)
does not work with SharedArrayBuffer and nether does new TextDecoder().decode(view)
but, most importantly, even if I wanted to use at least a resizable ArrayBuffer as fallback the The provided Uint8Array value must not be resizable error comes up.
To solve these issues I ended up ignoring entirely these APIs because these are not suitable for more complex scenarios and I start wondering what was the whole purpose of encodeInto
and decode
when where it's needed, binary data tha travels across realms or WASM exchanges, cannot use memory that can grow and shrink on demand.
Issue summary
- these APIs have an extremely narrowed use case which easily results int bloated RAM to create new buffers all over the place by design
- these APIs can be easily bypassed in purpose by
DataView
or direct view manipulation via JS code, defeating entirely the original guards meant to help developers, but the reality is that these limitations are just on the way when any developer caring about RAM and performance would like to use the platform - it's not clear why even resizable, single owned, ArrayBuffer cannot be used for synchronous blocking operations such as
encodeInto
- it's not clear how developers sure that a SAB cannot have concurrent access and is safe to use, can use these APIs
- it's clear (to me) and sad I should avoid these APIs as opposite of trusting the platform does the right thing
I hope at least some of these concerns can be either tackled or answered and, regardless, I feel like none of these issues is well documented out there so I'll write a post with demoes and benchmarks about how to ignore TextEncoder & TextDecoder but I am afraid that won't make developers happy, rather slightly confused about the fact anyone can workaround these limitations by ignoring native APIs and keep doing what they need to do.
Thanks for your patience in reading this and thanks in advance for any possible action around these issues that could make usage of these native APIs more appealing in the (hopefully) near future.