Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow writing to an existing Uint8Array, and add lastChunkHandling option #33

Merged
merged 2 commits into from
Jan 7, 2024

Conversation

bakkot
Copy link
Collaborator

@bakkot bakkot commented Dec 13, 2023

Fixes #21: this adds let {read, written } = Uint8Array.fromBase64Into(string, target, options) and let { read, written } = Uint8Array.fromHexInto(string, target, options) methods which allow writing to an existing Uint8Array, matching TextEncoder's encodeInto().

These methods also add an outputOffset option in the options bag which lets you specify an offset to start writing at, letting you skip the subarray call. TextEncoder doesn't have this but apparently it's important for V8. Moved this part into #34.

Also fixes #13: this adds a lastChunkHandling: 'stop-before-partial' option to the base64 decoders, which allows an efficient streaming decoder to be implemented in userland (efficient meaning, no passes over the input string in userland). See the updated stream.mjs. The other values for lastChunkHandling are "loose" and "strict"; "strict" enforces padding and zero-ness of overflow bits. "loose" is the default and matches the behavior of atob.

As a consequence of the refactor to use lastChunkHandling, there's no longer an option to forbid whitespace in the base64 decoders. This is easy to do in userland if you really want to, though there is rarely occasion for it. Whitespace is always permitted in the base64 decoder, and never in the hex decoder.

When this lands the proposal should be ready for stage 3 (or rather 2.7).

cc @phoddie @syg

@annevk
Copy link
Member

annevk commented Dec 13, 2023

If subarray is slow, shouldn't that be optimized instead? Letting each API pay the price seems like the wrong tradeoff.

@domenic objected to this before and I kinda agree with him that's bad.

Especially for a v1 we shouldn't do this. Once v1 has been out for a while and perf metrics are provided, we could reconsider, but note that nobody ever provided such metrics for encodeInto().

@syg
Copy link

syg commented Dec 13, 2023

If subarray is slow, shouldn't that be optimized instead? Letting each API pay the price seems like the wrong tradeoff.

Can you elaborate on what "optimizing subarray" means? I don't think this is as widespread as "each API", it's more like "each API that wishes to be as high performance as possible", which are much rarer.

V8's opinion here is that we have numerous internal complaints of TextDecoder's poor performance that points to static APIs that take explicit buffers and offsets as an incremental step, and the encoding version ought to be symmetric.

@phoddie
Copy link

phoddie commented Dec 13, 2023

FWIW – I agree with the comments of both @annevk and @syg. Passing the subview bounds as API arguments increases complexity, all the more so here where the is an input and output buffer. But, performance also matters.

An alternative approach could keep the API minimal and support high performance. The performance issue arises largely, as I understand (and observe in Moddable's work on embedded), from having to create a new view to access different parts of the buffer. That's necessary because the byteOffset and byteLength properties of a view are fixed at construction. If they were mutable, a single view instance could be used to access different parts of the buffer.

For a very long time, there was no precedent for views changing after construction. @syg's proposal for Resizable Array Buffer changed that by allowing TypedArrays to opt-into having their byteLength property track the size of the underlying resizable buffer. We could build on that precedent to allow a view to opt-into having mutable byteOffset and byteLength properties.

Such an approach would require careful consideration (compatibility, memory safety, etc). The benefits seem worthwhile. It would keep Base64 APIs simple and while allowing them to be performant. It would also benefit other situations, such as the TextDecoder performance issue Shu cites.

@annevk
Copy link
Member

annevk commented Dec 14, 2023

V8's opinion here is that we have numerous internal complaints of TextDecoder's poor performance that points to static APIs that take explicit buffers and offsets as an incremental step, and the encoding version ought to be symmetric.

Given that these complaints haven't surfaced until now (and performance metrics still haven't surfaced to my knowledge) while encodeInto() has been around for a long time, it's not clear that should drive decision-making here.

I'm more than willing to accept that we have to change certain APIs to also take an offset, but that should be done in a much more deliberate manner with data to back it up.

We agreed on the fundamental design here in 2015/2018:

There's some further background in whatwg/encoding#172.

@syg
Copy link

syg commented Dec 14, 2023

There're two threads here I'd like to tease apart:

  1. TextEncoder / TextDecoder APIs have performance issues in some Google workloads, and whether those APIs should change or amended.
  2. The design of this proposal needing to be analogous to TextEncoder / TextDecoder.

We should strive for (2) but it is not a hard requirement, but I'm sensing that @annevk (and perhaps @domenic) weigh it more than TC39 is currently treating it?

Given that these complaints haven't surfaced until now (and performance metrics still haven't surfaced to my knowledge) while encodeInto() has been around for a long time, it's not clear that should drive decision-making here.

Are you saying our performance desires shouldn't drive design of this TC39 proposal because we've seen performance issues in TextDecoder and Encoder, and keeping a close analogy with that existing API trumps that concern until you can validate that performance issues in the analogous API? That is, does this boil down to a judgment call of the DX footgun of a (small IMO) divergence here that can be healed with future proposals?

I hear you on the "first I've heard of this part", I'll try to compile some numbers.

@annevk
Copy link
Member

annevk commented Dec 14, 2023

You're saying you want a design that's different from TextDecoder because of concerns with that API that haven't been made public since the API was added. I'm saying we need strong justification for divergence of established patterns, especially when past proposals for equivalent API shapes (taking an offset) were rejected as documented in the referenced threads.

And to be clear, performance is also important for encodeInto(). It's important for all bring-your-own buffer APIs.

@syg
Copy link

syg commented Dec 14, 2023

You're saying you want a design that's different from TextDecoder because of concerns with that API that haven't been made public since the API was added. I'm saying we need strong justification for divergence of established patterns, especially when past proposals for equivalent API shapes (taking an offset) were rejected as documented in the referenced threads.

Thanks, that's clearer. My read of the past rejection is @domenic's (and yours?) preference that we do not design around performance limitations around object allocation, against which I am now raising the contradictory point from V8 that we do care about it. It's fair to ask for why we reached this conclusion; I'll work to provide something there.

Still I'd like to push back against "we need strong justification for divergence of established patterns", when that pattern is outside the purview of JS core stdlib and the propose divergence doesn't preclude re-convergence later.

It's important for all bring-your-own buffer APIs.

Yep, that's the goal here. I think we agree that high performance BYOB APIs are important. V8 in particular wants new APIs to have BYOB overloads that make the "right" (which we are debating) tradeoffs between DX and performance.

@annevk
Copy link
Member

annevk commented Dec 14, 2023

I don't see how it matters where the pattern has been established. No need to ship the org chart.

@ljharb
Copy link
Member

ljharb commented Dec 14, 2023

It's not about the org, it's about what constraints and use cases the API was designed with and for.

@annevk
Copy link
Member

annevk commented Dec 15, 2023

As I explained above those are the same.

@jridgewell
Copy link
Member

I think @bakkot is reacting to me saying subarray is slow. My measurements are for a source map VLQ library from a few years ago, let me whip up a new comparison that shows the relative performance of my current single-subarray vs the slower recreating-subarray approach.

@bakkot
Copy link
Collaborator Author

bakkot commented Dec 15, 2023

It's apparently also expensive in JSC, though I haven't benchmarked that either.

@jridgewell
Copy link
Member

jridgewell commented Dec 15, 2023

https://jsbench.github.io/#94c46cc814cab852a54aef37c56d59c4 shows a marginal improvement in Chrome and Safari (no change in Firefox) when using a single subarray rather than creating multiple. I remember this being a much larger difference. But even a relatively small number of subarrays (2 vs 36 in this example) is enough to impact the performance even when doing relatively expensive calculations (encoding source mappings into VLQ mappings).

@dead-claudia
Copy link

https://jsbench.github.io/#94c46cc814cab852a54aef37c56d59c4 shows a marginal improvement in Chrome and Safari (no change in Firefox) when using a single subarray rather than creating multiple. I remember this being a much larger difference. But even a relatively small number of subarrays (2 vs 36 in this example) is enough to impact the performance even when doing relatively expensive calculations (encoding source mappings into VLQ mappings).

@jridgewell I thought Base64 VLQ parsing and generation shouldn't be doing full Base64 <-> binary byte sequence conversion, just alphabet <-> 6-bit segment. It's not only much faster to do it directly letter by letter, but also easier.

Could you explain how full Base64 byte sequence encoding and decoding come into play here aside from just decoding and encoding the content of inline data:application/json;charset=utf-8;base64,... source map URLs? I'm struggling at the premise here.

@jridgewell
Copy link
Member

Huh? Discussion in this PR thread is about whether we should lean into typed array subarrays for the API, or should we use offset indexes into an existing typed array. The source map VLQ decoding is totally separate from this proposal, I'm using a library that I maintain to show a real performance impact caused by calling typedArray.subarray(…) repeatedly.

IMHO, we should be using offset indexes. And, if someone wants to use subarrays because it makes their code cleaner, then it'll transparently work because the offset indexes can default to 0.

@dead-claudia
Copy link

@jridgewell Oh, okay. Sorry about that then. Feel free to ignore me.

@bakkot
Copy link
Collaborator Author

bakkot commented Jan 7, 2024

I am basically neutral on outputOffset. Since it's controversial, I'm going to split it out into its own PR: #34, and land the remaining less controversial parts as is.

@bakkot bakkot merged commit 82b8b6d into main Jan 7, 2024
1 check passed
@bakkot bakkot deleted the byob branch January 7, 2024 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

encoding into an existing buffer Streaming / multi-shot
7 participants