Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming / multi-shot #13

Closed
bakkot opened this issue Aug 25, 2021 · 51 comments · Fixed by #33
Closed

Streaming / multi-shot #13

bakkot opened this issue Aug 25, 2021 · 51 comments · Fixed by #33

Comments

@bakkot
Copy link
Collaborator

bakkot commented Aug 25, 2021

@phoddie expressed a desire for this API to support streaming or some other multi-shot API. Let's use this issue to discuss what that might look like.

Note that there is also a WHATWG proposal for base64 encoding. Currently that proposal does not include streaming, with the comment

this could be later implemented in a BinaryEncoderStream / BinaryDecoderStream, or using the same synchronous API as text encoding, using a stream: true option on the encode / decode methods.

For prior art, see the stream option in the TextDecoder API, the encodeInto method in the TextEncoder API, and the TextEncoderStream and TextDecoderStream APIs (which are transform streams).

@bakkot
Copy link
Collaborator Author

bakkot commented Sep 2, 2021

Some thoughts on one possible design:

the binary -> string case

JavaScript doesn't have anything along the lines of a StringBuilder. So, there's two options here: either

  • you have a API which users feed chunks of binary data and get chunks of no-padding-needed base64 strings out, with the API holding on to the last one or two bytes which would not fit in such a string (up until they finalize and get the last possibly-padded chunk), as with TextDecoder, or
  • they manage the chunks of strings themselves.

Personally I really dislike stateful APIs if they can possibly be avoided, so I'd like to explore the second possibility. The only thing which is really needed here is the ability to append to an existing base64 string, replacing padding (both the extra 0 bits and the =) with new data. That could easily be provided with an optional prefix argument to toBase64. Given that, streaming would look like

let prefix = '';
for (let bufferChunk of chunks()) {
  // assume bufferChunk is an ArrayBuffer
  let chunk = bufferChunk.toBase64({ padding: false, prefix });
  let paddingChars = chunk.length % 4;
  let head = chunk.substring(0, chunk.length - paddingChars);
  let prefix = chunk.substring(paddingChars);
  commit(head);
}
if (prefix !== '') {
  commit(prefix.padEnd(4, '='));
}

That's... not terrible, but not great. Bad enough to warrant a stateful API? I'm not sure.

the string -> binary case

As with the above, either

  • you have a stateful API which consumes input strings, converts as many characters as possible to binary, and holds on to any characters which don't fit cleanly into a multiple of 4 to use as a prefix of the next input string, or
  • users have to manage that on their own.

The "manage it on your own" option looks very similar to the code above, except that you'd split the excess characters off from the input rather than the output. Also if you were building up the whole binary data in memory, fromBase64 would need to write into an existing buffer at an offset. That is, basically,

let buffer = new ArrayBuffer(SIZE); // if you don't know the size up front you'd want to use a resizable buffer and call `.resize` before writing each chunk
let offset = 0;
let prefix = '';
for (let stringChunk of chunks()) {
  stringChunk = prefix + stringChunk;
  let paddingChars = stringChunk.length % 4;
  let initial = stringChunk.substring(0, stringChunk.length - paddingChars);
  ArrayBuffer.fromBase64(initial, { target: buffer, offset });
  offset += initial.length / 4 * 3;
  prefix = stringChunk.substring(stringChunk.length - paddingChars);
}
if (prefix !== '') {
  ArrayBuffer.fromBase64(prefix, { target: buffer, offset });
}

Again, not terrible, but not great. (It would be improved by having a slightly different shape for fromBase64; e.g., in the case of writing to an existing buffer, you might want a prototype method rather than a static.)


Thoughts?

@jridgewell
Copy link
Member

That's... not terrible, but not great. Bad enough to warrant a stateful API? I'm not sure.

I disagree, those are both pretty terrible. Because it requires knowing the 4/3rds internal details of base64, it's difficult to verify and test. Having a decoder instance that can handle these details seems much nicer.

But, to I dislike the { stream: boolean } option provided by the TextDecoder API. I've had devs think "I'm streaming, so I should set stream to true" and not understand they needed to flip it to false for the final call. Having a second finialize method to end the stream would probably be easier to teach.

@bakkot
Copy link
Collaborator Author

bakkot commented Sep 2, 2021

Because it requires knowing the 4/3rds internal details of base64

If that's the only concern, maybe we can come up with a design which mitigates it without holding state - something which returns an offset representing the number of bytes/characters of input consumed, e.g.

I'll try to think more about this, though I'm not totally sure we can come up with something reasonable which is also ergonomic as a one-shot API. And if there is no design which is ergonomic for both one-shot and streaming, I'd want to just not pursue streaming in this proposal. It sounded like @phoddie doesn't agree, though.


Having a second finalize method to end the stream would probably be easier to teach.

I agree in principle, but I'm extremely reluctant to introduce into the web platform yet another way of managing state while doing streaming encoding.

As I see it, the options are

  • don't support streaming
  • some stateless affordances for streaming, as above
  • some new stateful API
  • copy the TextDecoder API
  • streams

I'm fine with either of the first two. I would not want to pursue this proposal if it requires any of the last three - in particular, I don't want to pursue a new stateful API, and if we decide to follow the TextDecoder or streams precedents, I think that should be done through WHATWG.

@phoddie
Copy link

phoddie commented Oct 12, 2021

Apologies for being slow to post here.

I agree with @jridgewell that the API should not require the caller to understand the details of Base64 encoding.

I also agree with @bakkot that it would be best to avoid stateful API here. A good solution should keep one-shot calls simple, and allow streaming in a straightforward way.

Ideally the toBase64 and fromBase64 cases will follow the same same API style. The notes that follow focus on the string to binary case, but should be adaptable to the binary to string case.

As proposed by @bakkot, the one shot with or without default options is straightforward. The options argument is provided, as I understand it, to support Base64 variants.

result = ArrayBuffer.fromBase64(string);
result = ArrayBuffer.fromBase64({...}, string);

Streaming can be signaled by an option, such as more. When streaming, the return value is a copy of the options object with the unused source string characters, if any, added in a remainder property and the converted binary data in buffer. The result is the state that is then passed directly to fromBase64 as-is to continue.

result = ArrayBuffer.fromBase64({..., more: true}, base64StrPart1);
result = ArrayBuffer.fromBase64(result, base64StrPart2);

The caller may choose to join together the result.buffer to form a single ArrayBuffer. In some cases, that may not be necessary as the partial data buffers could be written directly to a file, flash memory, etc.

There's no need for a flush or finalize as this API should always process as much of the input as possible (this is different from TextDecoder, because TextDecoder allows the option of processing invalid data).

A possibly useful side-effect of more is that, even with a single string, the caller can learn if there are trailing unparsed characters.

@bakkot
Copy link
Collaborator Author

bakkot commented Jan 19, 2022

@phoddie I would be extremely reluctant to add a method whose return type depends an option in an options bag. That's not something the language usually does, and is I think pretty confusing for readers (it makes it very hard to follow code like let x = ArrayBuffer.fromBase64(str, opt);, since now you have to know precisely what opt holds in order to know what kind of thing x is.

We could, I suppose, add a separate fromPartialBase64 method which always returned a fresh options bag object. I'm not personally convinced this use case really warrants an entirely new method. (It seems like it would really be a better fit for a WHATWG stream transformer, at that point.) Still, either way, thoughts?


Separately:

In some cases, that may not be necessary as the partial data buffers could be written directly to a file, flash memory, etc.

I can see how you could do this with the string-to-binary case, but not how you could do this with the binary-to-string case: either the API always includes padding (including potentially zero bits in the last character), in which case you need to manually strip those off before committing a chunk (and think about how to handle the last character), or the API does not always include padding, in which case you need a TextDecoder-style "this is the final chunk" bit which indicates that the API should include padding bytes on this occasion.

@lucacasonato
Copy link
Member

lucacasonato commented Jan 19, 2022

What if we move this TC39 proposal forward without streaming, and then I'll switch my related WHATWG proposal over to a streaming based API using WHATWG streams (BinaryEncoderStream and BinaryDecoderStream)?

The latter may be unsuitable for embedded use cases that lack implementation WHATWG streams (I assume this is what you want @phoddie). For these use-cases, the solution proposed by @bakkot above (manually streaming) seems very reasonable. A well maintained and tested userland package could encapsulate this code.

For base64 and hex encoding use cases non streaming is far more often used than streaming. Most often you are hex encoding very small buffers (cryptography digests for example), so streaming is really not a concern. If we say that the split here is 95% fixed size to 5% streaming, and out of those 5% streaming, maybe another 5% are on embedded, that is a rather small group (0.25% of all users). It seems reasonable to me that we should try to get this proposal shipped without streaming to the 95% group, then deal with the 4.75% group in WHATWG, and then leave the final 0.25% to userland modules for now.

@phoddie
Copy link

phoddie commented Jan 19, 2022

@bakkot – thanks for following up.

I would be extremely reluctant to add a method whose return type depends an option in an options bag. That's not something the language usually does

I understand.

The notes above are a rough first attempt to define a single function that can be used both for simple one shot and streaming conversions. Is your primary objection is that the return value cannot be predicted by counting the arguments? If so, perhaps that is possible to solve.

I can see how you could do this with the string-to-binary case, but not how you could do this with the binary-to-string case: either the API always includes padding.... in which case you need a TextDecoder-style "this is the final chunk" bit which indicates that the API should include padding bytes on this occasion.

Correct. The binary-to-string implementation would not pad when more is true. The behavior would be similar to TextDecoder in requiring a flush when streaming, as you describe.

@bakkot
Copy link
Collaborator Author

bakkot commented Jan 19, 2022

Is your primary objection is that the return value cannot be predicted by counting the arguments?

I'd strongly prefer functions be entirely consistent in their return type, but it's worst when the return type is not predictable from the use site, yes. I don't know if I want to call that "primary"?

The binary-to-string implementation would not pad when more is true.

So what does, e.g., (new Uint8Array([1])).buffer.toBase64({ more: true }) return?

@phoddie
Copy link

phoddie commented Jan 20, 2022

I'd strongly prefer functions be entirely consistent in their return type...

Agreed.

...but it's worst when the return type is not predictable from the use site, yes.

That's helpful, thanks.

Stating the obvious: always returning an object gives a consistent return type at the expense of making the simple case a little less simple.

The binary-to-string implementation would not pad when more is true.
So what does, e.g., (new Uint8Array([1])).buffer.toBase64({ more: true }) return?

Since 3 input bytes are required to generate unpadded output, the Base64 string output is empty and there is a remainder of one byte. Following the idea described above for fromBase64 the return value would be something like {value: "", remainder: [[ArrayBuffer of a single byte]]}.

@jridgewell
Copy link
Member

jridgewell commented Jan 20, 2022

I don't see why a weird return object, where you have to ferry data from the last return value to the next call, is better than a stateful API.

let result = '';
let last;
for (let chunk of chunks()) {
  last = ArrayBuffer.fromBase64(chunk, last?.remainder);
  result += last.value;
}
// How do you even cap the stream?

Compared to a stateful API:

let result = '';
let decoder = new ArrayBufferStream('base64');
for (let chunk of chunks()) {
  result += decoder.decode(chunk);
}
if (!decoder.finish()) {
  throw new Error('incomplete stream detected')
}

With the RAII programming, you could let that happen automatically.

@bakkot
Copy link
Collaborator Author

bakkot commented Jan 20, 2022

@phoddie

Stating the obvious: always returning an object gives a consistent return type at the expense of making the simple case a little less simple.

That's true. In my experience the simple (one-shot) use case is by far the most common one, so this would be a quite high cost, to my eyes.


@jridgewell

I find that stateful APIs are usually harder to reason about and work with, all else being equal. (Just to give an example: when you're managing the state yourself it's trivial to serialize the current state to localStorage or hand it off to a Worker. There's no way to do that when the state lives inside an opaque Encoder object.)

But yes, it may be that we can't find a way to support multi-shot en/decoding in a way which is both stateless and simple. If we do end up so concluding, my opinion would be that a WHATWG transform stream is the right abstraction for a stateful API.

(That said, I still think it would make sense to support a prefix argument for toBase64, since that's the only thing you absolutely need to build a streaming API yourself without having to know anything about which characters represent which bits in base64.)

@mhofman
Copy link
Member

mhofman commented Jan 30, 2023

To build a streaming helper on top of this API, you need the API to accept multiple buffers to avoid requiring the user from having to concatenate them. That way no need to pass the previous remainder as an option, you simply pass it as a value:

const leftovers = [];
let leftoverLength = 0;
for (let chunk of chunks()) {
  const chunkLength = chunk.length;
  if (leftoverLength + chunkLength < 4) {
    leftovers.push(chunk);
    continue;
  }
  const chunkEndIndex = chunkLength - (leftoverLength + chunkLength) % 4;
  leftovers.push(chunk.slice(0, chunkEndIndex));
  // The trick here is for base64 methods to accept an array of values
  // so that the caller doesn't have to stitch the leftovers together
  const decoded = ArrayBuffer.fromBase64(leftovers.splice(0));
  leftoverLength = chunkLength - chunkEndIndex;
  if (leftoverLength) {
    leftovers.push(chunk.slice(chunkEndIndex));
  }
  yield decoded;
}
if (leftoverLength) {
  yield ArrayBuffer.fromBase64(leftovers);
}

Same for the byte -> string, but with chunk.size instead of chunk.length, 3 instead of 4, and of course toBase64 instead of fromBase64.

@bakkot
Copy link
Collaborator Author

bakkot commented Mar 24, 2023

@mhofman Unfortunately many base64 strings include whitespace, so you can't just count the length of the string or trim off the last few characters.

@bakkot
Copy link
Collaborator Author

bakkot commented Mar 24, 2023

Talked about this with @phoddie. I think a slight tweak to the design proposed above is something I could live with. Looking for feedback.

The design is, the API always returns { value, extra } pairs. You pass an extra parameter indicating whether extra data is expected (as is done in TextEncoder). In the case that the extra parameter is omitted, the returned extra is guaranteed to be null; otherwise you can pass it in to subsequent calls and it will be prepended before decoding. This is similar to TextEncoder, but stateless (or rather the user manages state, instead of having a whole class to do it).

In the one-shot case, the main downside of this design is that you need to destructure the value out of the result. This is so the result type is consistent. I could maybe be convinced that the result type should be polymorphic based on whether more: true was passed, but I generally quite dislike APIs where the return type depends on the input.

Here's the buffer->string case:

// one-shot
let { value } = buffer.toBase64();
// value is a string including any necessary padding
// the `extra` property is guaranteed to be null because `more: true` was not passed


// multi-shot
let { value, extra } = chunk.toBase64({ more: true });
emit(value); // value is a string of 4n base64 characters for some n; no padding is included because none is necessary
// extra is a length 0-2 ArrayBuffer holding the bytes which didn't fit cleanly

({ value, extra } = chunk.toBase64({ more: true, extra })); // keep processing by repeating this call for each chunk
emit(value);

// when there are no more chunks, finalize by encoding `extra`; don't include `more: true`
({ value } = extra.toBase64());
emit(value);
// value is a possibly-empty string including any necessary padding
// `extra` is guaranteed to be null because `more: true` was not passed

And here's string->buffer:

// one-shot
let { value } = ArrayBuffer.fromBase64(string);
// value is an ArrayBuffer
// `extra` is guaranteed to be null because `more: true` was not passed


// multi-shot
let { value, extra } = ArrayBuffer.fromBase64(chunk, { more: true });
emit(value); // extra is a length 0-3 string holding the characters which didn't fit cleanly

({ value, extra } = ArrayBuffer.fromBase64(chunk, { more: true, extra })); // keep processing by repeating this call for each chunk
emit(value);

// when there are no more chunks, finalize by encoding `extra`; don't include `more: true`
({ value } =  ArrayBuffer.fromBase64(extra));
emit(value);

In a more real-world use, here's what it looks like to make an HTML TransformStream for encoding a stream of ArrayBuffers into a stream of well-formed base64 strings:

class BufferToStringTrasformStream extends TransformStream {
  #extra = null;
  constructor(alphabet) {
    super({
      transform: (chunk, controller) => {
        let { value, extra } = chunk.toBase64({ extra: this.#extra, more: true });
        this.#extra = extra;
        controller.enqueue(value);
      },
      flush: (controller) => {
        if (this.#extra == null) return; // stream was empty
        let { value } = this.#extra.toBase64();
        controller.enqueue(value);
      },
    });
  }
}

I'm not sure about the precise details yet - for example, maybe there's a better name than extra, maybe it should be positional instead of being in an options bag, etc. I'm hoping that the name of more is less confusing than TextEncoder's stream, which as @jridgewell points out above is quite confusing.

Like I said, looking for feedback.

@bakkot
Copy link
Collaborator Author

bakkot commented Apr 6, 2023

Update: I implemented a variant of the thing described in the previous comment and, in doing so, decided it really warranted two separate methods, so the simple case could remain maximally simple. I've documented both on the new playground.

@lucacasonato
Copy link
Member

I am not at all convinced of the need for the multi-shot API in the language. 90% of people will use only the single shot API. The next 9% will want to do streaming on the web, where their source and sink are WHATWG streams. The current proposed streaming API caters to the last 1%, who need streaming with some custom source and sink.

Using the streaming API with web streams is a pain already:

class BufferToStringTransformStream extends TransformStream {
  #extra = null;
  constructor(alphabet) {
    super({
      transform: (chunk, controller) => {
        let { result, extra } = chunk.toPartialBase64({
          alphabet,
          extra: this.#extra,
          more: true,
        });
        this.#extra = extra;
        controller.enqueue(result);
      },
      flush: (controller) => {
        if (this.#extra == null) return; // stream was empty
        let { result } = this.#extra.toPartialBase64({ alphabet });
        controller.enqueue(result);
      },
    });
  }
}

Implementing this on-top of the single shot API is not significantly more code: https://gist.github.com/lucacasonato/06a74fe2658fbe5a2d9c24cc767006c0

I would prefer we drop the streaming API from the TC39 proposal, and move forward with only the single-shot API, and then move forward with a WHATWG streams version of the streaming API. This will satisfy significantly more users.

@ljharb
Copy link
Member

ljharb commented Jul 12, 2023

I also prefer we drop the streaming API; I think a follow-on proposal would be more appropriate if it's indeed well-motivated on its own.

@jridgewell
Copy link
Member

Implementing this on-top of the single shot API is not significantly more code: gist.github.com/lucacasonato/06a74fe2658fbe5a2d9c24cc767006c0

You're focusing on the encoding, but not decoding. It's impossible to implement streamed decoding using the single-shot API.

I would prefer we drop the streaming API from the TC39 proposal, and move forward with only the single-shot API, and then move forward with a WHATWG streams version of the streaming API. This will satisfy significantly more users.

Either way is fine with me.

@bakkot
Copy link
Collaborator Author

bakkot commented Jul 17, 2023

@phoddie Would you be ok with dropping streaming from this proposal?

@phoddie
Copy link

phoddie commented Jul 17, 2023

@bakkot – Not really. The motivation for wanting streaming hasn't changed since this discussion began.

@ljharb
Copy link
Member

ljharb commented Jul 17, 2023

@phoddie is there a reason the streaming part would have to be coupled with the non-streaming part? Meaning, why not as a follow-on?

@bakkot
Copy link
Collaborator Author

bakkot commented Jul 17, 2023

@phoddie A couple people on the committee said they felt streaming was more complicated and less well-motivated than the one-shot case. Given that we generally agree that the one-shot version is useful, and that we've sketched out what the streaming version could look like in sufficient detail that we know it would be compatible with the one-shot version, is there a reason the two still have to be coupled?

It seems a shame to say that we can't advance a broadly useful thing just because there is a different, more niche thing in a related area. The only reason I see to couple is because of cross-cutting concerns, but it seems to me we've proved out that the design of the former is compatible with a potential future addition of the latter, if at a later date everyone is convinced it belongs in the language.

@phoddie
Copy link

phoddie commented Jul 17, 2023

@ljharb – It is exceedingly unlikely that there will be support for a streaming option in TC39 once It appears in WHATWG.

@bakkot – Being generally useful is not a "niche thing." TC39 is creating a general-purpose programming language. The motivation for the one-shot API is convenience not generality. Delegating the general purpose cases to other bodies focused on specific domains (web, embedded, etc) is likely to lead to different APIs to do about the same thing.

@bakkot
Copy link
Collaborator Author

bakkot commented Jul 17, 2023

Working with streaming data doesn't need to look like working with single-shot data; on many platforms, including the web platform, it looks pretty different. I don't see why potentially having different APIs is such a fatal issue that it means we can't have the one-shot version in the language unless we also have the streaming version.

I get that you want to have the streaming version as well. But I don't understand why that means the one-shot version can't be allowed to advance on its own merits, with a streaming version to be added if and when someone does the work of convincing the committee that the streaming version should also advance on its own merits. More general tools are not always strict improvements over more specialized ones.

@ljharb
Copy link
Member

ljharb commented Jul 17, 2023

@phoddie i'm not sure what you mean - if the streaming option is well-motivated, why would the non-streaming version advancing hold it back?

@phoddie
Copy link

phoddie commented Jul 17, 2023

@ljharb – You appear to have misread my intent. If TC39 provides only a one-shot, WHATWG will likely provide streaming. Once that happens, there will not be support in TC39 for streaming, no matter how well-motivated.

@kriskowal
Copy link
Member

Perhaps implicit in @phoddie’s comment, Moddable implements EcmaScript and does not track web standards, so standardizing streaming in 262 is preferable.

That could mean subsuming some web standards into 262, on the precedent of TypedArrays. I’m sure that would have worked out better with TC39 involved earlier.

I was also among the folks who would prefer separating the one-shot and streaming layers. Like @ljharb, I would like to see both layers advance, but am sufficiently eager to want the non-streaming layer to advance faster. But, I see @phoddie’s argument that we can use the political will for non-streaming to drive streaming, before a web API like a Base64Decoder based on TextDecoder emerges and takes the will away.

That said, I also agree with @lucacasonato that the streaming case is a much more vast design space and that the ergonomics of a streaming API are tightly coupled to the notion of a stream. The low-level primitives proposed make the extra bytes a very visible concern which is easy to hide in an API crafted in terms of stream adapters.

EmcaScript does have a stream primitive, async iterators, around which streaming base64 encoder/decoder could be designed and the extra state hidden. But, again, the design space for streaming is much more vast than non-streaming.

@lucacasonato
Copy link
Member

You're focusing on the encoding, but not decoding. It's impossible to implement streamed decoding using the single-shot API.

@jridgewell This is not correct. Encoding and decoding are both implementable using the single shot API. See my examples here:

https://gist.github.com/lucacasonato/06a74fe2658fbe5a2d9c24cc767006c0#file-base64decoderstream-js

@lucacasonato
Copy link
Member

I echo @kriskowal's point on the design space of streaming. If we want to advance streaming in TC39, we should seriously consider using the streaming primitive we already have in the language (iterators / async iterators). An async iterator API would have the best interop story with WHATWG streams: const sink = ReadableStream.from(base64DecodeStream(source))

I will pursue standardizing WHATWG stream versions of base64 and hex decoders / encoders regardless of the outcome of streaming here. This is because native WHATWG streams have significantly higher potential for internal optimization by implementers like Deno, when either source and sink are WHATWG streams. This optimization potential can not be unlocked by any API we standardize at TC39.

@bakkot
Copy link
Collaborator Author

bakkot commented Sep 14, 2023

@phoddie Will you be able to attend any of the September meeting? I'd like to choose a path forward at that meeting so this can be unblocked.

@phoddie
Copy link

phoddie commented Sep 15, 2023

@bakkot – I can arrange to attend some part of the September meeting. Honestly, I'm not confident the current discussion points to a path forward.

@bakkot
Copy link
Collaborator Author

bakkot commented Sep 15, 2023

Well, that would be useful information too, but I want to at least talk about it in plenary first. I've just added an item to discuss this to the agenda - if you add your schedule constraints I expect the chairs will be able to accomadate whatever timeslot works best for you.

@phoddie
Copy link

phoddie commented Sep 20, 2023

Regarding the above comment by @lucacasonato that streaming "decoding [is] implementable using the single shot API", the example only works by assuming the Base64 source text has no whitespace. The draft spec text, however, allows for whitespace in steps 4 and 5. In my experience, it is valuable for a Base64 decoder to support white space so the final spec text should fill in these steps.

@lucacasonato
Copy link
Member

@phoddie Adding support for whitespace is trivial in my streaming demo. I've moved my demo from a gist to a repo with tests: https://github.com/lucacasonato/base64_streams. You can look at the source code, and try it out in a REPL. I have also added a demo using sync-iterator as the stream driver, rather than using WHATWG streams.

You can see the implementation supporting whitespace here and here.

@phoddie
Copy link

phoddie commented Sep 25, 2023

@lucacasonato, thank you for the update.

Adding support for whitespace is trivial in my streaming demo...

Of course.

There are disadvantages to such an approach, both to the runtime and the evolution of the specification.

It requires an additional pass over the source text (with a regular expression!). That's slower. If there is whitespace, it requires an extra copy of the source text, at least temporarily. That uses more memory and increases GC pressure. The additional runtime overhead is unwelcome in the constrained environments that are a priority for my work.

It also increases the knowledge that the streaming implementation (outside the language) needs to have of the algorithm implemented by the one-shot decoder (inside the language). That has the potential to limit future enhancements to the one-shot Base64 decoder.

@bakkot
Copy link
Collaborator Author

bakkot commented Sep 25, 2023

I'm open to changing the single-shot decoder so that it skips over whitespace, if that would mitigate concerns about the overhead of a userland streaming version.

I think once the single-shot version is landed we're unlikely to be able to change its default behavior anyway, so I'm not as worried about consumers limiting potential future enhancements. Any future enhancements would almost certainly need to be opt-in via an option.

@phoddie
Copy link

phoddie commented Sep 26, 2023

I'm unsure it is that straightforward. Base64 encoding has been around a long time. There are plenty of variations and, quickly reviewing relevant RFCs suggests that even whitespace handling has some subtlety. Making future changes are opt-in doesn't guarantee that they would continue to allow a streaming decoder shim to make use of the one-shot API to efficiently to support them.

@phoddie
Copy link

phoddie commented Sep 28, 2023

@bakkot – I understand your position to be that the one-shot APIs are sufficient because they can be used to implement streaming with reasonable efficiency and without significant future compatibility concerns. If that is indeed the case, I will withdraw my objection.

One way to achieve that is with work that likely needs to be done anyway to move this proposal forward. Complete the placeholder sections of the draft specification. Then review those against the relevant RFCs, Base64 libraries, etc. to evaluable whether there are any gaps. I am open to helping with that review.

@bakkot
Copy link
Collaborator Author

bakkot commented Sep 28, 2023

Great to hear, @phoddie. I'll flesh out the spec text and ensure there's a complete streaming implementation in userland which supports all the same options.

I've reviewed the RFCs I'm aware of already, as well as libraries and implementations in other languages, and I'm reasonably sure the options I've carved out will cover the overwhelming majority of current base64 producers and consumers (possibly with some relatively small additional work on the user side; e.g. this API will not produce valid 998-character-lines MIME-formatted output, but the user can do so without too much extra work). I'll be sure to document the correspondence with RFCs and libraries more rigorously before coming back as well and will ping you when that's done.

@phoddie
Copy link

phoddie commented Sep 28, 2023

I'll be sure to document the correspondence with RFCs and libraries more rigorously

That will be helpful, thank you.

@dead-claudia
Copy link

I'd like to leave a point of feedback for streaming decoding: the extra buffer could result in some minor performance impact. It requires 2-3 GC allocations minimum, and this will almost always leave the nursery (as it survives individual chunks), slowing collection.

I don't currently have hard numbers here, just anticipating from experience.

I could see this being mitigated one of three ways:

  1. Accepting a pre-allocated extra buffer and "buffered" length, and returning the buffered length. Awkward and requires a bit more memory
  2. Making extra a bit-packed (31-bit) integer. The buffer only needs 3 bytes at most and the length can fit into 2 bits, resulting in 26 bits total. Structuring the "extra" buffer as a byte0 | byte1 << 8 | byte2 << 16 | length <<< 24 bit field could enable it to fit in a small integer, and that bitfield is byte-indexable. This results in basically no significant encode/decode overhead beyond just the options bag (which would never leave the nursery).
  3. Creating dedicated/encoder types with consume and end methods that return the resulting chunks. Obviously sounds very TextEncoder/TextDecoder-ish, but it's basically the same solution to the same problem.

@bakkot
Copy link
Collaborator Author

bakkot commented Oct 26, 2023

I've removed the streaming API, documented the space of allowed variation in the RFCs and other languages, filled out the specification, and written a reasonably efficient forward-compatible streaming implementation on top of the one-shot API, in #27. (Links go to files within that PR.)

Please take a look.

@phoddie
Copy link

phoddie commented Nov 1, 2023

@bakkot – Thank you for your work on this. The proposal is considerably more complete. That definitely helps. I've only had a chance to take brief look at the text, but did take a longer look at the implementation of Base64.decode().

I think it is important to the discussion to characterize what you consider "reasonably efficient" here" Would you quantify that or, at least, share an explanation as to how you concluded that? There is a comment in the code that seems relevant, "requires 1 additional pass over chunk, plus one additional copy of chunk."

@bakkot
Copy link
Collaborator Author

bakkot commented Nov 1, 2023

@phoddie Sure:

The encoder requires copying only a few extra bytes plus 2-3 extra typedarray.subarray calls per chunk. (One of those subarray calls could be eliminated by copying a few bytes manually.) It also requires doing one string concat per chunk. Whether that is expensive depends on the underlying implementation.

The decoder is trickier. It needs to know how many actual base64 (non-whitespace) characters there are so it can cut off the last chunk, which requires a pass over the string. After that, whether or not there is whitespace, it requires two slices (one to get the bulk of the string, one to get the last chunk). It also requires a string concat per chunk (to prepend the last few characters from the previous chunk to the next one). But it does not require any regular expressions or a find-replace or other string transformation, just a scan and concat + slices. Again, whether that is expensive depends on the underlying implementation.

I can't quantify it beyond that; it's going to depend too much on the details of an engine's string implementation. But this is not an amount of overhead I personally would be worried about. The most important thing is that it has fixed + small constant factor memory overhead per chunk, so the overhead is unlikely to be the dominant factor in whether this is usable.

It's forward compatible in that it just forwards all of the options to the underlying implementation. It does hardcode the list of whitespace, though as I've said in a comment there that's just for simplicity and it could instead hardcode the alphabets, to be robust. But, honestly, I do not expect the set of whitespace to change, or for that matter any additional options to be added.

@phoddie
Copy link

phoddie commented Nov 2, 2023

@bakkot – I'll restrict my comments for the moment to the decoder, as the encoder seems less of an efficiency concern.

Again, whether that is expensive depends on the underlying implementation.

Agreed. This is a generally true statement about JavaScript code, independent of Base64. Here we are specifically talking about the efficiency of an implementation on a resource-constrained device, where the engine is less able to apply optimizations that are common in engines used in browsers. We can't assume JavaScript on such a device has the same performance characteristics as JavaScript on a computer.

Still, we make a very rough estimate of the relative overhead by reviewing the operations involved:

  • A loop over every character in the input string
  • Each iteration of the loop, extracts one character from the input string
  • Each extracted character is tested for membership in a set
  • Local variable state is updated based on the test
  • When extra characters are present (likely ~75% of the time)
    • A (likely shorter) additional loop, again extracting characters one at a time and testing set membership
    • Making a (nearly) full copy of the input

Let's assume a JIT that optimally translates the JavaScript to native code (which is far from reality for an engine on a resource-constrained device). The initial loop over the entire input is at least as complex as Base64 decoding, so execution time is at least doubled (quite a bit more in reality). Additionally, the peak memory use of the operation is increased by two-thirds and the load on the garbage collector is increased.

This is conjecture informed by experience. With some time, we can build a test to get more precise results. Still, especially for larger inputs -- which is when streaming is needed -- this doesn't obviously feel "reasonably efficient."

@bakkot
Copy link
Collaborator Author

bakkot commented Nov 2, 2023

The initial loop over the entire input is at least as complex as Base64 decoding

... Really? Is that because of the set lookup? I assumed the set lookup was negligible overhead. I've replaced it with a comparison.

If the "extracts one character from the input string" part is similarly expensive, that can be replaced with a .charCodeAt(), of course. I don't know what the relative overhead of those things is for you.

I've also updated it to skip the loop when strict: true is passed. Since most inputs aren't going to have whitespace (especially for constrained environments, where the overhead of additional whitespace bytes might be relevant), I expect most uses can take advantage of that.

Additionally, the peak memory use of the operation is increased by two-thirds and the load on the garbage collector is increased.

If that's a relevant concern, peak memory cost can be bounded (down to the size of the short-lived objects) by splitting the input into fixed-sized chunks, at the cost of slightly more GC load. The user can trivially do that themselves if peak memory usage becomes an issue.

Still, especially for larger inputs -- which is when streaming is needed -- this doesn't obviously feel "reasonably efficient."

Hm. Well, I acknowledge that it is not as efficient as a native implementation would be, of course. But I expect relatively few inputs would fall into the range where the overhead of this operation is the limiting factor on whether it is usable. Most will be either small enough that the overhead is not binding or large enough that a native implementation wouldn't be usable either.


Stepping back a little - we've spent well over a year exploring this space. I'm willing to keep iterating with you on this for a little while longer, but not indefinitely.

But. The single-shot API is straightforward, self-contained, and well-motivated. It's pretty clear at this point that the committee does not have appetite for a streaming implementation or a significantly more complex API, and I don't think the overhead of my userland streaming implementation is going to change their minds on that.

So at this point, our choices appear to be the single-shot API or no API. I recognize that the efficiency of a streaming implementation is important for your platform. But I do not regard "I would like to have a native streaming implementation" as a valid reason to block this proposal as it currently stands, and I would consider it a serious failure of our process if the proposal dies for that reason.

@bakkot
Copy link
Collaborator Author

bakkot commented Nov 16, 2023

@phoddie Does my last tweak address your concerns about the efficiency of the userland implementation? I've put this proposal on the agenda going for Stage 3 at this meeting in case it does, to give other delegates time to review, but if not I'd like to continue this discussion. We can use the plenary time slot for that, or if you'd prefer a call before the meeting, or to continue async in this thread, those works too.

If your concerns aren't addressed, I'd really like to try to address or at least understand your reasons for blocking before plenary - I understand why you would want a streaming API in addition, but not why you consider the lack of a streaming API to be a sufficient reason to block this proposal as it stands.

@phoddie
Copy link

phoddie commented Nov 16, 2023

The initial loop over the entire input is at least as complex as Base64 decoding

... Really? Is that because of the set lookup? I assumed the set lookup was negligible overhead. I've replaced it with a comparison.

It is the simplicity of Base64 decoding, not the complexity of a specific operation in JavaScript. Making a full pass over the input in JavaScript is more expensive than Base64 decoding the same input in C. And because embedded engines are less optimized, the relative weight is higher.

I still need to review the revised spec text and cross reference that against the standards that use Base64 to understand the forward compatibility implications. I'll do that.


We can use the plenary time slot for that, or if you'd prefer a call before the meeting, or to continue async in this thread, those works too.

I do not think we are at a point where bringing this back to plenary will be helpful, particularly in light of last time which was more contentious than necessary or appropriate.

I suggested a conversation/call during the last plenary, and remain open to that.

I understand why you would want a streaming API in addition...

Thank you for acknowledging that.

... but not why you consider the lack of a streaming API to be a sufficient reason to block this proposal as it stands

Authors of engines request changes to proposals for reasons related to their priorities – performance and security are common. Our priority is operating in resource-constrained environments. That is well-known. This proposal does not yet appear to address those needs. Moddable supported this going to Stage 2 because it included streaming support. We would not have otherwise.

@bakkot
Copy link
Collaborator Author

bakkot commented Nov 16, 2023

I do not think we are at a point where bringing this back to plenary will be helpful, particularly in light of last time which was more contentious than necessary or appropriate.

I'll take it off the agenda if we don't make any progress before then (or just make it a 5-minute status update); I've only put it on optimistically so it could be there before the deadline for delegates to review.

Is there a time before the 23rd you'd be able to do a call? I might be able to do one later than that, but before would be more convenient. You can email [my handle]@gmail to coordinate.


This proposal does not yet appear to address those needs.

I recognize that. But it does address other needs. It is always the case that any given proposal addresses only some needs - no individual proposal is ever going to bring the language to a finished state. I had previously assumed that a well-motivated proposal could advance on its own right, even if some delegate also wanted another thing; delegates who wanted additional things would champion their desiderata themselves, rather than blocking a proposal until it met their additional needs.

To put it another way: right now there's no streaming base64 decoding in the language. This proposal doesn't affect that either way. So how does blocking the proposal advance your needs?

@phoddie
Copy link

phoddie commented Nov 17, 2023

@bakkot – I will follow-up by email to schedule a time. It should not be a problem to talk before the 23rd.

It is always the case that...

I don't believe this framing reflects the reality of TC39 or this proposal. For example, there is no request for "additional things" but simply what was agreed when advancing to Stage 2. I will keep my focus on the technical topics, and not wade into this further at this time.

@ByteEater-pl
Copy link

This is because native WHATWG streams have significantly higher potential for internal optimization by implementers like Deno, when either source and sink are WHATWG streams. This optimization potential can not be unlocked by any API we standardize at TC39.

Why, @lucacasonato? Can't ECMA-262 e.g. define an optional hook which if an implementation provides it can be used for offloading some work to some more efficient mechanism?

@bakkot bakkot closed this as completed in #33 Jan 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants