Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the correct use of {stream: true} option for Textdecoder.decode #184

Closed
lin7sh opened this issue Sep 4, 2019 · 2 comments
Closed

Comments

@lin7sh
Copy link

lin7sh commented Sep 4, 2019

Just came cross there is a second option can be passed to decoder, My first thought about it is a buffer, so I can do

 decoder.decode( new Uint8ArrayBuffer([97, 98, 99]), { stream: true })

and append another buffer

 decoder.decode( new Uint8ArrayBuffer([100,  101 , 102]), { stream: true })

and finally get me a string back which include the 2 parts

 const finalString = decoder.decode( new Uint8ArrayBuffer([]) , { stream: false })

but it isn't the case
the first two expression return "abc" and "def" the third give me empty, just like them without the stream option. and I've tried TextDecoder in Nodejs, it have the same behaviour

Can anyone can tell me how to use it correctly?

@ricea
Copy link
Collaborator

ricea commented Sep 4, 2019

The stream option changes the handling of the end of the input to allow it to be in the middle of a character. Compare:

decoder.decode(new Uint8Array([226, 153]), { stream: true });
// ""
decoder.decode(new Uint8Array([165]), { stream: true });
// "♥"

to

decoder.decode(new Uint8Array([226, 153]));
// "��"
decoder.decode(new Uint8Array([165]));
// "�"

Even with {stream: true} the TextDecoder emits all complete characters as soon as possible.

You may find TextDecoderStream a more intuitive way to do the same thing.

@lin7sh
Copy link
Author

lin7sh commented Sep 4, 2019

@ricea So it's for multi-byte decoding, that make sense and TextDecoderStream is what I'm looking for, hopefully it'll be available soon

@lin7sh lin7sh closed this as completed Sep 4, 2019
blrchen added a commit to scalaone/azure-openai-proxy that referenced this issue Jan 20, 2024
…le of non-English character (#28)

Ensure that the decoder.decode function can properly handle instances
where the input stream ends mid-way through a multi-byte, non-English
character. For more details, refer to the discussion at [Issue #184 on
the encoding GitHub
page](whatwg/encoding#184 (comment)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants