Skip to content

Commit

Permalink
Define encodeInto() API
Browse files Browse the repository at this point in the history
This enables converting strings into UTF-8 byte sequences reusing a pre-allocated output buffer.

Also cleans up TextEncoder a bit.

Tests: web-platform-tests/wpt#14505.

Fixes #69.
  • Loading branch information
annevk committed Jan 9, 2019
1 parent c2879dd commit 9d75583
Showing 1 changed file with 112 additions and 19 deletions.
131 changes: 112 additions & 19 deletions encoding.bs
Expand Up @@ -1294,16 +1294,20 @@ attribute's getter, when invoked, must return "<code>utf-8</code>".
<h3 id=interface-textencoder>Interface {{TextEncoder}}</h3>

<pre class=idl>
dictionary TextEncoderEncodeIntoResult {
unsigned long long read;
unsigned long long written;
};

[Constructor,
Exposed=(Window,Worker)]
interface TextEncoder {
[NewObject] Uint8Array encode(optional USVString input = "");
TextEncoderEncodeIntoResult encodeInto(USVString source, Uint8Array destination);
};
TextEncoder includes TextEncoderCommon;
</pre>

<p>A {{TextEncoder}} object has an associated <dfn for=TextEncoder>encoder</dfn>.

<p class="note no-backref">A {{TextEncoder}} object offers no <var>label</var> argument as it only
supports <a>UTF-8</a>. It also offers no <code>stream</code> option as no <a for=/>encoder</a>
requires buffering of scalar values.
Expand All @@ -1319,18 +1323,17 @@ requires buffering of scalar values.

<dt><code><var>encoder</var> . <a method for=TextEncoder lt=encode()>encode([<var>input</var> = ""])</a></code>
<dd><p>Returns the result of running <a>UTF-8</a>'s <a for=/>encoder</a>.

<dt><code><var>encoder</var> . <a method=for=TextEncoder lt="encodeInto(source, destination)">encodeInto(<var>source</var>, <var>destination</var>)</a></code>
<dd><p>Runs the <a>UTF-8 encoder</a> on <var>source</var>, stores the result of that operation into
<var>destination</var>, and returns the progress made as a dictionary whereby
{{TextEncoderEncodeIntoResult/read}} is the number of converted <a>code units</a> of
<var>source</var> and {{TextEncoderEncodeIntoResult/written}} is the number of bytes modified in
<var>destination</var>.
</dl>

<p>The <dfn constructor for=TextEncoder id=dom-textencoder><code>TextEncoder()</code></dfn>
constructor, when invoked, must run these steps:

<ol>
<li><p>Let <var>enc</var> be a new {{TextEncoder}} object.

<li><p>Set <var>enc</var>'s <a for=TextEncoder>encoder</a> to <a>UTF-8</a>'s <a for=/>encoder</a>.

<li><p>Return <var>enc</var>.
</ol>
constructor, when invoked, must return a new {{TextEncoder}} object.

<p>The <dfn method for=TextEncoder><code>encode(<var>input</var>)</code></dfn> method, when invoked,
must run these steps:
Expand All @@ -1347,20 +1350,108 @@ must run these steps:
<li><p>Let <var>token</var> be the result of
<a>reading</a> from <var>input</var>.

<li><p>Let <var>result</var> be the result of
<a>processing</a> <var>token</var> for
<a for=TextEncoder>encoder</a>, <var>input</var>, <var>output</var>.
<li><p>Let <var>result</var> be the result of <a>processing</a> <var>token</var> for the
<a>UTF-8 encoder</a>, <var>input</var>, <var>output</var>.

<li>
<p>Assert: <var>result</var> is not <a>error</a>.

<p class=note>The <a>UTF-8 encoder</a> cannot return <a>error</a>.

<li><p>If <var>result</var> is <a>finished</a>, convert <var>output</var> into a byte sequence,
and then return a {{Uint8Array}} object wrapping an {{ArrayBuffer}} containing <var>output</var>.
<!-- XXX https://www.w3.org/Bugs/Public/show_bug.cgi?id=26966 -->
</ol>
</ol>

<p>The
<dfn method for=TextEncoder><code>encodeInto(<var>source</var>, <var>destination</var>)</code></dfn>
method, when invoked, must run these steps:

<ol>
<li><p>Let <var>read</var> be 0.

<li><p>Let <var>written</var> be 0.

<li><p>Let <var>destinationBytes</var> be the result of
<a lt="get a reference to the buffer source">getting a reference to the bytes held by</a>
<var>destination</var>.

<li>
<p>Let <var>unused</var> be a new <a for=/>stream</a>.

<p class=note>The <a>handler</a> algorithm invoked below requires this argument, but it is not
used by the <a>UTF-8 encoder</a>.

<li><p>Convert <var>source</var> to a <a for=/>stream</a>.

<li>
<p>While true:

<ol>
<li><p>Let <var>token</var> be the result of <a>reading</a> from <var>source</var>.

<li><p>Let <var>result</var> be the result of running the <a>UTF-8 encoder</a>'s <a>handler</a>
on <var>unused</var> and <var>token</var>.

<li><p>If <var>result</var> is <a>finished</a>, then <a for=iteration>break</a>.

<li>
<p>If <var>result</var> is <a>finished</a>, convert <var>output</var> into a
byte sequence, and then return a {{Uint8Array}} object wrapping an
{{ArrayBuffer}} containing <var>output</var>.
<!-- XXX https://www.w3.org/Bugs/Public/show_bug.cgi?id=26966 -->
<p>Otherwise:

<p class=note><a>UTF-8</a> cannot return <a>error</a>.
<ol>
<li>
<p>If <var>destinationBytes</var>'s <a for="byte sequence">length</a> &minus;
<var>written</var> is greater than or equal to the number of bytes in <var>result</var>, then:

<ol>
<li><p>If <var>token</var> is greater than U+FFFF, then increment <var>read</var> by 2.

<li><p>Otherwise, increment <var>read</var> by 1.

<li><p>Write the bytes in <var>result</var> into <var>destinationBytes</var>, from byte
offset <var>written</var>.

<li><p>Increment <var>written</var> by the number of bytes in <var>result</var>.
</ol>

<li><p>Otherwise, <a for=iteration>break</a>.
</ol>
</ol>

<li><p>Return a new {{TextEncoderEncodeIntoResult}} dictionary whose
{{TextEncoderEncodeIntoResult/read}} member is <var>read</var> and
{{TextEncoderEncodeIntoResult/written}} member is <var>written</var>.
</ol>

<div class=example id=example-textencoder-encodeinto>
<p>The <a method=for=TextEncoder lt="encodeInto(source, destination)">encodeInto()</a> method can
be used to encode a string into an existing {{ArrayBuffer}} object. Various details below are left
as an exercise for the reader, but this demonstrates an approach one could take to use this method:

<pre><code class=lang-javascript>
function convertString(buffer, input, callback) {
let bufferSize = 256,
bufferStart = malloc(buffer, bufferSize),
writeOffset = 0,
readOffset = 0;
while (true) {
const view = new Uint8Array(buffer, bufferStart + writeOffset, bufferSize - writeOffset),
{read, written} = cachedEncoder.encodeInto(input.substring(readOffset), view);
readOffset += read;
writeOffset += written;
if (readOffset === input.length) {
callback(bufferStart, writeOffset);
free(buffer, bufferStart);
return;
}
bufferSize *= 2;
bufferStart = realloc(buffer, bufferStart, bufferSize);
}
}
</code></pre>
</div>


<h3 id=interface-mixin-generictransformstream>Interface mixin {{GenericTransformStream}}</h3>

Expand Down Expand Up @@ -3205,6 +3296,8 @@ Ken Whistler,
Kenneth Russell,
田村健人 (Kent Tamura),
Leif Halvard Silli,
Luke Wagner,
Maciej Hirsz,
Makoto Kato,
Mark Callow,
Mark Crispin,
Expand Down

0 comments on commit 9d75583

Please sign in to comment.