Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define forgiving-base64 #145

Merged
merged 4 commits into from Aug 15, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
77 changes: 77 additions & 0 deletions infra.bs
Expand Up @@ -872,6 +872,83 @@ For <a>pairs</a>, a slightly shorter literal syntax can be used, separating the
as 200/`<code>OK</code>`.


<h2 id=forgiving-base64>Forgiving base64</h2>

<p>To <dfn export>forgiving-base64 encode</dfn> given a <a>byte sequence</a> <var>data</var>, apply
the base64 algorithm defined in section 4 of RFC 4648 to <var>data</var> and return the result.
[[!RFC4648]]

<p class="note no-backref">This is named <a>forgiving-base64 encode</a> for symmetry with
<a>forgiving-base64 decode</a>, which is different from the RFC as it defines error handling for
certain inputs.

<p>To <dfn export>forgiving-base64 decode</dfn> given a string <var>data</var>, run these steps:</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a note explaining that RFC 4648 actually doesn't contain a decode algorithm would be useful. (Certainly not one with error handling.) Cf. HTML's

  <!-- Note: this function is defined explicitly here because RFC4648 does not specify how to handle
  erroneous input, and no preexisting browser implementation simply throws an exception on all
  erroneous input. -->

but I think having it as an actual note would be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what the note directly above it does?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so, although by my reading of the RFC, it doesn't actually define any decode algorithm at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It defines an Encoding scheme and some rules around it. The idea is that you infer the decode and encode algorithms from that. Similar to using ABNF and expecting you have a parser that works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so. I'm OK with it as-is but I think it'd be nicer for our readers if you frame this as providing the missing decode algorithm. (There is a fairly explicit encode algorithm, in contrast.)


<ol>
<li><p>Remove all <a>ASCII whitespace</a> from <var>data</var>.
<!-- https://lists.w3.org/Archives/Public/public-whatwg-archive/2011May/0207.html -->

<li>
<p>If <var>data</var> contains a <a>code point</a> that is not one of

<ul class="brief">
<li>U+002B (+)
<li>U+002F (/)
<li><a>ASCII alphanumeric</a>
</ul>

<p>then return failure.

<li>
<p>If <var>data</var>'s <a for=string>length</a> divides by 4 leaving no remainder, then:

<ol>
<li><p>If <var>data</var> ends with one or two U+003D (=) <a>code points</a>, then remove them
from <var>data</var>.
</ol>

<li><p>If <var>data</var>'s <a for=string>length</a> divides by 4 leaving a remainder of 1, then
return failure.

<li><p>Let <var>output</var> be an empty <a>byte sequence</a>.

<li><p>Let <var>buffer</var> be an empty buffer that can have bits appended to it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just make buffer a list? Then we can use append, and size (instead of "has accumulated"), and empty, and is empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then the interpretation business gets a lot trickier. I'd rather leave this alone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, interpreting 24 bits as three 8-bit big-endian numbers seems to work fine whether those bits are in a buffer or in a list...


<li><p>Let <var>position</var> be a <a>position variable</a> for <var>data</var>, initially
pointing at the start of <var>data</var>.

<li>
<p>While <var>position</var> does not point past the end of <var>data</var>:

<ol>
<li><p>Find the <a>code point</a> pointed to by <var>position</var> in the second column of
Table 1: The Base 64 Alphabet of RFC 4648. Let <var>n</var> be the number given in the first cell
of the same row. [[!RFC4648]]

<li><p>Append the six bits corresponding to <var>number</var>, most significant bit first, to
<var>buffer</var>.

<li><p>If <var>buffer</var> has accumulated 24 bits, interpret them as three 8-bit big-endian
numbers. Append three bytes with values equal to those numbers to <var>output</var>, in the same
order, and then empty <var>buffer</var>.

<li><p>Advance <var>position</var> by 1.
</ol>

<li>
<p>If <var>buffer</var> is not empty, it contains either 12 or 18 bits. If it contains 12 bits,
then discard the last four and interpret the remaining eight as an 8-bit big-endian number. If it
contains 18 bits, then discard the last two and interpret the remaining 16 as two 8-bit big-endian
numbers. Append the one or two bytes with values equal to those one or two numbers to
<var>output</var>, in the same order.</p>

<p class="note">The discarded bits mean that, for instance, "<code>YQ</code>" and
"<code>YR</code>" both return `<code>a</code>`.

<li><p>Return <var>output</var>.
</ol>


<h2 id=namespaces>Namespaces</h2>

<p>The <dfn export>HTML namespace</dfn> is "<code>http://www.w3.org/1999/xhtml</code>".
Expand Down