New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define forgiving-base64 #145
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -872,6 +872,83 @@ For <a>pairs</a>, a slightly shorter literal syntax can be used, separating the | |
as 200/`<code>OK</code>`. | ||
|
||
|
||
<h2 id=forgiving-base64>Forgiving base64</h2> | ||
|
||
<p>To <dfn export>forgiving-base64 encode</dfn> given a <a>byte sequence</a> <var>data</var>, apply | ||
the base64 algorithm defined in section 4 of RFC 4648 to <var>data</var> and return the result. | ||
[[!RFC4648]] | ||
|
||
<p class="note no-backref">This is named <a>forgiving-base64 encode</a> for symmetry with | ||
<a>forgiving-base64 decode</a>, which is different from the RFC as it defines error handling for | ||
certain inputs. | ||
|
||
<p>To <dfn export>forgiving-base64 decode</dfn> given a string <var>data</var>, run these steps:</p> | ||
|
||
<ol> | ||
<li><p>Remove all <a>ASCII whitespace</a> from <var>data</var>. | ||
<!-- https://lists.w3.org/Archives/Public/public-whatwg-archive/2011May/0207.html --> | ||
|
||
<li> | ||
<p>If <var>data</var> contains a <a>code point</a> that is not one of | ||
|
||
<ul class="brief"> | ||
<li>U+002B (+) | ||
<li>U+002F (/) | ||
<li><a>ASCII alphanumeric</a> | ||
</ul> | ||
|
||
<p>then return failure. | ||
|
||
<li> | ||
<p>If <var>data</var>'s <a for=string>length</a> divides by 4 leaving no remainder, then: | ||
|
||
<ol> | ||
<li><p>If <var>data</var> ends with one or two U+003D (=) <a>code points</a>, then remove them | ||
from <var>data</var>. | ||
</ol> | ||
|
||
<li><p>If <var>data</var>'s <a for=string>length</a> divides by 4 leaving a remainder of 1, then | ||
return failure. | ||
|
||
<li><p>Let <var>output</var> be an empty <a>byte sequence</a>. | ||
|
||
<li><p>Let <var>buffer</var> be an empty buffer that can have bits appended to it. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we just make buffer a list? Then we can use append, and size (instead of "has accumulated"), and empty, and is empty. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But then the interpretation business gets a lot trickier. I'd rather leave this alone. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Eh, interpreting 24 bits as three 8-bit big-endian numbers seems to work fine whether those bits are in a buffer or in a list... |
||
|
||
<li><p>Let <var>position</var> be a <a>position variable</a> for <var>data</var>, initially | ||
pointing at the start of <var>data</var>. | ||
|
||
<li> | ||
<p>While <var>position</var> does not point past the end of <var>data</var>: | ||
|
||
<ol> | ||
<li><p>Find the <a>code point</a> pointed to by <var>position</var> in the second column of | ||
Table 1: The Base 64 Alphabet of RFC 4648. Let <var>n</var> be the number given in the first cell | ||
of the same row. [[!RFC4648]] | ||
|
||
<li><p>Append the six bits corresponding to <var>number</var>, most significant bit first, to | ||
<var>buffer</var>. | ||
|
||
<li><p>If <var>buffer</var> has accumulated 24 bits, interpret them as three 8-bit big-endian | ||
numbers. Append three bytes with values equal to those numbers to <var>output</var>, in the same | ||
order, and then empty <var>buffer</var>. | ||
|
||
<li><p>Advance <var>position</var> by 1. | ||
</ol> | ||
|
||
<li> | ||
<p>If <var>buffer</var> is not empty, it contains either 12 or 18 bits. If it contains 12 bits, | ||
then discard the last four and interpret the remaining eight as an 8-bit big-endian number. If it | ||
contains 18 bits, then discard the last two and interpret the remaining 16 as two 8-bit big-endian | ||
numbers. Append the one or two bytes with values equal to those one or two numbers to | ||
<var>output</var>, in the same order.</p> | ||
|
||
<p class="note">The discarded bits mean that, for instance, "<code>YQ</code>" and | ||
"<code>YR</code>" both return `<code>a</code>`. | ||
|
||
<li><p>Return <var>output</var>. | ||
</ol> | ||
|
||
|
||
<h2 id=namespaces>Namespaces</h2> | ||
|
||
<p>The <dfn export>HTML namespace</dfn> is "<code>http://www.w3.org/1999/xhtml</code>". | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a note explaining that RFC 4648 actually doesn't contain a decode algorithm would be useful. (Certainly not one with error handling.) Cf. HTML's
but I think having it as an actual note would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what the note directly above it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so, although by my reading of the RFC, it doesn't actually define any decode algorithm at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It defines an Encoding scheme and some rules around it. The idea is that you infer the decode and encode algorithms from that. Similar to using ABNF and expecting you have a parser that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so. I'm OK with it as-is but I think it'd be nicer for our readers if you frame this as providing the missing decode algorithm. (There is a fairly explicit encode algorithm, in contrast.)