New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define forgiving-base64 #145
Conversation
For use by window.btoa()/window.atob() and data: URLs. Much of this text originated in the HTML Standard.
@ayg this might be of interest since you researched the algorithm for |
By the way, I'd like to delay landing this until the tests are ready and the corresponding HTML PR is approved. |
See whatwg/infra#145 for the change to the Infra Standard. Closes #2912.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with moving the position variable step.
infra.bs
Outdated
|
||
<li><p>Let <var>buffer</var> be an empty buffer that can have bits appended to it. | ||
|
||
<li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the "Let position be..." step to right before here, as otherwise it's a bit confusing what happens to it while the string mutates.
infra.bs
Outdated
font-size: 0.6em; | ||
column-width: 6em; | ||
column-count: 5; | ||
column-gap: 1em; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sad that this doesn't work in Firefox, ugh. I tried Googling and messing with stuff but no luck.
infra.bs
Outdated
<p>Find the character pointed to by <var>position</var> in the first column of the following | ||
table. Let <var>n</var> be the number given in the second cell of the same row.</p> | ||
|
||
<div id="base64-table"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could alternately say second column ... first cell of the same row in the RFC 4648 table 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that might be better as long as we don't inline the encode algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found some more things
infra.bs
Outdated
<li><p>Remove all <a>ASCII whitespace</a> from <var>data</var>. | ||
|
||
<li> | ||
<p>If <var>data</var> contains a code point that is not one of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link code point
infra.bs
Outdated
<p>If <var>data</var>'s <a for=string>length</a> divides by 4 leaving no remainder, then: | ||
|
||
<ol> | ||
<li><p>If <var>data</var> ends with one or two U+003D (=) characters, then remove them from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/characters/code points, with link
infra.bs
Outdated
|
||
<ol> | ||
<li> | ||
<p>Find the character pointed to by <var>position</var> in the first column of the following |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/character/code point, with link
infra.bs
Outdated
<table> | ||
<thead> | ||
<tr> | ||
<th>Character |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Character/Code point, but no link in the header I think
<a>forgiving-base64 decode</a>, which is different from the RFC as it defines error handling for | ||
certain inputs. | ||
|
||
<p>To <dfn export>forgiving-base64 decode</dfn> given a string <var>data</var>, run these steps:</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a note explaining that RFC 4648 actually doesn't contain a decode algorithm would be useful. (Certainly not one with error handling.) Cf. HTML's
<!-- Note: this function is defined explicitly here because RFC4648 does not specify how to handle
erroneous input, and no preexisting browser implementation simply throws an exception on all
erroneous input. -->
but I think having it as an actual note would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what the note directly above it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so, although by my reading of the RFC, it doesn't actually define any decode algorithm at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It defines an Encoding scheme and some rules around it. The idea is that you infer the decode and encode algorithms from that. Similar to using ABNF and expecting you have a parser that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so. I'm OK with it as-is but I think it'd be nicer for our readers if you frame this as providing the missing decode algorithm. (There is a fairly explicit encode algorithm, in contrast.)
infra.bs
Outdated
<ul class="brief"> | ||
<li>U+002B (+) | ||
<li>U+002F (/) | ||
<li><span>ASCII alphanumeric</span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
span should be a
|
||
<li><p>Let <var>output</var> be an empty <a>byte sequence</a>. | ||
|
||
<li><p>Let <var>buffer</var> be an empty buffer that can have bits appended to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just make buffer a list? Then we can use append, and size (instead of "has accumulated"), and empty, and is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then the interpretation business gets a lot trickier. I'd rather leave this alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, interpreting 24 bits as three 8-bit big-endian numbers seems to work fine whether those bits are in a buffer or in a list...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM although I think list would be a bit nicer than buffer still.
See whatwg/infra#145 for the change to the Infra Standard. Closes #2912.
See whatwg/infra#145 for the change to the Infra Standard. Closes whatwg#2912.
For use by window.btoa()/window.atob() and data: URLs. Much of this text originated in the HTML Standard.
Preview | Diff