Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define forgiving-base64 #145

Merged
merged 4 commits into from Aug 15, 2017
Merged

Define forgiving-base64 #145

merged 4 commits into from Aug 15, 2017

Conversation

annevk
Copy link
Member

@annevk annevk commented Aug 15, 2017

For use by window.btoa()/window.atob() and data: URLs. Much of this text originated in the HTML Standard.


Preview | Diff

For use by window.btoa()/window.atob() and data: URLs. Much of this text originated in the HTML Standard.
@annevk
Copy link
Member Author

annevk commented Aug 15, 2017

@ayg this might be of interest since you researched the algorithm for atob() back in the day.

@annevk
Copy link
Member Author

annevk commented Aug 15, 2017

By the way, I'd like to delay landing this until the tests are ready and the corresponding HTML PR is approved.

annevk added a commit to whatwg/html that referenced this pull request Aug 15, 2017
See whatwg/infra#145 for the change to the Infra Standard.

Closes #2912.
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with moving the position variable step.

infra.bs Outdated

<li><p>Let <var>buffer</var> be an empty buffer that can have bits appended to it.

<li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the "Let position be..." step to right before here, as otherwise it's a bit confusing what happens to it while the string mutates.

infra.bs Outdated
font-size: 0.6em;
column-width: 6em;
column-count: 5;
column-gap: 1em;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sad that this doesn't work in Firefox, ugh. I tried Googling and messing with stuff but no luck.

infra.bs Outdated
<p>Find the character pointed to by <var>position</var> in the first column of the following
table. Let <var>n</var> be the number given in the second cell of the same row.</p>

<div id="base64-table">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could alternately say second column ... first cell of the same row in the RFC 4648 table 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that might be better as long as we don't inline the encode algorithm.

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some more things

infra.bs Outdated
<li><p>Remove all <a>ASCII whitespace</a> from <var>data</var>.

<li>
<p>If <var>data</var> contains a code point that is not one of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link code point

infra.bs Outdated
<p>If <var>data</var>'s <a for=string>length</a> divides by 4 leaving no remainder, then:

<ol>
<li><p>If <var>data</var> ends with one or two U+003D (=) characters, then remove them from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/characters/code points, with link

infra.bs Outdated

<ol>
<li>
<p>Find the character pointed to by <var>position</var> in the first column of the following
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/character/code point, with link

infra.bs Outdated
<table>
<thead>
<tr>
<th>Character
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Character/Code point, but no link in the header I think

<a>forgiving-base64 decode</a>, which is different from the RFC as it defines error handling for
certain inputs.

<p>To <dfn export>forgiving-base64 decode</dfn> given a string <var>data</var>, run these steps:</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a note explaining that RFC 4648 actually doesn't contain a decode algorithm would be useful. (Certainly not one with error handling.) Cf. HTML's

  <!-- Note: this function is defined explicitly here because RFC4648 does not specify how to handle
  erroneous input, and no preexisting browser implementation simply throws an exception on all
  erroneous input. -->

but I think having it as an actual note would be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what the note directly above it does?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so, although by my reading of the RFC, it doesn't actually define any decode algorithm at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It defines an Encoding scheme and some rules around it. The idea is that you infer the decode and encode algorithms from that. Similar to using ABNF and expecting you have a parser that works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so. I'm OK with it as-is but I think it'd be nicer for our readers if you frame this as providing the missing decode algorithm. (There is a fairly explicit encode algorithm, in contrast.)

infra.bs Outdated
<ul class="brief">
<li>U+002B (+)
<li>U+002F (/)
<li><span>ASCII alphanumeric</span>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span should be a


<li><p>Let <var>output</var> be an empty <a>byte sequence</a>.

<li><p>Let <var>buffer</var> be an empty buffer that can have bits appended to it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just make buffer a list? Then we can use append, and size (instead of "has accumulated"), and empty, and is empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then the interpretation business gets a lot trickier. I'd rather leave this alone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, interpreting 24 bits as three 8-bit big-endian numbers seems to work fine whether those bits are in a buffer or in a list...

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although I think list would be a bit nicer than buffer still.

@annevk annevk merged commit 6c69d45 into master Aug 15, 2017
@annevk annevk deleted the annevk/base64 branch August 15, 2017 16:12
annevk added a commit to whatwg/html that referenced this pull request Aug 15, 2017
See whatwg/infra#145 for the change to the Infra Standard.

Closes #2912.
alice pushed a commit to alice/html that referenced this pull request Jan 8, 2019
See whatwg/infra#145 for the change to the Infra Standard.

Closes whatwg#2912.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants