Define forgiving-base64 #145

annevk · 2017-08-15T08:36:41Z

For use by window.btoa()/window.atob() and data: URLs. Much of this text originated in the HTML Standard.

Preview | Diff

For use by window.btoa()/window.atob() and data: URLs. Much of this text originated in the HTML Standard.

annevk · 2017-08-15T08:37:26Z

@ayg this might be of interest since you researched the algorithm for atob() back in the day.

annevk · 2017-08-15T08:38:32Z

By the way, I'd like to delay landing this until the tests are ready and the corresponding HTML PR is approved.

See whatwg/infra#145 for the change to the Infra Standard. Closes #2912.

domenic

LGTM with moving the position variable step.

domenic · 2017-08-15T13:19:50Z

infra.bs

+
+ <li><p>Let <var>buffer</var> be an empty buffer that can have bits appended to it.
+
+ <li>


Move the "Let position be..." step to right before here, as otherwise it's a bit confusing what happens to it while the string mutates.

domenic · 2017-08-15T13:27:09Z

infra.bs

+  font-size: 0.6em;
+  column-width: 6em;
+  column-count: 5;
+  column-gap: 1em;


Sad that this doesn't work in Firefox, ugh. I tried Googling and messing with stuff but no luck.

domenic · 2017-08-15T13:28:12Z

infra.bs

+    <p>Find the character pointed to by <var>position</var> in the first column of the following
+    table. Let <var>n</var> be the number given in the second cell of the same row.</p>
+
+    <div id="base64-table">


We could alternately say second column ... first cell of the same row in the RFC 4648 table 1.

Yeah, that might be better as long as we don't inline the encode algorithm.

domenic

Found some more things

domenic · 2017-08-15T13:30:20Z

infra.bs

+ <li><p>Remove all <a>ASCII whitespace</a> from <var>data</var>.
+
+ <li>
+  <p>If <var>data</var> contains a code point that is not one of


Link code point

domenic · 2017-08-15T13:30:30Z

infra.bs

+  <p>If <var>data</var>'s <a for=string>length</a> divides by 4 leaving no remainder, then:
+
+  <ol>
+   <li><p>If <var>data</var> ends with one or two U+003D (=) characters, then remove them from


s/characters/code points, with link

domenic · 2017-08-15T13:30:39Z

infra.bs

+
+  <ol>
+   <li>
+    <p>Find the character pointed to by <var>position</var> in the first column of the following


s/character/code point, with link

domenic · 2017-08-15T13:30:54Z

infra.bs

+     <table>
+      <thead>
+       <tr>
+        <th>Character


s/Character/Code point, but no link in the header I think

domenic · 2017-08-15T13:32:23Z

infra.bs

+<a>forgiving-base64 decode</a>, which is different from the RFC as it defines error handling for
+certain inputs.
+
+<p>To <dfn export>forgiving-base64 decode</dfn> given a string <var>data</var>, run these steps:</p>


I think a note explaining that RFC 4648 actually doesn't contain a decode algorithm would be useful. (Certainly not one with error handling.) Cf. HTML's



but I think having it as an actual note would be nice.

Isn't that what the note directly above it does?

I guess so, although by my reading of the RFC, it doesn't actually define any decode algorithm at all.

It defines an Encoding scheme and some rules around it. The idea is that you infer the decode and encode algorithms from that. Similar to using ABNF and expecting you have a parser that works.

I guess so. I'm OK with it as-is but I think it'd be nicer for our readers if you frame this as providing the missing decode algorithm. (There is a fairly explicit encode algorithm, in contrast.)

domenic · 2017-08-15T14:28:26Z

infra.bs

+  <ul class="brief">
+   <li>U+002B (+)
+   <li>U+002F (/)
+   <li><span>ASCII alphanumeric</span>


span should be a

domenic · 2017-08-15T14:30:03Z

infra.bs

+
+ <li><p>Let <var>output</var> be an empty <a>byte sequence</a>.
+
+ <li><p>Let <var>buffer</var> be an empty buffer that can have bits appended to it.


Should we just make buffer a list? Then we can use append, and size (instead of "has accumulated"), and empty, and is empty.

But then the interpretation business gets a lot trickier. I'd rather leave this alone.

Eh, interpreting 24 bits as three 8-bit big-endian numbers seems to work fine whether those bits are in a buffer or in a list...

domenic

LGTM although I think list would be a bit nicer than buffer still.

See whatwg/infra#145 for the change to the Infra Standard. Closes #2912.

See whatwg/infra#145 for the change to the Infra Standard. Closes whatwg#2912.

Define forgiving-base64

2c691c0

For use by window.btoa()/window.atob() and data: URLs. Much of this text originated in the HTML Standard.

annevk added a commit to whatwg/html that referenced this pull request Aug 15, 2017

Move base64 algorithms to Infra

38cb7aa

See whatwg/infra#145 for the change to the Infra Standard. Closes #2912.

annevk mentioned this pull request Aug 15, 2017

Move base64 algorithms to Infra whatwg/html#2920

Merged

nits

e331f99

domenic approved these changes Aug 15, 2017

View reviewed changes

domenic requested changes Aug 15, 2017

View reviewed changes

domenic reviewed Aug 15, 2017

View reviewed changes

reuse table from the RFC, nits

11826b3

domenic reviewed Aug 15, 2017

View reviewed changes

infra.bs Outdated

<ul class="brief">

<li>U+002B (+)

<li>U+002F (/)

<li><span>ASCII alphanumeric</span>

Copy link

Member

domenic Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span should be a

span -> a

8a6d030

domenic reviewed Aug 15, 2017

View reviewed changes

domenic approved these changes Aug 15, 2017

View reviewed changes

annevk merged commit 6c69d45 into master Aug 15, 2017

annevk deleted the annevk/base64 branch August 15, 2017 16:12

annevk added a commit to whatwg/html that referenced this pull request Aug 15, 2017

Move base64 algorithms to Infra

9008ac9

See whatwg/infra#145 for the change to the Infra Standard. Closes #2912.

alice pushed a commit to alice/html that referenced this pull request Jan 8, 2019

Move base64 algorithms to Infra

a4c289f

See whatwg/infra#145 for the change to the Infra Standard. Closes whatwg#2912.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define forgiving-base64 #145

Define forgiving-base64 #145

annevk commented Aug 15, 2017 •

edited by pr-preview bot

annevk commented Aug 15, 2017

annevk commented Aug 15, 2017

domenic left a comment

domenic Aug 15, 2017

domenic Aug 15, 2017

domenic Aug 15, 2017

annevk Aug 15, 2017

domenic left a comment

domenic Aug 15, 2017

domenic Aug 15, 2017

domenic Aug 15, 2017

domenic Aug 15, 2017

domenic Aug 15, 2017

annevk Aug 15, 2017

domenic Aug 15, 2017

annevk Aug 15, 2017

domenic Aug 15, 2017

domenic Aug 15, 2017

domenic Aug 15, 2017

annevk Aug 15, 2017

domenic Aug 15, 2017

domenic left a comment


		<li><p>Let <var>buffer</var> be an empty buffer that can have bits appended to it.

		<li>


		<li><p>Let <var>output</var> be an empty <a>byte sequence</a>.

		<li><p>Let <var>buffer</var> be an empty buffer that can have bits appended to it.

Define forgiving-base64 #145

Define forgiving-base64 #145

Conversation

annevk commented Aug 15, 2017 • edited by pr-preview bot

annevk commented Aug 15, 2017

annevk commented Aug 15, 2017

domenic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domenic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domenic left a comment

Choose a reason for hiding this comment

annevk commented Aug 15, 2017 •

edited by pr-preview bot