Redact ancestorOrigins using "the document's referrer" #2480

annevk · 2017-03-30T04:58:27Z

Also rewrite the algorithm to avoid loops and use variables correctly.

Fixes #1918.

💥 Error: Wattsi server error 💥

PR Preview failed to build. (Last tried on Jan 15, 2021, 7:57 AM UTC).

More

PR Preview relies on a number of web services to run. There seems to be an issue with the following one:

🚨 Wattsi Server - Wattsi Server is the web service used to build the WHATWG HTML spec.

🔗 Related URL

Command failed: /home/noderunner/wattsi/bin/wattsi /tmp/upload_0532f1c0a966a64b2a781d0bc15983ce (sha not provided) me5htz4o02 default /tmp/upload_71da5e0923fe04ca40b7ed523c540ca0

If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.

annevk · 2017-03-30T05:03:38Z

I meant to write: "to avoid labels". My bad. This can be fixed when landing, when we'll also need to add a link to web-platform-tests.

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

bzbarsky · 2017-03-30T12:44:02Z

@annevk Is there a reason you went with checking the referrer policy instead of the proposal in w3c/webappsec-referrer-policy#77 (comment) or something along those lines?

The reason I ask is that using the document's referrer policy fails because we can have per element referrer policies. See discussion in w3c/webappsec-referrer-policy#77. I know it's long, but you really should read through it to figure out the various constraints here. :(

bzbarsky · 2017-03-30T12:48:50Z

Note that I think we could still have the "stop the first time there is a mismatch" behavior with that proposal too, as long as we are very careful to handle the sandboxed iframe case to NOT be treated as a mismatch.

annevk · 2017-03-30T12:50:38Z

What other than taking the iframe element's referrer policy into account is listed there? Tried skimming through it several times, but not really coming up with something. (Accounting for iframes seems trivial, but wasn't mentioned in #1918 so I didn't think about it.)

bzbarsky · 2017-03-30T12:55:56Z

Here's a scenario not accounted for even if you consider the iframe element's referrer policy.

Say page A at origin 1 loads a same-origin page B in a subframe. A and the relevant iframe element have no referrer policy settings, but B has a referrer policy to not send referrers. The user clicks a link in B to go to a page C, which is at at different origin. C looks at ancestorOrigins.

In this situation, I claim that A (and B) have a reasonable expectation of not leaking referrer or ancestorOrigins information to C. My proposal handles this case. Examining only referrer policies of anything in A does not.

Note, by the way, that #1918 (comment) explicitly said the Blink folks like the idea of tying this to referrer, not referrer policy.

annevk · 2017-03-30T12:59:56Z

Hmm okay, so we'd tie it to https://html.spec.whatwg.org/#the-document's-referrer instead. That should be equally simple.

annevk · 2017-03-30T13:00:38Z

(That also means that my testcase where I dynamically change the referrer policy needs to change.)

annevk · 2017-03-30T15:40:38Z

@bzbarsky time for another look? If this is acceptable than I'll poke at the tests again.

bzbarsky · 2017-03-31T15:28:03Z

source

-   context</span>.</p></li>
+    <ol>
+     <li><p>If <var>current</var>'s <span>active document</span>'s <span data-x="the document's
+     referrer">referrer</span> is the empty string, then <span data-x="list append">append</span>


No, this doesn't work right.

Say I have a parent A, which has a no-referrer referrer policy. It loads an untrusted thing B. B then proceeds to itself load C, and C queries ancestorOrigins. Since the load of C came from B, and B doesn't have a no-referrer thing going on, C has a nonempty referrer. So it gets access to the information that it's loaded inside A.

bzbarsky · 2017-03-31T15:34:13Z

source

-   <li><p>Let <var>current</var> be <var>current</var>'s <span>parent browsing
-   context</span>.</p></li>
+    <ol>
+     <li><p>If <var>current</var>'s <span>active document</span>'s <span data-x="the document's


This part is preexisting, but I expect that use of "active document" here is wrong, in the "security bug, cross-site information leak" sense. Just like most use of "active document" for pretty much everything is wrong in that "information leak" sense.

Consider page A that loads untrusted page B in a subframe. B creates a same-origin (with B) subframe C. C opens a new window and hands that window its Location object.

Now that window polls ancestorOrigins on the object it was handed. If A navigates its subframe to some new site D, then C's browsing context (which may still be around if things were salvageable) still has B-and-D's browsing context as parent, but its active document is D. So the polling window, which is same-origin with B and C but not A or D, would get to find out that A loaded D, which is a cross-site information leak.

We really need to be walking the creator document chain, not the browser context chain, as should pretty much everyone else who walks up to parents. And to the extent that this is more complicated than just walking up the browser context chain, we should fix that. For example, maybe we should have a concept of "parent document" for documents that are loaded in nested browsing contexts or something, which is the creator document of the document's browsing context.

annevk · 2017-04-03T12:02:36Z

I tried to reword your algorithm in something that's closer to standardese:

Let parentOrigins be a new list of pairs consisting of parent's origin and referrer.
While document has a parent document:
1. Append document's parent document's origin/document's referrer to parentOrigins.
2. Set document to document's parent document.
Let output be a new list.
Let appendNull be false.
For each pair in parentOrigins:
1. If appendNull is true, then:
  1. Append "null" to output.
  2. If pair's parent's origin is an opaque origin, then set appendNull to false.
  3. Continue.
2. If pair's referrer is the empty string, then set appendNull to true.
3. Otherwise:
  1. Let referrerURL be the result of parsing pair's referrer.
  2. Assert: referrerURL is not failure.
  3. If pair's parent's origin is not same origin with referrerURL's origin, then set appendNull to true.
  4. Otherwise, append pair's parent's origin, serialized, to output.
4. If appendNull is true, then:
  1. Append "null" to output.

I'll look into the "parent document" abstraction now.

annevk · 2017-04-03T12:09:20Z

On IRC we also discussed about returning "null" for the first failure:

bz:

I'm not sure
If we don't do that, we have to be very careful with the sandboxed iframe case
which in my proposal gets replaced with "null"
but of course it's already "null"
So that's fine.
But if we stopped at the first "mismatch", then we would break that case.
We could treat "mismatch" as "the first place where we would use null and the actual origin is not null"
And then think a bit about the various cases described in that issue and how they end up behaving.

I asked:

Anne van Kesteren I guess data URLs will also be like that?

bz:

yes

annevk · 2017-04-03T12:20:09Z

Okay, as for creator/parent document, we used to have something like that, but reduced it to various properties to avoid leaking too much. See #792 which was inspired by a comment you made in w3c/webappsec-secure-contexts#18 (comment).

That still seems like a good idea so maybe we should instead grab those properties from the browsing context rather than from the active document.

bzbarsky · 2017-04-04T02:59:48Z

If pair's parent's origin is an opaque origin, then set appendNull to false.

This is a much stronger condition on "going back to appending things other than null" than in my proposal. I expect this condition will pretty much never be met. Is this intentional?

Good catch on no longer having creator documents. Snapshotting things on the browsing context will be a bit complicated in this case, I think.

Maybe it would be simpler to just have this API throw, or return some sort of nonsense if called on a document that is not "fully active" or something? I'm not entirely sure how "fully active" is defined (or whether it is) if we no longer have a concept of parent documents...

annevk · 2017-04-04T05:57:26Z

You wrote:

until a different "origin-of-parent" is found with "null"

That basically means it has to be an opaque origin, right? How could it be "null" otherwise?

I don't understand why this case is more complicated with respect to snapshotting on the browsing context. How is this different from secure contexts or some such? (As far as I can tell we already snapshot the appropriate bits.)

Fully active is defined in terms of the document being the active document and parent browsing context's their documents also being fully active.

bzbarsky · 2017-04-04T13:12:05Z

That basically means it has to be an opaque origin, right? How could it be "null" otherwise?

Ah, I see. I was unclear. My proposal had:

replace the "origin-of-parent" part in that pair, and any pairs going up the tree until a different "origin-of-parent" is found with "null".

but it's missing a comma. It should be:

replace the "origin-of-parent" part in that pair, and any pairs going up the tree until a different "origin-of-parent" is found, with "null".

That is, in my proposal when you enter the appendNull state, you remember the current "parent's origin" and you stay in the appendNull state until you reach a "parent's origin" that is not same-origin with the one you remembered. Conceptually, sanitize out all the parts of the parent chain that belong to the site that did not want to be leaked.

A bunch of this should probably go in informative text in the spec or something, because reconstructing all this reasoning from the algorithm is hard. :(

I don't understand why this case is more complicated with respect to snapshotting on the browsing context.

OK. How about you write down what you think should be snapshotted, and we'll see whether I was wrong. I might just be mis-modeling this in my head.

How is this different from secure contexts or some such?

Secure contexts require snapshotting one single bit of information. This thing here would require snapshotting a list of stuff, right?

and parent browsing context's their documents also being fully active

OK, so we still have a concept of a browsing context having a parent document, right? That's the one I was suggesting we use for this...

(Obviously we have to handle cases of missing browsing contexts and whatnot.)

annevk · 2017-04-04T13:16:45Z

No, we have a browsing context having a parent browsing context (which has "snapshotted" creator info). We don't have a parent document.

bzbarsky · 2017-04-04T13:34:57Z

No, we have a browsing context having a parent browsing context (which has "snapshotted" creator info). We don't have a parent document.

They I don't understand how you define "fully active".

annevk · 2017-04-04T13:37:37Z

The definition is at: https://html.spec.whatwg.org/multipage/browsers.html#fully-active.

bzbarsky · 2017-04-04T14:34:49Z

OK, so we have this concept of "the document through which it is nested". That's the thing that clearly needed to exist to make "fully active" definable, sure. But the point is, we have such a concept.

annevk · 2017-04-05T09:23:58Z

Sigh. So we put a bit of effort into putting "creator" slots on browsing contexts, but it's rather pointless as the entire "creator" is leaked anyway through "browsing context container" and "nested through". (And of course, how any of this relates to the lifecycle of all these objects nobody really knows.)

annevk · 2017-04-05T09:53:38Z

So one difference is that "nested through" only gives you parents, not openers, so it's different in that sense. I filed #2508 to see if we can restructure that better.

annevk · 2017-04-05T10:38:24Z

So looking at this again, I'm actually not sure the active document stuff is a problem, since we instantiate this list (and cache it "forever") when a Location object is created. Is there a scenario under which a Location object can be created without a browsing context and an associated Document that is fully active?

And to be clear, the case we are worried about is documents that are earlier or later than current in a browsing context's session history (or documents like that in the chain going up)?

annevk · 2017-04-05T11:21:13Z

Question: if origin A embeds B and then B embeds A end then that A embeds C without referrer, should we leak the first A to C? I believe your algorithm says yes (and then one I just committed), due to B, but I can also see how it would be reasonable to make it say no (we'd have to replace lastRedactedOrigin with redactedOrigins and do the check for each origin in that set I think).

bzbarsky · 2017-04-05T13:34:01Z

Is there a scenario under which a Location object can be created without a browsing context and
an associated Document that is fully active?

Per spec? I'm not sure. What happens if you insert an <iframe> into a not-current but salvageable document?

In UAs, I think this can't happen. Again, not 100% sure. :(

should we leak the first A to C?

I think doing so is fine, because B could have done that anyway. For example, C could do parent.parent.postMessage to ask B for any information it wants, and B could respond with it... So A doesn't have any expectation that this information won't leak.

(Things are more complicated if the same entity controls both A and B, e.g. both subdomains of the same domain. The security model of the web slightly falls down here.)

annevk · 2017-04-05T13:36:21Z

In case the Location object creation can happen without a browsing context or without the document being fully active due to prerendering or some such, I'm wondering how many other assumptions such a feature violates in the standard today. I'm not sure anything is actively designed with such a feature in mind.

annevk · 2017-04-19T07:26:08Z

@bzbarsky you can have a look at the generated output at https://html5.org/temp/ancestororigins.html#concept-location-ancestor-origins-list (I only uploaded the page of multipage that changed, there's broken links all over there).

jeisinger · 2017-04-19T09:46:56Z

/cc @mikewest

zcorpan

LGTM % nit

zcorpan · 2017-04-19T12:00:30Z

source

-   context</span>.</p></li>
+    <ol>
+     <li><p>Append <var>current</var>'s <span>parent browsing context</span>'s <span>active
+     document</span>'s <span>origin</span>/<var>current</var>'s <span>active document</span>'s <span


I think the slash notation here is unclear. It would be clearer if both items are variables, like "Append origin/referrer to parentOrigins".

I see now @bzbarsky commented about this earlier also. I wouldn't mind dropping the slash notation altogether and just use (a, b) for pairs.

Sigh. I thought we already settled this upstream. If we don't want slashes we don't need pairs either. We could just call them all tuples, but we decided against that...

It seems pretty weird to me to have the cognitive overhead of pairs separate from tuples with their own distinct syntax. Pairs are just not that special, imo, and two-element tuples are not exactly hard to reason about... why do we need a separate concept at all?

I think the main rationale has been that @mikewest and I have used them all over. But I can try to get them removed all over too. It's just work.

foolip

I've done a review to check that I could parse all of the statements, but didn't try to understand what the algorithm is doing, trusting that @zcorpan and @bzbarsky have that covered.

foolip · 2017-04-20T19:07:11Z

source

+     <li><p>If <var>lastRedactedOrigin</var> is not null and <var>pair</var>'s <span data-x="pair
+     parent's origin">parent's origin</span> is <span>same origin</span> with
+     <var>lastRedactedOrigin</var>, then <span data-x="list append">append</span>
+     "<code data-x="">null</code>" to <var>output</var> and continue.</p></li>


Link to https://infra.spec.whatwg.org/#iteration-continue? Elsewhere we have both "continue running these steps in parallel", "Continue to the next step" and "continue with these steps" so I'm not confident everyone would read this as intended. Linking "for each", "continue" and "break" everywhere is going to be tedious, of course...

annevk · 2017-04-20T19:25:31Z

So all we're lacking is review of the tests.

bzbarsky · 2017-04-20T21:48:27Z

you can have a look at the generated output at https://html5.org/temp/ancestororigins.html#concept-location-ancestor-origins-list

Thanks, that helps a lot.

I think the Note that explains why use of "active document" is not a security hole here should point out that the active document is in fact the document that things are nested through at the point when this algorithm runs.

The spec says to append "the Unicode serialization of pair's parent's origin". I just tested on http://кто.рф/ , and Chrome and Safari both seem to append punycode. Is that expected?

This matters in terms of what strings people are comparing this stuff to. I expect in practice if anyone is using this stuff they already depend on the Chrome/Safari behavior so we probably need to append the ASCII serialization, not the Unicode one.... That would be consistent with how the spec treats location.href too, I believe, even though it all really sucks for non-English speakers. :(

annevk · 2017-04-21T05:00:20Z

It would be consistent with location.href, but not with location.origin, self.origin, and anything else that exposes origins so it's very much unclear to me why Chrome and Safari would do that.

bzbarsky · 2017-04-21T05:04:43Z

so it's very much unclear to me why Chrome and Safari would do that

Because they hate IDN, more or less.

Fwiw, location.origin on http://кто.рф/ returns "http://xn--j1ail.xn--p1ai" in both Chrome and Safari. So does self.origin in Chrome. I guess we should get some bugs filed on them...

annevk · 2017-04-21T05:11:57Z

Because they hate IDN, more or less.

Let's not ascribe motive. It's unnecessary and not obviously true either, as to end users Chrome and Safari display this domain just fine. Exposing the interchange format to software seems perfectly reasonable. That's also how we deal with UTF-8 percent-encoded paths of URLs across all browsers.

I filed #2568 to sort this out. We can block on that here or deal with it in parallel.

bzbarsky · 2017-04-21T05:13:39Z

More seriously, Chrome's SecurityOrigin class only has one way to get a string out of it, and that one way returns punycode. See SecurityOrigin::BuildRawString.

as to end users Chrome and Safari display this domain just fine.

Unless you make the mistake of copying from your URL bar. Then you get punycode...

bzbarsky · 2017-04-21T05:18:30Z

But yes, the "hate IDN" was mostly tongue-in-cheek... though they have in fact pushed back every single time on exposing anything but punycode anywhere on the web.

annevk · 2017-04-21T05:18:44Z

Does that mean anything significant? The host parser in the URL Standard always returns ASCII too, but you can then apply a separate algorithm on that to get Unicode. Also, you first need to do ToASCII before you can do ToUnicode too to avoid some set of mismatches.

bzbarsky · 2017-04-21T05:19:42Z

Does that mean anything significant?

Which "that"?

annevk · 2017-04-21T05:20:15Z

Chrome's SecurityOrigin class only has one way to get a string out of it

bzbarsky · 2017-04-21T05:23:17Z

Well, it means that any time someone needs a string for an origin they use the one hammer they have to hand, which is its ToString method. There's nothing else they can do with it.

Yes, there could be another method around which then does the ToUnicode. It's not being called any of the places where you might think it would be called. And as a matter of API design, for something like a spec operation on an object you would probably want to actually have an API for doing it, if you planned to ever do it.

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

Also rewrite the algorithm to avoid loops and use variables correctly. Tests: web-platform-tests/wpt#5402. Fixes #1918.

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

annevk added normative change needs tests Moving the issue forward requires someone to write tests labels Mar 30, 2017

annevk added a commit to web-platform-tests/wpt that referenced this pull request Mar 30, 2017

Basic ancestorOrigins test and Location IDL update

e3bf13d

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

annevk changed the title ~~Redact ancestorOrigins using document's referrer policy~~ Redact ancestorOrigins using "the document's referrer" Mar 30, 2017

bzbarsky suggested changes Mar 31, 2017

View reviewed changes

annevk mentioned this pull request Apr 5, 2017

Creator browsing context state #2508

Open

zcorpan approved these changes Apr 19, 2017

View reviewed changes

annevk mentioned this pull request Apr 19, 2017

Remove pairs whatwg/infra#127

Closed

foolip approved these changes Apr 20, 2017

View reviewed changes

annevk added a commit to web-platform-tests/wpt that referenced this pull request Apr 21, 2017

Basic ancestorOrigins test and Location IDL update

bb37533

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

annevk added a commit to web-platform-tests/wpt that referenced this pull request Apr 21, 2017

Basic ancestorOrigins test and Location IDL update

5653bfc

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

annevk added a commit to web-platform-tests/wpt that referenced this pull request May 16, 2017

ancestorOrigins: add tests and update Location IDL

f0d3e4c

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

annevk force-pushed the annevk/redact-ancestororigins branch from 1407609 to ee7a6fb Compare February 4, 2018 09:30

Redact ancestorOrigins using document's referrer

a08939f

Also rewrite the algorithm to avoid loops and use variables correctly. Tests: web-platform-tests/wpt#5402. Fixes #1918.

annevk force-pushed the annevk/redact-ancestororigins branch from ee7a6fb to a08939f Compare February 5, 2018 15:33

annevk added a commit to web-platform-tests/wpt that referenced this pull request Feb 5, 2018

ancestorOrigins: add tests and update Location IDL

fb94c7f

See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.

annevk mentioned this pull request Apr 3, 2018

Track ancestorOrigins privacy issues w3c/ServiceWorker#1075

Open

annevk mentioned this pull request Apr 23, 2018

Cross-Origin-Resource-Policy (was: From-Origin) whatwg/fetch#687

Closed

annevk mentioned this pull request Aug 13, 2018

Define the ACHL cache and ACHL/ACH processing #3774

Open

annevk mentioned this pull request May 14, 2019

redact location.ancestorOrigins according to Referrer Policy #1918

Open

Base automatically changed from master to main January 15, 2021 07:56

domenic force-pushed the main branch from 0947339 to 1b5099f Compare July 27, 2021 17:07

fmarier mentioned this pull request Nov 7, 2023

[hackerone] Remove onion services from window.location.ancestorOrigins brave/brave-browser#32421

Closed

Redact ancestorOrigins using "the document's referrer" #2480

Are you sure you want to change the base?

Redact ancestorOrigins using "the document's referrer" #2480

Conversation

annevk commented Mar 30, 2017 • edited by pr-preview bot Loading

💥 Error: Wattsi server error 💥

annevk commented Mar 30, 2017

bzbarsky commented Mar 30, 2017

bzbarsky commented Mar 30, 2017

annevk commented Mar 30, 2017

bzbarsky commented Mar 30, 2017 • edited Loading

annevk commented Mar 30, 2017

annevk commented Mar 30, 2017

annevk commented Mar 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annevk commented Apr 3, 2017

annevk commented Apr 3, 2017

annevk commented Apr 3, 2017

bzbarsky commented Apr 4, 2017

annevk commented Apr 4, 2017

bzbarsky commented Apr 4, 2017

annevk commented Apr 4, 2017

bzbarsky commented Apr 4, 2017

annevk commented Apr 4, 2017

bzbarsky commented Apr 4, 2017

annevk commented Apr 5, 2017

annevk commented Apr 5, 2017

annevk commented Apr 5, 2017

annevk commented Apr 5, 2017

bzbarsky commented Apr 5, 2017

annevk commented Apr 5, 2017

annevk commented Apr 19, 2017

jeisinger commented Apr 19, 2017

zcorpan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

foolip left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annevk commented Apr 20, 2017

bzbarsky commented Apr 20, 2017

annevk commented Apr 21, 2017

bzbarsky commented Apr 21, 2017

annevk commented Apr 21, 2017

bzbarsky commented Apr 21, 2017

bzbarsky commented Apr 21, 2017

annevk commented Apr 21, 2017

bzbarsky commented Apr 21, 2017

annevk commented Apr 21, 2017

bzbarsky commented Apr 21, 2017

annevk commented Mar 30, 2017 •

edited by pr-preview bot

Loading

bzbarsky commented Mar 30, 2017 •

edited

Loading