-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redact ancestorOrigins using "the document's referrer" #2480
base: main
Are you sure you want to change the base?
Conversation
I meant to write: "to avoid labels". My bad. This can be fixed when landing, when we'll also need to add a link to web-platform-tests. |
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
@annevk Is there a reason you went with checking the referrer policy instead of the proposal in w3c/webappsec-referrer-policy#77 (comment) or something along those lines? The reason I ask is that using the document's referrer policy fails because we can have per element referrer policies. See discussion in w3c/webappsec-referrer-policy#77. I know it's long, but you really should read through it to figure out the various constraints here. :( |
Note that I think we could still have the "stop the first time there is a mismatch" behavior with that proposal too, as long as we are very careful to handle the sandboxed iframe case to NOT be treated as a mismatch. |
What other than taking the iframe element's referrer policy into account is listed there? Tried skimming through it several times, but not really coming up with something. (Accounting for iframes seems trivial, but wasn't mentioned in #1918 so I didn't think about it.) |
Here's a scenario not accounted for even if you consider the iframe element's referrer policy. Say page A at origin 1 loads a same-origin page B in a subframe. A and the relevant iframe element have no referrer policy settings, but B has a referrer policy to not send referrers. The user clicks a link in B to go to a page C, which is at at different origin. C looks at ancestorOrigins. In this situation, I claim that A (and B) have a reasonable expectation of not leaking referrer or ancestorOrigins information to C. My proposal handles this case. Examining only referrer policies of anything in A does not. Note, by the way, that #1918 (comment) explicitly said the Blink folks like the idea of tying this to referrer, not referrer policy. |
Hmm okay, so we'd tie it to https://html.spec.whatwg.org/#the-document's-referrer instead. That should be equally simple. |
(That also means that my testcase where I dynamically change the referrer policy needs to change.) |
@bzbarsky time for another look? If this is acceptable than I'll poke at the tests again. |
source
Outdated
context</span>.</p></li> | ||
<ol> | ||
<li><p>If <var>current</var>'s <span>active document</span>'s <span data-x="the document's | ||
referrer">referrer</span> is the empty string, then <span data-x="list append">append</span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this doesn't work right.
Say I have a parent A, which has a no-referrer referrer policy. It loads an untrusted thing B. B then proceeds to itself load C, and C queries ancestorOrigins. Since the load of C came from B, and B doesn't have a no-referrer thing going on, C has a nonempty referrer. So it gets access to the information that it's loaded inside A.
source
Outdated
<li><p>Let <var>current</var> be <var>current</var>'s <span>parent browsing | ||
context</span>.</p></li> | ||
<ol> | ||
<li><p>If <var>current</var>'s <span>active document</span>'s <span data-x="the document's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is preexisting, but I expect that use of "active document" here is wrong, in the "security bug, cross-site information leak" sense. Just like most use of "active document" for pretty much everything is wrong in that "information leak" sense.
Consider page A that loads untrusted page B in a subframe. B creates a same-origin (with B) subframe C. C opens a new window and hands that window its Location object.
Now that window polls ancestorOrigins on the object it was handed. If A navigates its subframe to some new site D, then C's browsing context (which may still be around if things were salvageable) still has B-and-D's browsing context as parent, but its active document is D. So the polling window, which is same-origin with B and C but not A or D, would get to find out that A loaded D, which is a cross-site information leak.
We really need to be walking the creator document chain, not the browser context chain, as should pretty much everyone else who walks up to parents. And to the extent that this is more complicated than just walking up the browser context chain, we should fix that. For example, maybe we should have a concept of "parent document" for documents that are loaded in nested browsing contexts or something, which is the creator document of the document's browsing context.
I tried to reword your algorithm in something that's closer to standardese:
I'll look into the "parent document" abstraction now. |
On IRC we also discussed about returning "null" for the first failure: bz:
I asked:
bz:
|
Okay, as for creator/parent document, we used to have something like that, but reduced it to various properties to avoid leaking too much. See #792 which was inspired by a comment you made in w3c/webappsec-secure-contexts#18 (comment). That still seems like a good idea so maybe we should instead grab those properties from the browsing context rather than from the active document. |
This is a much stronger condition on "going back to appending things other than null" than in my proposal. I expect this condition will pretty much never be met. Is this intentional? Good catch on no longer having creator documents. Snapshotting things on the browsing context will be a bit complicated in this case, I think. Maybe it would be simpler to just have this API throw, or return some sort of nonsense if called on a document that is not "fully active" or something? I'm not entirely sure how "fully active" is defined (or whether it is) if we no longer have a concept of parent documents... |
You wrote:
That basically means it has to be an opaque origin, right? How could it be "null" otherwise? I don't understand why this case is more complicated with respect to snapshotting on the browsing context. How is this different from secure contexts or some such? (As far as I can tell we already snapshot the appropriate bits.) Fully active is defined in terms of the document being the active document and parent browsing context's their documents also being fully active. |
Ah, I see. I was unclear. My proposal had:
but it's missing a comma. It should be:
That is, in my proposal when you enter the appendNull state, you remember the current "parent's origin" and you stay in the appendNull state until you reach a "parent's origin" that is not same-origin with the one you remembered. Conceptually, sanitize out all the parts of the parent chain that belong to the site that did not want to be leaked. A bunch of this should probably go in informative text in the spec or something, because reconstructing all this reasoning from the algorithm is hard. :(
OK. How about you write down what you think should be snapshotted, and we'll see whether I was wrong. I might just be mis-modeling this in my head.
Secure contexts require snapshotting one single bit of information. This thing here would require snapshotting a list of stuff, right?
OK, so we still have a concept of a browsing context having a parent document, right? That's the one I was suggesting we use for this... (Obviously we have to handle cases of missing browsing contexts and whatnot.) |
No, we have a browsing context having a parent browsing context (which has "snapshotted" creator info). We don't have a parent document. |
They I don't understand how you define "fully active". |
The definition is at: https://html.spec.whatwg.org/multipage/browsers.html#fully-active. |
OK, so we have this concept of "the document through which it is nested". That's the thing that clearly needed to exist to make "fully active" definable, sure. But the point is, we have such a concept. |
Sigh. So we put a bit of effort into putting "creator" slots on browsing contexts, but it's rather pointless as the entire "creator" is leaked anyway through "browsing context container" and "nested through". (And of course, how any of this relates to the lifecycle of all these objects nobody really knows.) |
So one difference is that "nested through" only gives you parents, not openers, so it's different in that sense. I filed #2508 to see if we can restructure that better. |
So looking at this again, I'm actually not sure the active document stuff is a problem, since we instantiate this list (and cache it "forever") when a Location object is created. Is there a scenario under which a Location object can be created without a browsing context and an associated Document that is fully active? And to be clear, the case we are worried about is documents that are earlier or later than current in a browsing context's session history (or documents like that in the chain going up)? |
Question: if origin A embeds B and then B embeds A end then that A embeds C without referrer, should we leak the first A to C? I believe your algorithm says yes (and then one I just committed), due to B, but I can also see how it would be reasonable to make it say no (we'd have to replace lastRedactedOrigin with redactedOrigins and do the check for each origin in that set I think). |
Per spec? I'm not sure. What happens if you insert an In UAs, I think this can't happen. Again, not 100% sure. :(
I think doing so is fine, because B could have done that anyway. For example, C could do (Things are more complicated if the same entity controls both A and B, e.g. both subdomains of the same domain. The security model of the web slightly falls down here.) |
In case the Location object creation can happen without a browsing context or without the document being fully active due to prerendering or some such, I'm wondering how many other assumptions such a feature violates in the standard today. I'm not sure anything is actively designed with such a feature in mind. |
@bzbarsky you can have a look at the generated output at https://html5.org/temp/ancestororigins.html#concept-location-ancestor-origins-list (I only uploaded the page of multipage that changed, there's broken links all over there). |
/cc @mikewest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % nit
source
Outdated
context</span>.</p></li> | ||
<ol> | ||
<li><p>Append <var>current</var>'s <span>parent browsing context</span>'s <span>active | ||
document</span>'s <span>origin</span>/<var>current</var>'s <span>active document</span>'s <span |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the slash notation here is unclear. It would be clearer if both items are variables, like "Append origin/referrer to parentOrigins".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now @bzbarsky commented about this earlier also. I wouldn't mind dropping the slash notation altogether and just use (a, b) for pairs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh. I thought we already settled this upstream. If we don't want slashes we don't need pairs either. We could just call them all tuples, but we decided against that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems pretty weird to me to have the cognitive overhead of pairs separate from tuples with their own distinct syntax. Pairs are just not that special, imo, and two-element tuples are not exactly hard to reason about... why do we need a separate concept at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main rationale has been that @mikewest and I have used them all over. But I can try to get them removed all over too. It's just work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source
Outdated
<li><p>If <var>lastRedactedOrigin</var> is not null and <var>pair</var>'s <span data-x="pair | ||
parent's origin">parent's origin</span> is <span>same origin</span> with | ||
<var>lastRedactedOrigin</var>, then <span data-x="list append">append</span> | ||
"<code data-x="">null</code>" to <var>output</var> and continue.</p></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to https://infra.spec.whatwg.org/#iteration-continue? Elsewhere we have both "continue running these steps in parallel", "Continue to the next step" and "continue with these steps" so I'm not confident everyone would read this as intended. Linking "for each", "continue" and "break" everywhere is going to be tedious, of course...
So all we're lacking is review of the tests. |
Thanks, that helps a lot. I think the Note that explains why use of "active document" is not a security hole here should point out that the active document is in fact the document that things are nested through at the point when this algorithm runs. The spec says to append "the Unicode serialization of pair's parent's origin". I just tested on http://кто.рф/ , and Chrome and Safari both seem to append punycode. Is that expected? This matters in terms of what strings people are comparing this stuff to. I expect in practice if anyone is using this stuff they already depend on the Chrome/Safari behavior so we probably need to append the ASCII serialization, not the Unicode one.... That would be consistent with how the spec treats location.href too, I believe, even though it all really sucks for non-English speakers. :( |
It would be consistent with |
Because they hate IDN, more or less. Fwiw, location.origin on http://кто.рф/ returns "http://xn--j1ail.xn--p1ai" in both Chrome and Safari. So does self.origin in Chrome. I guess we should get some bugs filed on them... |
Let's not ascribe motive. It's unnecessary and not obviously true either, as to end users Chrome and Safari display this domain just fine. Exposing the interchange format to software seems perfectly reasonable. That's also how we deal with UTF-8 percent-encoded paths of URLs across all browsers. I filed #2568 to sort this out. We can block on that here or deal with it in parallel. |
More seriously, Chrome's SecurityOrigin class only has one way to get a string out of it, and that one way returns punycode. See SecurityOrigin::BuildRawString.
Unless you make the mistake of copying from your URL bar. Then you get punycode... |
But yes, the "hate IDN" was mostly tongue-in-cheek... though they have in fact pushed back every single time on exposing anything but punycode anywhere on the web. |
Does that mean anything significant? The host parser in the URL Standard always returns ASCII too, but you can then apply a separate algorithm on that to get Unicode. Also, you first need to do ToASCII before you can do ToUnicode too to avoid some set of mismatches. |
Which "that"? |
|
Well, it means that any time someone needs a string for an origin they use the one hammer they have to hand, which is its ToString method. There's nothing else they can do with it. Yes, there could be another method around which then does the ToUnicode. It's not being called any of the places where you might think it would be called. And as a matter of API design, for something like a spec operation on an object you would probably want to actually have an API for doing it, if you planned to ever do it. |
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
1407609
to
ee7a6fb
Compare
Also rewrite the algorithm to avoid loops and use variables correctly. Tests: web-platform-tests/wpt#5402. Fixes #1918.
ee7a6fb
to
a08939f
Compare
See whatwg/html#1918 for the HTML Standard discussion and whatwg/html#2480 for the HTML Standard change.
Also rewrite the algorithm to avoid loops and use variables correctly.
Fixes #1918.
💥 Error: Wattsi server error 💥
PR Preview failed to build. (Last tried on Jan 15, 2021, 7:57 AM UTC).
More
PR Preview relies on a number of web services to run. There seems to be an issue with the following one:
🚨 Wattsi Server - Wattsi Server is the web service used to build the WHATWG HTML spec.
🔗 Related URL
If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.