Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redact location.ancestorOrigins according to Referrer Policy #1918

Open
hillbrad opened this issue Oct 17, 2016 · 43 comments

Comments

@hillbrad
Copy link

commented Oct 17, 2016

@bzbarsky @dakami and I had a hallway discussion at the end of TPAC about the possibility of adding location.ancestorOrigins to Firefox. bz has had longstanding concerns about the information this leaks to child frames. We arrived at a local consensus that any leakage is roughly equivalent to what happens already with referrer, so it would make sense to redact ancestorOrigins according to referrer policy. (and this could resolve that objection to a Mozilla implementation of ancestorOrigins)

/cc @smaug---- @annevk

@domenic

This comment has been minimized.

Copy link
Member

commented Oct 17, 2016

One big question, which I asked in the PR, is what does "redact" mean. Since it's an origin instead of a URL, several of the referrer policies don't really apply (e.g. maybe they're no-ops). If it gets censored completely (e.g. if the referrer policy is "no-referrer"), then does the resulting array contain null? The empty string? Or is that entry just missing, so that the number of entries in the array is less than the number of ancestor browsing contexts? We'll need a comprehensive spec for (origin, referrer policy) -> censored origin.

Otherwise, I think we'd need to get a sense of what other user agents besides Firefox would be interested in this spec change. I guess only Chrome implements both referrer policy and ancestorOrigins, so... @mikewest, perhaps?

As for WebKit and Edge, which don't implement referrer policy but do implement ancestorOrigins: does this sound reasonable to you, as something you would do if/when you eventually implemented referrer policy? Leaving aside any commitments to implementing referrer policy. Tagging the usual suspects... @cdumez @travisleithead. Please route to more appropriate people as necessary.

@bzbarsky

This comment has been minimized.

Copy link
Collaborator

commented Oct 17, 2016

The idea is that if the referrer policy allows the origin to leak out via the referrer (which I believe all policies except "no-referrer" do) then we should just go ahead and return the origin in ancestorOrigins. So this is really about the "no-referrer" case, plus any browser configuration that has equivalent effects.

As for what value should be used in the "no-referrer" case, I don't have a strong opinion. Obvious options are "", null, "null" (this last as if the actual origin were a unique origin). Using "null" feels somewhat nice to me in that it's a situation that could arise even without the referrer policy business, so pages should be ready for it anyway. Using null would worry me in terms of pages getting exceptions when trying to string-manipulate the array entries.

@hillbrad

This comment has been minimized.

Copy link
Author

commented Oct 17, 2016

I should write some test cases, but isn't the null case already possible today with GUID URL schemes? (data:, file:, etc.) And implicitly handled, as with CORS, by serializing to the string literal "null" according to RFC6454?

@domenic

This comment has been minimized.

Copy link
Member

commented Oct 17, 2016

"null" sounds pretty good. (And it's according to the Unicode serialization of an origin, not some RFC ;).) But yeah, the PR as written just asks for the origin of the URL no-referrer, so we gotta straighten that out.

@hillbrad

This comment has been minimized.

Copy link
Author

commented Oct 17, 2016

Well, this could be defined as basically a switch on the referrer policy states (which might be the most logical internal implementation choice), but I thought that calling out to the algorithm to produce a referrer and then extracting the origin via URL parsing would be more future compatible with new policy states that might be defined. I can revisit if that seems preferable.

@domenic

This comment has been minimized.

Copy link
Member

commented Oct 17, 2016

IMO a switch makes the most sense, but adding it to the Referrer Policy spec would be best, since that ensures that whenever they add new policies they'll see that they need to update that algorithm as well.

@bzbarsky

This comment has been minimized.

Copy link
Collaborator

commented Oct 17, 2016

The referrer may or may not be related to the origin in general (e.g. for a sandboxed iframe the referrer is based on its URL but the origin a unique origin). So going via some sort of "extract the referrer" algorithm to get a value to use in ancestorOrigins as is done in this PR isn't right.

@hillbrad

This comment has been minimized.

Copy link
Author

commented Oct 18, 2016

@bzbarsky

This comment has been minimized.

Copy link
Collaborator

commented Oct 19, 2016

One thing that I'd like to check on, actually. What should happen if a page at origin A loads a subframe from origin A which then loads a page from origin B, if the original page is sending full referrers but the subframe is using the no-referrer policy?

@hillbrad

This comment has been minimized.

Copy link
Author

commented Oct 19, 2016

I haven't spec'd it as a barrier or ratchet, but an individual query from a
Location, to each ancestor, independent of any intermediate contexts and
their policy states.

On Tue, Oct 18, 2016 at 6:00 PM Boris Zbarsky notifications@github.com
wrote:

One thing that I'd like to check on, actually. What should happen if a
page at origin A loads a subframe from origin A which then loads a page
from origin B, if the original page is sending full referrers but the
subframe is using the no-referrer policy?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1918 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACFbcC4lLs5fwF7n6Tv4A1Aes0OY-y36ks5q1WvHgaJpZM4KZEkb
.

@bzbarsky

This comment has been minimized.

Copy link
Collaborator

commented Oct 19, 2016

OK, but that will leak the origin of the topmost page in this case, when it should be able to have a reasonable expectation of no such leakage occurring, right?

@hillbrad

This comment has been minimized.

Copy link
Author

commented Oct 19, 2016

Is that a reasonable expectation? Or should it set its own policy if it is
concerned? Specifying a ratchet is much more difficult, btw, as the
referrer policy options don't have a strict ordering.

@bzbarsky

This comment has been minimized.

Copy link
Collaborator

commented Oct 19, 2016

Is that a reasonable expectation?

As long as it's only loading things it controls, I think it is, yes. This way the decision as to whether to allow the origin to escape only has to be made in the page that actually loads cross-site things.

Specifying a ratchet is much more difficult, btw

I'm not sure what you mean by "ratchet" here, but two simple things to specify would be that once you hit no-referrer you either insert a single "null" and terminate or insert "null" for everything else up the frame chain. This isn't as nice as doing more complicated checks about same-originness, I agree.

@bzbarsky

This comment has been minimized.

Copy link
Collaborator

commented Jan 10, 2017

Note the more clearly articulated proposal I made for this in w3c/webappsec-referrer-policy#77 (comment). I thought @hillbrad was going to convert that to an HTML spec issue, but that didn't seem to happen...

Anyway, I would love feedback from Blink and WebKit on whether the change I propose is something they would implement, and feedback from Edge on whether they're interested in implementing this at all, and if so under what conditions.

@annevk

This comment has been minimized.

Copy link
Member

commented Feb 20, 2017

Copying @RByers, @cdumez, @travisleithead to get input from Blink, WebKit, and Edge. Would be nice to make some progress here.

@foolip

This comment has been minimized.

Copy link
Member

commented Mar 7, 2017

For Blink, perhaps @dominiccooney or @mikewest could comment?

@mikewest

This comment has been minimized.

Copy link
Member

commented Mar 7, 2017

@jeisinger and @estark37 are Blink's referrer policy folks, and will likely have opinions.

@jeisinger

This comment has been minimized.

Copy link
Member

commented Mar 7, 2017

What I like about @bzbarsky's proposal is that it only indirectly uses referrer policy - referrer policy ideally should only affect the referrer. Of course using the referrer afterwards for whatever is fine.

I think we'd implement this if that means that Firefox will ship ancestorOrigins, and the API is still good enough to achieve the kind of protection @hillbrad et al need

annevk added a commit that referenced this issue Mar 29, 2017

domenic added a commit that referenced this issue Mar 29, 2017

annevk added a commit to web-platform-tests/wpt that referenced this issue Mar 30, 2017

Basic ancestorOrigins test and Location IDL update
See whatwg/html#1918 for the HTML Standard
discussion and whatwg/html#2480 for the HTML
Standard change.

jgraham added a commit to w3c/testharness.js that referenced this issue Mar 31, 2017

Remove usage of ancestorOrigins
WebKit throws the same exception as other browsers these days and
ancestorOrigins might become less reliable due to
whatwg/html#1918.
@annevk

This comment has been minimized.

Copy link
Member

commented Apr 3, 2017

Update: the model I went with was wrong. I've now adjusted it (only in a comment on the PR thus far) to what @bzbarsky proposed. I was wondering if anyone had any opinions on whether we want to reveal all ancestors or not. I guess you can already tell how many parents you have anyway through parent and top, so we probably shouldn't worry about that at all.

@johnwilander

This comment has been minimized.

Copy link

commented Apr 4, 2017

Sorry for the delay. WebKit will obviously have to implement the Referrer Policy to support this opt-out but I think it takes us in the right direction.

Did we ever consider an off by default model instead? Are we saying too much relies on ancestorOrigins today? If Mozilla hasn't implemented yet the web can't be completely solidified on existing behavior. If we went off by default we wouldn't have to add a side effect to the existing no-referrer policy. We could for instance add an "; ancestorOrigins" attribute to referrer policies.

@annevk

This comment has been minimized.

Copy link
Member

commented Apr 4, 2017

I don't think that's been considered, but by basing it on the referrer actually being transmitted, we only leak as much as the network does, although for more contrived scenarios it might leak a little more by default I suppose.

@bzbarsky

This comment has been minimized.

Copy link
Collaborator

commented Apr 4, 2017

Did we ever consider an off by default model instead?

Yes. But given Blink and WebKit's refusal to even discuss this topic for years, we had to, without their input, come up with something that we felt they would be most likely to implement, hence a minimal change from what they are doing right now.

I'm happy to consider an opt-in if it still solves the use cases this property is trying to solve. If I understood correctly, doing this as an opt-in would require changes to pretty much every site that embeds Google and Facebook ads to opt in or something.

annevk added a commit to web-platform-tests/wpt that referenced this issue Apr 6, 2017

Basic ancestorOrigins test and Location IDL update
See whatwg/html#1918 for the HTML Standard
discussion and whatwg/html#2480 for the HTML
Standard change.

annevk added a commit that referenced this issue Apr 19, 2017

Redact ancestorOrigins using document's referrer policy
Also rewrite the algorithm to avoid loops and use variables correctly.

Tests: web-platform-tests/wpt#5402.

Fixes #1918.

annevk added a commit to web-platform-tests/wpt that referenced this issue Apr 21, 2017

Basic ancestorOrigins test and Location IDL update
See whatwg/html#1918 for the HTML Standard
discussion and whatwg/html#2480 for the HTML
Standard change.

annevk added a commit to web-platform-tests/wpt that referenced this issue Apr 21, 2017

Basic ancestorOrigins test and Location IDL update
See whatwg/html#1918 for the HTML Standard
discussion and whatwg/html#2480 for the HTML
Standard change.

annevk added a commit to web-platform-tests/wpt that referenced this issue May 16, 2017

ancestorOrigins: add tests and update Location IDL
See whatwg/html#1918 for the HTML Standard
discussion and whatwg/html#2480 for the HTML
Standard change.

annevk added a commit that referenced this issue Feb 4, 2018

Redact ancestorOrigins using document's referrer
Also rewrite the algorithm to avoid loops and use variables correctly.

Tests: web-platform-tests/wpt#5402.

Fixes #1918.

annevk added a commit that referenced this issue Feb 5, 2018

Redact ancestorOrigins using document's referrer
Also rewrite the algorithm to avoid loops and use variables correctly.

Tests: web-platform-tests/wpt#5402.

Fixes #1918.

annevk added a commit to web-platform-tests/wpt that referenced this issue Feb 5, 2018

ancestorOrigins: add tests and update Location IDL
See whatwg/html#1918 for the HTML Standard
discussion and whatwg/html#2480 for the HTML
Standard change.
@jeffreytgilbert

This comment has been minimized.

Copy link

commented Aug 29, 2018

For what it's worth, the idea to respect a referrer policy set by the domains in the ancestry chain is great, but neither ancestorOrigins nor the requested change go far enough in either direction. A full URL should be available in ancestorOrigins because domain on its own is no more or less secure because information about a person can be groked by domain + some number of other data points, so truncating it doesn't make much sense for user privacy concerns if we're being strict here. Conversely, a domain (cnn.com) may be considered ok, but a page on that domain (cnn.com/vegas-shooting-kills-dozens-etc) may be considered not ok given a specific context.

On the other hand, the user also has not and cannot indicate via referrer policy set by the middle men that it doesn't want to leak information about the ancestor chain, and that begs the question, should there be user level controls for turning this information flow on or off.

In my opinion, this requires a multi-part solution where the user has the ability to turn off a behavior, as do sites(content providers) who manage relationships between one another, but the location.href chain should be opened up fully where no restrictions are explicitly called for. The primary case FOR doing this from a supply chain perspective is being assured the message and markup you're delivering is not being framed in an inappropriate context. Advertisers, for instance, may have strict policies against placing their brand next to content related to pornography or extreme violence for instance. This information, when locked away through cross origin chains of iframes, becomes unknowable.

On the other hand, if a user jumps into "in private" mode and disables this information from leaking to chains of iframes, a disabled chain of unknowable origins should be enough information for an advertiser to use as an indicator that maybe the risk isn't worth the buy opportunity, and the end users experience and privacy is preserved.

alice added a commit to alice/html that referenced this issue Jan 8, 2019

@dliebner

This comment has been minimized.

Copy link

commented May 14, 2019

The current webkit implementation is helpful to ad tech as it helps determine the validity of the embed. It's possible for an advertisement to be chained from the original site through multiple intermediary iframes before finally rendering the bottom level ad content - this is normal, if an ad request is going through multiple ad networks before finally arriving on a served ad. What ad tech wants to detect is when an ad is being served on an unwanted domain, or if something else is generally amiss in the chain of ancestors. Failure to make this information available makes it easier for bad actors to commit ad fraud.

@bzbarsky

This comment has been minimized.

Copy link
Collaborator

commented May 14, 2019

Sure, and ad tech could just treat "no available ancestorOrigins" as "bad actor" for its purposes. Then sites can decide whether they want to leak their origin to their subframes (and allow ad tech in there) or not, right?

@dliebner

This comment has been minimized.

Copy link

commented May 14, 2019

I'm a little confused by the attitude that a parent frame should remain anonymous to its subframes. If a site is being embedded by another site, don't they deserve to know by who? In what legitimate scenario does a site embed an iframe (or a chain of iframes) and need to be anonymous?

@annevk

This comment has been minimized.

Copy link
Member

commented May 14, 2019

As a reminder, there's a HTML PR for this at #2480 and a WPT PR at web-platform-tests/wpt#5402.

@othermaciej @johnwilander I suspect Safari picking this up would make it more likely for Firefox to ship this too (it currently does not expose this attribute at all).

@bzbarsky

This comment has been minimized.

Copy link
Collaborator

commented May 14, 2019

If a site is being embedded by another site, don't they deserve to know by who?

Imo, no. If it doesn't want to be framed, it has ways to avoid being framed, yes?

My usual go-to example here is that imo a site should be able to embed a video from a video hosting site without exposing information about itself to a video hosting site. Under the assumption that the video hosting site allows such framing, of course.

@opyh

This comment has been minimized.

Copy link

commented May 21, 2019

What ad tech wants to detect is when an ad is being served on an unwanted domain, or if something else is generally amiss in the chain of ancestors. Failure to make this information available makes it easier for bad actors to commit ad fraud.

A person who visits a political or health blog doesn't want these URLs to be shared with giphy, facebook, and every adtech company on the planet.

While it's understandable that adtech companies want to know my political views and if I have cancer or not (and as a side effect, can prevent ad fraud more easily), as a user I'd like to have a choice if my browser sends this very personal information. Embed providers are not entitled to it. They should be able to choose who can embed them (possible with frame-ancestors), and users should be able to choose whom they want to share information with.

@dliebner

This comment has been minimized.

Copy link

commented May 21, 2019

My counter point is that blocking-by-default will effectively block the majority of ancestor data to ad tech because you can't expect developers to go out of their way to add/enable allow-policies. From the ad tech point of view, if you can't reliably see the ancestors, you can't reliably detect fraud.

With regard to your privacy concerns, 1) Not all ad tech companies are interested in invading your privacy (although sure probably most are) and 2) If that's something you're worried about, ad block is fairly effective and 3) If the sites you're visiting are of a sensitive nature and are embedding advertisements and you're concerned about your privacy, perhaps you should be evaluating those sites and their choice of ad partners.

I am someone who is building an ad tech company who is not interested in tracking individual users, and I need tools to detect, prevent and deter ad fraud.

@opyh

This comment has been minimized.

Copy link

commented May 22, 2019

I have worked in adtech myself, on several sides of the ecosystem – adtech developers are used to much more painful things than adding allow policies to websites ;) So you can expect developers to do this.

You can’t demand from a normal person using a browser to know what's going on behind the scenes. If I, as a software developer, have no means to see which health site tracks me and which doesn’t, how is a non-IT person supposed to understand this?

It's the standard’s job to help creating browsers that protect me from bad actors. No matter if I have an ad blocker or not.

If ad fraud can't be detected without complete surveillance, so be it? The ad industry is free to adapt business models that don’t simplify privacy fraud. If a user explicitly wants to be tracked in exchange for freebies, they'd still be free to configure their browser accordingly.

Thanks for your counter arguments – I'm out of this discussion, and I hope that this issue can be solved in a way that doesn't hand my browser history over to random companies as a default.

@michael-oneill

This comment has been minimized.

Copy link

commented May 22, 2019

Browsers can determine if the user is a bot or not, as least as well as any external service. If this is communicated in a privacy preserving way then fraud could be detected more effectively without having to rely on surveillance.
https://github.com/w3c/web-advertising/blob/master/admetrics.md

@dliebner

This comment has been minimized.

Copy link

commented May 22, 2019

Browsers can determine if the user is a bot or not, as least as well as any external service. If this is communicated in a privacy preserving way then fraud could be detected more effectively without having to rely on surveillance.
https://github.com/w3c/web-advertising/blob/master/admetrics.md

That is useful, but the issue I'm talking about is running ads that are supposed to only be served on one site and running them on another site. The people seeing the ads will be legitimate users, but how will the ad tech know if the ads are being served on the intended site without the ancestor list?

@michael-oneill

This comment has been minimized.

Copy link

commented May 22, 2019

In this proposal the browser will determine if they are being shown on the intended site, the ad tech only gets metrics from the Metrics Server e.g. Neilson or similar. Anything invalid gets ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.