Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phasing out document.domain #829

Closed
annevk opened this issue Mar 8, 2016 · 34 comments
Closed

Phasing out document.domain #829

annevk opened this issue Mar 8, 2016 · 34 comments
Labels
needs implementer interest Moving the issue forward requires implementers to express interest normative change

Comments

@annevk
Copy link
Member

annevk commented Mar 8, 2016

Based on discussion in w3c/webappsec-secure-contexts#10 with @mikewest, @hillbrad, @bholley, and @jwatt, hereby a concrete proposal for how we could phase out document.domain usage for new features:

FurthestSameOriginBrowsingContext ( document )

  1. Let start be document's browsing context.
  2. Let current be start.
  3. Let furthest be start.
  4. While current has a parent browsing context:
    1. Set current to current's parent browsing context.
    2. If current's Document object's origin is same origin with start's Document object's origin, set furthest to current.
  5. Return furthest.

DocumentDomainDisabler ( document, disable )

  1. Let unexpected be false, if disable is true, and true otherwise.
  2. Let furthest be FurthestSameOriginBrowsingContext(document).
  3. Let state be furthest's Document object's [[DisableDocumentDomain]] internal slot.
  4. If state's value is unexpected, then return false.
  5. Set state's value to disable.
  6. Return true.

Note: [[DisableDocumentDomain]] does not have an initial value.

document.domain will have these steps:

  1. If DocumentDomainDisabler(this Document object, false) is false, then throw a SecurityError exception.

New features will have these steps:

  1. If DocumentDomainDisabler(someDocument, true) is false, then ...

The Storage Standard would be an example of such a new feature. There may well be others. If this looks good I'm happy to work on a PR for HTML to put this infrastructure in place.

@annevk annevk added normative change needs implementer interest Moving the issue forward requires implementers to express interest labels Mar 8, 2016
@foolip
Copy link
Member

foolip commented Mar 8, 2016

@mikewest, in w3c/webappsec-secure-contexts#10 (comment) you wrote about existing features requiring secure contexts. Did you add use counters for that as you hoped? A generic DocumentSetDomainSecureOrigin and DocumentSetDomainInsecureOrigin pair might also be interesting.

With usage around 5% I must say I'm a bit skeptical that breaking document.domain for various new features is going to drive down usage enough to disable it on secure origins in any reasonable time frame. Mike, do you think RAPPOR or some other tool could reveal who the top users are, for some targeted outreach?

@annevk
Copy link
Member Author

annevk commented Mar 8, 2016

@foolip this proposal is meant to be completely orthogonal to secure contexts. Perhaps at some point due to this existing for a while we can consider various optimizations, but for now this would only be about driving usage of document.domain down.

@foolip
Copy link
Member

foolip commented Mar 8, 2016

If document.domain setter usage decreases, what's the first place you hope to disable it? From the top of w3c/webappsec-secure-contexts#10 I see Service Worker-controlled and TLS-delivered documents, is that still the ambition?

@annevk
Copy link
Member Author

annevk commented Mar 8, 2016

@foolip I think the long term goal is still to get rid of it completely. @bholley can probably better speak to the intentions, I'm just helping out.

@domenic
Copy link
Member

domenic commented Mar 8, 2016

@foolip IIRC it's used by Facebook which is probably a large portion of that 5%. Their engineers have previously expressed that it's a low-priority work item for them to get rid of it.

@foolip
Copy link
Member

foolip commented Mar 8, 2016

@domenic, right, so getting an idea of the usage excluding Facebook would be a top priority. Either by asking very nicely for them to not use it, or by counting them in a separate bucket.

@bholley, if there are any partial goals along the way it'd be interesting to think about how to get there. To spell out my concern, I worry about speculative deprecation of bad APIs (like sync XHR) where the path to removal isn't at all clear.

@bholley
Copy link

bholley commented Mar 8, 2016

It's a low priority for Facebook (and others) because there's not an incentive for them to get rid of it. We could probably cajole Facebook, but I'm less optimistic about others (like, say, Ebay). The only reason people are using document.domain is because they find it more convenient than postMessage. The goal here is to add inconvenience to tilt the balance. I don't think there are any useful partial milestones.

I think document.domain is probably more like showModalDialog and less like sync XHR - that is to say, it's more likely to be used system that are maintained (if potentially crusty).

To be clear, document.domain is less of a security problem in Gecko than it is in other engines, and the biggest victim is probably the spec, which needs to bend over backwards to support it. So while I'm enthusiastic about deprecating it, I'm only interested in pursuing it if we have buy-in across the board.

@zcorpan
Copy link
Member

zcorpan commented Mar 8, 2016

SELECT page, url FROM (
  SELECT
    page,
    url,
    JSON_EXTRACT(payload, '$._body') AS hasBody,
    JSON_EXTRACT(payload, '$.response.content.text') AS content
  FROM [httparchive:har.android_feb_1_2016_requests]
)
WHERE hasBody = 'true'
AND REGEXP_MATCH(content, r'\bdocument\.domain\s*=[^=]')

The android dataset is only 4,751 pages. 4,298 resources match. Group by page makes it 2,069 pages that match (so ~43.5% of total pages).

The first page of matches are

Row page url
1 http://www.slickamericans.com/ https://pagead2.googlesyndication.com/pagead/js/r20160126/r20151006/expansion_embed.js
2 http://www.slickamericans.com/ https://pagead2.googlesyndication.com/pagead/osd.js
3 http://www.smosh.com/ http://a.postrelease.com/serve/load.js?async=true
4 http://www.smosh.com/ http://partner.googleadservices.com/gpt/pubads_impl_79.js
5 http://www.smosh.com/ http://platform.twitter.com/widgets.js
6 http://www.smosh.com/ https://apis.google.com/js/plusone.js
7 http://www.smosh.com/ http://cdn.smosh.com/sites/default/files/styles/small_merch_carousel/public/merch/6-vGMHs7gLnSLNouqEU__aKGJUXSi5v23wimaVLO9AM.jpg?itok=J0mmu7fe
8 http://www.solvusoft.com/ http://www.googletagmanager.com/gtm.js?id=GTM-PTV2B8
9 http://www.kodi.tv/ http://www.google-analytics.com/r/collect?v=1&_v=j40&a=1875032504&t=pageview&_s=1&dl=http%3A%2F%2Fkodi.tv%2F&ul=en-us&de=UTF-8&dt=Kodi%20%7C%20Open%20Source%20Home%20Theatre%20Software&sd=24-bit&sr=360x511&vp=980x1391&je=0&_u=AEAAAEABI~&jid=341375554&cid=499845401.1454339081&tid=UA-3066672-1&_r=1&z=146630498
10 http://www.kodi.tv/ https://apis.google.com/_/scs/apps-static/_/js/k=oz.mobile_plusone.en_US.PVZ2Sl6nWVI.O/m=p1b,mp1p/rt=j/d=1/t=zcms/rs=AGLTcCO1EKjpn2xQXOgKHq8hO96_yBI01A
11 http://www.uline.com/ http://www.googletagmanager.com/gtm.js?id=GTM-KLD9QD
12 http://www.114so.cn/ http://static.js.weather.com.cn/gadget/b.js
13 http://www.oceanofgames.com/ http://platform.twitter.com/widgets.js
14 http://www.oceanofgames.com/ http://oceanofgames.com/wp-content/themes/MystiqueR3/images/style-blue/search.png
15 http://www.stltoday.com/ http://partner.googleadservices.com/gpt/pubads_impl_79.js
16 http://www.stltoday.com/ http://www.googletagmanager.com/gtm.js?id=GTM-TDWDC2
17 http://www.sakshi.com/ http://partner.googleadservices.com/gpt/pubads_impl_79.js

I thought this seemed crazy-high, so wanted to also check the much bigger chrome_feb_1_2016_requests dataset (472,009 pages):

158,084 resources match, from 91,319 pages (~19.3% of total pages).

@zcorpan
Copy link
Member

zcorpan commented Mar 8, 2016

I note that a lot of these sites have independent scripts that all set document.domain. I suppose it works out OK if they all set it to the same value, but it seems like a footgun. Maybe browser devtools could log a warning to the console whenever document.domain is set a second time (to a different value), and recommend using postMessage() instead?

@zcorpan
Copy link
Member

zcorpan commented Mar 8, 2016

Given the httparchive data, I think we should not disable new features when document.domain is used. That just means 19% of the Web will not be able to use the new features, and people will have another reason to move away from the Web platform.

@foolip
Copy link
Member

foolip commented Mar 9, 2016

It sure looks like a lot of those hits are third-party scripts, mostly ads and widgets. The embedding page will have no control over these, and if they can't use some new features without dropping those scripts they'd probably not be too impressed by the incentives.

Removing or somehow neutering the document.domain setter definitely seems like a good goal, but how to get there? Interested to hear what @mikewest is scheming for Blink.

@briansmith
Copy link

That just means 19% of the Web will not be able to use the new features, and people will have another reason to move away from the Web platform.

No, it would mean that authors would have to switch from document.domain at the same time they add the new feature, which is actually a quite reasonable request of them.

@foolip
Copy link
Member

foolip commented Mar 9, 2016

I don't think it will seem very reasonable if there isn't a clear connection between the new API and document.domain. And if #829 (comment) is representative, they couldn't switch away from document.domain on their own initiative even if they wanted to.

jwatt added a commit to jwatt/webappsec-secure-contexts that referenced this issue Mar 10, 2016
We'd still like to get rid of document.domain, but rather than
conflating that and secure contexts, that work has been split out
into:

  whatwg/html#829

See also the discussion at:

  w3c#10
@mikewest
Copy link
Member

@foolip: I'm not scheming anything for Blink. Like @bholley, I see this as a really nice thing to do, but I don't have any urgent issue that's driving me towards drastic action. I think @briansmith is right, though; predicating new features on an old feature's deprecation seems like a reasonable thing to consider. If we do, however, perhaps we should be thinking bigger than document.domain. If we introduce a "cool new kids" mode, we could make an arbitrary number of changes, right? Web 3.0, etc.

@annevk
Copy link
Member Author

annevk commented Mar 11, 2016

The mechanism described in OP can easily be generalized to also cover document.write(), document.open(), others? I'm happy to do the work turning this into a proper PR and getting new specifications to adopt it, but I need some implementer commitment to start.

@zcorpan
Copy link
Member

zcorpan commented Mar 11, 2016

Being able to somehow opt-in to disabling bad legacy features seems reasonable, so long as it doesn't prevent using new features in legacy sites that use third party scripts that happen to set document.domain (or use document.write and so on). Somewhat similar to http://wicg.github.io/ContentPerformancePolicy/

@annevk
Copy link
Member Author

annevk commented Mar 11, 2016

@zcorpan that does not really drive down usage. Your position seems to be that sites won't migrate and instead do something that does not involve the web, but I wonder if that's really true. Those sites might get competitors that will migrate and offer better experiences due to these new APIs. They might themselves realize it's worth the cost. Mixed content blocking is also something we enforced because it's the right thing to do for security, but it also made some things harder to develop for. It's not super clear to me this is that much different.

@zcorpan
Copy link
Member

zcorpan commented Mar 11, 2016

It would drive down usage if the opt-in becomes the de facto way of doing things, like today we have <!doctype html>, <meta viewport>, and so on.

But I guess it's a balance act in how aggressive we want to be. If we're too aggressive, people are discouraged from developing for the Web. If we're not aggressive enough, things don't change. I don't have data but I have the impression that there are a number of things that are annoying with the Web platform already that discourages people from developing for it, so I want to be careful with making it worse. It's not black-or-white of course.

@foolip
Copy link
Member

foolip commented Mar 11, 2016

@mikewest, I see, just checking for plans. How much of a problem is document.domain for Blink and the web platform at large? I can't quite tell yet if disabling it for secure origins is a worthwhile partial goal or not, from w3c/webappsec-secure-contexts#10 it seems like it but @bholley isn't quite on board with that in #829 (comment)

@bholley
Copy link

bholley commented Mar 11, 2016

How much of a problem is document.domain for Blink and the web platform at large?

It causes at least one serious observable incompatibility between Gecko and Blink (Gecko revokes, Blink doesn't). Lack of revocation pretty much wrecks the same-origin policy when document.domain is involved, but the Blink folks aren't willing to redesign everything around such a dumb feature, and I don't blame them.

It also required enormous amounts complexity and gymnastics when speccing cross-origin behavior over the last several years. In general, anything security-related on the Web always needs the caveat "but what about the document.domain case?"

I can't quite tell yet if disabling it for secure origins is a worthwhile partial goal or not

You mean secure contexts? The problem with that, as some pointed out, is that usage of document.domain cannot be detected at load-time in the way that secure contexts can. So you have to a "using X disables Y and using Y disables X" sort of thing. So I agree at this point that we shouldn't tangle this up with secure contexts.

@bholley isn't quite on board with that in #829 (comment)

As noted above, I would totally be in favor of disabling it for secure-contexts if we could detect it ahead of time, but we can't. So we probably need a separate mechanism.

On the issue of document.domain being used by third-party etc outside of the page's control: one thing we could do would be to have an opt-in flag or somesuch that allows the page to proactively forbid deprecated features in its global. That would allow pages to opt-out of the guessing game and say "I want new hotness and will give up deprecated stuff".

@foolip
Copy link
Member

foolip commented Mar 11, 2016

You mean secure contexts?

Hmm, yes! That's what appears to be used (in Blink) do allow or deny usage of powerful features like getUserMedia, even though https://www.chromium.org/Home/chromium-security/prefer-secure-origins-for-powerful-new-features and the deprecation messages talk about secure and insecure origins....

This bit I don't understand:

The problem with that, as some pointed out, is that usage of document.domain cannot be detected at load-time in the way that secure contexts can. So you have to a "using X disables Y and using Y disables X" sort of thing. So I agree at this point that we shouldn't tangle this up with secure contexts.

I mean simply that an attempt to set document.domain in a secure context would throw an exception or perhaps silently do nothing, using the same kinds of checks that getUserMedia has. My assumption is that this will be more feasible (but maybe still intractable) and that insecure contexts are lost anyway.

@bholley
Copy link

bholley commented Mar 11, 2016

Secure Contexts themselves aren't a new thing, only the labeling of them as such. There are pages from 15 years ago that would fit the newly-invented definition of Secure Context, and disabling document.domain in those situations would break them.

The only way in which limiting document.domain deprecation to Secure Contexts would make the problem more tractable is that it would somewhat-arbitrarily slice the existing body of web content into a smaller subset. Secure Contexts don't identify pages that are less-likely to be broken by disabling document.domain, nor do we derive any unique security or simplicity benefits (that I can think of) from knowing that Secure Contexts never use document.domain.

@foolip
Copy link
Member

foolip commented Mar 12, 2016

Isn't the problem with document.domain that foo.example.com can pretend to be example.com? It seem to me that for insecure contexts, that's possible even without document.domain, because a network attacker can cause foo.example.com to load example.com and also control everything loaded from example.com.

Of, course, it's possible that 100% of the usage already comes from secure contexts in which case it's no more tractable than before, one would have to measure to find out. But if that would be pointless and the project is to get from 5% usage down to 0.01% or so for removal, I think one has to try to push against existing usage as well. Assuming the half-life of deployments is one year, it'd otherwise take 9 years to get down to 0.01%. (Just making this up, I don't know what the decay rate is.)

@bholley
Copy link

bholley commented Mar 24, 2016

Isn't the problem with document.domain that foo.example.com can pretend to be example.com

No, because it only becomes same-origin with other content that has explicitly set document.domain. Collaboration is opt-in, so we don't have the security problem you're implying.

Rather, the problem with document.domain is two-fold:

  • It means that content can transition from being cross-origin to same-origin (and vice-versa), which complicates the implementation and spec for cross-origin objects immensely, and makes it very hard to avoid making implementation decisions web-observable.
  • It can put non-opting-in content at risk of XSS in engines that do not revoke existing references when document.domain changes (Gecko revokes, Blink does not). This is kind of yucky, though probably not enough of a problem to redesign the world.

@foolip
Copy link
Member

foolip commented Mar 25, 2016

Thanks @bholley, those are indeed entirely different concerns than what I inferred from w3c/webappsec-secure-contexts#10. The implementation complexity could only be shed by getting rid of the document.domain setter entirely, and in light of that "I don't think there are any useful partial milestones" makes sense.

Going back to the original idea of gating new features on the non-usage of the document.domain setter. Let's assume that all new web platform features use this mechanism and that half of the web is rewritten every year to use some new feature, so that usage halves every year. At 4% currently, it'd still take 7 years for usage to drop to ~0.03%, which is the upper range of page-breaking things that have been successfully removed in Blink. This is all guesswork, but I don't think a halving of usage each year is pessimistic, on the contrary.

What I think would be needed to make progress faster is to use data like #829 (comment) to work with the top users of this API.

@annevk
Copy link
Member Author

annevk commented Mar 31, 2016

Note that based on the discussion so far I'm inclined to WONTFIX the issue in whatwg/storage I just linked. I'm personally still in favor of doing this even if it makes adoption of new features harder, since it does increase the security of the web, at least for those that opt to use these new features, but I can't move this mountain alone.

@bholley
Copy link

bholley commented Mar 31, 2016

but I can't move this mountain alone.

Well, I haven't heard anyone say they're not interested in helping out - more just that this problem affects everyone uniformly, and in the really-annoying rather than showstopping kind of way. This means that everyone (at least you, me, and Mike) would love to see it go, but no implementor is volunteering to do something high-cost or high-risk in their engine to push it forward. That said, everyone seems willing to do something straightforward and sane if a plan is proposed.

There have been two proposals so far: "Cool Kids Mode" (i.e. explicit opt-in to new stuff at the expense of old stuff) and "First-To-Use XOR" (using new feature X disables document.domain usage for the subtree, and vice-versa).

I think the latter approach has some pretty serious downsides, namely that it allows changes in third-party scripts to break unrelated (and presumably more-important) functionality in the site.

"Cool Kids Mode" is interesting, but is tantamount to versioning the web. It may be time for something like that though - not in the classic "Version X is no longer supported", but as a crutch to our efforts to phase-out support for certain features. If there are enough things we'd like to drop, a general-purpose mechanism here could be useful.

Here's a strawman of what it could look like:

  • We parse an optional "year" attribute out of the HTML tag, which authors can set to the current year.
  • The first engine to implement a new feature can gate usage of the feature on the current year or greater. If the attribute doesn't exist or is too low, a helpful console message appears suggesting that the author add/bump the year on the tag.
  • Browsers shipped in year X can drop support for features in year X+1 onward.

The nice thing about this mechanism is that it scales easily to handle any and all breaking changes that vendors want to make, without requiring authors to update existing sites. It's effectively just an additional carrot to aid the existing process of dropping old features when usage reaches a certain threshold.

Again though, it probably only makes sense if we'd like to do more of this stuff. That may make sense if we expect the web platform to live a Very Long Time.

@zcorpan
Copy link
Member

zcorpan commented Mar 31, 2016

I would expect people would auto-generate the year so it always shows the current year, which seems like it would defeat the purpose.

@bholley
Copy link

bholley commented Mar 31, 2016

I would expect people would auto-generate the year so it always shows the current year,
which seems like it would defeat the purpose.

Why would they do that? It doesn't seem to be in anybody's interest. Presumably if somebody is going to go to the trouble of generating this attribute, they'd have some concept of what it's for.

@zcorpan
Copy link
Member

zcorpan commented Mar 31, 2016

This is the Web, people do stupid things without understanding what things are for all the time. Things we add to the Web should be robust against misuse, intentional or accidental...

@bholley
Copy link

bholley commented Mar 31, 2016

The system doesn't have to be perfect - it's all about moving the needle in the aggregate. Certainly open to counter-proposals though.

@jesperkristensen
Copy link

I don't know nearly as much about this topic as you, so sorry if this makes no sense.

As far as I understand the problem is not that a document can specify a different origin to use, but that it can change it in-place. What if you add a mechanism for specifying the domain declaratively before anything is loaded in the document? It could be processed similar to how < meta charset > is processed. That would not have the same problem? If such a new mechanism is added, maybe it could help drive down usage of the document.domain setter much faster, since it would be a trivial change for existing code using document.domain. If a document uses this new mechanism, it should make the document.domain setter a no-op, to support sites that use both mechanisms for backwards compatibility.

Maybe it would be web compatible to add other restrictions such as only the first call to the setter would have an effect, and only before the load event. If so, it might be possible to change the behavior of the setter to for example reload the page with the new origin instead of changing it in-place.

@foolip
Copy link
Member

foolip commented Apr 6, 2016

Per #829 (comment) it looks like much of the usage is in third party scripts, so they couldn't use a declarative mechanism.

@annevk
Copy link
Member Author

annevk commented Jul 6, 2016

I'm going to close this since the proposal in OP went nowhere and later proposals seemed increasingly more unlikely to be successful. Fresh ideas welcome in new issues.

@annevk annevk closed this as completed Jul 6, 2016
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs implementer interest Moving the issue forward requires implementers to express interest normative change
Development

No branches or pull requests

8 participants