Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upConsider shared caching #22
Comments
metromoxie
added this to the v2 milestone
Dec 21, 2015
hillbrad
modified the milestone:
v2
Jan 22, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
btrask
Apr 24, 2016
I hope this is a reasonable place to comment. (If not please tell me where to go.)
I've been working on content addressing systems for several years. I understand that content addresses, which are "locationless," are inherently in conflict with the same-origin policy, which is location-based.
An additional/alternate solution is for a list of acceptable hashes to be published by the server at a well-known location.
For example, the user agent could request https://example.com/.well-known/sri-list, which would return a plain text file with a list of acceptable hashes, one per line. Hashes on this list would be treated as if they were hosted by the server itself, and thus could be fetched from a shared cache while being treated for all intents and purposes like they were fetched from the server in question.
This does add some complexity both for user agents and for site admins. On the other hand, the security implications are well understood, and wouldn't require new permission logic.
Thanks for your work on SRI.
btrask
commented
Apr 24, 2016
|
I hope this is a reasonable place to comment. (If not please tell me where to go.) I've been working on content addressing systems for several years. I understand that content addresses, which are "locationless," are inherently in conflict with the same-origin policy, which is location-based. An additional/alternate solution is for a list of acceptable hashes to be published by the server at a well-known location. For example, the user agent could request This does add some complexity both for user agents and for site admins. On the other hand, the security implications are well understood, and wouldn't require new permission logic. Thanks for your work on SRI. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
metromoxie
Apr 25, 2016
Contributor
An interesting idea (although I know many folks who are vehemently against well-known location solutions, but I won't pretend to fully grasp why). If implemented, though, it would still require a round trip to get .well-known/sri-list, right? Which seems to lose a lot of the benefit of these acting as libraries.
Another suggestion, that I think I heard somewhere, is, if the page includes a CSP, only use an x-origin cache for an integrity attribute resource if the CSP includes the integrity value in the script-hash whitelist. I think this would address @mozfreddyb's concerns listed in Synzvato/decentraleyes#26, but I haven't thought too hard about it. On the other hand, it also starts to look really weird and complicated :-/
Also, these solutions don't address timing attacks with x-origin caches. Although, as a side not, someone recently pointed out to me that history timing attacks in this case are probably not too concerning from a security perspective since it's a "one-shot" timing attack. That is, the resource is definitively loaded after the attack happens, so you can't attempt the timing again, and that makes the timing attack much more difficult to pull off, since timing attacks usually rely on repeated measurement.
|
An interesting idea (although I know many folks who are vehemently against well-known location solutions, but I won't pretend to fully grasp why). If implemented, though, it would still require a round trip to get .well-known/sri-list, right? Which seems to lose a lot of the benefit of these acting as libraries. Another suggestion, that I think I heard somewhere, is, if the page includes a CSP, only use an x-origin cache for an integrity attribute resource if the CSP includes the integrity value in the script-hash whitelist. I think this would address @mozfreddyb's concerns listed in Synzvato/decentraleyes#26, but I haven't thought too hard about it. On the other hand, it also starts to look really weird and complicated :-/ Also, these solutions don't address timing attacks with x-origin caches. Although, as a side not, someone recently pointed out to me that history timing attacks in this case are probably not too concerning from a security perspective since it's a "one-shot" timing attack. That is, the resource is definitively loaded after the attack happens, so you can't attempt the timing again, and that makes the timing attack much more difficult to pull off, since timing attacks usually rely on repeated measurement. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
btrask
Apr 26, 2016
Using a script-hash whitelist in the HTTP headers (as part of CSP or separately) is better for a small number of hashes, since it doesn't require an extra round trip. Using a well-known list is better for a large number of hashes, since it can be cached for a long time.
I agree that well-known locations are ugly. Although it works for /robots.txt and /favicon.ico, there is a high cost for introducing new ones.
The privacy problem is worse than timing attacks: if you control the server, you can tell that no request is ever made. This seems insurmountable for cross-origin caching.
Perhaps the gulf between hashes and locations is too large to span. For true content-addressing systems (like what I'm working on), my preference is to treat all hashes as a single origin (so they can't reference or be referenced by location-based resources).
Thanks for your quick reply!
btrask
commented
Apr 26, 2016
|
Using a script-hash whitelist in the HTTP headers (as part of CSP or separately) is better for a small number of hashes, since it doesn't require an extra round trip. Using a well-known list is better for a large number of hashes, since it can be cached for a long time. I agree that well-known locations are ugly. Although it works for /robots.txt and /favicon.ico, there is a high cost for introducing new ones. The privacy problem is worse than timing attacks: if you control the server, you can tell that no request is ever made. This seems insurmountable for cross-origin caching. Perhaps the gulf between hashes and locations is too large to span. For true content-addressing systems (like what I'm working on), my preference is to treat all hashes as a single origin (so they can't reference or be referenced by location-based resources). Thanks for your quick reply! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mozfreddyb
Apr 26, 2016
Contributor
I'd be slightly more interested in blessing the hashes for cross-origin caches by mentioning in the CSP. .well-known would add another roundtrip. I'm not sure if that's going to impact hamper the performance benefit that we wanted in the first place.
The idea to separate hashed resources into their own origin is interesting, but I don't feel comfortable drilling holes that deep into the existing weirdness of origins.
|
I'd be slightly more interested in blessing the hashes for cross-origin caches by mentioning in the CSP. The idea to separate hashed resources into their own origin is interesting, but I don't feel comfortable drilling holes that deep into the existing weirdness of origins. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
btrask
Apr 26, 2016
To be clear, giving hashes their own origin only makes sense if you are loading top-level resources by hash. In that case, you can give access to all other hashes, but prohibit access to ordinary URLs. But that is a long way off for any web browsers and far from the scope of SRI.
btrask
commented
Apr 26, 2016
|
To be clear, giving hashes their own origin only makes sense if you are loading top-level resources by hash. In that case, you can give access to all other hashes, but prohibit access to ordinary URLs. But that is a long way off for any web browsers and far from the scope of SRI. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mozfreddyb
Oct 18, 2016
Contributor
For the record, @hillbrad wrote a great document outlining the privacy and security risks of shared caching: https://hillbrad.github.io/sri-addressable-caching/sri-addressable-caching.html
|
For the record, @hillbrad wrote a great document outlining the privacy and security risks of shared caching: https://hillbrad.github.io/sri-addressable-caching/sri-addressable-caching.html |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kevincox
Oct 31, 2016
That document doesn't appear to consider an opt-in approach. While this would reduce the number of people who do it it could be quite useful.
<script src=jquery.js integrity="..." public/>This tag should only be put on scripts for which timing is not an issue. Of course deciding what is pubic is now the responsibility of the website. However since the benefit would be negligible for anything that is website specific this might be pretty clear. For example loading a script specific to my site has a single URL anyways, so I may as well not put public otherwise malicious sites can figure out who has been to my site recently even though I don't get any benefit from the content-addressed cache. However if I am including jQuery there will be a benefit because there are many different copies on the internet and at the same time it means that knowing whether a user has jQuery in their cache is much less identifying.
That being said if FF had a way to turn this on now I would enable it, I don't see the privacy hit to be large and the performance would be nice to have.
kevincox
commented
Oct 31, 2016
|
That document doesn't appear to consider an opt-in approach. While this would reduce the number of people who do it it could be quite useful. <script src=jquery.js integrity="..." public/>This tag should only be put on scripts for which timing is not an issue. Of course deciding what is That being said if FF had a way to turn this on now I would enable it, I don't see the privacy hit to be large and the performance would be nice to have. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
hillbrad
Dec 21, 2016
Contributor
|
If I want to use the presence of my script in a shared cache to track you
illicitly, I will deliberately set the public flag, even if the content
isn't actually public.
…On Mon, Oct 31, 2016 at 3:06 PM Kevin Cox ***@***.***> wrote:
That document doesn't appear to consider an opt-in approach. While this
would reduce the number of people who do it it could be quite useful.
<script src=jquery.js integrity="..." public/>
This tag should only be put on scripts for which timing is not an issue.
Of course deciding what is pubic is now the responsibility of the
website. However since the benefit would be negligible for anything that is
website specific this might be pretty clear. For example loading a script
specific to my site has a single URL anyways, so I may as well not put
public otherwise malicious sites can figure out who has been to my site
recently even though I don't get any benefit from the content-addressed
cache. However if I am including jQuery there will be a benefit because
there are many different copies on the internet and at the same time it
means that knowing whether a user has jQuery in their cache is much less
identifying.
That being said if FF had a way to turn this on now I would enable it, I
don't see the privacy hit to be large and the performance would be nice to
have.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACFbcMQKkDaic1pBKylEUbpeHMoE2GLOks5q5mZbgaJpZM4G5Tap>
.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kevincox
Dec 21, 2016
kevincox
commented
Dec 21, 2016
|
On 21/12/16 01:07, Brad Hill wrote:
If I want to use the presence of my script in a shared cache to track you
illicitly, I will deliberately set the public flag, even if the content
isn't actually public.
If you want to track me and you control both origins you want to track
me from you can just use the same URL and you get a cookie which is
better tracking and works today.
This is about preventing a third-party site having a script with the
same hash as for example a script on Facebook, then they can tell if you
have been to facebook "recently". However since fb hosts the script they
won't set it as "public" and so it won't be a problem.
I don't understand what threat you are trying to protect against.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
btrask
Dec 21, 2016
A "public" flag seems like a good solution to me. It seems to encapsulate both the benefits and the drawbacks of shared caching. It says, "yes, you can share files publicly, but that means anyone can see them."
That said, if it's opt-in, there's the question of how many sites would actually use it, and whether it's worth the trouble. Especially if it has to be set in HTML, rather than say by CDNs automatically. Maybe it would work better as an HTTP header?
btrask
commented
Dec 21, 2016
|
A "public" flag seems like a good solution to me. It seems to encapsulate both the benefits and the drawbacks of shared caching. It says, "yes, you can share files publicly, but that means anyone can see them." That said, if it's opt-in, there's the question of how many sites would actually use it, and whether it's worth the trouble. Especially if it has to be set in HTML, rather than say by CDNs automatically. Maybe it would work better as an HTTP header? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ScottHelme
Dec 22, 2016
Setting in the HTML doesn't seem to be a big problem. If large CDN providers include this in their example script/style tags then sites will copy and paste support for this. A similar approach is currently being used for SRI and although it's not as fast as I'd like, usage will slowly grow. Sites that are also looking for those extra performance boosts would be keen to implement it.
ScottHelme
commented
Dec 22, 2016
•
|
Setting in the HTML doesn't seem to be a big problem. If large CDN providers include this in their example script/style tags then sites will copy and paste support for this. A similar approach is currently being used for SRI and although it's not as fast as I'd like, usage will slowly grow. Sites that are also looking for those extra performance boosts would be keen to implement it. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kevincox
Jan 2, 2017
The idea of a public header (or even another key in Cache-Control) sounds quite interesting and elegant, however I think it would make it more difficult to use as one significant use case of this is to let each site to point to their own copy of a script, rather then a centrally hosted one. This means that each site would have to add headers to some of their scripts rather then just a modification in HTML. Not that either is a huge barrier but often static site hosting makes it difficult to set headers especially for a subset of paths.
At the end of the day I have not major objections to either option though.
kevincox
commented
Jan 2, 2017
|
The idea of a public header (or even another key in At the end of the day I have not major objections to either option though. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
btrask
Jan 3, 2017
@kevincox Yes, I was suspecting that Cache-Control: public might be appropriate. It seems like the HTTP concept of a "shared cache" is fundamentally equivalent to SRI shared caching. See here for definitions of public and private: https://tools.ietf.org/html/rfc7234#section-5.2.2.5
The Cache-Control security concerns (cache poisoning, accidentally caching sensitive information) are prevented by hashing. The only remaining security consideration is information leaks, which Cache-Control: public seems to address.
I'm not opposed to using an HTML attribute instead, but I think it's good to reuse existing mechanisms when they fit. Caching has traditionally been controlled via HTTP, not HTML.
There are a few other ways to break this down:
- Does an HTML attribute make more sense for non-HTTP (
file:,data:,ftp:, etc.) resources? (There's an argument for shared caching across protocols, which a HTTP header wouldn't really help with; on the other hand, caching doesn't make much sense for some protocols) - Is publicness a property of the resource itself, or the use of that resource? (My intuition says the resource, since the point is that it can be shared between different contexts)
- Which is better for third party resources (e.g. hotlinking)? (Either approach can be limiting)
I think that thinking about it in terms of "which method is easier for non-expert webmasters to deploy?" is likely to lead to a suboptimal solution. Yes some people don't know how to set HTTP headers, and some hosts don't let users set them, but in that case they are already stuck with limited caching options. Unless we're going to expose all of Cache-Control via HTML.
btrask
commented
Jan 3, 2017
|
@kevincox Yes, I was suspecting that The Cache-Control security concerns (cache poisoning, accidentally caching sensitive information) are prevented by hashing. The only remaining security consideration is information leaks, which I'm not opposed to using an HTML attribute instead, but I think it's good to reuse existing mechanisms when they fit. Caching has traditionally been controlled via HTTP, not HTML. There are a few other ways to break this down:
I think that thinking about it in terms of "which method is easier for non-expert webmasters to deploy?" is likely to lead to a suboptimal solution. Yes some people don't know how to set HTTP headers, and some hosts don't let users set them, but in that case they are already stuck with limited caching options. Unless we're going to expose all of |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
brillout
Mar 8, 2017
@btrask A website highly concerned about privacy and loading <script src='/uncommon-datepicker.jquery.js' integrity="sha....." /> will want to make sure that uncommon-datepicker.jquery.js is never loaded from the shared cache. Whether the shared cache should be used or not is to be controlled by the website using the resource and not by the server who first delivered the resource.
brillout
commented
Mar 8, 2017
|
@btrask A website highly concerned about privacy and loading |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
btrask
Mar 8, 2017
@brillout: Yes, good point. Using a mechanism not in the page source defeats the purpose, when the page source is the only trusted information. Thanks for the tip!
btrask
commented
Mar 8, 2017
|
@brillout: Yes, good point. Using a mechanism not in the page source defeats the purpose, when the page source is the only trusted information. Thanks for the tip! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
brillout
Mar 8, 2017
@metromoxie
@mozfreddyb
@kevincox
@ScottHelme
Are we missing any pieces?
The two concerns are;
- CSP
- Privacy / "history attacks"
Solution to privacy: We can make the shared cache an opt-in option via an HTML attribute. I'd say it to be enough. (But if we want more protection then browsers could add a resource to the shared cache only when many domains use that resource. As described in https://hillbrad.github.io/sri-addressable-caching/sri-addressable-caching.html#solution and w3c/webappsec#504 (comment)).
Solution to CSP: UA should treat scripts with enabled shared cache as inline scripts. (As described here w3c/webappsec#504 (comment).)
It would be super exciting to be able to use bunch of web components using different frontend frameworks behind the web component curtain. A date picker using Angular, an infinite scroll using React and a video player using Vue. This is currently prohibitive KB-wise but a shared cache would allow it.
And with WebAssembly the sizes of libraries will get bigger increasing the need of such shared cache.
@nomeata Funny to see you on this thread, the world is small
brillout
commented
Mar 8, 2017
|
@metromoxie Are we missing any pieces? The two concerns are;
Solution to privacy: We can make the shared cache an opt-in option via an HTML attribute. I'd say it to be enough. (But if we want more protection then browsers could add a resource to the shared cache only when many domains use that resource. As described in https://hillbrad.github.io/sri-addressable-caching/sri-addressable-caching.html#solution and w3c/webappsec#504 (comment)). Solution to CSP: UA should treat scripts with enabled shared cache as inline scripts. (As described here w3c/webappsec#504 (comment).) It would be super exciting to be able to use bunch of web components using different frontend frameworks behind the web component curtain. A date picker using Angular, an infinite scroll using React and a video player using Vue. This is currently prohibitive KB-wise but a shared cache would allow it. And with WebAssembly the sizes of libraries will get bigger increasing the need of such shared cache. @nomeata Funny to see you on this thread, the world is small |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
An opt-in privacy leak isn't a great feature to have. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
brillout
Mar 8, 2017
An opt-in privacy leak isn't a great feature to have.
How about opt-in + a resource is added to the shared cache only after the resource has been loaded by several domains?
brillout
commented
Mar 8, 2017
How about opt-in + a resource is added to the shared cache only after the resource has been loaded by several domains? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kevincox
Mar 8, 2017
kevincox
commented
Mar 8, 2017
|
I don't think that really helps as the attacker can purchase two domains
quite easily.
…On Mar 8, 2017 18:48, "Romuald Brillout" ***@***.***> wrote:
An opt-in privacy leak isn't a great feature to have.
How about opt-in + a resource is added to the shared cache only after the
resource has been loaded by several domains?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAeJvA685ky3o6v0z4l3viFAYvio7VxEks5rjvgTgaJpZM4G5Tap>
.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
brillout
Mar 8, 2017
I don't think that really helps as the attacker can purchase two domains
quite easily.
Yes it can't be n domains where n is predefined. But making n probabilistic makes it considerably more difficult for an attack to be successful. (E.g. last comment at w3c/webappsec#504 (comment).)
brillout
commented
Mar 8, 2017
Yes it can't be |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
strugee
Mar 10, 2017
CSP has (is getting?) a nonce-based approach. IIUC the concern with CSP is that an attacker would be able to inject a script that loaded an outdated/insecure library through the cache, thus bypassing controls based on origin. However requiring nonces for SRI-based caching seems to solve this issue as the attacker wouldn't know the nonce; it also creates a performance incentive for websites to move to nonces, which are more secure than domain whitelists for the same reason[1].
I think it's possible that we could solve the privacy problem by requiring a certain number of domains to reference the script... it'd be really useful to have some metrics from browser telemetry here. For example if we determined that enough users encountered e.g. a reference to jQuery in >100 domains for that to be the minimum, it might be that we could load things from an SRI cache if they had been encountered in 100+ distinct top-level document domains (i.e. domains the user explicitly browsed to, not that were loaded in a frame or something). The idea being that because of the top-level document requirement, the attacker would have to socially engineer the user into visiting 100 domains, which would be very, very difficult. However if telemetry told us that 100 is too high a number and it's actually more like 20 for a particular jQuery version, that'd be a different story.
[1]: consider e.g. being able to load an insecure Angular version from the Google CDN because the site loaded jQuery from the Google CDN
strugee
commented
Mar 10, 2017
|
CSP has (is getting?) a nonce-based approach. IIUC the concern with CSP is that an attacker would be able to inject a script that loaded an outdated/insecure library through the cache, thus bypassing controls based on origin. However requiring nonces for SRI-based caching seems to solve this issue as the attacker wouldn't know the nonce; it also creates a performance incentive for websites to move to nonces, which are more secure than domain whitelists for the same reason[1]. I think it's possible that we could solve the privacy problem by requiring a certain number of domains to reference the script... it'd be really useful to have some metrics from browser telemetry here. For example if we determined that enough users encountered e.g. a reference to jQuery in >100 domains for that to be the minimum, it might be that we could load things from an SRI cache if they had been encountered in 100+ distinct top-level document domains (i.e. domains the user explicitly browsed to, not that were loaded in a frame or something). The idea being that because of the top-level document requirement, the attacker would have to socially engineer the user into visiting 100 domains, which would be very, very difficult. However if telemetry told us that 100 is too high a number and it's actually more like 20 for a particular jQuery version, that'd be a different story. [1]: consider e.g. being able to load an insecure Angular version from the Google CDN because the site loaded jQuery from the Google CDN |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
zrm
Apr 5, 2017
For example, the user agent could request https://example.com/.well-known/sri-list, which would return a plain text file with a list of acceptable hashes, one per line.
For some domains that file could be too large and change too often. Consider Tumblr's image hosting (##.media.tumblr.com) where each of the domain names host billions of files and the list changes every second.
How about something similar to HTTP ETag but with a client-specified hash algorithm. If the hash is correct you only get a response affirming as much instead of the entire file, which the browser can cache. It doesn't save you the round trip but it saves you the data.
zrm
commented
Apr 5, 2017
For some domains that file could be too large and change too often. Consider Tumblr's image hosting (##.media.tumblr.com) where each of the domain names host billions of files and the list changes every second. How about something similar to HTTP ETag but with a client-specified hash algorithm. If the hash is correct you only get a response affirming as much instead of the entire file, which the browser can cache. It doesn't save you the round trip but it saves you the data. |
Synzvato
referenced this issue
Apr 25, 2017
Closed
Breaks sites that implement integrity hashes #161
symbiogenesis
referenced this issue
Dec 29, 2017
Closed
Add Async and Defer boolean attributes for script resources #464
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
BigBlueHat
Mar 22, 2018
Member
How about something similar to HTTP ETag but with a client-specified hash algorithm. If the hash is correct you only get a response affirming as much instead of the entire file, which the browser can cache. It doesn't save you the round trip but it saves you the data.
RFC 3230: Instance Digests in HTTP defines a Digest header and a Want-Digest header that work exactly this way...or was meant to.
This would get the 304 Not Modified style responses, but it's still limited to a single URL check.
Maybe it (or something like it) coupled with the Immutable header could be used to populate some amount of caching or "permanence," but the model is still about the "given name" of the object (it's URL) and not about its intrinsic identification (it's content hash).
Caching's one use case for these things, but the Web could also benefit from some "object permanence" where possible and appropriate.
RFC 3230: Instance Digests in HTTP defines a This would get the Maybe it (or something like it) coupled with the Caching's one use case for these things, but the Web could also benefit from some "object permanence" where possible and appropriate. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kevincox
Mar 22, 2018
I don't see the benefit from Want-Digest. If the client has a whitelisted digest and the content backing it why bother the server? There are three possible responses.
- 304 Not Modified: Use what you had in the cache.
- 200 + Contents that match digest. Redundant transfer of file.
- Other: Error.
This would wait around for a response that can only make the situation worse.
kevincox
commented
Mar 22, 2018
|
I don't see the benefit from Want-Digest. If the client has a whitelisted digest and the content backing it why bother the server? There are three possible responses.
This would wait around for a response that can only make the situation worse. |
metromoxie commentedDec 21, 2015
We've had a lot of discussions about using SRI for shared caching (see https://lists.w3.org/Archives/Public/public-webappsec/2015May/0095.html for example). An explicit issue was filed at w3c/webappsec#504 suggesting a sharedcache attribute to imply that shared caching is OK. We should consider leveraging SRI for more aggressive caching.