New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`sec-metadata` #280

Closed
mikewest opened this Issue May 15, 2018 · 39 comments

Comments

@mikewest

mikewest commented May 15, 2018

Good morning, friendly TAG!

I'm requesting a (p)review of:

Further details (optional):

  • Relevant time constraints or deadlines: None. Just an early directional review (and invitation to a naming/spelling bikeshed).
  • I am passingly familiar with the Self-Review Questionnare on Security and Privacy. This proposal does have some privacy implications insofar as it reveals whether a request was made from a cross-origin/site page, even in the face of a referrer policy that would prevent leaking the URL. The granularity is low enough that I'll boldly claim that the value seems to outweigh the marginal risk.
  • I have reviewed the TAG's API Design Principles

You should also know that you're probably my favorite web architecture review body. Top 10, certainly.

We'd prefer the TAG provide feedback as (please select one):

  • open issues in our Github repo for each point of feedback
  • open a single issue in our Github repo for the entire review
  • leave review feedback as a comment in this issue and @-notify [github usernames]

@torgo torgo added this to the 2018-05-29-telcon milestone May 22, 2018

@dbaron

This comment has been minimized.

Member

dbaron commented Jun 4, 2018

So some initial reactions, which you shouldn't take too seriously because I haven't put that much thought into them yet:

  • mitigating CSRF is a worthwhile problem to work on

  • it seems like the underlying problem here is use of ambient authority in unexpected ways. This solution seems to be attacking the problem by blocking "unexpected ways". It seems like there could be alternatives that would reduce the scope of the ambient authority; you've even worked on one. Are there reasons to prefer one approach over the other, or the combination?

  • it seems like it might not be cheap, though. In particular:

    • the header isn't tiny, and seems like it will be sent on every request
    • it seems like it might be quite a bit of work to get browsers to produce interoperable statements, work that might (?) be better spent elsewhere
@mikewest

This comment has been minimized.

mikewest commented Jun 5, 2018

Hey @dbaron, thanks for the feedback!

I'd actually appreciate y'all spending a little more time on this in the somewhat near future, as it's something that turned out to be trivial to implement, and that doesn't seem controversial from the (few) conversations I've had with other vendors. It's feasible that we could ship it in the somewhat near future, and your feedback would be really helpful in ensuring that we do that in the right way.

I'll also note that I started sketching out a more detailed spec at https://mikewest.github.io/sec-metadata/, which might relieve some of your interop concerns below. :)

To the line-by-line:

mitigating CSRF is a worthwhile problem to work on

I agree! Google's security folks are enthusiastic about this mechanism, and I'd like to let them loose on it in the wild.

it seems like the underlying problem here is use of ambient authority in unexpected ways.

I think that's accurate. The goal here is to give developers on the server enough context to make more informed decisions before doing all the work of putting together a response and delivering it to the client. Rather than relying on client-side rules (e.g. MIME type checks a la CORB), we can perform checks on the server that enable a priori decision-making, therefore avoiding things like timing attacks which are otherwise pervasive.

It seems like there could be alternatives that would reduce the scope of the ambient authority

SameSite cookies are good at reducing ambient authority. They're also somewhat difficult to deploy effectively. I happen to think it's worth the effort for things like CSRF tokens, but developers have yet to practically agree with me by doing the work. :)

This proposal has some deployability advantages insofar as it doesn't require rearchitecture, renaming, or etc. All the server needs to do is read a header and make decisions. In many cases, those decisions can even be abstracted out of the application code itself, and into frontend servers which might be maintained and hardened by an entirely different team. I think that's very much worth exploring.

I'd also note that things like SameSite cookies address only the same-site/cross-site portion of the proposal's site member. It doesn't allow same-origin distinction, which is helpful in many cases, and doesn't address destination, which is just as (more?) important (and without existing analog).

the header isn't tiny, and seems like it will be sent on every request

I agree that the size is a possible concern. That said, if it's something we ship, I'm pretty sure we can minimize the impact by tweaking the H/2 compression dictionary. We're just uploading ~4 enums, after all. The values remain static over time, and should be quite compressable.

it seems like it might be quite a bit of work to get browsers to produce interoperable statements

We're actually already exposing the most complext bit of information via Service Workers (see Request.destination for example), so I expect that the interop work there will be minimal.

The user-activated vs forced distinction will be tougher to get interoperably working in the various edge-cases, but since that reflects a deeper set of disagreements between browsers' determinations of what constitutes a "gesture", it's not clear to me that exposing those differences to servers is doing any harm (and might lead to more interop work in the future).

@annevk

This comment has been minimized.

Member

annevk commented Jun 5, 2018

  • I see you removed the request's initiator dependency, thanks!
  • target seems wrong for workers. I don't think we should deviate too much from https://fetch.spec.whatwg.org/#subresource-request et al.
  • Instead of destination accepting a string, I think we should reuse fetch as a value there similar to https://fetch.spec.whatwg.org/#concept-potential-destination.
  • The document purports to be in the Public Domain and under copyright owned by Google.
  • I'm a little uncomfortable sharing all this data. It seems like it could easily make #76 much worse.
  • I'm also not entirely persuaded by "H/2 compression will save us". The whole point of adding compression there was to reduce the existing bloat, not to provide a way to add more.
@mikewest

This comment has been minimized.

mikewest commented Jun 5, 2018

Hey, @annevk!

I see you removed the request's initiator dependency, thanks!

I'd still like to get the "download" vs non-download distinction back somehow, but the remaining bits of initiator are much less important to me, so dropping it wasn't hard.

target seems wrong for workers. I don't think we should deviate too much from https://fetch.spec.whatwg.org/#subresource-request et al.

The interesting distinction in target is really top-level vs nested. I would be perfectly happy dropping it entirely from non-navigational requests. One way to do that could be to split the "document" destination into "top-level-document" and "nested-document", which would allow us to drop target entirely.

Instead of destination accepting a string, I think we should reuse fetch as a value there similar to https://fetch.spec.whatwg.org/#concept-potential-destination.

I don't quite understand: are you saying that instead of destination="", you'd prefer destination="fetch"? That seems fine to me.

Also: I don't see "potential destination" used anywhere in Fetch. Where is it used?

The document purports to be in the Public Domain and under copyright owned by Google.

Huh. That was dumb. I'll fix it.

I'm a little uncomfortable sharing all this data. It seems like it could easily make #76 much worse.

What bits of data are you uncomfortable sharing? destination? It's also not clear to me how you see this relating to #76. Could you elaborate?

I'm also not entirely persuaded by "H/2 compression will save us". The whole point of adding compression there was to reduce the existing bloat, not to provide a way to add more.

My point isn't that compression means that we can magically add everything ever to HTTP requests. I'm claiming that compression reduces the apparent cost of the header to something that seems pretty manageable, and (IMO) worth the tradeoff.

@annevk

This comment has been minimized.

Member

annevk commented Jun 5, 2018

  • Splitting document seems potentially breaking, but I wouldn't mind trying to use nested-document for frames.
  • I'd prefer destination=fetch, with no string values. <link rel=preload as> takes a potential destination, as defined by HTML.
  • The way it relates to #76 is that #76 is about the fetcher having to know how to fetch the resource. This seems to make that worse as this allows the resource to make that much more granular (e.g., only providing a response when destination is font).
  • Given that according to https://www.arturjanc.com/cross-origin-infoleaks.pdf SameSite cookies provide the same level of defense, it's unclear that it's worth the tradeoff.
@mikewest

This comment has been minimized.

mikewest commented Jun 5, 2018

Splitting document seems potentially breaking, but I wouldn't mind trying to use nested-document for frames.

Breaking insofar as folks might be looking at the destination value in a Service Worker?

I'd prefer destination=fetch, with no string values. takes a potential destination, as defined by HTML.

Turns out, @mnot removed bare identifiers as valid dictionary values in https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-05#appendix-A.1 (https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-05#section-4.4). It's strings or nothing.

The way it relates to #76 is that #76 is about the fetcher having to know how to fetch the resource. This seems to make that worse as this allows the resource to make that much more granular (e.g., only providing a response when destination is font).

The goal is indeed to allow the server to make more granular decisions about when to service a request. I'm claiming that as a substantial advantage. :)

Given that according to https://www.arturjanc.com/cross-origin-infoleaks.pdf SameSite cookies provide the same level of defense, it's unclear that it's worth the tradeoff.

SameSite does not offer the same defenses (perhaps the table we put in that doc wasn't quite detailed enough :) ). It allows folks to distinguish cross-site vs same-site requests (which is valuable!). It does not allow folks to distinguish between request types for endpoints which expect cross-site usage. SameSite also seems quite a bit more difficult to deploy in a robust manner.

@mikewest

This comment has been minimized.

mikewest commented Jun 5, 2018

/cc @arturjanc, since we're talking about him indirectly. :)

@annevk

This comment has been minimized.

Member

annevk commented Jun 5, 2018

It's strings or nothing.

In that case don't bother with potential destination.

@arturjanc

This comment has been minimized.

arturjanc commented Jun 5, 2018

Given that according to https://www.arturjanc.com/cross-origin-infoleaks.pdf SameSite cookies provide the same level of defense, it's unclear that it's worth the tradeoff.

SameSite cookies seem like the right model for security (modulo the lack of a same-origin vs. same-site distinction), but they are -- unfortunately -- almost certainly not the right model for adoptability. I made some notes about this in https://lists.w3.org/Archives/Public/public-webappsec/2018May/0015.html; among other things, SameSite cookies make it all but impossible to use CORS (all cross-site requests become unauthenticated) or have resources which are embedded by other sites (when a document is framed cross-site, all its same-origin requests lose cookies because SameSite cookies are only sent if the top document is same-site).

Sec-Metadata addresses these problems, and has the essential feature of letting developers test the impact of enabling any security restrictions before switching to enforcement, via a server-side equivalent to CSP's report-only policies. This isn't possible with a cookie-based model.

Basically, I see Sec-Metadata as a riff on SameSite cookies that is much more amenable to being deployed in non-trivial applications; for an opt-in security feature this seems like a critical distinction.

@mikewest

This comment has been minimized.

mikewest commented Jun 5, 2018

(Also, there's no guarantee that a cookie actually is SameSite, because cookies are terrible and can (usually) be set/overwritten by anyone on your registrable domain.)

@arturjanc

This comment has been minimized.

arturjanc commented Jun 14, 2018

To answer some of @annevk and @dbaron's questions above, I figured I'd elaborate on the benefits we expect to get out of Sec-Metadata's approach based on an HTTP request header. These benefits boil down to two important areas: the overall security benefit offered by this feature, and its ease of adoption.

When it comes to offering protections against cross-origin information leaks, it seems important to provide the server with context about the request which allows it to make a security decision upfront, before any server-side processing has occurred. For example, mechanisms using a response header such as Cross Origin Resource Policy leave the application susceptible to CSRF (because any side effects of handling a request on the server will remain even if the browser doesn't expose the response to its initiating context) and timing attacks (because the amount of time to generate a response and return it to the client doesn't change, and can be leaked via the usual side channels). A model where a server has a chance to reject an unexpected / unwanted request before taking any action is a powerful primitive which addresses cross-origin infoleaks in a more robust way.

The main advantage of Sec-Metadata, however, lies in its adoptability -- which is a crucial aspect for any opt-in security feature which requires developers to make often non-trivial changes to their applications. From experience with CSP we know that it's difficult for developers to track the locations of resources loaded by their applications; CSP's report-only mode is a critical tool in any non-trivial deployment of CSP to prevent the policy from breaking existing resource loads. Arguably, tracking dependencies in the other direction, i.e. understanding which of the application's resources are being requested by other origins, is even less tractable because developers don't get much useful information about the initiators of requests to their applications. The desire to make mechanisms such as C-O-R-P simple leads to them not shipping with a reporting mode similar to the one in CSP, meaning that developers have no safety net during deployment -- they can only find out about things that broke after they enable the restrictions in their production environments, which increases the risk of adoption.

Sec-Metadata allows developers to know which resources would be affected before enforcing any restrictions on cross-origin resource loads. The developer can first review incoming values of Sec-Metadata headers to understand which, if any, endpoints are being accessed by other origins. If all requests which carry Sec-Metadata set site=same-site or site=same-origin, the developer knows that she can reject cross-origin requests without the risk of breaking existing functionality. This design allows building a server-side equivalent to CSP's report-only mode without adding more complexity in the browser; it also makes it possible to create generic middleware for common server-side frameworks which will both handle such reporting, and provide a set of easy to understand security policies (e.g. "lock down my application to only same-origin non-navigational requests" or "lock down my scripts so they can only be fetched same-site or from example.org").

Based on our experience, the request header model is significantly easier for developers to deploy than either the Cross-Origin-Resource-Policy response header or SameSite cookies. With C-O-R-P, the lack of a report-only mode is a large obstacle to adoption in many applications (as is its current inability to permit cross-site resource loads, though this is fixable); if deploying a security feature leads to any user-visible breakage, it often becomes much more difficult to convince product teams to give it a second chance. SameSite cookie deployments run into the problems I mentioned above (e.g. breaking authenticated cross-site CORS requests): basically, most developers cannot set their authentication cookies as SameSite so they would need to set an auxiliary SameSite cookie cryptographically tied to the authentication token, and verify its presence on a per-endpoint basis, which by itself is fairly cumbersome. Even in this more flexible setup there are fairly large deployment concerns -- for example, a developer couldn't gather data about endpoints which are only accessed in a same-site context because cookies marked as SameSite are sent on all requests by browsers which don't understand this attribute.

Sec-Metadata aims for the "sweet spot" by both providing a more robust security benefit (protection against most known types of cross-origin information leaks) and solving the deployment problems we've seen in other mechanisms by allowing incremental adoption without requiring either extensive application-level changes or adding browser-side reporting infrastructure. As a bonus, it seems like a relatively small feature for a user-agent to implement, since many of the fields it proposes map to request data that the browser already has to keep track of for other reasons, e.g. for Fetch.

The main trade-off on the Sec-Metadata side is that its design requires writing server-side code to inspect the request header and make security decisions. Developers often like to define security restrictions as static response headers in their server config -- they wouldn't be able to do this here. At the same time, the middleware to inspect Sec-Metadata will likely be generic enough that most applications will be able to use one of the preset, easy to understand policies outlined above.

My guess is that this model is the right trade-off for handling the problem of cross-origin information leakage in moderately complex applications: it gives developers enough information to make meaningful security decisions, without requiring adding extensive new machinery to the web platform.

@lknik

This comment has been minimized.

lknik commented Jun 26, 2018

Thank you Artur @arturjanc for this nice explanation, and notably including an explanatory reference to whatwg/fetch#687 as well

@slightlyoff

This comment has been minimized.

Member

slightlyoff commented Jul 26, 2018

Would this header be restricted to secure contexts?

@mnot

This comment has been minimized.

Member

mnot commented Jul 26, 2018

It looks like we're going to add bare identifiers back into Structured Headers, FWIW.

@mikewest

This comment has been minimized.

mikewest commented Jul 26, 2018

@slightlyoff

Would this header be restricted to secure contexts?

Yes. I had this in the explainer, but not in the spec: fixed in mikewest/sec-metadata@70f9c34, thanks!

@mnot

It looks like we're going to add bare identifiers back into Structured Headers, FWIW.

Ah! That would drop some quotes from the serialization, which would be nice!

For completeness, and to follow up on a conversation earlier today: if we care more about the header size than usability or readability, we can treat this header as containing a boolean (cause), an 18-value enum (destination), a 3-value enum (site), and a boolean (target), so we can stuff it into an ~7 bit mask, and base64url encode it as a binary structured header (e.g. Sec-Metadata: cause="user-activated", destination="document", site="same-origin", target="top-level" => 1 010 00 1 => Sec-Metadata: *UQ*).

That's a thing we could do. I'm not sure it's a good idea. @travisleithead seemed to be in favor of the more verbose description.

@cynthia cynthia removed the extra time label Jul 26, 2018

@mnot

This comment has been minimized.

Member

mnot commented Jul 26, 2018

FWIW, the longer-term pseudo-plan for Structured Headers is that there will eventually be a much more wire-/parse-efficient format in a future version of HTTP (or extension). Perhaps not that efficient, but better.

Even without that, remember that you've got H2/QUIC header compression, and these values are pretty stable (right), so size isn't the absolutely first concern.

@slightlyoff

This comment has been minimized.

Member

slightlyoff commented Jul 26, 2018

Rearding naming, @ylafon offhandedly mentioned Sec-Context. Thoughts?

@mikewest

This comment has been minimized.

mikewest commented Jul 27, 2018

Rearding naming, @ylafon offhandedly mentioned Sec-Context. Thoughts?

mikewest/sec-metadata#2 has the extent of the thought folks have put into the name. :) Sec-Context came up there, as well as Sec-Fetch and Sec-Request. I'm leaning towards Sec-Fetch as that's where much of the metadata in question comes from, but 🤷‍♂️ .

@mikewest

This comment has been minimized.

mikewest commented Oct 16, 2018

Any more feedback from y'all on this mechanism? I think we're aligning on a reasonable design here, and I'd like to get it moving forward. Perhaps we can discuss it at TPAC if y'all have questions?

@dbaron dbaron added the Paris2018f2f label Oct 31, 2018

@torgo

This comment has been minimized.

Member

torgo commented Oct 31, 2018

Hello @mikewest. Again, we didn't manage to connect on this at TPAC (sorry). Are you looking for additional specific feedback from TAG to finish your amended design? If so maybe we can arrange for a discussion at a teleconference soon?

@mikewest

This comment has been minimized.

mikewest commented Oct 31, 2018

@mikewest

This comment has been minimized.

mikewest commented Nov 7, 2018

So. "Meh." stamp of approval and close this review out, @torgo? Or do y'all have thoughts you'd like to share?

@ylafon

This comment has been minimized.

Member

ylafon commented Nov 9, 2018

Sliced bread is overrated and I like the way this spec is going.
Note that you will not get an official reply/close before our next teleconference.

@mikewest

This comment has been minimized.

mikewest commented Nov 9, 2018

🍞!

Thanks, @ylafon.

@torgo

This comment has been minimized.

Member

torgo commented Nov 13, 2018

Hello @mikewest. Sorry for the delays. We discussed briefly on our call today and the two issues that came up were:

  1. header bloat – what do the http people have to say about this?
  2. is this yet another feature which we are adding to the web platform which is only usable by industrial-scale web parties? (In other words, how do small or medium sized providers take advantage of this capability?)

I think if we can address these issues then we're happy to close this one off. 🍞

@mikewest

This comment has been minimized.

mikewest commented Nov 13, 2018

header bloat – what do the http people have to say about this?

As @mnot always says, HTTP header compression is a silver bullet panacea that cures all ills.

Also, we discussed this above. See #280 (comment) and #280 (comment).

is this yet another feature which we are adding to the web platform which is only usable by industrial-scale web parties? (In other words, how do small or medium sized providers take advantage of this capability?)

As @slightlyoff noted in the minutes, software providers know their software, and can ship rules themselves at the application layer, which will automagically protect their clients. Imagine Wordpress locking down non-navigational requests to their API endpoints, for instance.

At the network layer, https://bugs.chromium.org/p/chromium/issues/detail?id=861678 is an exciting trip through the world of Web Application Firewalls, showing that they didn't like our initial pass at Sec-Metadata's syntax, but are interested in supporting it in the future. See in particular Ergon's comments at https://bugs.chromium.org/p/chromium/issues/detail?id=861678#c18.

My expectation is that Google-like companies will farm the work of tuning Sec-Metadata rules to @arturjanc-like employees, while https://www.movistar.es/, et al will rely on firewall software providers to do the same.

@mnot

This comment has been minimized.

Member

mnot commented Nov 13, 2018

@mikewest you are a bad man.

As always, this is going to be a judgement call. Given the take-up of other WebAppSec mechanisms, I would be concerned if this were included in every request; if it's not going to be used in the vast majority of cases, why send it?

Possible mitigations:

  1. Have the server opt into it. The usual mechanisms, usual problems. If only there were a metadata file that contained the server's preferences for browsers!

  2. Split into multiple headers. If there are many permutations of the metadata sent by a browser over a single connection (and it appears there are), this could make the header compression more efficient.

  3. Don't make the directives so verbose, while still trying to maintain readability. Don't go full P3P; nobody goes full P3P.

@mikewest

This comment has been minimized.

mikewest commented Nov 14, 2018

Regarding the header's size, we've made a few tweaks to the format in the last few weeks (dropped target, added a new value to destination (assuming that whatwg/fetch#755 is accepted), dropped cause from non-navigation requests, and @arturjanc wants to add a new 3-value enum for mode (in mikewest/sec-metadata#5)). Given those changes, the longest navigation request header would be 82 characters:

Sec-Metadata: cause=user-activated, destination=nested-document, site=cross-origin

The longest subresource request would be 62 characters:

Sec-Metadata: destination=serviceworker, site=cross-origin, mode=same-origin

Many requests will be shorter (e.g. Sec-Metadata: destination=empty, site=same-site, mode=cors), but let's take those as our baseline. They don't seem terrible to me. But, let's assume that they are, in fact, terrible. We have some options to make them shorter if we throw legibility out the window.

  • cause can shift from a named identifier model to a boolean: forced=!T or forced=!F. That drops the maximum navigation request value's size down to 57 characters, and it's not too terrible to read.
  • site can shift from a three-value enum to the numbers 0-2: site=0. That drops the maximum navigation request value's size down to 45 characters.
  • destination can shift from a ~20 value enum to the numbers 0-20: destination=10. That drops the maximum navigation request value's size down to 32 characters.
Sec-Metadata: forced=1, destination=10, site=0

This isn't terribly legible, and basically requires a lookup table for destination. At that point, since we're throwing legibility out the window, we may as well also set it on fire before its defenestration by encoding the data as a bitfield.

We've got a boolean, a 20-value enum, a 3-value enum, and a 5-value enum. Let's give ourselves a whole byte for destination, and pad the other values with a bit each, because who knows how much they'll change in the misty future, and we end up with:

Cause Destination Site Mode
0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 1 0

Which encodes as a binary structured header value in 6 characters as:

Sec-Metadata: *AADq*

That's a direction we could go. Is it better? I'm not sure.

Let's look at @mnot's suggestions:

  • Have the server opt into it. The usual mechanisms, usual problems. If only there were a metadata file that contained the server's preferences for browsers!

    The usual mechanisms (Client Hints, for example) might not be appropriate here, as they're moving to a delegation model that I think makes a lot of sense (see WICG/feature-policy#129). That model would remove basically all value from this proposal, as it would be attacker controlled (and it seems unlikely that we can rely on attackers setting the evil bit for us).

    Chrome's actively developing Origin Policy, and I've heard rumblings that Firefox is interested in doing the same. I'm not particularly enthusiastic about waiting on this until those mechanisms are finished, given that it would allow us to address concrete threats today. I'm equally unenthusiastic about inventing a new opt-in mechanism alongside all the others we've invented recently.

    IMO, the value of the header is high. Its cost seems bearable, while the cost of another site-wide opt-in mechanism is not. But of course I'd think that. :)

  • Split into multiple headers.

    If we binary-encode the data as above, we wouldn't want to split them up (I assume?).

    If we don't binary-encode the data, we'd shift from:

    Sec-Metadata: cause="user-activated", destination="document", site="same-origin"

    To

    Sec-Metadata-Cause: user-activated
    Sec-Metadata-Destination: document
    Sec-Metadata-Site: same-origin

    If that helps compression, great. It's not clear to me that it does? But you know more about the algorithm than I do.

  • Don't make the directives so verbose, while still trying to maintain readability.

    As above, cause seems simple to model as a boolean. I don't know how to shorten the destination values in a way that maintains readability. document could be doc, I guess. And nested-document could be iframe. But serviceworker can't be sw because it needs to be distinguished from sharedworker. style can't be s because it needs to be distinguished from script. Is sc and st legible? I'm not sure. Likewise, site could be so/ss/cs. That seems neither legible nor efficient.

With the exception of shifting cause to a boolean (which seems fine either way?), I think I would prefer either the existing, legible format, or a binary-encoded bitfield. The middle ground doesn't seem like it addresses either human comprehension or efficiency.

What do y'all think?

@mnot

This comment has been minimized.

Member

mnot commented Nov 14, 2018

Which encodes as a binary structured header value in 6 characters as:

Please, please, please don't do that. The whole point of using structured headers is to avoid creating new parsers and formats -- along with the bugs and interop problems that come with them.

Furthermore, if you go with a binary encoding, you're locking yourself into that set of values; if you want to add any, you'll need to mint new headers.

IOW, don't over-optimise for header size. Making these less wordy (e.g., dest instead of destination) will get you a long way. One of our aims for SH is to make future serialisations more efficient, so think of what you get with HPACK / QPACK as a floor, not a ceiling.

And, if you don't go binary, you can split into multiple headers, which does give you better compression out of the dynamic table. That's because each complete header value is stored in the dynamic table; if there are many permutations of a single header, it can blow out the total dynamic header table size and cause them to be pushed out.

In your proposed approach, there are ~600 permutations of values, whereas if you split it out, there are 30 possible values.

Even with the binary encoding (which again please don't do!), that's 30000 bytes of space in the dynamic table (assuming all permutations are seen on a connection) (see here for how to calculate the size of a header for purposes of the dynamic table).

If we use this style:

Sec-MD-User: ?T
Sec-MD-User: ?F
Sec-MD-Dest: audio
Sec-MD-Dest: audioworklet
Sec-MD-Dest: document
Sec-MD-Dest: embed
Sec-MD-Dest: empty
Sec-MD-Dest: font
Sec-MD-Dest: image
Sec-MD-Dest: manifest
Sec-MD-Dest: object
Sec-MD-Dest: paintworklet
Sec-MD-Dest: report
Sec-MD-Dest: script
Sec-MD-Dest: serviceworker
Sec-MD-Dest: sharedworker
Sec-MD-Dest: style
Sec-MD-Dest: track
Sec-MD-Dest: video
Sec-MD-Dest: worker
Sec-MD-Dest: xslt
Sec-MD-Dest: nested-document
Sec-MD-Site: same-origin
Sec-MD-Site: same-site
Sec-MD-Site: cross-site
Sec-MD-Mode: same-origin
Sec-MD-Mode: cors
Sec-MD-Mode: no-cors
Sec-MD-Mode: navigate
Sec-MD-Mode: websocket

I count it as a maximum of 1509 bytes in the dynamic table (again, assuming that all of the directives are seen). Much less impact, easy to parse and extensible to boot.

The default dynamic table size for the headers in one direction on a connection is 4,096 bytes. While many browsers increase that on the response side, it's not clear whether it's safe to assume that on the request side, as servers generally have more stringent per-connection memory constraints.

In actual use, the number of permutations seen on a connection will often be lower. However, from what I can tell from your use case, there will still be a fair amount of variance, no? Also, think about things like proxies that are mixing traffic from multiple clients onto one upstream connection.

The upshot here is that HTTP header compression is heavily optimised for header values that don't have a lot of variance on a connection. Don't fight it :)

@mikewest

This comment has been minimized.

mikewest commented Nov 15, 2018

Thanks, @mnot!

[many reasons not to encode data in a bitfield]

I generally agree. The proposal was somewhat ad absurdum in nature. I generally agree with @arturjanc's comment that "We should likely be making it easier for developers to use security mechanisms. Requiring application-level decompression is awkward and a barrier to entry." I would prefer human-readable headers when possible.

IOW, don't over-optimise for header size.

This sounds like good advice.

The upshot here is that HTTP header compression is heavily optimized for header values that don't have a lot of variance on a connection. Don't fight it :)

This sounds like a reasonable argument in favor of splitting the header into a million little shards. Thank you for the primer on header compression (whose constraints I think I still don't really understand); I appreciate the review.

@lknik

This comment has been minimized.

lknik commented Nov 18, 2018

Hi,

First of all, I like the whole idea of Sec-Metadata (is this the final name finally?). Solves problems, while also offering new ways to make exciting security and privacy measurements (passively, possibly actively, never mind for that until I write a grant proposal for that).

I vehemently favor the legible version. I do not believe there is any particular bloat involved (also, as @mikewest and @mnot appear to say, compression, being the standard now, make it a no-case).

Please do not write any binary-encoding tables of the kind above. Makes eyes hurt, also, among the others thinking of the poor developers, security engineers and pentesters, I do not think there is a valid case for not having legibility. In other words, do not make it an IAB-style* consent framework

In other words, I'd say all is fine and on place. There is also no question from my personal perspective, that Sec-Metadata is a good thing, and its value is tangible to multiple stakeholders.

[*] what do you consent to here? 00000100 11100001 00000101 00010000 00001100 10001110
00010000 01010001 00000000 11001000 00000001 11000000 00000100 00110001 00001101 00000000 10001110 00000000 00000100 00000000 00011101 10111100 00000000 01000000 00000001 00100000 00011000

@arturjanc

This comment has been minimized.

arturjanc commented Nov 18, 2018

Thanks for the comments, @mnot and @lknik!

To the point about splitting the values across multiple headers, I looked into the data we're collecting from clients with Chrome's experimental web platform features flag enabled based on a total of ~155M requests to 200 Google services.

In this data set, we have 81 permutations of the four main values (destination, target, site, cause). The top 10 most common permutations (e.g. site=same-site, destination=image, target=subresource) are seen in 90% of requests, and for the top 20 permutations this rises to 98% (note that this doesn't take into account Fetch's mode value because Chrome's implementation doesn't send it yet). Also, this is global data -- my guess is that for a given connection the number of permutations would be lower because requests would tend to have the same site field. I'm not sure how this distribution affects the HTTP header compression discussion, but I'd expect most servers to see a top-heavy distribution of values even if they were included in the same header.

From a developer ergonomics point of view my guess is that having the information in a single header would be slightly easier to reason about -- servers will generally need to look at several of the values to make a security decision, and developers may need to do the same when debugging any rejected requests. That said, it's not a hill I'd want to die on; if the performance benefits of the split header approach are substantial and request size is our main remaining concern, I think it would still be workable.

@mnot

This comment has been minimized.

Member

mnot commented Nov 18, 2018

@arturjanc Likewise, I'm not dead-set on splitting them out; just trying to illustrate that the binary approach isn't actually giving us what we might think.

I agree it's likely to be top-heavy; isn't everything on the Web a Ziph curve, after all?

WRT developer ergonomics - I don't know that accessing several single header values is more onerous than parsing one complex one -- especially if both approaches use structured headers -- but of course YMMV.

From a compression efficiency perspective, the effect I illustrate will become more pronounced when you add more fields; it'd be interesting to see what Chrome's numbers are once you add mode. And of course if there are more fields that might be added later, that changes things too.

@arturjanc

This comment has been minimized.

arturjanc commented Nov 19, 2018

For mode specifically, it's quite likely to have a lot of overlap with the other values, e.g. a destination of document or nested-document should always send mode=navigate, whereas subresource loads with a non-empty destination will have a mode of either no-cors or same-origin. Because of this, I'd expect it to result in only a small multiplier on the number of permutations we see; but I recognize that this would not be very future-proof, especially if we decided to expose other properties of the Request object (which AFAIK we don't, but who knows...)

Given that there don't seem to be extremely strong opinions for either approach -- except for the universal dislike of the bitfield -- my guess is that we should let @mikewest make the call and then blame everything on him once things inevitably blow up ;-)

@mnot

This comment has been minimized.

Member

mnot commented Nov 19, 2018

Now that's a plan I can get behind!

@annevk

This comment has been minimized.

Member

annevk commented Nov 19, 2018

FWIW, I like the separate header approach as each header would contain a simple token which is extremely easy to code against. Given that even browsers don't interoperate on header parsers for anything more complex than a token (and I've found even differences in tokens, due to whitespace differences), it seems better to err on the side of simplicity for server operators.

@lknik lknik self-assigned this Nov 19, 2018

@devd

This comment has been minimized.

devd commented Nov 22, 2018

Hi!

Quick note on @torgo 's comment as an additional vote:

re http header bloat, I defer to all the http experts on this thread :). I do have a strong preference for human readable formats and simplicity; but if we are going down the route of single header, lets not make people write a new parser in every language they use. I would rather do JSON.

re if this is only for industrial scale web parties: I believe this header will be useful/important to everyone. Whether or not they adopt it is a question of how much they invest in security and what other priorities they have (no point protecting against this if you have an XSS vuln everyday). For example, Dropbox would love to adopt this. While we are reasonably popular, we aren't as popular as Google :)

In terms of comparison to previous web standards, I suspect this will be a lot more easier for security teams to adopt than CSP. Additionally, I will note that this header isn't just about "defense in depth". There is a whole class of side channel attacks, demonstrated many a time in previous research, that are impossible to prevent right now on the web platform. This header will at least make it possible to defend against these attacks. If 2018 has taught us something, it is better that we start protecting against side channel attacks before they become trivial :)

Finally, I believe one use case for this would also be protecting internal webapps from attacks. While these apps won't show up on any popularity contests, there are pretty sensitive apps and impact of protecting them is huge.

@torgo

This comment has been minimized.

Member

torgo commented Nov 28, 2018

Discussed on call 28-Nov. We agreed to close this based on the feedback provided. Also noted @mnot's blog post which is related to the http header "bloat" issue. Thanks all.

@torgo torgo closed this Nov 28, 2018

@annevk annevk referenced this issue Nov 28, 2018

Closed

Separate headers #6

@mikewest

This comment has been minimized.

mikewest commented Dec 3, 2018

Split the header into a million little pieces based on this conversation: https://mikewest.github.io/sec-metadata/. Thanks, all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment