Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The algorithm for parsing metadata should be described in more detail #84

Open
baek9 opened this issue Dec 10, 2019 · 12 comments
Open

The algorithm for parsing metadata should be described in more detail #84

baek9 opened this issue Dec 10, 2019 · 12 comments

Comments

@baek9
Copy link
Contributor

baek9 commented Dec 10, 2019

3.4.3. Parse metadata describes an algorithm for integrity metadata parsing. We need to describe it in detail for every possible situation. This will unify the behavior of browsers implementing SRI. Chrome and Firefox, which currently implement SRI, have different behaviors when parsing integrity metadata.

For example, after parsing metadata that contains non-base 64 digest, Firefox returns it by the step 4 of the algorithm since the digest is not empty. Chrome, on the other hand, returns "no metadata" since digest is not valid. See more details in here.

This is just one example. So, this issue suggests that 3.4.3 Parse metadata should be described in more detail in order to prevent unknown cases.

@annevk
Copy link
Member

annevk commented Dec 10, 2019

Yeah, the validity/parsing step should be replaced by an algorithm that parses the string using primitives from https://infra.spec.whatwg.org/#strings. (Whenever we use ABNF this is the result. So we should stop doing that for normative requirements.)

It's also weird how that ABNF references CSP2, but only CSP1 is in the normative references. Furthermore, parsing of the base64 value isn't defined. Is this https://infra.spec.whatwg.org/#forgiving-base64-decode or another?

@baek9
Copy link
Contributor Author

baek9 commented Dec 14, 2019

As @annevk say CSP2 is not in the normative reference even if it is referenced by 3.4.6 The integrity attribute. And @annevk also said :

Furthermore, parsing of the base64 value isn't defined. Is this
https://infra.spec.whatwg.org/#forgiving-base64-decode or another?

In fact, there is no need to decode the base64 encoded digest contained in the metadata. In the metadata parsing algorithm corresponding to 3.4.3 Parse metadata, It seems to mean checking only that the digest is in base64 according to the following ABNF grammar from CSP L2, section 4.2.

base64-value = 1*( ALPHA / DIGIT / "+" / "/" )*2( "=" ) <from [CSP L2, section 4.2]>
hash-expression = hash-algo "-" base64-value

In the integrity checking algorithm corresponding to 3.4.5 Does response match metadataList? and 3.4.1 Apply algorithm to response, We should return false when the digest and the base64 encoded value of the content's hash do not match. Also, 3.4.1 references base64 encoding algorithm from the RFC document, Section 4 of RFC 4648. And this is in the normative reference.

@annevk
Copy link
Member

annevk commented Dec 14, 2019

Ah okay, so you encode the hash using base64 and then compare it with the base64 value given here. We should have tests for the edge cases in that case, e.g., incorrect number of equal signs at the end.

@domfarolino
Copy link
Member

So it seems like we should just define hash-expression as hash-algo "-" value, instead of hash-algo "-" base64value. That way, the parsing of non-base64 developer-supplied metadata will still populate the metadata set, and will inevitably fail to match the actual base64-encoded hashes. Does that sound right?

@baek9
Copy link
Contributor Author

baek9 commented Dec 20, 2019

That approach seems to eliminate the confusion that can occur in implementing SRI without modifying the algorithm. It will also fix the bug that @domfarolino originally raised. However, the spec defines digest as "the base64-encoded result of executing a cryptographic hash function on an arbitrary block of data". Does the definition need to be changed? If so, the original author's opinion is also needed. Thank you.

@domfarolino
Copy link
Member

That makes sense, someone else's opinion would indeed be good to get /cc @mikewest.

@yutakahirano
Copy link

@domfarolino's idea sounds good (though I'm not an expert either).

We may want to have a step which allows us to reject the request before making a network fetch when the value is not a valid base64-encoded string.

@annevk
Copy link
Member

annevk commented Jan 4, 2020

I think the current setup should be replaced by a proper processing model along the lines of my earlier comment. Tweaking the ABNF doesn't seem sufficient.

@domfarolino
Copy link
Member

@baek9 Do you have any intentions on getting back to this PR, by any chance?

@baek9
Copy link
Contributor Author

baek9 commented Nov 25, 2021

@domfarolino As it is now, I am not familiar with the w3c spec, so I thought someone better could have addressed this issue. I'm submitting a new commit(I'm not sure if the commit are correct) for this issue that hasn't been done yet. To go further, I'd like to know if that commit is sufficient to address the feedback raised. The feedback I have in mind is:

  1. No use ABNF grammar by @annevk
    Previously, metadata had to be parsed according to the ABNF grammar. However, the base64-value doesn't actually need to be in base64 format, as the name suggests. I think this is the root cause of this issue.

  2. Fail-open for forward-compatibility by @mikewest
    User agent will load a script with integrity="gibberish" by 3.8 of the new algorithm, for instance, which seems reasonable behavior to keep, since it gives us grammatical flexibility in the future.

  3. option-expression
    The specification says that option are also included in the metadata.

@baek9
Copy link
Contributor Author

baek9 commented Jan 11, 2022

Yeah, the validity/parsing step should be replaced by an algorithm that parses the string using primitives from https://infra.spec.whatwg.org/#strings. (Whenever we use ABNF this is the result. So we should stop doing that for normative requirements.)

It's also weird how that ABNF references CSP2, but only CSP1 is in the normative references. Furthermore, parsing of the base64 value isn't defined. Is this https://infra.spec.whatwg.org/#forgiving-base64-decode or another?

@annevk [CSP1] appears to have been removed from 68fd65e. Also, it seems that [CSP L3 2.3.1 Source Lists] can be used instead of [CSP L2 4.2 Source List Syntax]. Do we need to add [CSP L3 2.3.1 Source Lists] as a normative reference for 3.1 Integrity metadata and 3.5 The integrity attribute?

@annevk
Copy link
Member

annevk commented Jan 12, 2022

Yeah, referencing https://w3c.github.io/webappsec-csp/#grammardef-hash-source directly makes sense to me for both those sections. We can probably also simplify some of the prose to no longer talk about options. No need for that until it's actually a thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants