The algorithm for parsing metadata should be described in more detail #84

baek9 · 2019-12-10T15:20:33Z

3.4.3. Parse metadata describes an algorithm for integrity metadata parsing. We need to describe it in detail for every possible situation. This will unify the behavior of browsers implementing SRI. Chrome and Firefox, which currently implement SRI, have different behaviors when parsing integrity metadata.

For example, after parsing metadata that contains non-base 64 digest, Firefox returns it by the step 4 of the algorithm since the digest is not empty. Chrome, on the other hand, returns "no metadata" since digest is not valid. See more details in here.

This is just one example. So, this issue suggests that 3.4.3 Parse metadata should be described in more detail in order to prevent unknown cases.

annevk · 2019-12-10T15:54:06Z

Yeah, the validity/parsing step should be replaced by an algorithm that parses the string using primitives from https://infra.spec.whatwg.org/#strings. (Whenever we use ABNF this is the result. So we should stop doing that for normative requirements.)

It's also weird how that ABNF references CSP2, but only CSP1 is in the normative references. Furthermore, parsing of the base64 value isn't defined. Is this https://infra.spec.whatwg.org/#forgiving-base64-decode or another?

baek9 · 2019-12-14T03:41:51Z

As @annevk say CSP2 is not in the normative reference even if it is referenced by 3.4.6 The integrity attribute. And @annevk also said :

Furthermore, parsing of the base64 value isn't defined. Is this
https://infra.spec.whatwg.org/#forgiving-base64-decode or another?

In fact, there is no need to decode the base64 encoded digest contained in the metadata. In the metadata parsing algorithm corresponding to 3.4.3 Parse metadata, It seems to mean checking only that the digest is in base64 according to the following ABNF grammar from CSP L2, section 4.2.

base64-value = 1*( ALPHA / DIGIT / "+" / "/" )*2( "=" ) <from [CSP L2, section 4.2]>
hash-expression = hash-algo "-" base64-value

In the integrity checking algorithm corresponding to 3.4.5 Does response match metadataList? and 3.4.1 Apply algorithm to response, We should return false when the digest and the base64 encoded value of the content's hash do not match. Also, 3.4.1 references base64 encoding algorithm from the RFC document, Section 4 of RFC 4648. And this is in the normative reference.

annevk · 2019-12-14T07:50:16Z

Ah okay, so you encode the hash using base64 and then compare it with the base64 value given here. We should have tests for the edge cases in that case, e.g., incorrect number of equal signs at the end.

domfarolino · 2019-12-17T19:24:31Z

So it seems like we should just define hash-expression as hash-algo "-" value, instead of hash-algo "-" base64value. That way, the parsing of non-base64 developer-supplied metadata will still populate the metadata set, and will inevitably fail to match the actual base64-encoded hashes. Does that sound right?

baek9 · 2019-12-20T16:26:11Z

That approach seems to eliminate the confusion that can occur in implementing SRI without modifying the algorithm. It will also fix the bug that @domfarolino originally raised. However, the spec defines digest as "the base64-encoded result of executing a cryptographic hash function on an arbitrary block of data". Does the definition need to be changed? If so, the original author's opinion is also needed. Thank you.

domfarolino · 2019-12-20T16:41:37Z

That makes sense, someone else's opinion would indeed be good to get /cc @mikewest.

yutakahirano · 2019-12-25T10:24:29Z

@domfarolino's idea sounds good (though I'm not an expert either).

We may want to have a step which allows us to reject the request before making a network fetch when the value is not a valid base64-encoded string.

annevk · 2020-01-04T08:19:20Z

I think the current setup should be replaced by a proper processing model along the lines of my earlier comment. Tweaking the ABNF doesn't seem sufficient.

domfarolino · 2021-09-02T04:57:32Z

@baek9 Do you have any intentions on getting back to this PR, by any chance?

baek9 · 2021-11-25T02:28:15Z

@domfarolino As it is now, I am not familiar with the w3c spec, so I thought someone better could have addressed this issue. I'm submitting a new commit(I'm not sure if the commit are correct) for this issue that hasn't been done yet. To go further, I'd like to know if that commit is sufficient to address the feedback raised. The feedback I have in mind is:

No use ABNF grammar by @annevk
Previously, metadata had to be parsed according to the ABNF grammar. However, the base64-value doesn't actually need to be in base64 format, as the name suggests. I think this is the root cause of this issue.
Fail-open for forward-compatibility by @mikewest
User agent will load a script with integrity="gibberish" by 3.8 of the new algorithm, for instance, which seems reasonable behavior to keep, since it gives us grammatical flexibility in the future.
option-expression
The specification says that option are also included in the metadata.

baek9 · 2022-01-11T02:19:55Z

Yeah, the validity/parsing step should be replaced by an algorithm that parses the string using primitives from https://infra.spec.whatwg.org/#strings. (Whenever we use ABNF this is the result. So we should stop doing that for normative requirements.)

It's also weird how that ABNF references CSP2, but only CSP1 is in the normative references. Furthermore, parsing of the base64 value isn't defined. Is this https://infra.spec.whatwg.org/#forgiving-base64-decode or another?

@annevk [CSP1] appears to have been removed from 68fd65e. Also, it seems that [CSP L3 2.3.1 Source Lists] can be used instead of [CSP L2 4.2 Source List Syntax]. Do we need to add [CSP L3 2.3.1 Source Lists] as a normative reference for 3.1 Integrity metadata and 3.5 The integrity attribute?

annevk · 2022-01-12T14:39:43Z

Yeah, referencing https://w3c.github.io/webappsec-csp/#grammardef-hash-source directly makes sense to me for both those sections. We can probably also simplify some of the prose to no longer talk about options. No need for that until it's actually a thing.

jonathanKingston mentioned this issue Jan 3, 2020

Specify hash function naming #89

Closed

abitrolly mentioned this issue Sep 16, 2021

Use : as a separator for hash format #108

Closed

baek9 mentioned this issue Nov 25, 2021

Edit 3.3.2 Parse metadata in more detail #110

Merged

baek9 mentioned this issue Jan 25, 2022

Update references and clarify section 3.5 in the current version of the spec #112

Open

annevk mentioned this issue Feb 15, 2023

Script not blocked when integrity attribute is malformed ? #120

Open

annevk mentioned this issue Apr 21, 2023

base64-related requirements are broken #121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The algorithm for parsing metadata should be described in more detail #84

The algorithm for parsing metadata should be described in more detail #84

baek9 commented Dec 10, 2019

annevk commented Dec 10, 2019

baek9 commented Dec 14, 2019

annevk commented Dec 14, 2019

domfarolino commented Dec 17, 2019

baek9 commented Dec 20, 2019 •

edited

Loading

domfarolino commented Dec 20, 2019

yutakahirano commented Dec 25, 2019

annevk commented Jan 4, 2020

domfarolino commented Sep 2, 2021

baek9 commented Nov 25, 2021 •

edited

Loading

baek9 commented Jan 11, 2022 •

edited

Loading

annevk commented Jan 12, 2022

The algorithm for parsing metadata should be described in more detail #84

The algorithm for parsing metadata should be described in more detail #84

Comments

baek9 commented Dec 10, 2019

annevk commented Dec 10, 2019

baek9 commented Dec 14, 2019

annevk commented Dec 14, 2019

domfarolino commented Dec 17, 2019

baek9 commented Dec 20, 2019 • edited Loading

domfarolino commented Dec 20, 2019

yutakahirano commented Dec 25, 2019

annevk commented Jan 4, 2020

domfarolino commented Sep 2, 2021

baek9 commented Nov 25, 2021 • edited Loading

baek9 commented Jan 11, 2022 • edited Loading

annevk commented Jan 12, 2022

baek9 commented Dec 20, 2019 •

edited

Loading

baek9 commented Nov 25, 2021 •

edited

Loading

baek9 commented Jan 11, 2022 •

edited

Loading