-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework content integrity section based on @jyasskin's feedback. #1484
Conversation
index.html
Outdated
<tr> | ||
<td>`digestMultibase`</td> | ||
<td> | ||
A cryptographic digest, as defined in [[[VC-DATA-INTEGRITY]]]. | ||
</td> | ||
</tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pulls the use of digestMultibase into the VCDM specification, which I object to, per previous discussions. This is particularly unnecessary, when we already have a perfectly good W3C digest standard to use - digestSRI. Choosing one digest standard would increase interoperability.
Please delete the lines adding the use of digestMultibase to the specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't have an opinion on whether to include multibase in the core VC data model, I agree that it's more than an editorial change and not necessary to fix the issue I commented on in #1285.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete the lines adding the use of digestMultibase to the specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither approving nor denying, noting that I believe it's necessary we find consensus on whether we include references to multibase before going forward with this PR.
index.html
Outdated
<tr> | ||
<td>`digestMultibase`</td> | ||
<td> | ||
A cryptographic digest, as defined in [[[VC-DATA-INTEGRITY]]]. | ||
</td> | ||
</tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't have an opinion on whether to include multibase in the core VC data model, I agree that it's more than an editorial change and not necessary to fix the issue I commented on in #1285.
index.html
Outdated
The value of the `relatedResource` property MUST be associated with one or | ||
more objects of the following form: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've gotten lost in the RDF/JSON-LD terminology here. As I understand it, the relatedResource
property appears in a claim that has a subject and a value. In this document, most uses of "X is associated with Y" appear to mean that there's a claim in which Y is the property and X is the value, although I think I see some where Y is the subject instead.
If we interpret "associated with" to just mean "appears in the same claim as", then this sentence means that there's one claim (X, relatedResource
, Y), and then there must be other claims of the form (Y, P, Z) where Z is an object with id
, mediaType
, etc. fields. Do you actually just mean that Y must be an object with those properties?
We might have another editorial issue that "associated with" should generally be replaced with better-defined terms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, "associated with" in this case is meant to mean: "The value of the relatedResource
property MUST be one or more objects of the following form". IOW, I think we can just delete the word "associated with" here and be fine.
I deleted "associated with" in commit 6d5ab3f.
in Section [[[#identifiers]]]. The value MUST be unique among the list of | ||
related resource objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this uniqueness just fall out of how JSON-LD works? Any two objects in a JSON-LD serialization with the same ID are just the same RDF object, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this uniqueness just fall out of how JSON-LD works?
Yes.
Any two objects in a JSON-LD serialization with the same ID are just the same RDF object, right?
Correct.
However, there are people in the WG that don't want to do "full JSON-LD processing" and this language helps them avoid writing code that merges duplicates in the list.
We advise against going "full RDF" and creating multiple nodes with the same identifier in VCs; people that do that shouldn't expect to interoperate as well as if they would just avoid doing that.
I agree that the benefit of this language is debatable.
index.html
Outdated
If a `mediaType` is listed, implementations that retrieve the resource | ||
using [[[?RFC9110]]] SHOULD: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What resource? I think it's the ID of Y in the (X, relatedResource
, Y) claim, but it could also be X's ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified in b0896f2.
index.html
Outdated
<li> | ||
match the retrieved content media type, such as via the `Content-Type` HTTP | ||
Header. | ||
use the media type in the `Content-Type` HTTP Header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the requester doesn't set the Content-Type
header, do you mean "reject the response if the response includes a Content-Type
HTTP header with a different media type"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed in ac98e17.
constructed using the method specified in | ||
<a href="https://www.w3.org/TR/SRI/#integrity-metadata">Subresource Integrity</a>. | ||
To extend integrity protection to a related resource, an [=issuer=] of a | ||
[=verifiable credential=] MAY include the `relatedResource` property: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any constraints on the allowable subject of the relatedResource
property? If the constraints are based on the id
of the value of that property, I think the current text requires implementations to recursively scan everything the issuer could be said to have "included" in order to find all uses of the property. That'd be easy for an RDF processor but probably not a JSON processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any constraints on the allowable subject of the relatedResource property?
No, there are no constraints. Any URL would do.
That'd be easy for an RDF processor but probably not a JSON processor.
Hmm, I don't understand why? Why wouldn't a "JSON processor" just look for "id" values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I think you have here is serialized as
{ "id": "vc://id",
"type": "VerifiableCredential",
"relatedResource": {
"id": "the://resource",
"mediaType": "media/type",
"digestSRI": "sha384-hash"
}
}
In RDF, that's
<the://resource> <https://schema.org/encodingFormat> "media/type" .
<the://resource> <https://www.w3.org/2018/credentials#digestSRI> "sha384-hash" .
<vc://id> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <vc://VerifiableCredential> .
<vc://id> <https://www.w3.org/ns/credentials/issuer-dependent#relatedResource> <the://resource> .
But I don't see any constraint in the text that the "relatedResource" property has to map from the VC to the constrained resource. The issuer could as easily "include the relatedResource property" as:
{ "id": "vc://id",
"type": "VerifiableCredential",
"foo": { "bar": { "baz": { "quux": {
"relatedResource": {
"id": "the://resource",
"mediaType": "media/type",
"digestSRI": "sha384-hash"
}
}}}}
}
A JSON processor then has to recursively scan all the objects to look for relatedResource
properties. RDF APIs seem to have native queries for "find all the triples with a property of X", but that's not a natural operation on JSON.
index.html
Outdated
property MAY be annotated with integrity information as specified in this | ||
section by inclusion of `digestSRI` | ||
in the object. | ||
section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section doesn't directly say how to annotate an object that contains an id
property. In particular, do you make it the value of a relatedResource
property or the subject? Normally I'd "annotate" something by making it the subject of a claim, but I think relatedResource
is the opposite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in f72d137.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this seems to say that relatedResource
isn't an essential part of constraining resource contents. Processors actually have to look for any object that has a digestSRI
field, and constrain fetches of that object's id
. If that's the case, it seems like this section should start by discussing the mediaType
and digestSRI
properties ("claims"?) that can be added to any object, and then mention relatedResource
as a way to constrain URLs that don't otherwise appear as objects in the VC. This also deserves some tricky cases in the test suite, to make sure implementations don't miss the implications.
index.html
Outdated
<td>`digestSRI`</td> | ||
<td> | ||
A cryptographic digest, as defined in [[[SRI]]]. | ||
</td> | ||
</tr> | ||
<tr> | ||
<td>`digestMultibase`</td> | ||
<td> | ||
A cryptographic digest, as defined in [[[VC-DATA-INTEGRITY]]]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can an object have more than one of each kind of digest specified? If you write "a", I bet JSON-based processors will forget to look for arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Fixed in eed1251.
specification MUST produce a validation error unless the resource matches the | ||
expected media type and cryptographic digest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this constraint only for objects that are the value of a relatedResource
claim, or any subject of the mediaType
or digestSRI
claims?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its for any subject of the digestSRI
or digestMultibase
claims. The previous sentence specifies that expectation.
index.html
Outdated
<p class="issue" title="Unification of cryptographic hash expression formats are under discussion"> | ||
The Working Group is currently attempting to determine whether cryptographic hash | ||
expression formats can be unified across all of the VCWG core specifications. | ||
Candidates for this mechanism include `digestSRI` and `digestMultibase`. There | ||
are arguments for and against unification that the WG is currently debating. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there should be a numbered issue associated with this "Unification of cryptographic hash expression formats" discussion, where the "arguments for and against unification" (and description of what is meant by "unification", here) are (or can be) clearly enumerated, including those for and against inclusion of digestMultibase
and/or digestSRI
, also including possibly removing both (since including either one alone can be read as picking a winner where we perhaps should not), and thus debated more clearly....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue #1489 has been raised to track the disagreement in this PR, and previous PRs related to digest mechanism, and that issue number is now cited from the specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the related resource additions are not pure editorial changes (as requested) but are technical additions so they should be in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires implementers that want to verify the integrity of related resources to handle both SRI and multibase. There is already enough optionality in VCs. If the WG can't come to consensus on SRI vs multibase it should decide on one hash algorithm and use it (ie sha256).
"digest": "c2ac819c487c5fb6380a9c5d97c33ed28d540e881ecf067d0f692ebbc356326e"
The issue was discussed in a meeting on 2024-05-08
View the transcript2.1. Rework content integrity section based on @jyasskin's feedback. (pr vc-data-model#1484)See github pull request vc-data-model#1484. Manu Sporny: The content integrity section has a number of objections on it. I presume we'll talk about those at some point. The rest of the specs are doing their normal thing, that's it from me.
See github pull request vc-data-model#1484. Brent Zundel: We have some time to look at the content integrity PR. |
1 similar comment
The issue was discussed in a meeting on 2024-05-08
View the transcript2.1. Rework content integrity section based on @jyasskin's feedback. (pr vc-data-model#1484)See github pull request vc-data-model#1484. Manu Sporny: The content integrity section has a number of objections on it. I presume we'll talk about those at some point. The rest of the specs are doing their normal thing, that's it from me.
See github pull request vc-data-model#1484. Brent Zundel: We have some time to look at the content integrity PR. |
The issue was discussed in a meeting on 2024-05-08
View the transcript2.3. Rework content integrity section based on @jyasskin's feedback. (pr vc-data-model#1484)See github pull request vc-data-model#1484. Brent Zundel: This PR has had a number of comments on it, number of requests for changes, Manu could you talk us through it? Manu Sporny: This PR started out as a request from Jeffrey Yaskin to rewrite the content integrity section so that it matched the pattern of the other sections. This section was originally written by Mike Prorock, it was the first attempt at it and it went through a lot of churn and needed clean up. Michael Jones: Obviously, I'm the one who had made the point that we shouldn't be normatively adding multibase to the VCDM spec since we have been careful to not include it before. I understand that there is an issue comment, but I'm not willing to elevate multibase beyond that issue comment. David Chadwick: This new text should have removed the note, suggest removing that bit, rework editorial, and discuss later. Manu Sporny: Yes, I think that the new text, actually, if you look at it carefully, the changes have been made, it's clearly addressing Jeffrey's comments. I think the new text on multibase isn't replacing anything. We should have struck out the note. Then we can put that into a new PR and just work on that PR alone. Brent Zundel: ok, so smaller more controversial bit to its own PR. Dmitri Zagidulin: I'm curious, with regards to Manu Sporny: Sure. I think there's a misunderstanding about what's in the spec today. Today we normatively reference DI and VC-JOSE-COSE -- because this group decided that because we're publishing those we will normatively reference those from VCDM. As a result, we can pull in those normative things from those specs. If the concern is not defining them in VCDM, that's not happening. Brent Zundel: Just to summarize, currently the VCDM normatively references Manu Sporny: No, it references SRI, which is a web browser thing. We use a tiny piece of SRI but we can't just do that. We have to define what the value is in the Brent Zundel: That was helpful. Michael Jones: As Dmitri points out, the problem with digestMultibase is its dependent upon multibase, where inexplicably it defines something like 26 encodings for the same data, which is a travesty for interop.
Dave Longley: 2nd Dmitri's proposal to resolve this, to say that we only support the base64url prefix, which is the multibase prefix for base64url, nopad encoded, if we do that, then that gets us past the objection, then we get data savings, and so on. Brent Zundel: proposal on the table is we specify which base and do that in this section of VCDM. Mike, would that address your objections? Dmitri Zagidulin: I like Dave's suggestion, but if we're fixing that we're just fixing base64url, and drop the 'u' and save ourselves an extra character. Michael Jones: Well, Dmitri's suggestion is rational, threw me off. In response to brent's suggestion, I'm unwilling to define any form of digestMultibase in our core spec, I understand the irony, which defines this in controller document, and I do appreciate we got to compromise to DI where we don't normatively ref 26 choices, but we decide w/ what Dmitri said, what choices we make, I do not know whether a form of digestMultibase is defined in controller document, if we're going to define that w/ base64url encoding I'd do it there to keep everything together. The other question, I do support defining the fields necessary to use digestSRI in our spec in the Content Integrity section. Finally, I haven't read Jeffrey's comments, read them w/ eye, is he asking for Multibase to be specified. or is he asking unrelated editorial things? Dave Longley: One of the advantages of using multibase is it introduces layers and a separation of concerns, so you can compress any multibase data regardless of whichever data is done, if you want to compress using CBOR-LD, it can do that sort of compression, or use tech that is not a VC, different choices on encoding, here in our group we can say base64url, no one has to go rewrite new tech, type is multibase, no new tech needs to be written to that particular field, that other tech continues to work, we get benefit that there is interop around that prefix, we get best of both worlds if we make that choice.
Dmitri Zagidulin: Dave's point about CBOR-LD processors makes a lot of sense, since signal processors are working off of, RDF type multibase, why dont' we define RDF type for base64url, same logic, same compression can be invoked?
Dave Longley: What we're suggesting now is a new RDF type for every encoding type, or we can re-use work out there, already implementations using that... if the world had decided to take a different approach, and communities were using RDF, then that might have worked, but there are separations of concerns, trying to re-use technologies, if we re-invent yet another way to do it, we're less interoperable. Michael Jones: I guess I disagree with the characterization, it's not inventing something, it's something that's been around for a decade and a half. If we need context and vocabulary term, let's create it.
Michael Jones: To Dave's point, that's what people have already used, that people chose to not make a choice doesn't mean we should promulgate that or promote that. Dave Longley: First to clarify, I didn't mean to imply that base64url encoding , the alphabet is what's being invented. We'd be inventing a new RDF type for a single base-encoding. Regarding failure to make a choice, depends on whether or not people feel things are equivalent. Community that built Multibase had their reasons. JWK allows different keys to be parameterized instead of saying one key format. We tried to re-use that work, and we've successfully been interoping with that technology. I would prefer us to not invent a new thing and cut it there because it seems like picking a prefix of U and putting it on there solves all the problems doesn't solve... we're building on other communities to greatest extend that we can profile those down, we should do that, instead of inventing something new. Manu Sporny: There are a number of misconceptions around what multibase is including in that post that Mike wrote. Brent Zundel: It sounds like, maybe overly optimistic, it sounds like we're dancing around a middle ground that not everyone likes but we can live with, use base64url encoding, that's the choice we're making for multibase option for content integrity section.
Michael Jones: Mainly going to respond to Dave's comment, defining base64url RDF type would be invention, perhaps, but it's not a very big invention, it's just tagging what this field uses. Creating the tag is perhaps an invention, but that's a good thing, it let's people know what choice was made.
Brent Zundel: please chime in in the PR, let's explore that middle ground. |
692947e
to
0480bf4
Compare
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
index.html
Outdated
<tr> | ||
<td>`digestMultibase`</td> | ||
<td> | ||
One or more cryptographic digests, as defined in [[[VC-DATA-INTEGRITY]]]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use a direct reference to the multibase specification in the DI spec, that is §2.4:
One or more cryptographic digests, as defined in [[[VC-DATA-INTEGRITY]]]. | |
One or more cryptographic digests, as defined in <a data-cite="VC-DATA-INTEGRITY#multibase-0">VC Data Integrity</a> specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the digestMultibase
entry due to @selfissued's objection. We'll pick that item up in another PR.
index.html
Outdated
<tr> | ||
<td>`digestSRI`</td> | ||
<td> | ||
One or more cryptographic digests, as defined in [[[SRI]]]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reference to the SRI specification may be a bit problematic; that specification describes a general framework for HTML, and we use only one part of it, namely the integrity attribute. To be even more precise, we use the BNF definition therein.
Note that we have, in our spec, a precise definition for an SRI string definition which does that precise reference. Wouldn't it be better to refer directly to that section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 4e7d4b3.
@selfissued wrote:
Removed in 869ebeb with issue #1489 raised to track the disagreement in this PR (and previous PRs related to digest mechanism). |
Editorial, multiple reviews, changes requested and made, objections moved to issue #1489, no remaining objections for this PR, merging. |
This PR is an attempt to partially address issue #1348 by reworking the content integrity section in the specification based on feedback from @jyasskin.
NOTE: It contains content that was objected to by @selfissued in PR #1464 and is, thus, not mergeable in its current state. This PR will be used to drive the discussion and come to consensus on what can be merged by consensus of the group.
Specifically:
5.4 Integrity of Related Resources
Preview | Diff