New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IDs Not Differentiated Sufficiently #120
Comments
Great point! Would love to see this solved. Here and in DID spec. |
'id' is a part of the underlying linked data model, and is used to uniquely identify the object that is being described. Most developers that are familiar with JSON-LD are used to this pattern where "id" identifies the thing that you're attaching attributes to. Usually, we depend on "type" to determine the type of thing identified by "id"... so "type": "VerifiableProfile", "type": "VerifiableCredential", "type": "RevocationList2017"... that sort of thing. That said, even the type is optional as most of the data structures depend on duck typing. We've found that developers find it inconvenient to repeat "VerifiableCredential" all over the place when the data itself makes it clear what is being described. There is a JSON-LD feature called "aliasing". In fact, this is where "id" comes from... "id" is an alias for "@id". We could alias "subjectID" to "@id", "holderID" to "@id", and so on... but developers will ultimately skip this and just use "id" as they have in the past. At that point, we'll have a choice on how stringent we will want to be with the validating schemas, and again, given history, the developers will insist that we allow both, which then makes understanding the data model even more complex because developers will be allowed to use both "id" and "subjectID" in the same location. It's for these reasons that I don't think this is a good idea. It feels like complexity that is being added because of a fundamental misunderstanding of the underlying data model. Once a developer understands the data model, they will see these additions as unnecessary. I do note that many developers may start out where @David-Chadwick did, and that is certainly a concern. |
I may have misunderstood this comment:
Could you provide an example where this isn't the case? Are you talking about where IDs are merely references (e.g. they appear as strings) rather than full JSON objects? AFAIK, we're following the JSON-LD/RDF model properly and every JSON object is a node in a graph (with an open world assumption) that is uniquely identified by the So, looking at the most common example, inside of "claim" the object represents a subject, so its
represented as JSON: {
"id": "<credential id>",
"issuer": "<issuer id>",
"claim": {
"id": "<subject id>",
"name": "Alice"
}
} ...mean that there is a credential, identified by If you wanted to say more about the issuer you could have done: {
"id": "<credential id>",
"issuer": {
"id": "<issuer id>",
"name": "Bob"
},
"claim": {
"id": "<subject id>",
"name": "Alice"
}
} If we're not following this model somewhere in the spec then I believe it's a mistake that should be corrected. I don't think we need to introduce more terms. |
Well @dlongley you have put your finger on it your example above |
What is being claimed is: "There is a subject with identifier X and attribute Y". You could more verbosely represent that there's a graph hiding inside of claim like this: {
"claim": {
"@graph": {
"id": "<subject id>",
"name": "Alice"
}
}
} That syntax is actually equivalent (in the data model) to omitting So it is hidden by syntactic sugar, but, for precision with the data model, it's hiding out there. If you wanted an identifier for the claim object you'd put it right in there at the same level as |
or you presumably could state "subject": { To my mind this would be better, as the credential is a statement about the subject by the issuer. We dont need the claim baggage do we |
Well, that approach would actually mean that the predicate "subject" would point at a graph without an identifier (like |
Sorry my post got corrupted during the transfer. What I typed was "subject": { and the transmitted version omitted it entirely. I should have checked in Preview first :-) |
Hm. Can I chime in and support the concerns voiced here?
"claim": {
"id": "<someid>",
"name": "Alice"
}
I am tempted to read the json above as "claim has id <someid>", and
the fact that the id actually belongs to the subject is not so clear.
the data model even more complex because developers will be allowed to use both "id" and "subjectID" in the same location
It would be nice if json-ld allowed to do something like unalias "@id"
in the scope of claim.
…On 26 February 2018 at 20:05, Dave Longley ***@***.***> wrote:
@David-Chadwick <https://github.com/david-chadwick>,
Well, that approach would actually mean that the predicate "subject" would
point at a graph without an identifier (like claim does now) and what we
refer to currently as "subject" would be a node inside of that graph. That
may lead to more confusion, not less. It might help with a naive mental
digestion of the syntax but, honestly, it seems incorrect from a data
modeling perspective -- and changes the meaning of "subject", IMO. But if
the group feels like that's a step in the right direction then it's a
consideration.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#120 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA8hmXeksRTxRujKBxr6odrmX2RPaqc1ks5tYuROgaJpZM4STCmb>
.
|
I don't think it changes my response. I do think "claim" has always been the most awkward part of the syntax. |
Dave, after your lengthy explanations, it is now my opinion that "claim" should become "subject". I can see no reason for treating issuer and subject differently in either the data model or the JSON syntax. They are both objects in the same RDF graph. After all, subjects become issuers when they delegate their credentials. |
I'm not against such a change, but I want to point out this change, IMO, would be done to support the priority of a naive (and intentionally less flexible) reading of the syntax, not to support a more "correct" (or at least consistent) data model. In your proposal to replace "claim" with "subject", using the predicate "subject" would link a credential to an isolated graph of statements in the data model. Those statements would include in them, a node that is the "subject" as we've used the term previously. And we can make a naive reading of the syntax "look like" we're just saying "here's the subject of this credential". Using the term "claim" to link a credential to a set of statements "claimed" by an issuer was the original impetus behind that particular predicate choice. It was also stated, in the definition of a credential that the claim was (typically) about a single subject. I understand that many now want it to be more narrowly targeted at a particular subject -- and that a simple reading of the current syntax makes the subject's location in the credential less than obvious. So the argument here is to make the subject relationship more explicit and narrow in nature within the syntax, i.e. a credential does not merely set out a claim containing a set of statements about one or more subjects, but rather, it "is about a very particular subject". That's the main difference with what's being proposed here, IMO. |
@dlongley
|
If you add three backticks before and after your example, it will preserve your spacing:
|
But a credential does NOT contain a set of claims about ONE subject. That section is in error. The RDF-based data model allows--in fact depends on--multiple statements about, potentially, multiple subjects (in the RDF sense of subject, predicate, object). The idea that VCs might be thought of as about a single subject is the heart of the confusion. For some claims it may make sense for the issuer to assert that there is a single subject (in this vague meaning), but it actually seems to make things harder. This was the opening point in my other post. I think the stricter notion of subject helps us clarify things. The profile can make statements about specific subjects in specific credentials without the issuer specifying a "single subject". |
@jandrieu |
Hmmmm... I had thought multiple subjects were already supported. It is innate in the RDF graph model. Every statement has a subject. Each "claim" is in fact a RDF graph, which you can see in Example 16:
That is, every property in "claim", other than "id", is a predicate whose subject is the value of "id" and whose object is the value of the property. But all of those have the same subject. If I'm following this correctly, then "claim" should probably be "claims" and should be an array. Because an individual "claim" with its singular "id" literally can only have one subject. I think I'm beginning to see @David-Chadwick's point. |
Yes. |
Sorry, @dlongley I edited out that question when I updated my comment. I think David's point is on the mark. (But thanks for the clarification.) |
The value of "claim", in the data model, is actually an isolated graph with some set of statements in it -- which matches the concept. The choice of syntax does happen to promote the use of a single subject as "the main subject" or "entry point" for that claim because that's expected to be the most common (and simplest) case. You could also use an array as the value of "claim" and this would make But a quick note about plural key/term names -- they are frowned upon. If you want more than one value, just use an array as stated above and in your example; there's no need to muck about with the term name. |
Agreed about not using the term "name". I don't think there are good examples or language explaining how to rigorously specify multiple subjects in a single credential (that's baked into the JSON-LD data model). We should add that. FWIW, @stonematt and I are working on an update to the use cases document that will include the citizenship sticky wicket above and would be a good example use case for multiple subjects in a credential and multiple credentials in a profile. |
Part of the problem is that JSON does not have a formal schema, unlike XML, and the current specification does not help as it is ambiguous or simply lacking in explanation of many of the properties. Conventionally I would expect that if a property is expected to have a single value it would be written as : "value", whereas if it is expected to have a set of values it would be written as ["value"]. We would not expect to see Consequently I think we need to tighten up the current specification and make it clear where sets/lists are allowed and lists are not allowed. |
I think one of the problems may be the implicit assumption that JSON-LD is fundamental to (understanding) the data model. But I do not think this is or should be the case. Early on it was said that VCs should work without any reliance on JSON-LD, so that VCs formatted in pure JSON would work, as would those with JSON-LD constructs such as Context added to them, without effecting implementations that don't understand JSON-LD. I think the same principles should apply to the data model document as well. JSON is very widely accepted, but I am not sure that JSON-LD is so widely accepted. In order to gain the widest possible acceptance of VCs, the reader of the VC data model document should not be required to have any previous knowledge about or understanding of JSON-LD. The same should go for implementers of VCs. This means that: |
This is no such restriction "if a value can be an array then it MUST be an array" in JSON. JSON values can be anything, they can be mixed and matched, and so on. |
@dlongley |
Why? Why not just have examples with single values and examples with more than one? |
Because I would like the data model document to remove ambiguity. It should state quite categorically which values should be single values (such as ID) and which can be lists. |
I'm in agreement with that. But that some "can be lists" doesn't mean that they "must be". We should show examples both ways. |
Agreed. But we should have some way of telling readers this fact. We can do it per property definition, by stating it is a list of values, or by putting [..] in the examples, or both. But to be silent on the issue is not good enough in my opinion. |
About "subject", JWT already has "sub" and is used to carry claims. Would there be any value in maintaining a consistent naming with JWT on that aspect? |
We still have not resolved this issue. Take Example 4 in the latest version (18 April 2018)
The first "id" is the ID of unknown object as there is no outer type. However, we are told we have to read the following "type" property to find out what type of object this is. It is "Credential". So this is the ID of Credential, of subtype ProofOfAgeCredential. Terms of Use from Example 10 is more inconsistent
This does not have an id parameter, but has a uid parameter instead. What is the difference? And the object type is inconsistent. Is it the outer termsOfUse object type, or the inner type Policy? Proposed Solution. The following proposed solution follows that in Example 18, viz:.
Example 4 would become
Terms of Use from Example 10 would become
|
We now have PRs for fixing the IDs of TermsOfUse and CredentialStatus, but not for Claim Can we please change "claim" to "subject", viz:
|
Migrating to new issue #207. Closing. |
Our data model contains several instances of different IDs, but each ID is simply called "id" and has no qualifying descriptor. We cannot interpret what the ID refers to as the data model does not contain any descriptive text to tell us. We cannot assume it is the ID of the encapsulating JSON data object, as this is not always the case. Furthermore, with the enhancements that are currently being suggested in order to make the data model more precise and unambiguous, more IDs are currently being added.
It is proposed that each occurrence of ID is replaced by a descriptive ID so that it is obvious to the reader (and verifier) to what the ID refers. It would also be helpful to add descriptive text to the data model document to state unambiguously to what each ID refers.
Thus inside a claim (example 1), "id" would become "subjectID".
Inside a profile (no example yet), "id" would become "holderID"
Inside evidence (example 8), "id" would become "evidenceID"
Inside a credential (example 7), "id" would become "credentialID"
Inside credential status (example 4), "id" would become "statusID"
The initial "id" (all examples) would be replaced by either "credentialID" or "profileID" so that the type field is used to further refine what sort of credential or profile this is.
The text was updated successfully, but these errors were encountered: