IDs Not Differentiated Sufficiently #120

David-Chadwick · 2018-02-26T10:57:07Z

Our data model contains several instances of different IDs, but each ID is simply called "id" and has no qualifying descriptor. We cannot interpret what the ID refers to as the data model does not contain any descriptive text to tell us. We cannot assume it is the ID of the encapsulating JSON data object, as this is not always the case. Furthermore, with the enhancements that are currently being suggested in order to make the data model more precise and unambiguous, more IDs are currently being added.

It is proposed that each occurrence of ID is replaced by a descriptive ID so that it is obvious to the reader (and verifier) to what the ID refers. It would also be helpful to add descriptive text to the data model document to state unambiguously to what each ID refers.

Thus inside a claim (example 1), "id" would become "subjectID".
Inside a profile (no example yet), "id" would become "holderID"
Inside evidence (example 8), "id" would become "evidenceID"
Inside a credential (example 7), "id" would become "credentialID"
Inside credential status (example 4), "id" would become "statusID"
The initial "id" (all examples) would be replaced by either "credentialID" or "profileID" so that the type field is used to further refine what sort of credential or profile this is.

Drabiv · 2018-02-26T11:12:39Z

Great point! Would love to see this solved. Here and in DID spec.

msporny · 2018-02-26T13:56:33Z

It is proposed that each occurrence of ID is replaced by a descriptive ID so that it is obvious to the reader (and verifier) to what the ID refers. It would also be helpful to add descriptive text to the data model document to state unambiguously to what each ID refers.

'id' is a part of the underlying linked data model, and is used to uniquely identify the object that is being described. Most developers that are familiar with JSON-LD are used to this pattern where "id" identifies the thing that you're attaching attributes to. Usually, we depend on "type" to determine the type of thing identified by "id"... so "type": "VerifiableProfile", "type": "VerifiableCredential", "type": "RevocationList2017"... that sort of thing. That said, even the type is optional as most of the data structures depend on duck typing. We've found that developers find it inconvenient to repeat "VerifiableCredential" all over the place when the data itself makes it clear what is being described.

There is a JSON-LD feature called "aliasing". In fact, this is where "id" comes from... "id" is an alias for "@id". We could alias "subjectID" to "@id", "holderID" to "@id", and so on... but developers will ultimately skip this and just use "id" as they have in the past. At that point, we'll have a choice on how stringent we will want to be with the validating schemas, and again, given history, the developers will insist that we allow both, which then makes understanding the data model even more complex because developers will be allowed to use both "id" and "subjectID" in the same location.

It's for these reasons that I don't think this is a good idea. It feels like complexity that is being added because of a fundamental misunderstanding of the underlying data model. Once a developer understands the data model, they will see these additions as unnecessary. I do note that many developers may start out where @David-Chadwick did, and that is certainly a concern.

dlongley · 2018-02-26T16:17:43Z

@David-Chadwick,

I may have misunderstood this comment:

We cannot assume it is the ID of the encapsulating JSON data object, as this is not always the case.

Could you provide an example where this isn't the case? Are you talking about where IDs are merely references (e.g. they appear as strings) rather than full JSON objects?

AFAIK, we're following the JSON-LD/RDF model properly and every JSON object is a node in a graph (with an open world assumption) that is uniquely identified by the id property, if present. If an identifier appears as merely a string (no encapsulating JSON object), then it is just syntactic sugar that avoids having to create a JSON object with a single id property (and no other properties). It also allows for referencing other nodes in a document without having to embed all of its properties at every location.

So, looking at the most common example, inside of "claim" the object represents a subject, so its id is the ID of the subject. These statements:

<credential id> <issuer> <issuer id>
<credential id> <claim> <subject id>
<subject id> <name> "Alice"

represented as JSON:

{
  "id": "<credential id>",
  "issuer": "<issuer id>",
  "claim": {
    "id": "<subject id>",
    "name": "Alice"
  }
}

...mean that there is a credential, identified by <credential id> that was issued by an issuer identified by <issuer id> that includes a claim that a subject, identified by <subject id> has the name "Alice". Every id is in its right place and identifies the encapsulating JSON object. There's nothing more to say about the issuer here other than its ID, so it appears as a string. But the credential identifier and the subject identifier appear in encapsulating JSON objects.

If you wanted to say more about the issuer you could have done:

{
  "id": "<credential id>",
  "issuer": {
    "id": "<issuer id>",
    "name": "Bob"
  },
  "claim": {
    "id": "<subject id>",
    "name": "Alice"
  }
}

If we're not following this model somewhere in the spec then I believe it's a mistake that should be corrected. I don't think we need to introduce more terms.

David-Chadwick · 2018-02-26T16:38:47Z

Well @dlongley you have put your finger on it your example above
if "x" : {"id": "the id of x"} is meant to be the model, then "claim" does not fit it, as subject id should actually be the claim id.

dlongley · 2018-02-26T16:46:42Z

@David-Chadwick,

"claim" is best thought of as a predicate, not an object.
The syntax actually further hides that the claim itself is a separate graph of information that has no explicit identifier. Within that graph of information is the subject.

What is being claimed is: "There is a subject with identifier X and attribute Y".

You could more verbosely represent that there's a graph hiding inside of claim like this:

{
  "claim": {
    "@graph": {
      "id": "<subject id>",
      "name": "Alice"
    }
  }
}

That syntax is actually equivalent (in the data model) to omitting @graph. However, the JSON-LD @context assures people that they don't need to see that complexity upfront and we're better off for it.

So it is hidden by syntactic sugar, but, for precision with the data model, it's hiding out there. If you wanted an identifier for the claim object you'd put it right in there at the same level as @graph, but there's no reason to have one -- a claim object is merely existential and needs no reference to it other than the one from the credential via the predicate "claim".

David-Chadwick · 2018-02-26T16:53:10Z

or you presumably could state

"subject": {
"id": "",
"name": "Alice"
}

To my mind this would be better, as the credential is a statement about the subject by the issuer. We dont need the claim baggage do we

dlongley · 2018-02-26T17:05:17Z

@David-Chadwick,

Well, that approach would actually mean that the predicate "subject" would point at a graph without an identifier (like claim does now) and what we refer to currently as "subject" would be a node inside of that graph. That may lead to more confusion, not less. It might help with a naive mental digestion of the syntax but, honestly, it seems incorrect from a data modeling perspective -- and changes the meaning of "subject", IMO. But if the group feels like that's a step in the right direction then it's a consideration. I don't mean that passive aggressively, it could be argued that naive syntax reading is a higher priority than data model purity.

David-Chadwick · 2018-02-26T17:18:44Z

Sorry my post got corrupted during the transfer. What I typed was

"subject": {
"id": " < subject id without the spaces > ",
"name": "Alice"
}

and the transmitted version omitted it entirely. I should have checked in Preview first :-)
Is that better for you?

Fak3 · 2018-02-26T17:31:04Z

Hm. Can I chime in and support the concerns voiced here? "claim": { "id": "<someid>", "name": "Alice" } I am tempted to read the json above as "claim has id <someid>", and the fact that the id actually belongs to the subject is not so clear.

the data model even more complex because developers will be allowed to use both "id" and "subjectID" in the same location

It would be nice if json-ld allowed to do something like unalias "@id" in the scope of claim.

…

On 26 February 2018 at 20:05, Dave Longley ***@***.***> wrote: @David-Chadwick <https://github.com/david-chadwick>, Well, that approach would actually mean that the predicate "subject" would point at a graph without an identifier (like claim does now) and what we refer to currently as "subject" would be a node inside of that graph. That may lead to more confusion, not less. It might help with a naive mental digestion of the syntax but, honestly, it seems incorrect from a data modeling perspective -- and changes the meaning of "subject", IMO. But if the group feels like that's a step in the right direction then it's a consideration. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#120 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8hmXeksRTxRujKBxr6odrmX2RPaqc1ks5tYuROgaJpZM4STCmb> .

dlongley · 2018-02-26T17:33:13Z

@David-Chadwick,

Is that better for you?

I don't think it changes my response. I do think "claim" has always been the most awkward part of the syntax.

dlongley · 2018-02-26T17:34:35Z

@Fak3,

It would be nice if json-ld allowed to do something like unalias "@id"
in the scope of claim.

JSON-LD 1.1 (which we are using) has that feature.

David-Chadwick · 2018-02-26T18:00:02Z

Dave, after your lengthy explanations, it is now my opinion that "claim" should become "subject". I can see no reason for treating issuer and subject differently in either the data model or the JSON syntax. They are both objects in the same RDF graph. After all, subjects become issuers when they delegate their credentials.

dlongley · 2018-02-26T18:22:27Z

@David-Chadwick,

Dave, after your lengthy explanations, it is now my opinion that "claim" should become "subject".

I'm not against such a change, but I want to point out this change, IMO, would be done to support the priority of a naive (and intentionally less flexible) reading of the syntax, not to support a more "correct" (or at least consistent) data model.

In your proposal to replace "claim" with "subject", using the predicate "subject" would link a credential to an isolated graph of statements in the data model. Those statements would include in them, a node that is the "subject" as we've used the term previously. And we can make a naive reading of the syntax "look like" we're just saying "here's the subject of this credential".

Using the term "claim" to link a credential to a set of statements "claimed" by an issuer was the original impetus behind that particular predicate choice. It was also stated, in the definition of a credential that the claim was (typically) about a single subject. I understand that many now want it to be more narrowly targeted at a particular subject -- and that a simple reading of the current syntax makes the subject's location in the credential less than obvious.

So the argument here is to make the subject relationship more explicit and narrow in nature within the syntax, i.e. a credential does not merely set out a claim containing a set of statements about one or more subjects, but rather, it "is about a very particular subject". That's the main difference with what's being proposed here, IMO.

David-Chadwick · 2018-02-26T18:58:00Z

@dlongley
I think you have read too much into the change I am proposing. Since a credential contains a set of claims about ONE subject, according to the definition in Section 2 viz: (credential - A set of one or more claims made by the same entity about a subject) then it would not be a change in the data model to say it "is about a very particular subject" since this conforms to the current definition.
However, if you want to change the definition in Section 2 to say:
credential - A set of one or more claims made by the same entity
claim - An assertion that a subject has one or more properties.
Then you would need to revise the syntax to the following (sorry I cant seem to get the spacing correct)

{
  "id": "< credential id >",
  "issuer": {
    "id": "< issuer id >",
    "name": "Bob"
  },
  "claims": [ 
  {
    "subject": {
          "id": "< subject id >",
          "name": "Alice"
           }, 
   },
  {
    "subject": {
          "id": "< subject id >",
          "name": "Bob"
           }
   } 
]
}

dlongley · 2018-02-26T19:01:57Z

@David-Chadwick,

If you add three backticks before and after your example, it will preserve your spacing:

```
{
  "stuff": "foo"
}
```

jandrieu · 2018-02-26T19:14:25Z

But a credential does NOT contain a set of claims about ONE subject. That section is in error.

The RDF-based data model allows--in fact depends on--multiple statements about, potentially, multiple subjects (in the RDF sense of subject, predicate, object).

The idea that VCs might be thought of as about a single subject is the heart of the confusion. For some claims it may make sense for the issuer to assert that there is a single subject (in this vague meaning), but it actually seems to make things harder. This was the opening point in my other post.

I think the stricter notion of subject helps us clarify things. The profile can make statements about specific subjects in specific credentials without the issuer specifying a "single subject".

David-Chadwick · 2018-02-26T19:31:54Z

@jandrieu
Hi Joe. My above post suggests a change to the credential JSON to make it much clearer that multiple subjects can be supported

jandrieu · 2018-02-26T21:33:25Z

Hmmmm... I had thought multiple subjects were already supported. It is innate in the RDF graph model. Every statement has a subject. Each "claim" is in fact a RDF graph, which you can see in Example 16:

   "claim": {
      "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
      "name": "Alice Bobman",
      "birthDate": "1985-12-14",
      "gender": "female",
      "nationality": {
        "name": "United States"
      }
    }

That is, every property in "claim", other than "id", is a predicate whose subject is the value of "id" and whose object is the value of the property. But all of those have the same subject.

If I'm following this correctly, then "claim" should probably be "claims" and should be an array. Because an individual "claim" with its singular "id" literally can only have one subject.

I think I'm beginning to see @David-Chadwick's point.

dlongley · 2018-02-26T21:37:48Z

@jandrieu,

Can you simply use an array as a value for multiple statements with the same subject & predicate?

Yes.

jandrieu · 2018-02-26T21:40:31Z

Sorry, @dlongley I edited out that question when I updated my comment. I think David's point is on the mark. (But thanks for the clarification.)

dlongley · 2018-02-26T21:44:53Z

@jandrieu,

If I'm following this correctly, then "claim" should probably be "claims" and should be an array. Because an individual "claim" with its singular "id" literally can only have one subject.

The value of "claim", in the data model, is actually an isolated graph with some set of statements in it -- which matches the concept. The choice of syntax does happen to promote the use of a single subject as "the main subject" or "entry point" for that claim because that's expected to be the most common (and simplest) case.

You could also use an array as the value of "claim" and this would make ~~"multiple independent claims" (separate isolated graphs) would each have a different subject as the "main subject" of each claim~~ a single independent claim (separate isolated graph) which would each have multiple subjects, each as a disjoint "main subject". This is possible now with the data model and syntax as is.

But a quick note about plural key/term names -- they are frowned upon. If you want more than one value, just use an array as stated above and in your example; there's no need to muck about with the term name.

jandrieu · 2018-02-26T22:04:28Z

Agreed about not using the term "name". I don't think there are good examples or language explaining how to rigorously specify multiple subjects in a single credential (that's baked into the JSON-LD data model). We should add that.

FWIW, @stonematt and I are working on an update to the use cases document that will include the citizenship sticky wicket above and would be a good example use case for multiple subjects in a credential and multiple credentials in a profile.

David-Chadwick · 2018-02-27T00:03:08Z

Part of the problem is that JSON does not have a formal schema, unlike XML, and the current specification does not help as it is ambiguous or simply lacking in explanation of many of the properties.

Conventionally I would expect that if a property is expected to have a single value it would be written as : "value", whereas if it is expected to have a set of values it would be written as ["value"].

We would not expect to see
"issuer": ["https://dmv.example.gov", "https://example.com/jdoe/"]
although this is allowed according to @dlongley and the JSON syntax. But this does make any sense.

Consequently I think we need to tighten up the current specification and make it clear where sets/lists are allowed and lists are not allowed.

David-Chadwick · 2018-02-28T18:22:36Z

I think one of the problems may be the implicit assumption that JSON-LD is fundamental to (understanding) the data model. But I do not think this is or should be the case.

Early on it was said that VCs should work without any reliance on JSON-LD, so that VCs formatted in pure JSON would work, as would those with JSON-LD constructs such as Context added to them, without effecting implementations that don't understand JSON-LD. I think the same principles should apply to the data model document as well.

JSON is very widely accepted, but I am not sure that JSON-LD is so widely accepted. In order to gain the widest possible acceptance of VCs, the reader of the VC data model document should not be required to have any previous knowledge about or understanding of JSON-LD. The same should go for implementers of VCs.

This means that:
i) any features which are implicitly imported from JSON-LD should be explicitly described in the data model document
ii) it is OK to repeat things in the data model document that are already stated in JSON-LD documents.
iii) the JSON examples should conform to normal JSON (and not to some imported semantics from JSON-LD). So if a value is meant or allowed to be an array then it should start with [ and end with ], otherwise it can only be a simple value.

dlongley · 2018-02-28T18:30:01Z

iii) the JSON examples should conform to normal JSON (and not to some imported semantics from JSON-LD). So if a value is meant or allowed to be an array then it should start with [ and end with ], otherwise it can only be a simple value.

This is no such restriction "if a value can be an array then it MUST be an array" in JSON. JSON values can be anything, they can be mixed and matched, and so on.

David-Chadwick · 2018-02-28T18:44:28Z

@dlongley
you have misinterpreted what I said. To elaborate, If a value in our data model is meant or allowed to be an array then it should start with [ and end with ] in our examples (even if it only contains a single element), and our descriptive text of the value should state it is an array. If the value is only allowed to be simple value, then this should be stated in the description of the name.

dlongley · 2018-02-28T18:53:53Z

@David-Chadwick,

To elaborate, If a value in our data model is meant or allowed to be an array then it should start with [ and end with ] in our examples (even if it only contains a single element), and our descriptive text of the value should state it is an array.

Why? Why not just have examples with single values and examples with more than one?

David-Chadwick · 2018-02-28T19:34:15Z

Because I would like the data model document to remove ambiguity. It should state quite categorically which values should be single values (such as ID) and which can be lists.

dlongley · 2018-02-28T19:52:50Z

Because I would like the data model document to remove ambiguity. It should state quite categorically which values should be single values (such as ID) and which can be lists.

I'm in agreement with that. But that some "can be lists" doesn't mean that they "must be". We should show examples both ways.

David-Chadwick · 2018-02-28T20:51:39Z

Agreed. But we should have some way of telling readers this fact. We can do it per property definition, by stating it is a list of values, or by putting [..] in the examples, or both. But to be silent on the issue is not good enough in my opinion.

davux · 2018-04-20T17:28:02Z

About "subject", JWT already has "sub" and is used to carry claims. Would there be any value in maintaining a consistent naming with JWT on that aspect?

David-Chadwick · 2018-04-24T17:13:21Z

We still have not resolved this issue. Take Example 4 in the latest version (18 April 2018)

Example 4: Usage of status property

{
  "id": "http://dmv.example.gov/credentials/3732",
  "type": ["Credential", "ProofOfAgeCredential"],
  "issuer": "https://dmv.example.gov/issuers/14",
  "issued": "2010-01-01T19:73:24Z",
  "claim": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "ageOver": 21
  },
  "credentialStatus": {
    "id": "https://dmv.example.gov/status/24,
    "type": "CredentialStatusList2017"
  },
  "proof": { ... }
}

The first "id" is the ID of unknown object as there is no outer type. However, we are told we have to read the following "type" property to find out what type of object this is. It is "Credential". So this is the ID of Credential, of subtype ProofOfAgeCredential.
The second "id" has an outer type, and does not have a "type" parameter, but the ID is not the ID of "claim" but is the ID of the subject.
The third "id" has an outer type and a "type" parameter, but these are inconsistent. The outer type is "credentialStatus" and the "type" parameter is "CredentialStatusList2017". So is the ID the ID of "credentialStatus" or "CredentialStatusList2017"?

Terms of Use from Example 10 is more inconsistent

 "termsOfUse": [{
    "type": "Policy",
    "uid": "http://example.com/policies/credential/4",
    "profile": "http://example.com/profiles/credential",
    "prohibition": [{
      "assigner": "https://dmv.example.gov/issuers/14",
      "assignee": "AllVerifiers",
      "target": "http://dmv.example.gov/credentials/3732",
      "action": ["Archival"]
    }]

This does not have an id parameter, but has a uid parameter instead. What is the difference? And the object type is inconsistent. Is it the outer termsOfUse object type, or the inner type Policy?

Proposed Solution. The following proposed solution follows that in Example 18, viz:.

Example 18: A simple verifiable profile

{
  "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
  "credential": [{
    "id": "http://dmv.example.gov/credentials/3732",
    "type": ["Credential", "ProofOfAgeCredential"],
...

Example 4 would become

Example 4: Usage of status property

{
  "id": "http://dmv.example.gov/credentials/3732",
  "type": ["Credential", "ProofOfAgeCredential"],
  "issuer": "https://dmv.example.gov/issuers/14",
  "issued": "2010-01-01T19:73:24Z",
  "subject": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "ageOver": 21
  },
  "credentialStatus": {
    "id": "https://dmv.example.gov/status/24,
    "type":  ["CredentialStatus", "CredentialStatusList2017"]
  },
  "proof": { ... }
}

Terms of Use from Example 10 would become

 "termsOfUse": [{
    "id": "http://example.com/policies/credential/4",
    "type": ["TermsOfUse", "Policy"],
    "profile": "http://example.com/profiles/credential",
    "prohibition": [{
      "assigner": "https://dmv.example.gov/issuers/14",
      "assignee": "AllVerifiers",
      "target": "http://dmv.example.gov/credentials/3732",
      "action": ["Archival"]
    }]

David-Chadwick · 2018-05-19T20:35:20Z

I believe PR #170 and PR #176 resolve this issue

David-Chadwick · 2018-06-30T16:09:20Z

We now have PRs for fixing the IDs of TermsOfUse and CredentialStatus, but not for Claim

Can we please change "claim" to "subject", viz:

{
  "id": "http://dmv.example.gov/credentials/3732",
  "type": ["Credential", "ProofOfAgeCredential"],
  "issuer": "https://dmv.example.gov/issuers/14",
  "issued": "2010-01-01T19:73:24Z",
  "claim": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "ageOver": 21
  },

TO:

{
  "id": "http://dmv.example.gov/credentials/3732",
  "type": ["Credential", "ProofOfAgeCredential"],
  "issuer": "https://dmv.example.gov/issuers/14",
  "issued": "2010-01-01T19:73:24Z",
  "subject": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "ageOver": 21
  },

msporny · 2018-07-24T15:47:56Z

Migrating to new issue #207. Closing.

msporny assigned msporny and David-Chadwick and unassigned msporny Feb 27, 2018

burnburn added the ready for PR This issue is ready for a Pull Request to be created to resolve it label Mar 13, 2018

David-Chadwick mentioned this issue Mar 21, 2018

Profile ID is ambiguous and potentially improper #133

Closed

burnburn added pr exists and removed ready for PR This issue is ready for a Pull Request to be created to resolve it labels Jun 25, 2018

msporny mentioned this issue Jul 24, 2018

Rename "claim" to "subject" #207

Closed

msporny closed this as completed Jul 24, 2018

IDs Not Differentiated Sufficiently #120

IDs Not Differentiated Sufficiently #120

Comments

David-Chadwick commented Feb 26, 2018

Drabiv commented Feb 26, 2018

msporny commented Feb 26, 2018

dlongley commented Feb 26, 2018 • edited

David-Chadwick commented Feb 26, 2018

dlongley commented Feb 26, 2018 • edited

David-Chadwick commented Feb 26, 2018

dlongley commented Feb 26, 2018 • edited

David-Chadwick commented Feb 26, 2018

Fak3 commented Feb 26, 2018 via email

dlongley commented Feb 26, 2018 • edited

dlongley commented Feb 26, 2018

David-Chadwick commented Feb 26, 2018

dlongley commented Feb 26, 2018

David-Chadwick commented Feb 26, 2018 • edited

dlongley commented Feb 26, 2018

jandrieu commented Feb 26, 2018

David-Chadwick commented Feb 26, 2018

jandrieu commented Feb 26, 2018 • edited

dlongley commented Feb 26, 2018

jandrieu commented Feb 26, 2018

dlongley commented Feb 26, 2018 • edited

jandrieu commented Feb 26, 2018

David-Chadwick commented Feb 27, 2018

David-Chadwick commented Feb 28, 2018

dlongley commented Feb 28, 2018

David-Chadwick commented Feb 28, 2018

dlongley commented Feb 28, 2018

David-Chadwick commented Feb 28, 2018

dlongley commented Feb 28, 2018 • edited

David-Chadwick commented Feb 28, 2018

davux commented Apr 20, 2018

David-Chadwick commented Apr 24, 2018

David-Chadwick commented May 19, 2018 • edited by burnburn

David-Chadwick commented Jun 30, 2018

msporny commented Jul 24, 2018

dlongley commented Feb 26, 2018 •

edited

dlongley commented Feb 26, 2018 •

edited

dlongley commented Feb 26, 2018 •

edited

dlongley commented Feb 26, 2018 •

edited

David-Chadwick commented Feb 26, 2018 •

edited

jandrieu commented Feb 26, 2018 •

edited

dlongley commented Feb 26, 2018 •

edited

dlongley commented Feb 28, 2018 •

edited

David-Chadwick commented May 19, 2018 •

edited by burnburn