Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDs Not Differentiated Sufficiently #120

Closed
David-Chadwick opened this issue Feb 26, 2018 · 35 comments
Closed

IDs Not Differentiated Sufficiently #120

David-Chadwick opened this issue Feb 26, 2018 · 35 comments
Assignees

Comments

@David-Chadwick
Copy link
Contributor

Our data model contains several instances of different IDs, but each ID is simply called "id" and has no qualifying descriptor. We cannot interpret what the ID refers to as the data model does not contain any descriptive text to tell us. We cannot assume it is the ID of the encapsulating JSON data object, as this is not always the case. Furthermore, with the enhancements that are currently being suggested in order to make the data model more precise and unambiguous, more IDs are currently being added.

It is proposed that each occurrence of ID is replaced by a descriptive ID so that it is obvious to the reader (and verifier) to what the ID refers. It would also be helpful to add descriptive text to the data model document to state unambiguously to what each ID refers.

Thus inside a claim (example 1), "id" would become "subjectID".
Inside a profile (no example yet), "id" would become "holderID"
Inside evidence (example 8), "id" would become "evidenceID"
Inside a credential (example 7), "id" would become "credentialID"
Inside credential status (example 4), "id" would become "statusID"
The initial "id" (all examples) would be replaced by either "credentialID" or "profileID" so that the type field is used to further refine what sort of credential or profile this is.

@Drabiv
Copy link

Drabiv commented Feb 26, 2018

Great point! Would love to see this solved. Here and in DID spec.

@msporny
Copy link
Member

msporny commented Feb 26, 2018

It is proposed that each occurrence of ID is replaced by a descriptive ID so that it is obvious to the reader (and verifier) to what the ID refers. It would also be helpful to add descriptive text to the data model document to state unambiguously to what each ID refers.

'id' is a part of the underlying linked data model, and is used to uniquely identify the object that is being described. Most developers that are familiar with JSON-LD are used to this pattern where "id" identifies the thing that you're attaching attributes to. Usually, we depend on "type" to determine the type of thing identified by "id"... so "type": "VerifiableProfile", "type": "VerifiableCredential", "type": "RevocationList2017"... that sort of thing. That said, even the type is optional as most of the data structures depend on duck typing. We've found that developers find it inconvenient to repeat "VerifiableCredential" all over the place when the data itself makes it clear what is being described.

There is a JSON-LD feature called "aliasing". In fact, this is where "id" comes from... "id" is an alias for "@id". We could alias "subjectID" to "@id", "holderID" to "@id", and so on... but developers will ultimately skip this and just use "id" as they have in the past. At that point, we'll have a choice on how stringent we will want to be with the validating schemas, and again, given history, the developers will insist that we allow both, which then makes understanding the data model even more complex because developers will be allowed to use both "id" and "subjectID" in the same location.

It's for these reasons that I don't think this is a good idea. It feels like complexity that is being added because of a fundamental misunderstanding of the underlying data model. Once a developer understands the data model, they will see these additions as unnecessary. I do note that many developers may start out where @David-Chadwick did, and that is certainly a concern.

@dlongley
Copy link
Contributor

dlongley commented Feb 26, 2018

@David-Chadwick,

I may have misunderstood this comment:

We cannot assume it is the ID of the encapsulating JSON data object, as this is not always the case.

Could you provide an example where this isn't the case? Are you talking about where IDs are merely references (e.g. they appear as strings) rather than full JSON objects?

AFAIK, we're following the JSON-LD/RDF model properly and every JSON object is a node in a graph (with an open world assumption) that is uniquely identified by the id property, if present. If an identifier appears as merely a string (no encapsulating JSON object), then it is just syntactic sugar that avoids having to create a JSON object with a single id property (and no other properties). It also allows for referencing other nodes in a document without having to embed all of its properties at every location.

So, looking at the most common example, inside of "claim" the object represents a subject, so its id is the ID of the subject. These statements:

<credential id> <issuer> <issuer id>
<credential id> <claim> <subject id>
<subject id> <name> "Alice"

represented as JSON:

{
  "id": "<credential id>",
  "issuer": "<issuer id>",
  "claim": {
    "id": "<subject id>",
    "name": "Alice"
  }
}

...mean that there is a credential, identified by <credential id> that was issued by an issuer identified by <issuer id> that includes a claim that a subject, identified by <subject id> has the name "Alice". Every id is in its right place and identifies the encapsulating JSON object. There's nothing more to say about the issuer here other than its ID, so it appears as a string. But the credential identifier and the subject identifier appear in encapsulating JSON objects.

If you wanted to say more about the issuer you could have done:

{
  "id": "<credential id>",
  "issuer": {
    "id": "<issuer id>",
    "name": "Bob"
  },
  "claim": {
    "id": "<subject id>",
    "name": "Alice"
  }
}

If we're not following this model somewhere in the spec then I believe it's a mistake that should be corrected. I don't think we need to introduce more terms.

@David-Chadwick
Copy link
Contributor Author

Well @dlongley you have put your finger on it your example above
if "x" : {"id": "the id of x"} is meant to be the model, then "claim" does not fit it, as subject id should actually be the claim id.

@dlongley
Copy link
Contributor

dlongley commented Feb 26, 2018

@David-Chadwick,

  1. "claim" is best thought of as a predicate, not an object.
  2. The syntax actually further hides that the claim itself is a separate graph of information that has no explicit identifier. Within that graph of information is the subject.

What is being claimed is: "There is a subject with identifier X and attribute Y".

You could more verbosely represent that there's a graph hiding inside of claim like this:

{
  "claim": {
    "@graph": {
      "id": "<subject id>",
      "name": "Alice"
    }
  }
}

That syntax is actually equivalent (in the data model) to omitting @graph. However, the JSON-LD @context assures people that they don't need to see that complexity upfront and we're better off for it.

So it is hidden by syntactic sugar, but, for precision with the data model, it's hiding out there. If you wanted an identifier for the claim object you'd put it right in there at the same level as @graph, but there's no reason to have one -- a claim object is merely existential and needs no reference to it other than the one from the credential via the predicate "claim".

@David-Chadwick
Copy link
Contributor Author

or you presumably could state

"subject": {
"id": "",
"name": "Alice"
}

To my mind this would be better, as the credential is a statement about the subject by the issuer. We dont need the claim baggage do we

@dlongley
Copy link
Contributor

dlongley commented Feb 26, 2018

@David-Chadwick,

Well, that approach would actually mean that the predicate "subject" would point at a graph without an identifier (like claim does now) and what we refer to currently as "subject" would be a node inside of that graph. That may lead to more confusion, not less. It might help with a naive mental digestion of the syntax but, honestly, it seems incorrect from a data modeling perspective -- and changes the meaning of "subject", IMO. But if the group feels like that's a step in the right direction then it's a consideration. I don't mean that passive aggressively, it could be argued that naive syntax reading is a higher priority than data model purity.

@David-Chadwick
Copy link
Contributor Author

Sorry my post got corrupted during the transfer. What I typed was

"subject": {
"id": " < subject id without the spaces > ",
"name": "Alice"
}

and the transmitted version omitted it entirely. I should have checked in Preview first :-)
Is that better for you?

@Fak3
Copy link

Fak3 commented Feb 26, 2018 via email

@dlongley
Copy link
Contributor

dlongley commented Feb 26, 2018

@David-Chadwick,

Is that better for you?

I don't think it changes my response. I do think "claim" has always been the most awkward part of the syntax.

@dlongley
Copy link
Contributor

@Fak3,

It would be nice if json-ld allowed to do something like unalias "@id"
in the scope of claim.

JSON-LD 1.1 (which we are using) has that feature.

@David-Chadwick
Copy link
Contributor Author

Dave, after your lengthy explanations, it is now my opinion that "claim" should become "subject". I can see no reason for treating issuer and subject differently in either the data model or the JSON syntax. They are both objects in the same RDF graph. After all, subjects become issuers when they delegate their credentials.

@dlongley
Copy link
Contributor

@David-Chadwick,

Dave, after your lengthy explanations, it is now my opinion that "claim" should become "subject".

I'm not against such a change, but I want to point out this change, IMO, would be done to support the priority of a naive (and intentionally less flexible) reading of the syntax, not to support a more "correct" (or at least consistent) data model.

In your proposal to replace "claim" with "subject", using the predicate "subject" would link a credential to an isolated graph of statements in the data model. Those statements would include in them, a node that is the "subject" as we've used the term previously. And we can make a naive reading of the syntax "look like" we're just saying "here's the subject of this credential".

Using the term "claim" to link a credential to a set of statements "claimed" by an issuer was the original impetus behind that particular predicate choice. It was also stated, in the definition of a credential that the claim was (typically) about a single subject. I understand that many now want it to be more narrowly targeted at a particular subject -- and that a simple reading of the current syntax makes the subject's location in the credential less than obvious.

So the argument here is to make the subject relationship more explicit and narrow in nature within the syntax, i.e. a credential does not merely set out a claim containing a set of statements about one or more subjects, but rather, it "is about a very particular subject". That's the main difference with what's being proposed here, IMO.

@David-Chadwick
Copy link
Contributor Author

David-Chadwick commented Feb 26, 2018

@dlongley
I think you have read too much into the change I am proposing. Since a credential contains a set of claims about ONE subject, according to the definition in Section 2 viz: (credential - A set of one or more claims made by the same entity about a subject) then it would not be a change in the data model to say it "is about a very particular subject" since this conforms to the current definition.
However, if you want to change the definition in Section 2 to say:
credential - A set of one or more claims made by the same entity
claim - An assertion that a subject has one or more properties.
Then you would need to revise the syntax to the following (sorry I cant seem to get the spacing correct)

{
  "id": "< credential id >",
  "issuer": {
    "id": "< issuer id >",
    "name": "Bob"
  },
  "claims": [ 
  {
    "subject": {
          "id": "< subject id >",
          "name": "Alice"
           }, 
   },
  {
    "subject": {
          "id": "< subject id >",
          "name": "Bob"
           }
   } 
]
}

@dlongley
Copy link
Contributor

@David-Chadwick,

If you add three backticks before and after your example, it will preserve your spacing:

```
{
  "stuff": "foo"
}
```

@jandrieu
Copy link
Contributor

But a credential does NOT contain a set of claims about ONE subject. That section is in error.

The RDF-based data model allows--in fact depends on--multiple statements about, potentially, multiple subjects (in the RDF sense of subject, predicate, object).

The idea that VCs might be thought of as about a single subject is the heart of the confusion. For some claims it may make sense for the issuer to assert that there is a single subject (in this vague meaning), but it actually seems to make things harder. This was the opening point in my other post.

I think the stricter notion of subject helps us clarify things. The profile can make statements about specific subjects in specific credentials without the issuer specifying a "single subject".

@David-Chadwick
Copy link
Contributor Author

@jandrieu
Hi Joe. My above post suggests a change to the credential JSON to make it much clearer that multiple subjects can be supported

@jandrieu
Copy link
Contributor

jandrieu commented Feb 26, 2018

Hmmmm... I had thought multiple subjects were already supported. It is innate in the RDF graph model. Every statement has a subject. Each "claim" is in fact a RDF graph, which you can see in Example 16:

   "claim": {
      "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
      "name": "Alice Bobman",
      "birthDate": "1985-12-14",
      "gender": "female",
      "nationality": {
        "name": "United States"
      }
    }

That is, every property in "claim", other than "id", is a predicate whose subject is the value of "id" and whose object is the value of the property. But all of those have the same subject.

If I'm following this correctly, then "claim" should probably be "claims" and should be an array. Because an individual "claim" with its singular "id" literally can only have one subject.

I think I'm beginning to see @David-Chadwick's point.

@dlongley
Copy link
Contributor

@jandrieu,

Can you simply use an array as a value for multiple statements with the same subject & predicate?

Yes.

@jandrieu
Copy link
Contributor

Sorry, @dlongley I edited out that question when I updated my comment. I think David's point is on the mark. (But thanks for the clarification.)

@dlongley
Copy link
Contributor

dlongley commented Feb 26, 2018

@jandrieu,

If I'm following this correctly, then "claim" should probably be "claims" and should be an array. Because an individual "claim" with its singular "id" literally can only have one subject.

The value of "claim", in the data model, is actually an isolated graph with some set of statements in it -- which matches the concept. The choice of syntax does happen to promote the use of a single subject as "the main subject" or "entry point" for that claim because that's expected to be the most common (and simplest) case.

You could also use an array as the value of "claim" and this would make "multiple independent claims" (separate isolated graphs) would each have a different subject as the "main subject" of each claim a single independent claim (separate isolated graph) which would each have multiple subjects, each as a disjoint "main subject". This is possible now with the data model and syntax as is.

But a quick note about plural key/term names -- they are frowned upon. If you want more than one value, just use an array as stated above and in your example; there's no need to muck about with the term name.

@jandrieu
Copy link
Contributor

Agreed about not using the term "name". I don't think there are good examples or language explaining how to rigorously specify multiple subjects in a single credential (that's baked into the JSON-LD data model). We should add that.

FWIW, @stonematt and I are working on an update to the use cases document that will include the citizenship sticky wicket above and would be a good example use case for multiple subjects in a credential and multiple credentials in a profile.

@David-Chadwick
Copy link
Contributor Author

Part of the problem is that JSON does not have a formal schema, unlike XML, and the current specification does not help as it is ambiguous or simply lacking in explanation of many of the properties.

Conventionally I would expect that if a property is expected to have a single value it would be written as : "value", whereas if it is expected to have a set of values it would be written as ["value"].

We would not expect to see
"issuer": ["https://dmv.example.gov", "https://example.com/jdoe/"]
although this is allowed according to @dlongley and the JSON syntax. But this does make any sense.

Consequently I think we need to tighten up the current specification and make it clear where sets/lists are allowed and lists are not allowed.

@msporny msporny assigned msporny and David-Chadwick and unassigned msporny Feb 27, 2018
@David-Chadwick
Copy link
Contributor Author

I think one of the problems may be the implicit assumption that JSON-LD is fundamental to (understanding) the data model. But I do not think this is or should be the case.

Early on it was said that VCs should work without any reliance on JSON-LD, so that VCs formatted in pure JSON would work, as would those with JSON-LD constructs such as Context added to them, without effecting implementations that don't understand JSON-LD. I think the same principles should apply to the data model document as well.

JSON is very widely accepted, but I am not sure that JSON-LD is so widely accepted. In order to gain the widest possible acceptance of VCs, the reader of the VC data model document should not be required to have any previous knowledge about or understanding of JSON-LD. The same should go for implementers of VCs.

This means that:
i) any features which are implicitly imported from JSON-LD should be explicitly described in the data model document
ii) it is OK to repeat things in the data model document that are already stated in JSON-LD documents.
iii) the JSON examples should conform to normal JSON (and not to some imported semantics from JSON-LD). So if a value is meant or allowed to be an array then it should start with [ and end with ], otherwise it can only be a simple value.

@dlongley
Copy link
Contributor

iii) the JSON examples should conform to normal JSON (and not to some imported semantics from JSON-LD). So if a value is meant or allowed to be an array then it should start with [ and end with ], otherwise it can only be a simple value.

This is no such restriction "if a value can be an array then it MUST be an array" in JSON. JSON values can be anything, they can be mixed and matched, and so on.

@David-Chadwick
Copy link
Contributor Author

@dlongley
you have misinterpreted what I said. To elaborate, If a value in our data model is meant or allowed to be an array then it should start with [ and end with ] in our examples (even if it only contains a single element), and our descriptive text of the value should state it is an array. If the value is only allowed to be simple value, then this should be stated in the description of the name.

@dlongley
Copy link
Contributor

@David-Chadwick,

To elaborate, If a value in our data model is meant or allowed to be an array then it should start with [ and end with ] in our examples (even if it only contains a single element), and our descriptive text of the value should state it is an array.

Why? Why not just have examples with single values and examples with more than one?

@David-Chadwick
Copy link
Contributor Author

Because I would like the data model document to remove ambiguity. It should state quite categorically which values should be single values (such as ID) and which can be lists.

@dlongley
Copy link
Contributor

dlongley commented Feb 28, 2018

Because I would like the data model document to remove ambiguity. It should state quite categorically which values should be single values (such as ID) and which can be lists.

I'm in agreement with that. But that some "can be lists" doesn't mean that they "must be". We should show examples both ways.

@David-Chadwick
Copy link
Contributor Author

Agreed. But we should have some way of telling readers this fact. We can do it per property definition, by stating it is a list of values, or by putting [..] in the examples, or both. But to be silent on the issue is not good enough in my opinion.

@burnburn burnburn added the ready for PR This issue is ready for a Pull Request to be created to resolve it label Mar 13, 2018
@davux
Copy link

davux commented Apr 20, 2018

About "subject", JWT already has "sub" and is used to carry claims. Would there be any value in maintaining a consistent naming with JWT on that aspect?

@David-Chadwick
Copy link
Contributor Author

We still have not resolved this issue. Take Example 4 in the latest version (18 April 2018)

Example 4: Usage of status property

{
  "id": "http://dmv.example.gov/credentials/3732",
  "type": ["Credential", "ProofOfAgeCredential"],
  "issuer": "https://dmv.example.gov/issuers/14",
  "issued": "2010-01-01T19:73:24Z",
  "claim": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "ageOver": 21
  },
  "credentialStatus": {
    "id": "https://dmv.example.gov/status/24,
    "type": "CredentialStatusList2017"
  },
  "proof": { ... }
}

The first "id" is the ID of unknown object as there is no outer type. However, we are told we have to read the following "type" property to find out what type of object this is. It is "Credential". So this is the ID of Credential, of subtype ProofOfAgeCredential.
The second "id" has an outer type, and does not have a "type" parameter, but the ID is not the ID of "claim" but is the ID of the subject.
The third "id" has an outer type and a "type" parameter, but these are inconsistent. The outer type is "credentialStatus" and the "type" parameter is "CredentialStatusList2017". So is the ID the ID of "credentialStatus" or "CredentialStatusList2017"?

Terms of Use from Example 10 is more inconsistent

 "termsOfUse": [{
    "type": "Policy",
    "uid": "http://example.com/policies/credential/4",
    "profile": "http://example.com/profiles/credential",
    "prohibition": [{
      "assigner": "https://dmv.example.gov/issuers/14",
      "assignee": "AllVerifiers",
      "target": "http://dmv.example.gov/credentials/3732",
      "action": ["Archival"]
    }]

This does not have an id parameter, but has a uid parameter instead. What is the difference? And the object type is inconsistent. Is it the outer termsOfUse object type, or the inner type Policy?

Proposed Solution. The following proposed solution follows that in Example 18, viz:.

Example 18: A simple verifiable profile

{
  "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
  "credential": [{
    "id": "http://dmv.example.gov/credentials/3732",
    "type": ["Credential", "ProofOfAgeCredential"],
...

Example 4 would become

Example 4: Usage of status property

{
  "id": "http://dmv.example.gov/credentials/3732",
  "type": ["Credential", "ProofOfAgeCredential"],
  "issuer": "https://dmv.example.gov/issuers/14",
  "issued": "2010-01-01T19:73:24Z",
  "subject": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "ageOver": 21
  },
  "credentialStatus": {
    "id": "https://dmv.example.gov/status/24,
    "type":  ["CredentialStatus", "CredentialStatusList2017"]
  },
  "proof": { ... }
}

Terms of Use from Example 10 would become

 "termsOfUse": [{
    "id": "http://example.com/policies/credential/4",
    "type": ["TermsOfUse", "Policy"],
    "profile": "http://example.com/profiles/credential",
    "prohibition": [{
      "assigner": "https://dmv.example.gov/issuers/14",
      "assignee": "AllVerifiers",
      "target": "http://dmv.example.gov/credentials/3732",
      "action": ["Archival"]
    }]

@David-Chadwick
Copy link
Contributor Author

David-Chadwick commented May 19, 2018

I believe PR #170 and PR #176 resolve this issue

@burnburn burnburn added pr exists and removed ready for PR This issue is ready for a Pull Request to be created to resolve it labels Jun 25, 2018
@David-Chadwick
Copy link
Contributor Author

We now have PRs for fixing the IDs of TermsOfUse and CredentialStatus, but not for Claim

Can we please change "claim" to "subject", viz:

{
  "id": "http://dmv.example.gov/credentials/3732",
  "type": ["Credential", "ProofOfAgeCredential"],
  "issuer": "https://dmv.example.gov/issuers/14",
  "issued": "2010-01-01T19:73:24Z",
  "claim": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "ageOver": 21
  },

TO:

{
  "id": "http://dmv.example.gov/credentials/3732",
  "type": ["Credential", "ProofOfAgeCredential"],
  "issuer": "https://dmv.example.gov/issuers/14",
  "issued": "2010-01-01T19:73:24Z",
  "subject": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "ageOver": 21
  },

@msporny
Copy link
Member

msporny commented Jul 24, 2018

Migrating to new issue #207. Closing.

@msporny msporny closed this as completed Jul 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants