Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename "claim" to "subject" #207

Closed
msporny opened this issue Jul 24, 2018 · 30 comments
Closed

Rename "claim" to "subject" #207

msporny opened this issue Jul 24, 2018 · 30 comments
Assignees

Comments

@msporny
Copy link
Member

msporny commented Jul 24, 2018

@David-Chadwick would like to change "claim" to "subject", viz:

{
  "id": "http://dmv.example.gov/credentials/3732",
  "type": ["Credential", "ProofOfAgeCredential"],
  "issuer": "https://dmv.example.gov/issuers/14",
  "issued": "2010-01-01T19:73:24Z",
  "claim": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "ageOver": 21
  },

TO:

{
  "id": "http://dmv.example.gov/credentials/3732",
  "type": ["Credential", "ProofOfAgeCredential"],
  "issuer": "https://dmv.example.gov/issuers/14",
  "issued": "2010-01-01T19:73:24Z",
  "subject": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "ageOver": 21
  },

Related to #120

@stonematt
Copy link
Contributor

stonematt commented Jul 27, 2018

I have a few thoughts on this suggestion.

  1. This change might reduce confusion between "credential" and "claim" - which is a positive
  2. This change could confuse the term "subject" as it relates to "subject != holder" - I don't think this is the same "subject" and subject is already too overloaded as a term in the wide world.
  3. Finally it's very late in the process to make such a substantial change in fundamental terminology in our group

As a result, I would vote against this request, unless someone can make a compelling argument for it.

@David-Chadwick
Copy link
Contributor

I think the term subject is very clear in the context of VCs. The subject is the subject of the credential. This is exactly the same as the use of subject in X.509 as well. The solution to this is to ensure that the definitions section clearly defines subject, and clearly defines holder.

Concerning the timing of this issue, I would like to make two points.

  1. It has been an issue for many months, so it not new. At the time it was discussed several months ago, @dlongley said he was sympathetic to this change. But other issues (e.g. the PR on subject NE holder) took precedence for the group, so it got parked until I raised it again.
  2. We should be concerned about the long term use of VCs and their use becoming ubiquitous. Consequently our objective should be for the data model to be understood by the widest group of users. Anything we can do now to make the data model more understandable to the average person should be welcomed with open arms, before it goes for review by other W3C groups such as the security group.

@msporny
Copy link
Member Author

msporny commented Jul 31, 2018

Since we're bikeshedding the name, other choices include:

  • assertion
  • attestation
  • affirmation
  • declaration

I do agree with @David-Chadwick that now is the time to get this right. I disagree that "subject" would be the proper solution, primarily because of points made by @stonematt and @TallTed -- it's an overloaded term.

@msporny
Copy link
Member Author

msporny commented Jul 31, 2018

If we change to subject, the corresponding data model would be strange:

<_:credentialId> cred:subject <_:subjectGraph> _:defaultGraph .
_:subjectGraph {
  <_:subjectDID> some:property "someValue" .
}

That is cred:subject points to a graph of statements containing the subject. The plain English reading of this would be "The subject of the credential is THIS_GRAPH."... but that's clearly not what we're trying to say. We're trying to say that the credential makes assertions about a particular subject.

I do admit that this is a pedantic argument and the more important thing is to make the markup look reasonable and immediately graspable for authors. We should pick something that fits for both worlds, and I think the following words meet that criteria: assertion, attestation, affirmation, declaration.

Of those, I'd argue against attestation and affirmation... as they convey the idea that the statements are more official than they are and have the same problem as "verifiable claim".

@David-Chadwick
Copy link
Contributor

Take example 1 from the current data model document, with claim changes to subject

{
  "id": "http://dmv.example.gov/credentials/3732",   -- this is the credential id
  "type": ["VerifiableCredential", "ProofOfAgeCredential"],
  "subject": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21", -- this is the subject id
    "ageOver": 21  -- this is the subject's attribute
  },
  "proof": { ... }
}

It makes eminent sense to call the object a subject, as the id is the id of the subject. To call it anything other than subject is no better than calling it claim.

If you want to keep claim, attestation, or other assertion term, we could further elaborate the data model as follows

{
  "id": "http://dmv.example.gov/credentials/3732",
  "type": ["VerifiableCredential", "ProofOfAgeCredential"],
  "subject": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "claim|attestation|affirmation": {
        "ageOver": 21}
  },
  "proof": { ... }
}

The above makes much more sense and is easier to read and understand. It has the correct semantics for the "id"s which the existing data model does not have. And it correctly attaches the claim to the subject.

@David-Chadwick
Copy link
Contributor

@msporny. Could you please comment on the proposed new syntax for a VC, which keeps the concept of a claim, but also ensures that the subject is correctly identified, viz:

{
  "id": "http://dmv.example.gov/credentials/3732",
  "type": ["VerifiableCredential", "ProofOfAgeCredential"],
  "subject": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21",
    "claim": {
        "ageOver": 21}
  },
  "proof": { ... }
}

I will then produce a PR with this global edit to the examples. I suggest that this edit could be made without any changes to the accompanying text, but please correct me if I am wrong.

@dhh1128
Copy link
Contributor

dhh1128 commented Sep 18, 2018

I am in favor of the last suggestion from @David-Chadwick . Example 9 shows why this makes sense:

"claim": {
    "id": "did:example:abcdef1234567",
    "name": "Jane Doe",
    "favoriteFood": "Papaya"
  },

In this example, we have diverged from the definition of "claim" given earlier in the spec: "A claim is statement about a subject." Yet we have one "claim" element of the doc model containing two statements. I get why we did this--for terseness--but what we should really be doing if we want to allow this is calling the element "subject", with multiple claims inside it.

I would also point out that "id" inside a claim is not the ID of the claim, but ID of the subject. And this does violence to the notion of id as a unique identifier of a portion of the doc, since the same subject could be identified more than once.

@RieksJ
Copy link

RieksJ commented Oct 10, 2018

I would like to go for assertion. Indeed, the id inside is not the ID of the claim/assertion, but of the subject. The stuff in between { and } is what is asserted about the identified subject (by the issuer), and hence could very well be called assertion. Then, only when the issuer signs it does it become something to which the issuer attests, making the assertion(s) stronger.

@burnburn
Copy link
Contributor

burnburn commented Oct 26, 2018

From TPAC 2018:

  • Document that Claim uses an @graph container
  • Go through and find inconsistencies in examples

@msporny
Copy link
Member Author

msporny commented Oct 28, 2018

@jandrieu @David-Chadwick Can you check the PR directly above this comment to see if it addresses both of your concerns? We may add an example to note the sugar... but I didn't do that yet because I didn't know if the current text would be good enough.

@jandrieu
Copy link
Contributor

It doesn't look like the implicit @graph is explained. This was the heart of the confusion for most of us discussing this at TPAC.

If you don't explain how the @graph is an implicit property of the claim object--and show examples about how you can use it explicitly to put additional kinds of statements in the "claim" value, which are NOT about the subject--then there's a hug gap for those trying to understand the boundaries and flexibility of the data-model.

@David-Chadwick
Copy link
Contributor

I think it might be useful to draw an RDF graph of a VC. My opinion now is that there is no such thing as a claim inside a VC. Rather, an unsigned VC (or a simple credential) is actually a claim. I would say that the RDF for a credential or a claim is:

Claim is about subject
Claim is claimed by issuer
Subject has property e.g. subject is aged 21, subject was born on date, subject holds passport etc.

On the other hand, a verifiable credential is a different object than a credential (or claim) and we should not confuse the two. The RDF for this is:
VC is about subject
VC is issued by issuer
VC is verified by proof
VC expires on date
Subject has property e.g. subject is aged 21, subject was born on date, subject holds passport etc.

In this way there is no need for the @graph concept.

@TallTed
Copy link
Member

TallTed commented Oct 30, 2018

I appreciate the above effort, but I think that drawing a graph requires graphical imagery, not simple text. I think that drawing such a graph would indeed help clarify things, and hopefully clarify both the differences and the similarities between verifiable and unverifiable claims/credentials/whatever-we-call-them.

To my mind, the differences between verifiable and unverifiable are all about verification/proof -- which I still believe is all about verification that the Issuer did indeed Issue the claims/credentials/whatever-we-call-them -- but the above appears (I think unintentionally) to me to be about "proof of the assertion within the credential" rather than "proof that the credential came from the issuer who made the assertions therein".

Sticking with the text stack, but putting it more into pseudo-RDF form, I'd say it looks something like this --

            Assertion  about               Subject ;
                       madeby              Issuer ;
                       says                [ Subject attribute value ] .
OPTIONAL  { Assertion  expires             date }

            UnVC       contains            Assertion ;
                       unprovablyIssuedBy  Issuer.
OPTIONAL  { UnVC       expires             date }

            VC         contains            Assertion ;
                       provablyIssuedBy    Issuer.
OPTIONAL  { VC         expires             date }

In other words -- the only difference between a VC and an UnVC is whether or not the Issuance is Provable.

@David-Chadwick
Copy link
Contributor

This is perhaps something we should have done a long time ago, because there are potentially several different ways in which we can model a VC and a claim. I believe that having an RDF diagram does help our understanding, but of course this discussion medium does not facilitate the diagramatic way of representing RDF. So unfortunately we will have to stick with text.

If you refer to Figure 4 in the data model document you will see the diagram for
"Pat knows Sam
Sam has title Professor"

This is the model I have used when I said
"Claim is about subject
Subject has property"
This could be collapsed into "Claim says Subject has property" The diagram will look the same in both cases. (This is why diagrams help).

In contrast your model appears to have
"Claim is about subject
Claim says subject has property" (if I have interpreted Subject attribute value correctly)

Thus you appear to have introduced two different subject nodes into the RDF graph. Or alternatively, you have put two different arcs between the claim and subject. You do not need your first statement "Claim is about subject" as this is implied in your second statement.

Secondly "UnVC contains Assertion" is unnecessary. An assertion is the same as an unsigned credential. All the properties that can be attributed to one can be attributed to the other. If you can see a difference between the two, could you please tell me? I think this similarity is adequately illustrated because you say "Assertion made by Issuer" and "UnVC unprovably issued by Issuer". This is duplication of the same fact.

@TallTed
Copy link
Member

TallTed commented Nov 1, 2018

A little refinement of my earlier text sketch --

            Assertion   about       Subject ;
                        madeby      Asserter ;
                        says        [ Subject attribute value ] .
OPTIONAL  { Assertion   expires     date } .

            Credential  contains    Assertion ;
                        issuedBy    Issuer .
OPTIONAL  { Credential  verifiable  Verifiability } .
OPTIONAL  { Credential  expires     date }

I can have an Unverifiable Credential, which contains Unverifiable Assertions. If you cannot tell that the laminated "license" I present to you was produced and issued by the relevant Department of Motor Vehicles (or whatever) of the jurisdiction named thereon -- it's still a Credential, containing Assertions. None of those Assertions are the same as the Unverifiable Credential.

I think one rather big wall we're hitting is that figures 2, 3, and 4 --

Figure 2 Figure 3 Figure 4
figure 2 figure 3 figure 4

-- get quite (usefully!) atomic, while figures 5 and 6 --

Figure 5 Figure 6
figure 5 figure 6

-- are just colored-box handwaves -- and these are the core of everything we're trying (and, I'm afraid, failing) to MODEL.

(While yes, github doesn't directly support diagrams, we cannot limit ourselves to text for this discussion. There are numerous graphical tools out there, which can be used to sketch things as we go, and to produce drawings like those already in the document. I often use OmniGraffle v5; there are numerous alternatives. Something was used to create the diagrams in the current document; I would suggest that the same tool might be well employed here.)

@msporny
Copy link
Member Author

msporny commented Nov 1, 2018

Something was used to create the diagrams in the current document; I would suggest that the same tool might be well employed here.

That tool was Google Draw -- with export to SVG, which are the files types that are used in the spec.

@David-Chadwick
Copy link
Contributor

@TallTed your message shows quite clearly to me how RDF diagrams help. Would you like to draw diagrams for your textual RDF, and then compare it to my diagram, which is fig 4 with the following mappings
Pat -> Claim
knows -> says
Sam -> Subject
jobTitle -> has
Professor -> Property
Ignore ageOver 21 as they are not needed.

@David-Chadwick
Copy link
Contributor

Actually you could add the following mapping
ageOver -> issuedBy
21 -> Issuer

@msporny
Copy link
Member Author

msporny commented Nov 12, 2018

Here is a summary of our problems at present (as well as other issues that resolving this issue may address):

  • claim points to a @graph, which many people in the group were not aware of (because we hid it using the JSON-LD Context to try and make it easier to understand).
  • Renaming to subject would follow the same pattern we use elsewhere in the specification (e.g. evidence, issuer, termsOfUse, etc.)... claim feels strange in that respect.
  • Removing claim would be strange because the spec is about making claims and nothing in the data model would contain the world claim if we were to change it.

After much debate, mostly with @dlongley last week in the DB office, I think we've come to the following proposal for resolving this item:

  • Remove @graph container from that term, making claim no longer a graph container (we believe we can do this safely now).
  • Rename claim to subject (or something more specific if people protest ... xyzSubject).

If we do this, I think it resolves the many issues swimming around this particular item AND simplifies the data model and our use of advanced JSON-LD features such that developers will be far less likely to shoot themselves in the foot when making claims.

@David-Chadwick
Copy link
Contributor

Fantastic. How many +1s do you want?

@msporny
Copy link
Member Author

msporny commented Nov 12, 2018

Fantastic. How many +1s do you want?

Well, you only get the one. :p

We will need to raise this on the call tomorrow to see if there are any objections and then I can do the PR for it.

@jandrieu
Copy link
Contributor

I hate to be the negative nelly, but what happens when there are multiple subjects? You just use [] and have each set of statements within each element in the array have a separate subject that must be independently reified within each?
subject : [ { id : "abc", name : "Joe" }, { id : "xyz", name : "David" } ]
I guess that works, actually. Is that what's intended?

@msporny
Copy link
Member Author

msporny commented Nov 12, 2018

I guess that works, actually. Is that what's intended?

Yep, what you wrote is the most common way that you would achieve that use case (two subjects with no relation to one another).

@msporny
Copy link
Member Author

msporny commented Nov 12, 2018

re: "subject"

I've been anti-"let's call it subject" for a long time now because:

  1. All nodes in the graph can be a subject.
  2. claim really points to a hidden graph, not the subject.
  3. "subject" is a common word that someone else might want to use and that will definitely cause problems if overridden via a JSON-LD Context..
  4. Credentials make "claim"s, so what does a credential do if it doesn't do that?

... but, changing a few things at the same time caused the following to happen:

  1. Yes, everything is a subject, but in context... this is credential.subject -- or the "subject of the credential", which feels fairly easy to talk about. It's certainly more straightforward than explaining what "claim" is and why "claim.id" is not the identifier for the claim.
  2. If it's no longer a hidden graph, and is just a part of the main graph, nothing bad happens and it cuts away at the "hidden graph" argument.
  3. The word "subject" is not reserved in schema.org. If it was common, it would've been reserved at this point. We avoid "subject" being overridden using sealed/frozen contexts, which are going to be in JSON-LD 1.1, so this danger is no longer there.
  4. We can still include the word "claim" in the prose.

I do think that RDF folks are going to flip out over the use of subject and we may have to back off to credentialSubject.

@dlongley
Copy link
Contributor

dlongley commented Nov 12, 2018

If it's no longer a hidden graph, and is just a part of the main graph, nothing bad happens and it cuts away at the "hidden graph" argument.

Note that I'm pro using subject now because we're using a "hidden graph" for proof and a "hidden graph" for verifiableCredential when doing Verifiable Presentations (both of which we should still talk about in the advanced concepts section). This will ensure the integrity and boundaries we need for trusting what was said by whom. We also don't have the same "weirdness" of the property name not matching what it points to for either of those... which for claim was made harder to understand.

We will lose the ability to encapsulate the "claims" made from other meta data about a credential (which is all signed by the issuer anyway), but I think this is an acceptable compromise.

@TallTed
Copy link
Member

TallTed commented Nov 12, 2018

I do think that RDF folks are going to flip out over the use of subject and we may have to back off to credentialSubject.

Well... Speaking as one of the more RDF-oriented, I wouldn't say I'll flip out, but I would continue to argue strongly against using the ridiculously overloaded unqualified subject. I am much more in favor of the qualified credentialSubject which is almost perfectly unloaded (8 whole results found by Google!).

@msporny
Copy link
Member Author

msporny commented Nov 15, 2018

PR is in -- #277 -- please review.

@TallTed What if we do subject in the JSON, but that maps to credentialSubject in the RDF? So, this:

{
  "id": "http://dmv.example.gov/credentials/3732",   -- this is the credential id
  "subject": {
    "id": "did:example:ebfeb1f712ebc6f1c276e12ec21", -- this is the subject id
    "ageOver": 21  -- this is the subject's attribute
  },
}

translates to this:

<http://dmv.example.gov/credentials/3732> <https://w3.org/2018/credentials#credentialSubject> <did:example:ebfeb1f712ebc6f1c276e12ec21> .
<did:example:ebfeb1f712ebc6f1c276e12ec21> <https://example.com/example#ageOver> "21"^xsd:integer .

Note that subject -> https://w3.org/2018/credentials#credentialSubject

How does that make you feel?

@TallTed
Copy link
Member

TallTed commented Nov 15, 2018

@msporny -

I think using subject in one place/serialization and credentialSubject in another, with the same semantic meaning, will inevitably cause trouble, especially as it seems these would not be true aliases but carry different (if not no) meaning in the other language. Looking forward, what would happen if someone put credentialSubject (instead of subject) into their JSON? Or if they put subject into their RDF expression of that certificate?

What's your goal with this thought?

@TallTed
Copy link
Member

TallTed commented Nov 16, 2018

@msporny - I don't know whether the following is better put here, or on #277...

I'm with @dlongley on preferring consistency (so credentialIssuer would be preferred over issuer, given we're going with credentialSubject which I think is far preferred over subject). If literal-length is more important than clarity (I don't think this), I would suggest credIssuer and credSubject, again because these are not likely to be misinterpreted as issuer or subject of something else.

That said, regretfully, I have to say that having read over this change several times in context, it's still problematic. (I should say "partial context," as Sections 1 thru 4 will need careful review and many changes to ensure that they match these revisions of Sections 5 thru 10. The contradicting definitions and such in those earlier sections are part of why I am finding evaluation of the later sections so challenging.) This may well be dismissed as "bikeshedding", but I do not think such dismissal is appropriate.

We're still working with textual "illustrations" of everything, whether in English prose or JSON[-LD] or otherwise, and I think that these textual representations are still failing to communicate clearly, and that we are going in circles because of that lack of clarity. I also think there's a significant blur of data and metadata in this textually "illustrated" model.

I am certain that I am not clearly understanding what was meant by the writer(s) in some areas, and I am fairly confident that some readers who think everything is perfectly clear would find that their mental images do not match up with the mental images held by either the writer(s) or by other readers who think everything is perfectly clear. Indeed, my "understanding" has changed multiple times with repeated readings.

I think things would get somewhat clearer if we used more real-world credential examples, such as "passport" or "driver's license", rather than making up a "ProofOfAgeCredential" or whatever, even if the purpose of this presentation is to verify the holder's age. (I'm not aware of any real-world "ProofOfAgeCredential", though I am aware of many real-world IDs which include a "Date Of Birth" and which are therefore used for ProofOfAge.)

@msporny
Copy link
Member Author

msporny commented Dec 4, 2018

This was resolved in PR #277.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants