Does DID Document metadata belong in the Document? #65

dmitrizagidulin · 2019-10-09T13:00:54Z

Does metadata about the DID Document (such as when it was created, updated, or who it was signed by) belong in that DID Document?

Note that this question is not about a) the metadata for the subject of the DID (keys, service endpoints) or b) the metadata about the resolution of a particular DID Document (proof added by a resolver, caching data, what servers/nodes were used for resolution) -- that belongs either in the Resolver metadata or Method metadata sections.

So far, there have been arguments both for and against placing this metadata in the DID Document itself (vs outside of it, say in the Resolver metadata sections).

A) This metadata is already in the registry

A - against: Since much of this metadata (specifically, the created and updated timestamps and the proof which includes authorship metadata and document integrity protection) will also likely reside in the underlying DID registry mechanism (distributed ledger, etc), a Resolver should be able to figure out this data from the registry, and include it in the resolution metadata.

A - for: In many (most?) cases, these are two separate sets of metadata - one about the document itself, and one about the underlying registry mechanism.

Also: The DID Document should be self-contained, in terms of critical metadata, in case it is archived or otherwise separated from its underlying ledger or storage medium.

B) Potential for developer confusion

B - against: If the DID Doc metadata (such as when the document was created) differs from the did registry metadata (when the document was registered on a ledger, for example), this may confuse developers.

B - for: @TallTed

You want to talk about "confused developers"? Check out "last accessed", "last modified", and "created", among other Unix-y timestamps attached to documents in Unix-y filesystems.

In other words, these two categories of metadata are separate, and developers constantly have to keep this difference in mind anyway.

C) Use cases

C - against: There are no use cases currently for this metadata. (Or, the use cases are unclear.)

C - for: There are use cases -- this topic is highly relevant to any DID registry using a mutable storage mechanism, such as the BTCR mutable extension documents or did:web method documents.

Also, as @peacekeeper points out:

Perhaps the strongest argument for a proof on a DID Document is to link DIDs to already existing PKI such as X.509 or the E.U.'s eIDAS infrastructure. You could include an eIDAS signature (this is called "eSignature" or "eSeal") on a DID Document to link the DID to a legal identity.

D) Offload this topic to DID method specific specs

D - against: Even if this metadata does belong in the DID Document, perhaps we should hand this off to each DID method to decide (rather than the main DID spec).

D - for: @ChristopherA

However, if there are any other DID methods that use mutable storage for DID Documents, they would need to solve the same problem we do, and they might do it different ways which could be good (for innovation) or bad (for security if they don't understand it well as our scenario is complicated).

In other words, this is going to be a common enough problem that we should address this in the main spec.

E) Conceptual elegance

E - against: @dlongley:

I want to point out that the way that we've avoided the HTTP-Range-14 argument (which we should absolutely continue to do) is by deciding that you can, for most practical purposes, conflate a DID Document and a DID subject (they have the same identifier). There's a danger that we may lose this simplicity by encouraging expressing information in a way that stretches the limits of that conflation.

E - for: ... an excellent point. Perhaps we can continue to benefit from this conceptual simplicity (of having the DID Doc be mostly about the DID subject) by making it clear via the attribute names that the metadata is about the doc, not the subject? Like, having the field be named docCreated instead of just created, to prevent ambiguity?

The text was updated successfully, but these errors were encountered:

dlongley · 2019-11-19T17:13:49Z

A DID Document is a graph of information. That information is primarily about the DID subject. If we want to make statements about the graph itself, those statements do not belong in that very graph. There may perhaps be exceptions we can make for things like proof that get special treatment, but we should otherwise avoid this. If the statement we want to make can be reasonably understood to apply to the DID subject then we can put it in the graph. This gives us wiggle room to avoid the http-range-14 problem.

iherman · 2019-11-19T17:21:53Z

This issue was discussed in a meeting.

No actions or resolutions

View the transcript

Brent Zundel: #65
Markus Sabadello: this issue is about the question of having data in the DID document some of which is about the subject and some of which is about the DID document itself
… there were ideas to maybe remove some properties like created or updated
… or proof property. That’s where this discussion started
… we thought created, updated, proof is it about the subject or the DID document itself
… dmitriz wrote a really good summary
… The question that we need to agree on is are we okay with data that is sometimes about the subject and sometimes about the document
… or do we want to separate that somehow
… I think there’s a third category which is data about a the resolution process, some metadata may be added about that in the result
… but the primary question is are we fine having data about the did subject like services an dpublic keys
… as well as about the document like proof
… and if we’re not, do the subject and the document have separate identifiers?
… we spent a lot of time discussing that
… we felt we are comfortable with combining that, they don’t need separate identifiers
… the same identifier for the subject and the did document
Manu Sporny: #27
Manu Sporny: #28
Manu Sporny: I want to point out that we have two PRs pending, 27 and 28, when I put those PRs in there my assumption was that the created and updated were being used to describe metadata about the DID document
… but after putting it in I can see how people thought they were about the DID itself, the identifier
… this is really a metadata discussion, if created and updated are truly about the identifier itself and not metadata about the DID document then it’s fine to keep them in there
… but if is metadata about the DID document I feel strongly we should take it out, we shouldn’t be conflating those two things
… we need to decide whether or not it’s okay to use the same identifier to kind of sort of refer to two different things
Ivan Herman: +1 to manu
Manu Sporny: that is a huge red flag in the linked data space, your semantics get really messy
… similarly you do not need to have an identifier for everything
… you can do autogenerated identifiers, that’s a common thing, we use it in VCs
… we could have metadata about the DID document that’s outside of the DID document itself, much cleaner separation
Daniel Burnett: The DID document is not the resource. It is an explicit representation of access mechanisms (to use the HTTP URI analogy)
Manu Sporny: if we come to that philosophy it’s much easier for us to determine if a particular item is in or outside of the DID document
… I thought the original issue was about metadata about the DID document, interested to see if anyone hears differently
Jonathan Holt: I thought these were for convenience, and if you wanted to find the original source of truth you spin up a resolver or your own node and verify the assertions being made in the DID document
Manu Sporny: I’m hearing Jonathan say “issued” and “created” are about the DID Document.
Jonathan Holt: my interpretation was they were self asserted related to creation of the DID document, and are there for convenience
… what markus mentioned for identifiers, the keys ed25519, hiding keys.. was that what you were talking about as far as the subjec tidentifier? you have to have a self asserted key identifier in the DID document that’s only about its own keys?
… or are we having this conceptual framework of referring to delegate keys or controller keys?
… what are the semantics we are working with?
Markus Sabadello: we’re not talking about identifiers for keys, we’re talking about whether the DID is an identifier for the subject, that’s where we ended up after a few months of httpRange-14, or is the DID the identifier for the document, or is it both?
… I think the community thinks it should be both
… but understand that’s ambiguous from a linked data perspective
Jonathan Holt: the subject is the identifier around the DID document, not a human subject?
Markus Sabadello: the DID subject is the person, org, thing, whatever, resource, identified by the DID
Daniel Burnett: “The DID subject is the subject of the DID.” <- Official definition :)
Dmitri Zagidulin: interpretation about created, updated, my summary in issue 65, I was interpreting them to be metadata about the DID document
… I’m not sure it makes sense to have metadata abut the DID because it doesn’t apply separately to the DID document
Markus Sabadello: +1 to dmitriz that created, updated are metadata about the DID document
Dmitri Zagidulin: On the issue of does metadata about the document belong there. On which grounds is manu objecting?
… I laid out several arguments that i’ve seen you make in the various issues against it, which are relevant and howd o you feel about the counterpoints?
Joe Andrieu: I’ve flipflopped on this issue
… one aha for me right now is the definitive way to find out if a given DID document is the correct DID document for a given DID is to execute the resolution process
Daniel Burnett: markus_sabadello, does created not apply to BTCR DIDs where DID documents are generated rather than stored?
Joe Andrieu: If that’s correct, there’s not necessarily a baked in way for a document to demonstrate on its own as a set of bytes that it’s the authoritative one, I think whatever metadata you need to verify the process needs to be in the DID document
… otherwise that separation feels a little false to me
Manu Sporny: there is a certain subset of things I’m strongly objecting to, and that’s the conflation of any kind of semantics
… it’s not clear to me what the group things issued, attributed, means yet
… one thing that might be helpful, there are two categories of information we are talking about
… information about the identifier itself, the DID string, and whatever it may identify
… and then information about the DID document itself
… those are two distinct categories that i think we should keep distinct
… if we conflate them there’s nasty stuff that can happen
… that’s where my concern comes from
… Let’s say that we say that updated is the time the identifier was updated. Semantically that’s meaningless. I know the identifier was updated but it doesn’t tell me anything more than that
… whereas if the DID document was updated, there’s a change the resolver can check, that’s about the document itself not the identifier. The semantics are very different
Dmitri Zagidulin: nobody is proposing that it would be about the identifier
Manu Sporny: i’m not convinced
… I think some people are and some people aren’t, and some people don’t understand what conflating those two things does to the entire data model
… You may not be proposing that and I think other people might be, we need to get down to the definition of what created and updated really means to people, and then see if those definitions are the problem
Dmitri Zagidulin: the topic of this issue is does metadata about the document belong in the document
… that’s a separate httpRange-14 discussion
… nobody is conflating, just discussing whether data about the document belongs in the document
Ted Thibodeau Jr: “How do we identify the identifier which identifies an entity?”
Dmitri Zagidulin: Having metadata about the DID document in the document allows portability
… it allows fo standardizing of that metadata among mutable DID methods that don’t have underlying ledger mechanisms
Markus Sabadello: manu is saying sometimes we’re talking about metadata about the identifier, I don’t think that makes much sense
… it always identifies something, and the data is about that resource
… we can’t have data about the identifier, we can only have data about the thing being identified
… with data about the subject, data about the DID document
… I agree it’s better to separate them, even though conflating was the outcome of a few months of discussion of httpRange-14, makes more sense to keep separate
… agree with dmitriz that the metadata about the document inside the DID document is the issue. inside the DID document we need a separate object or level of JSON-LD structure
… one one level describe the document, on one about the subject
Dave Longley: when we’re talking about updating or applying an update to a DID document, eg. adding a key, we’re really updating the subject
Daniel Burnett: yes, explicitly marking any meta data as such by placing it in a separate subtree in the DID doc would at least make clear that it is different
Dave Longley: The predicates in a DID document are things like authorization, which the subject, some aspect of a person or some thing, and when you add a key you say this person authorizes this key for some purpose
… that’s the statement you’re making
… if you make that kind of update you’re updating information about the subject, not the document
… these update times that are metadata might actually be information about the subject not the DID document
Manu Sporny: I agree with dlongley, that’s the point I’m attempting to make.
Dave Longley: dmitri also brought up portability, we’re talking about porting information about the subject, not the document
… the information inside the DID document is about the DID subject. That’s what you’d want to port
… I think we disagree less than we think because a lot of these things we’re talking about are really just more information about the DID subject
… manu was talkign about the identifier, I think he really meant information about the subject not the DID, we’re not changing DIDs, that doesn’t make any sense
… a lot of the disagreements go away because we’re not talking about metadata that happens to live on some registry somewhere, we’re talking about the subject
Joe Andrieu: manu, you conflated the identifier with the subject. A lot of people have been responding in confusing because of that. I don’t think anyone is talking about putting information about the subject in a DID, that would be a privacy antipattern
… we have a did that’s a string, we don’t need metadata about that
… The subject.. the DID document is how you get from the DID to secure interaction with that subject
… We need to be much more careful about the language we use here, it’s confusing us, going to be more confusing for others
… we have this weird issue of the definitive DID document is not a string of bytes anywhere, it’s the output of a resolution process
… to understand if it’s definitive, whatever metadata we use, needs to be part of the DID document
Daniel Burnett: I wanted to bump up a level here
… the metal model that led to where we are
… as long as we can keep that mental model we’ll be fine
… what joe said matches what manu said
… we wanted our use of DIDs as URIs to work similarly to the way other URIs work
… such as http URIs
… if you look at the definition that we always refer to of a URI there is a resolution process and a dereferencing process
… the resolution process is where you discover what the access method and operation methods with the resource are, including any kinds of authn approaches that are necessary
… we’re different from http - we put a lot of that information that is part of the resolution process in the DID document
… we’re getting confused by making the DID document something more magical than it is intended to be
… which is a representation about how you access and update the resource
… it’s not the access to the resource itself , it is the things you can do with the resource and how you can authenticate yourself for that
… That may help. We may still decide that there is information that is not about the resource itself but we stil may put it inside the DID document
… joe is correct that conceptually the resource access methods all of this exists even for DID methods that do not explicitly store a representation of the DID document
… the DID document can be generate if necessary, not have to live at a location somewhere
Ivan Herman: from a linked data / semantic web point of view
… with JSON-LD for the did document, that means we define in a particular syntax a bunch of RDF triples and if I can imagine a linked data environment which includes lots of triples, includes the triples in the did document
… according to the JSON-LD and RDF, there are triples, and all what I see in the DID document. The triple consists of subject, predicate, object, and the subject is a DID URL
… that’s what happens in RDF
Manu Sporny: yep, to Ivan.
Ivan Herman: none of those triples have to say anything about the DID document itself because the DID document is just a collection of triples
Manu Sporny: exactly, Ivan.
Ivan Herman: if we want to say something about the DID document, we need another subject that identifies it, in order to play properly with the linked data world
… if you link it to any other process that wants to use these identifiers, we have to be careful because you will get wrong triples
… triples that say things you don’t want
… and someone may use those triples to deduce things that semweb technologies can deduce, you will get wrong statements, you cannot mix these two up
Brent Zundel: we have 9 mins left
Dmitri Zagidulin: to draw a parallel with the VC data model
… we had the same discussion about the created metadata, and there we have two separate sections, subgraphs
… one about the credential and the other about the credential subject
… we label it
… we standardized the created timestamp for the verifiable credential
Dave Longley: +1 to ivan, the DID Document is a graph/dataset with triples about the DID subject in it
Dmitri Zagidulin: this is the same thing that’s being proposed for the DID document
… we standardize it for the DID document, not to the person or org
… if we need to have a separate linked data section so that the triples don’t get confused, that’s fine, let’s talk about that
Ivan Herman: +1 to dmitriz
Dmitri Zagidulin: but I want to re-emphasize the need for storing the data about the document not the subject in the document itself
Joe Andrieu: the conversation isn’t about triples. it’s about quads. about statements about statements.
Dmitri Zagidulin: the counter that manu seems to be proposing is we let each did method standardize their own. That doesn’t seem right
Manu Sporny: dmitriz that is absolutely not what I’m suggesting
… I think there’s some miscommunication
… we need some concrete examples
Dmitri Zagidulin: let’s take ‘created’ as a concrete example.
Manu Sporny: The thing that you raised is spot on - in VC we had two subgraphs, one for the credential, the other for the credential subject
… this is the exact same thing
… the issue is that.. what we need is put some concrete examples and ways we could address this problem
… we can use created and issued as examples
… that would help people see how the philosophy applies to an actual concrete solution
… we only need two examples, there are two ways we can go
… that’s what we need for the next time we discuss this
Ivan Herman: +1 to manu, we need specific examples
Manu Sporny: people can see what’s being proposed
Joe Andrieu: +1 for specific examples
… The thing is we’re not talking about triples, we’re talking about quads
Kenneth Ebert: I like the examples, too
Joe Andrieu: I’m not familiar enough with JSON-LD spaghetti, methods for representing quads
Brent Zundel: +1 for examples
Joe Andrieu: we’re talking about the context in which the triples are stated
Dave Longley: {id: $MD_CODE$$MD_CODE$ means "the DID subject (identified by `, authentication: [`) has authorized `]}` for the purpose of authentication" ... that's a statement about the DID subject
Joe Andrieu: we need to make statements about that context
… we need to be able to in the DID document say something about the DID document
… metadata about the resolution is part of proof
… why do we believe this? here’s some metadata about the process to increase your confidence that this is legitimate
… What needs to be in there we should figure out at the DID document level, not at the DID resolution level
Markus Sabadello: +1 to keep the triples/quads clean and separate. Strictly speaking we would need a separate identifier for the DID document
Daniel Burnett: +1 dlongley
Manu Sporny: you don’t need to give the DID Document a separate identifier… can be a blank node… works just fine.
Markus Sabadello: the problem with that which we’ve discussed before for a few months, if we give the DID document a separate identifier we ran into problems defining the dereferencing process with URLs, especially if the DID URL has a fragment
… the way you dereference a fragment is you first deref the primary resource, without the fragment. The result has a mime type and dereferencing the fragment depends on the mime type
Ivan Herman: +1 to markus_sabadello
Markus Sabadello: if it’s an identifier for the subject, we can’t dereference it because it’s a real world resource and doesn’t have a mime type
… I like what dmitri said, parallel with VC, separate sections about the document and the subject
Dave Longley: a DID Document itself is much more ephemeral – you generally don’t “talk about it”, except perhaps to make statements in a resolution process
Brent Zundel: we had a recommendation to present real world examples so we can have something more concrete to discuss about
… The issue, 65 is assigned to markus
Manu Sporny: {resolution_things… didDocument: {did document things}}
Brent Zundel: markus, comfortable working to arrange some concrete examples?
Markus Sabadello: I can come up with some examples
Manu Sporny: {metadata_about_did_document… didDocument: {did_document_stuff}}
Daniel Burnett: yes, dlongley, this is what I meant by giving a DID document more reality than it should have, which is a physical representation of resolution info
Dave Longley: I think it helps to think of the DID Document as a graph … for which we generally don’t give an identifier
Ted Thibodeau Jr: DID document … is { .ttl owl:sameAs .jsonld owl:sameAs .rdfxml }? Can you speak of one serialization? Or only of all?
Ted Thibodeau Jr: It can be important to track when info about a subject was changed, as well as when the subject changed, as well as when the info about the subject was logged (which may be different from when it changes)…
Ted Thibodeau Jr: VERY complex!

jandrieu · 2019-11-19T17:24:38Z

@dlongley I believe that statement is fundamentally incorrect.

That information is primarily about the DID subject.

The DID document provides the information necessary to interact securely with a DID Subject. That's it. It is NOT about the did subject. Yes, I can see how you could argue that how you interact with a Subject is indirectly and ultimately about the subject, but that is just going to get us in trouble. It's the wrong mental model. The defining line here the DID Document provides the information needed to interact securely with the Subject. If it isn't about interacting securely with the subject--potentially including meta-data about why we should believe the rest of the content is itself secure--then it doesn't belong in the DID Document.

Statements about Subjects don't belong in DID Documents.

If we don't tow that line, we are inviting a privacy nightmare with this work.

dlongley · 2019-11-19T17:43:37Z

@jandrieu,

The DID document provides the information necessary to interact securely with a DID Subject. That's it. It is NOT about the did subject. Yes, I can see how you could argue that how you interact with a Subject is indirectly and ultimately about the subject, but that is just going to get us in trouble.

Yes, this information is about the subject. That there are risks there are not a reason to break the model, IMO.

It's the wrong mental model. The defining line here the DID Document provides the information needed to interact securely with the Subject. If it isn't about interacting securely with the subject--potentially including meta-data about why we should believe the rest of the content is itself secure--then it doesn't belong in the DID Document.

I think it would be confusing to create a new model here (both mentally and technically) -- i.e, "public information about a subject is not about the subject, but private information is". The issue isn't with whether or not the information is about the subject. It's about public, discoverable information vs. private information. What we need to do is provide clear guidance on what should be said where. This is no different from talking about people in general and I suspect moving away from that will only create more confusion. I think it is better to draw on what people already know about public vs. private to help avoid trouble rather than try to obscure it away with a special model.

Statements about Subjects don't belong in DID Documents. If we don't tow that line, we are inviting a privacy nightmare with this work.

Privacy is always going to be a consideration no matter what we do. We have to be clear and upfront about what kind of information should be in a DID Document that is publicly available or on a blockchain, for example. And, yes, no private information should ever be there.

iherman · 2019-11-19T17:52:43Z

@jandrieu @dlongley chiming in again with my Semantic Web hat on; maybe this is one of those cases when the RDF terminology does help. (It does help me, but I am biased by my background.

If I look at the DID document, then I only see triples like

<did:example:123456789abcdefghi> authentication <did:example:123456789abcdefghi#key> .
<did:example:123456789abcdefghi#key> publicKeyPem "...."
etc.

I.e., strictly speaking, we are making statements about the DID (URI). The RDF Semantics doesn't require anything more about the DID URI and what it "denotes" (in our case about the relationship between the DID URI and the DID Subject). It says:

IRI meanings may also be determined by other constraints external to the RDF semantics; when we wish to refer to such an externally defined naming relationship, we will use the word identify and its cognates.

(Emphasis is mine).

In other words: the only thing the DID document contains are statements about the DID as a URI, and any relationship between the DID and the DID subject is defined "outside" of the DID document. You guys tell me exactly where.

Does this help?

jandrieu · 2019-11-19T18:51:55Z

@dlongley The distinction between "private" and "public" is a false dichotomy. I've been writing and speak about this for years. http://blog.joeandrieu.com/2011/04/10/constellations-of-privacy/

MANY people have repeatedly argued that once a piece of information is public it is no longer private. This is grossly incorrect. It is also usually a bald-faced justification for the kinds of broken Big Data business models which have inspired many in this community to create a better alternative. Semantically, these terms are essentially meaningless. As such, it is incorrect scoping for determining what is or is not in the DID Document.

What goes in the document should ONLY be information that enables secure resolution of appropriate resources, within the meaning of RFC 3986 https://tools.ietf.org/html/rfc3986#page-28:

URI "resolution" is the process of determining an access mechanism and the appropriate parameters necessary to dereference a URI;

You wouldn't say that a DNS record is about the owner of the record. It's about how you turn that identifier into service endpoints. In the same way, what is in the DID Document is not about the Subject, it is about how you interact with the Subject securely. That is a very specific subset of information "about the Subject".

Asserting the broader statement will lead to inappropriate information included in DID Documents rather than expressing them through other secure or verifiable mechanisms, like VCs. This would directly undermine the separation of concerns that underlies the entire framework of VCs and DIDs and the idea of decentralized identity as we--as a community--have been working on for years.

If we don't make the distinction about what goes in a DID Document clearly, early, and consistently, we will be enabling massive global tracking systems such as that proposed by GADI http://didalliance.org/.

jandrieu · 2019-11-19T18:56:09Z

@iherman I think you have the gist of it, with one clarification. The statements are not about the DID-URI, but rather about how you use the DID. The distinction between DID-URIs and DIDs is an unfortunate one, but the DID Document can't know the full DID-URI that might be ultimately dereferenced. All the statements are relative to the DID.

This makes for some delicate nuance between a DID-URI (whose ABNF is in the spec) and a DID as a URI, both of which might be referred to as a DID URI.

dlongley · 2019-11-19T20:21:27Z

@jandrieu,

The distinction between "private" and "public" is a false dichotomy. I've been writing and speak about this for years. http://blog.joeandrieu.com/2011/04/10/constellations-of-privacy/

In my view, this is in support of not drawing some artificial line at the data modeling layer between public and private. The data is about the subject -- the only question is about whether it is appropriate to express certain pieces of information in places where anyone can read them.

MANY people have repeatedly argued that once a piece of information is public it is no longer private. This is grossly incorrect. It is also usually a bald-faced justification for the kinds of broken Big Data business models which have inspired many in this community to create a better alternative. Semantically, these terms are essentially meaningless. As such, it is incorrect scoping for determining what is or is not in the DID Document.

I don't think the terms are meaningless -- though they can get sticky to pin down, violating expectations. I think we'll find a similar problem with other approaches, too, as I mention below.

What goes in the document should ONLY be information that enables secure resolution of appropriate resources, within the meaning of RFC 3986 https://tools.ietf.org/html/rfc3986#page-28:
You wouldn't say that a DNS record is about the owner of the record. It's about how you turn that identifier into service endpoints. In the same way, what is in the DID Document is not about the Subject, it is about how you interact with the Subject securely. That is a very specific subset of information "about the Subject".

Yes, but you could say that "how you interact with the Subject did:123" is you "must call him by the name Joe Andrieu". Similarly, you could say "how you interact with Subject did:123" is you use endpoint "https://my-website.com/my-SSN/my-other-private-info/foo". Perhaps we'll end up debating the semantics of "secure" instead. Who knows? But I'm sure a nearly unbounded set of examples like this can be used to violate expectations here as well.

None of this changes (or should change) that we have a graph data model that expresses information about subjects. Again, this is a debate about what should be expressed and where. You may have argued that "private" and "public" are semantically meaningless, but they clearly get across some meaning, even in this conversation. I don't think the distinction "how you interact with the Subject securely" solves the problem you want it to solve. I also don't think we should shy aware of terms that are more commonly understood; they get us closer to where we want to be and help establish the very expectations we worry may be violated.

Perhaps it would be simpler and better to talk about the information in a DID Document in terms of who can read the DID Document.

TallTed · 2019-11-19T21:25:25Z

"Subject" is causing trouble again, still, forever.

Also, a DID document may contain a representation of a graph -- but a DID document is not itself a graph!

We interact with entities (that may be humans, organizations, or otherwise).

Those entities may be identified by DIDs (but those entities are not DIDs). If identified by DIDs, those entities should be the subjects of DID documents which documents contain sentences describing those entities identified by the DIDs, and which documents might also contain sentences describing the documents themselves -- as they should in a Linked Data world -- and in such case, the documents should be identified with a different identifier than that which identifies the entity (the DID) which description is the purpose of the DID document.

jandrieu · 2019-11-20T06:03:18Z

@dlongley I'm not saying they are meaningless terms, I'm saying they aren't black & white. What is private in one context may not be in another. Privacy is innately contextual and the context in which a DID Document might be read is unknowable. In fact, ANY data might be considered private, depending on context. Therefore, private v public is an ineffective way to distinguish between what should be in a DID Document and what should not. There will absolutely be service endpoints that some would consider private, while others will bend over backwards to keep correlatable yet non-private pseudonyms out. It's up to the DID Controller whether or not to use service endpoints (or other data) that might be correlatable and thereby, in some context, be considered private. It's not up to us, in the specification to define, embed, and then police some abstract notion of what should be private and what should be public. That way lies madness.

@TallTed is right. In one lens, of course graphs are about subjects. That's how RDF works. I'm using Subject as the term is defined in VCs and in the spec: the entity referred to by the DID. It's unclear how you mean it.

If the defining nature of what should and should not go in a DID Document whether or not a statement is about a subject (RDF sense), then there is no meaningful distinction; ALL RDF statements are about a subject. Equally so, if the litmus test is whether or not the statement is about the Subject (in the VC and DID sense), that is equally meaningless AND invites putting inappropriate information in a DID Document.

If, instead, you build on the RFC3986 distinction about resolution, then the ONLY thing that should be in a DID Document are statements that enable secure interactions with the Subject, including, IMO, the provenance of the DID Document itself, because it tells you why you should believe any of those statements are "secure".

That's my litmus test. @dlongley, is there anything you want to put in a DID Document that doesn't pass that test?

The examples you gave made my point more than yours. It's trivial (and yet potentially useful) to put information about secure interactions, which violates some notion of privacy. That's why private is a horrible litmus test. In contrast, any information you put in a DID that isn't about secure interactions with the subject absolutely should not go in the DID Document.

Back to the point of this issue...

For ALL DIDs, the only way to know you have the authentic DID Document is to exercise DID resolution according to the DID's method. As such, any supporting meta-data for why you should believe that resolution returned a correct DID Document is provenance that, IMO, should be included in the DID Document itself. Data without provenance is meaningless; therefore, we should embed the provenance WITH the data.

You said

I don't think the distinction "how you interact with the Subject securely" solves the problem you want it to solve.

Could you unpack that? All I want it to solve is defining a litmus test of what should and should not go into a DID Document. The distinction I offer is actually a distinction. You're statements about subjects (or Subjects) provide no distinction whatsoever.

You also said

I also don't think we should shy aware of terms that are more commonly understood

"Privacy" is one of the least understood terms in this industry. Talk to anyone who has been working on the problem professionally for more than a freshman year and they will tell you that regulators, legislators, developers, end-users, and entrepreneurs constantly put forth different notions on what privacy means to them. To some it means to be left alone (Brandeis) to others it means agency (Gropper) to still others it means avoiding PII leaks. There is no commonly accepted definition of what is "private". For a hot minute Personally Identifiable Information (PII) was the red herring many thought would provide a functional way to manage privacy. Turned out that was a horrible way to try and discuss privacy, much less regulate it.

Public and private are not well defined terms. Period.

dlongley · 2019-11-20T15:21:40Z

@jandrieu,

... the ONLY thing that should be in a DID Document are statements that enable secure interactions with the Subject...

That's my litmus test. @dlongley, is there anything you want to put in a DID Document that doesn't pass that test?

I think the problem is with this test -- I suspect just about anything can be construed to meet its demand. Any piece of information about the subject could be understood to be required to have a secure interaction with the subject, depending on the context. The subject's cat's name? Well, on catville.com, that's key. I think this test is actually less useful than thinking about who can read the contents of the DID Document.

jandrieu · 2019-11-20T18:14:07Z

Exactly. So the requirements for catville are different than those for others. But let's take your offer and talk about who can read the contents of a DID Document.

To date, there are zero authorization mechanisms for who can read a DID Document. Are you proposing we add some?

Asking who can read a DID Document when deciding what goes into a DID Document per the specification is, IMO, almost as useless as asking who can read an HTML document to inform the HTML standard. Controlling access to DID Documents is not currently part of the DID specification.

For all of the use cases currently in the DID Use Case document, it is presumed that DID Documents are accessible to anyone who has the DID and access to the mechanisms of resolution per its method. Notable exceptions in the community discussion are contextual DIDs such as did:git and did:peer, where if you aren't a part of the context, you can't resolve the DID.

I expect adding authorization isn't what you mean. Some notion of baking authorization to read a DID Document into the DID Document would be a significant departure from current conversations.

So, from a specifications standpoint, we should assume that ANYONE might read any given DID Document. Which is why ONLY that information directly relevant to secure interactions with the subject should be included.

Putting your favorite cat, a street address, or an email address into a DID Document is an anti-pattern, UNLESS it, in fact, contributes to secure interactions with the Subject. Not that it might--that would lead us to potentially putting the entire data warehouse worth of PII in--but that it specifically DOES. A service endpoint of http://twitter.com/JoeAndrieu IS completely reasonable if that is how the controller chooses to present a channel for secure interaction. Arbitrary statements like "The Subject is known to the State of California as Joseph Andrieu" are NOT.

In fact, that service endpoint MUST NOT be interpreted as saying the Subject is the person who controls http://twitter.com/JoeAndrieu, but rather simply that http://twitter.com/JoeAndrieu is a means to interact with the Subject. That interaction may be understood to be posting @JoeAndrieu publicly--which is, in fact, interpreted by others as sort of a digital drop of messages never even intended for Joe Andrieu.

Can you unpack the insights you think we'd get by asking who gets to read a DID Document?

dlongley · 2019-11-21T16:48:16Z

@jandrieu,

To date, there are zero authorization mechanisms for who can read a DID Document. Are you proposing we add some?

Asking who can read a DID Document when deciding what goes into a DID Document per the specification is, IMO, almost as useless as asking who can read an HTML document to inform the HTML standard. Controlling access to DID Documents is not currently part of the DID specification.

No, I'm not suggesting we propose any. I'm suggesting that we're using an open world data model and that what should govern whether or not something appears in a DID Document depends on a combination of the what the DID controller wants to put there and what the DID method allows. These, in turn, should be governed, at the very least, by an understanding of who is able to read the DID Document.

If anyone can read the DID Document -- then only put information in the DID Document that you're ok with anyone reading. I don't think it has to be more complicated than that in terms of data visibility.

Beyond this, all we're doing is saying in the spec is: if you're going to represent verification methods, controllers, services, etc. -- here's the interoperable way of doing that.

Side note: There are still discussions this group needs to have on GDPR-compliant "proxy/see also" services that can appear in DID Documents registered on blockchains. These services would direct people to more information about the DID subject, including additional service endpoints that may not be able to be written to the blockchain in a GDPR compliant way. This other graph of information could potentially require some authorization to get access to it ... which is one thing I was alluding to.

peacekeeper · 2019-11-25T18:11:01Z

I think I'm mostly with @dlongley in this thread. The RDF statements in the DID document are about the DID subject. The intention is that these statements contain only public information, and the primary motivation is that they will be used for secure interaction with the DID subject. I'm also supportive of the open world model, i.e. a DID document could contain arbitrary other statements, if the DID controller wants that and the DID method supports it. We had a long discussion about "hardening" (i.e. strongly constraining) DID documents about 2 years ago.

The DNS record analogy is partially useful when talking about resolution, but one difference is that a DID is an identifier for a real-world entity, whereas a domain name is not (an HTTP URI containing the domain name might be).

To get back to the original topic, if we want to make statements about the DID document itself, then as @TallTed has noted we would strictly speaking need a separate identifier, and we would therefore need to change the overall JSON-LD structure.

Example 1:

{
    "@context": "...",
    "type": "DidDocument",
    "created": "...",
    "updated": "...",
    "proof": [ ... ],
    "didSubject": {
        "id": "did:ex:1234",
        "authentication": [ ... ],
        "service": [ ... ]
    }
}

In this example, the identifier of the DID subject is did:ex:1234, and the DID document has a separate blank node identifier (it could also have its own IRI). There are a number of problems with this, such as the RFC 3986 rules for dereferencing DID URLs with fragments the way we've been using them (e.g. did:ex:1234#key-1).

Example 2:

{
    "@context": "...",
    "meta": {
        "id": "#meta",     // could be omitted to use a blank node identifier instead
        "created": "...",
        "updated": "...",
        "proof": [ ... ]
    }
    "id": "did:ex:1234",
    "authentication": [ ... ],
    "service": [ ... ]
}

Or similar, with several possible variations. I believe this has similar problems with regard to DID URL dereferencing as Example 1.

Or we just leave things the way they are (maybe preprending certain property names such as "docCreated" as suggested by @dmitrizagidulin). This means that would we accept a certain "conflation" (aka "simplification") of identifiers for the DID subject and the DID document.

I believe we have had this conflation for a long time anyway, due to the two assumptions that 1. the DID identifies the DID subject, and 2. we want to use DID URLs such as did:ex:1234#keys-1. I believe if we wanted to be super correct about RDF semantics and URI dereferencing rules, we would have to drop one of these two assumptions; the implications would be quite significant.

iherman · 2019-11-26T08:06:27Z

Looking at the first pattern of @peacekeeper, with a little additional JSON-LD trick it can be turned into a semantically perfectly sound structure. I have turned example 1 into finished JSON-LD with an additional statement in the context:

{
  "@context": [
    "https://www.w3.org/ns/did/v1",
    {
      "didSubject": "@graph"
    }
  ],
  "type": "DidDocument",
  "created": "2019-11-26",
  "didSubject": {
    "id": "did:ex:1234",
     "authentication": [
        "did:example:123456789abcdefghi#keys-1",
        {
           "id": "did:example:123456789abcdefghi#keys-2",
           "controller": "did:example:123456789abcdefghi",
           "publicKeyBase58": "H3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV"
        }
     ]
  }
}

Which translates in a set of TriG statements as follows:

_:b0 
     dcterms:created "2019-11-26"^^<xsd:dateTime> ;
     a <https://json-ld.org/playground/DidDocument> .

_:b0 {
    <did:ex:1234> did:authenticationMethod
         <did:example:123456789abcdefghi#keys-1> , 
         <did:example:123456789abcdefghi#keys-2> .
    <did:example:123456789abcdefghi#keys-2> 
        did:controller <did:example:123456789abcdefghi> ;
        did:publicKeyBase58 "H3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV" .   
}

(see JSON-LD Playground to experiment with this further.)

I have not looked at example 2 but, at first glance, that seems semantically a bit less clear.

msporny · 2019-11-26T22:08:31Z

Thanks to @peacekeeper for the examples, building on what he has said above.

Example 1 is sort of how we dealt with this topic with Verifiable Credentials.
Example 2 is sort of how we dealt with this topic with the proof property.

Both are valid ways of expressing metadata about information, but here's the real issue:

We made a mistake by calling something a "DID Document". There is no such thing. There is a DID, that identifies a resource, and when you dereference it, you get a representation of that resource. It's information at that point in time... and that's all it is... and calling it a DID Document is confusing people.

There is information, and metadata about information.

Sometimes you serialize that information, and some people call that serialization "a document"... but it isn't. It isn't a unique parchment of which there is only one copy in the entire universe. It's this ephemeral thing, and sometimes you need to say things about that ephemeral thing.

We got this right with Verifiable Credentials. The outermost thing was metadata about the information (metadata about the credential), and the innermost thing was the information itself (the subject(s) of the credential).

I really worry about both Examples, I think they're both wrong.

Example 1 is wrong because it breaks all blockchain-based mechanisms. Submitting Example 1 to Veres One would mean that the DID subject would be setting the created and updated dates, and they have no right to do that. It's the consensus algorithm that decides when entries in the ledger are created and updated.

Example 2 is wrong for the same reason. The DID subject has no right to set the created/updated dates except in the fringe case where they actually control that information (like for did:web).

So, I think the correct solution is this (Example 3):

{
    "@context": "...",
    "type": "DidResolutionResponse",
    "created": "...", // when the DID Resolution was created
    "didCreated": "...", // when the DID identifier was created
    "didUpdated": "...", // when the DID Document was updated
    "didSubject": { // this is what we traditionally call the DID Document
        "id": "did:ex:1234",
        "authentication": [ ... ],
        "service": [ ... ]
    },
    "proof": [ ... ], // proof from the resolver
}

The proposal above (Example 3) is nuanced in its difference from Example 1. It works for did:web and did:v1/did:btcr/did:ethr where Example 1 is very problematic in the latter use cases. Here's how it could work: the did:web Method would state that any file written to a web server MUST be a DID Resolution response. This means that a resolver will hit a did:web method and pull a raw resolution response (that contains a didSubject) from the web server. If a developer just wants the "DID Document", they pull the didSubject field out and give it back to the developer. This creates the proper separation of concerns and doesn't require us to rearchitect what a DID Document is (and frankly, I don't think that would be the right thing to do at this stage anyway).

We do have the authority in the Working Group to specify a data model for "where metadata about the DID Document should go". The trick is doing this w/o opening a massive can of worms that is DID Resolution. So, we have a few options going forward:

State that metadata about a DID Document is out of scope for the DID WG, and it should go in the DID Resolution spec. This one is easy and keeps our scope limited while providing an answer for the did:web folks.
State that the data model for specifying metadata about the DID Document is in scope, but the resolution protocol is out of scope. This one is a slippery slope.

jonnycrunch · 2019-11-26T23:28:55Z

this puts a lot of power in the resolver.

jonnycrunch · 2019-11-27T00:01:31Z

also, just to highlight the self-sovereign cryptographic signature that I as the author of the DID data assert the time created and updated, not that it is necessarily convenient to be there.

jandrieu · 2019-11-27T02:08:43Z

@jonnycrunch IMO, you shouldn't trust a resolver you aren't running any more than a bitcoin node you aren't running, for the same reasons. DLT-based resolution generally requires a full node under the hood. The point of meta-data about DID Document resolution is for a given resolver to provide some level of assurance (mechanism TBD and per-method) for making a trust decision about that result.

The kind of meta-data we are talking about could include just about anything, including identifying information about the resolver, so that one could rely on specific resolvers (either pseudonymous with some notion of reputation or bound to legal entities and their reputations). Another kind of "meta-data" could include the block height of the tip (for BTCR) or even a merkle bloom filter that could be used elsewhere to proof existence on chain of the root of the DID Document. I'm just speculating about these cryptographic assurances, but they are definitely part of the "stack" for deciding whether or not to rely on the result from a given resolver.

msporny · 2019-11-27T02:23:26Z

The discussion on the WG call today was all over the place, and I think the root cause was because no one, including me, has defined what "created" means. At least six definitions popped up during the discussion today:

The time that the DID subject is asserting they created the DID.
The time that the resolver is asserting that the DID was created.
The time that the ledger consensus algorithm is asserting that the DID was created.
The time that the DID subject is asserting they created the DID Document.
The time that the resolver is asserting that the DID Document was created.
The time that the ledger consensus algorithm is asserting that the DID Document was created.

I think @dmitrizagidulin was talking about either 1 or 4, I was talking about 3 or 6, and I'm not sure which one @peacekeeper was talking about.

Let's go at this from the other direction and get very specific about the items being discussed. I don't think having the conversation in the abstract is helping us. Let's just focus on "created" and all make sure we're talking about the same definition before we start talking about where the data should be stored.

peacekeeper · 2019-11-27T02:42:03Z

We made a mistake by calling something a "DID Document". There is no such thing. There is a DID, that identifies a resource, and when you dereference it, you get a representation of that resource.

As an httpRange-14 nerd, I would say: What is that resource that the DID identifies? The DID subject, right? Well that's not an "information resource", therefore it has no representation that can be retrieved, has no media type, and there is no way to dereference fragments like did:ex:123#keys-1. From an RDF semantics perspective, we treat DIDs like identifiers for the DID subject, but from a URI dereferencing perspective, we treat DIDs like identifiers for the DID document.

I think this is the reason why originally we didn't really mind having properties like "created", "updated", "services", "authentication" side by side without distinction.

dmitrizagidulin · 2019-11-27T02:42:09Z

@msporny - The point I (and @peacekeeper) was trying to make is not that there are multiple definitions of 'created'. It's that there are multiple timestamps that need to be tracked. Which may include:

The time that the DID subject is asserting they created the DID Document. (And if that particular DID method uses a proof section in DID documents, this assertion will be signed by the creator.)
The time that the resolver has retrieved the document (currently tracked in resolverMetadata.retrieved property of the resolution result).
The time that the registry (ledger or other mechanism) asserts that the DID was registered. (This is method-specific, and would go into the methodMetadata section of the resolution result.)

2 and 3 already have mechanisms in the (DID Resolution) data model. And what we're arguing is that item 1, the self-asserted creation date of the document, belongs in the DID document.

(We were not talking about the timestamp that the DID was created (as a separate entity from the DID Document), because it's not really possible to record or keep track of.)

dmitrizagidulin · 2019-11-27T02:45:20Z

@msporny I agree with you, btw, that the current property, created, is ambiguous, and should be changed to something like docCreated, to indicate which of the timestamps it refers to.

peacekeeper · 2020-02-18T10:16:35Z

At the Amsterdam F2F meeting in January 2020, @gannan08 ran a session on this topic (see slides).

We then started a document to collect (meta-)data items related to DIDs and DID documents.

The next steps are:

Propose more items in that document (please everybody add items you think are missing!)
Then decide what "buckets" or "types" of (meta-)data we will have.
Then decide where they will go (e.g. DID document, DID resolution result, new to-be-invented data structure, etc.).

burnburn · 2020-03-03T16:42:05Z

Chairs set a 2 week deadline on the document from today after which we can move to the next step.

jricher · 2020-03-03T16:42:14Z

Does DID Document metadata belong in the Document? #65

Does DID Document metadata belong in the Document? #65

Comments

dmitrizagidulin commented Oct 9, 2019

A) This metadata is already in the registry

B) Potential for developer confusion

C) Use cases

D) Offload this topic to DID method specific specs

E) Conceptual elegance

dlongley commented Nov 19, 2019 • edited Loading

iherman commented Nov 19, 2019

jandrieu commented Nov 19, 2019 • edited Loading

dlongley commented Nov 19, 2019

iherman commented Nov 19, 2019

jandrieu commented Nov 19, 2019

jandrieu commented Nov 19, 2019

dlongley commented Nov 19, 2019

TallTed commented Nov 19, 2019

jandrieu commented Nov 20, 2019

dlongley commented Nov 20, 2019

jandrieu commented Nov 20, 2019

dlongley commented Nov 21, 2019

peacekeeper commented Nov 25, 2019 • edited Loading

iherman commented Nov 26, 2019

msporny commented Nov 26, 2019 • edited Loading

jonnycrunch commented Nov 26, 2019

jonnycrunch commented Nov 27, 2019

jandrieu commented Nov 27, 2019

msporny commented Nov 27, 2019 • edited Loading

peacekeeper commented Nov 27, 2019

dmitrizagidulin commented Nov 27, 2019 • edited Loading

dmitrizagidulin commented Nov 27, 2019

peacekeeper commented Feb 18, 2020

burnburn commented Mar 3, 2020

jricher commented Mar 3, 2020

OR13 commented Mar 4, 2020 • edited Loading

peacekeeper commented Mar 10, 2020 • edited Loading

peacekeeper commented Mar 10, 2020 • edited Loading

msporny commented Mar 11, 2020

jricher commented Mar 12, 2020

nikosft commented May 7, 2020

OR13 commented Jun 9, 2020 • edited Loading

peacekeeper commented Jun 9, 2020

peacekeeper commented Jun 23, 2020

peacekeeper commented Jul 29, 2020

msporny commented Jul 29, 2020 • edited Loading

peacekeeper commented Sep 1, 2020

brentzundel commented Sep 18, 2020

dlongley commented Nov 19, 2019 •

edited

Loading

jandrieu commented Nov 19, 2019 •

edited

Loading

peacekeeper commented Nov 25, 2019 •

edited

Loading

msporny commented Nov 26, 2019 •

edited

Loading

msporny commented Nov 27, 2019 •

edited

Loading

dmitrizagidulin commented Nov 27, 2019 •

edited

Loading

OR13 commented Mar 4, 2020 •

edited

Loading

peacekeeper commented Mar 10, 2020 •

edited

Loading

peacekeeper commented Mar 10, 2020 •

edited

Loading

OR13 commented Jun 9, 2020 •

edited

Loading

msporny commented Jul 29, 2020 •

edited

Loading