-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DID Doc Encoding: Abstract Data Model in JSON #128
Comments
@SmithSamuelM Thanks for posting such an exhaustive case for an abstract data model. This was the issue I raised in #103, and I think this largely supersedes that thread. Although I originally proposed defining the abstract data model using a modeling language like UML, I am persuaded by your argument that doing it in a simple, universal encoding like JSON will make it more approachable to developers and thus better for adoption. I fully understand that this is ripping off the bandaid on the tension between JSON and JSON-LD for DID documents. Given how low DIDs and DID documents are in the trust infrastructure stack, I am heavily in favor of "the simplest thing that could possibly work"—above all because of the need for this layer to be as rock-solid as possible from a security standpoint. |
I am in favor of this proposal. While I recognize that JSON-LD provides some expressive power that ordinary JSON does not, I think the cost-vs-benefit for that expressive power is not a good tradeoff at the DID level. DID docs should be simple; they are a foundation that many things build on, and should not introduce onerous dependencies. Developers shouldn't have to learn JSON-LD to process DID docs. I think the case for the expressive power of semantic-web-style constructs like RDF/JSON-LD is stronger at the VC level than at the DID doc level. |
From a requirements perspective the simplest necessary and sufficient representation should be preferred over any unnecessary but sufficient representation especially if the later is more complex than the former. This proposal does not forbid the later but merely enables the former. |
I'm very concerned that we don't have good examples of DID Methods that don't use JSON-LD at all. So people who don't understand JSON-LD, just kind of hack around it which leads to weakening JSON-LD... I'm happy to help clarify this issue. I think we need to provide some clear examples for how to use DID Core spec without JSON-LD and with it, and how to not muddy the waters, and improve the security understanding for either decision. To be clear, I'm actually a huge fan of JSON-LD, and intend to keep using it with ...to actually address this issue proposal directly I'm in favor of this proposal, and I would like to see JSON-Schema used to help provide better clarity on what is and isn't allowed. |
My question is: do we intend to define a DID document exhaustively, i.e., will we define all keys (terms) that can be used in a DID document, or do we envisage that other actors (methods, applications, controller, whatever) may add keys to a DID document that are not defined in this spec? The power of JSON-LD comes if we allow for the latter. On the other hand, if we want to define all possible keys a DID document may contain then the advantages of using JSON-LD becomes a question. |
@iherman I don't believe that we need to exhaustively define all keys up front. JSON is an extensible self-documenting data format that supports hierarchical mapping constructs. This makes it possible to discover extended content. The NoSql database world is filled with examples of document oriented databases where this is a standard practice. The RDF construct imbues a specific semantic that many find useful especially if one is building a graph databases but a graph database is not necessary to provide extensibility especially at the low level where DID Docs operate. Verifiable Credentials on the other hand are a different story. But my concern is that RDF has become a greedy paradigm that at least for the DID spec has resulted in unwarranted complexity and moreover due to its unfamiliarity causes unproductive confusion. This proposal does not preclude a JSON-LD implementation, it merely facilitates a specification that does not have the RDF data model as a dependency in order to better foster universal adoption. |
+1 Exactly. I think this is the next step. In many previous attempts to do this we have become bogged down by the complications of the "right way" to do this in RDF as opposed to not using RDF as the mental model. IMHO given the primary purposes of a DID Doc outlined above, the cryptographic considerations are paramount. |
I think we absolutely intend to do the former. This is self-sovereign technology with an aim at decentralized extensibility. I disagree with the premise that there is a "lot of time being expended unproductively in community meetings" on this subject. I would also argue that we will spend significantly more time rewriting/reinventing the parts of the JSON-LD standard that we're using here to accomplish the same goals. Either that, or we will have to head in an entirely different direction, and start assuming we know everything about how things should work and close off innovation at the edges. In other words, while I think this proposal is well intentioned, I suspect, if we were to adopt it, the outcome would be a need to duplicate significant complexity into our own spec instead of relying upon the work others have already put in (and that has already been standardized). All of this would also come at the cost of interoperability. I don't think people realize all of the benefits we're getting from piggybacking on top of JSON-LD (e.g., SS/decentralized extensibility, generic data model that can be understood by tools (already) written once, ability to reference objects in the data model by ID using an existing standard, hash resolution rules, and more would come to light as we painfully discover what we've lost....). Taking any other approach will be necessarily closed world or a reinvention of the wheel. Furthermore, I think our spec already insufficiently expresses all of the things we're assuming work a certain way and we're working hard to improve this. To cut out the layers it depends on would only increase this burden as the benefits we assumed we had slip away. |
This is all siloed data that cannot be combined with anything else. This is exactly what we want to avoid and exactly why having a more generic data model that expresses relationships is useful for decentralized extensibility. |
There is much value in what has already been done. This need not and should not be discarded. The problem is that the full syntax and semantics of the RDF model are not replicable in other encodings, at least not without major effort. Consequently we want just the good stuff. The essential constructs that are both valuable and universally applicable. An abstract data model does this and what is proposed is that this abstract data model be expressed in JSON. It certainly can have the "right" semantics that may be essentially the same as JSON-LD without requiring all that JSON-LD requires. This makes it not siloed. Siloing is not the same as not using JSON-LD. Any standard representation with agreed upon syntax and semantics is not siloed. An extensible hierarchical mapping data construct is perfectly adequate for expressing interoperable semantics. The process of defining those semantics is important. This allows for extensibility over time. Attempting to canonicalize a universal data graph up front is a difficult if not impossible task and is one reason not to be drawn into an RDF approach. |
@iherman hits the vital point. If we want a DID Document to be extensible without namespace conflicts, we need JSON-LD (or its equivalent). If we want to define a concise and limited set of specific properties that define a DID Document, JSON alone is fine. There may be other JSON-LD features we'd lose (I seem to recall something about language-specific things like character order), but it is the extensibility that appears to be the most significant. One thing I keep seeing as a point of confusion from advocates of JSON is that UNLESS someone exercises extensibility, JSON-LD is JSON. So all of the tools and practices for a fixed-schema JSON work just fine with an un-extended JSON-LD serialization. As long as the context is unchanged and the JSON properties are of the constrained set, then you can treat JSON-LD as JSON. It is only when the document is extended that you need to evaluate the contexts. Which is exactly when JSON alone runs into trouble. That makes the real question the one I started with. Is extensibility important? @SmithSamuelM's last comment came in as I wrote this and I'm not sure how to interpret his comments on extensibility. No one is proposing a universal data graph up front. Certainly not the JSON-LD advocates. The point of advocacy is an open world data model where extensibility is afforded from the start. "The process of defining those semantics" sounds like you mean that a DID v2 could extend the specification. Yes, that's true, but you could only do so through testing non-compliant implementations unless you start out with an extensible serialization. It is that limited definition of properties implied by JSON only, that I believe @dlongley means by siloed. |
You can encode the RDF model in JSON (this is what JSON-LD is) -- and the argument here is to use JSON. JSON-LD is JSON. Could you provide a concrete example of the problem you're highlighting?
My reading of this is exactly what we want and already have... but it translates to: use JSON-LD and keep the core simple for JSON-only consumers. Someone treating JSON-LD as any other JSON (unless they want to use the extensibility features) shouldn't notice any difference. This is the same approach we took with VC with success.
There are libraries to do this and specs in the works for future standardization (Note: I don't think we say anywhere that you must do this anyway). I don't think this is a strong reason to avoid the approach, especially given the other benefits we get from it. But, again, I feel like we are already where we need to be with respect to getting extensibility from JSON-LD/RDF and simplicity from JSON. |
I see this more from a philosophical perspective than from a practical one. I don't think it's super hard to process JSON-LD if you only have plain JSON tools and knowledge, and vice versa I don't think JSON-LD provides that much extra needed functionality for DID documents that can't also be done with plain JSON. So in terms of how hard, or secure, or extensible it is, I think it doesn't matter that much. For me the main purpose of DIDs is to try and model digital identity in a way that approximates as much as possible how identity works in the physical world. This is why I'm a big fan of @SmithSamuelM 's KERI work, where the root of trust is entropy alone which is available to everyone without dependency on anything else. This also means that for me, the question of data format of the DID document is primarily about describing who you are in the digital world, and how to interact with you. This is also why it's important to talk about metadata about the DID subject vs metadata about the DID document (#65), about httpRange-14, and similar very theoretical topics. From this perspective, I believe a description of (the core of) my physical identity in the digital world can be more appropriately done with a semantic RDF graph model, than with a plain JSON object tree of keys and values. So I like JSON-LD DID documents better than plain JSON DID documents. I believe getting these conceptual foundations right is more important than mass adoption. I'm also in favor of describing the data model in an abstract way and then allowing different formats such as JSON-LD, plain JSON, CBOR, XML (#103). But the abstract description should be in UML or in English, not in JSON, because that wouldn't be abstract anymore. |
+1 to that. A JSON serialization is a very concrete one, not abstract. |
@peacekeeper Unfortunately, I disagree in the strongest terms with these two statements:
and
This is the wrong mental framing. If you see DID Documents as about the Subject, you are creating a privacy nightmare. Period. DIDs and DID Documents only present ways for securely interacting with Subjects. They MUST say nothing about the Subject except these are alleged means of secure interaction. DIDs are NOT your physical identity--online or off. They are a means to communicate with a counter party how to bootstrap secure interactions. I give you a DID and, in theory, I'm giving you a way to interact with the Subject. That's it. FULL STOP. DIDs should never be tied to a specific person, because that can change. Yes. If you didn't get that, you need to understand that a given DID's Subject can change from one physical person to another. If that's outside the scope of what you have imagined so far, simply consider a DID with the Subject of the King of England. Not only has that Subject changed from time to time, it actually doesn't refer to any specific person at this moment in November 2019. Sometime in the next decade or two, it almost certainly will. And that is completely independent of whatever might be recorded in a DID Document. Similarly, DID Documents should never contain information about a specific person other than that which enables specific secure interaction. I've made this argument already. Imagining the DID Document as about the Subject, without filter, will absolutely create privacy harms. Real ones. And when we achieve the scope of ambition we have for these identifiers, those harms will escalate to loss of liberty and even life. Don't imagine for a minute that privacy leaking DID Documents won't eventually kill someone. This is EXACTLY why many definitions of "persistence" as a goal for DIDs is flat out wrong. I've voiced this before and I'll voice it until my dying breath. DIDs are intentionally, and should always be, a fundamental separation of concerns between the physical and the digital. Framing it any other way paves the path for exceptional abuses of this technology. |
Agreed, but those means of secure interaction can still be considered statements about the person, semantically not so different from saying what your name or address is. I am not saying DID documents should contain any more than the minimum amount of information for secure interaction, but semantically, DIDs are still identifiers for the DID subject, They are more than just something like an IP address for reaching the DID subject. At least that's my own personal perspective, I won't insist on it strongly. I can also understand the arguments for simple, constrained, robust, plain JSON documents that are similar to DNS records, and that fulfill their well-defined purpose on a lower, separate layer than the actual "identity layer" that establishes your digital self.
Are you suggesting we drop the "persistence" principle of DIDs? How would you be able to cryptographically prove that control of the DID has been transferred to the new King? The traditional thinking has been that in this case, the new King would have have a different DID than the old King. The old King's Verifiable Credential would get revoked, and a new Verifiable Credential would get issued to the new King.
Agreed that it's super important to avoid privacy leaking DID documents. I think I could also argue that if the DID is only seen as a lookup key for some technical metadata, not as the root of your digital existence, then that wouldn't fully set you free and make you "self-sovereign". But I can also understand your view, see above. I am not sure if there's any contradiction here, does the question of the DID document format have anything to do with the goal of avoiding privacy leaking data in those documents? |
I just wanted to chime in to say that I agree with this limited conception of DID Documents, which I think is close in spirit to the one Joe is arguing for. I do not agree with a richer conception that overloads them with lots of meaning and infinite extensibility. I think other resources, accessed through service endpoints, is where that belongs. Simpler is better, at the relatively primitive communication-enabling level where DIDs belong. |
Can we get more precise? If I parse this statement very, very carefully, I don't disagree with it--but a lighter reading gives what I consider a faulty impression. Here is what the DID spec currently says about persistence:
Now, Joe's statement doesn't say that the DID subject can change; it says that the person associated with the DID subject can change. I agree with that. If a DID's subject is "King of England", then the DID's subject hasn't changed when the person playing the role of "King of England" changes. The subject is stable; the person associated with that subject is what changed. This is more or less how we expect organizational DIDs to work. The staff of a company evolves over time, but the DID's subject--the company--remains constant. But this is not how DIDs for ordinary people are expected to work. For ordinary people, the people are the subject. And a DID like this can't be an identifier for Alice today, and Bob tomorrow. So when the subject of a DID is a person instead of a role, the person in question is immutable. Agreed? |
JSON is a notation. Hence the 'N' in its name. It is true that it can also be a serialization format--but we do not have to view it that way for the purposes of writing a spec. JSON as a notation is terser, clearer, and easier to work with in text than UML or fuzzy human language. Expressing the hierarchy and sequences of a data model with {...} and [...] makes much better sense to me than deliberately picking something clunky and less precise. As long as we say that the notation can be rendered in various serializations (including JSON-as-serialization, CBOR, etc), I think it's an optimal choice. |
A couple of historical complications of JSON-LD
Both of these suffer from the complication of making external unexpressed dependencies part of the DID Doc. At least in an abstract data model we can make all dependencies internal. Implementers of a optional JSON-LD encoding could expand their dependency space at their leisure without encumbering the spec for everyone else.
For example if a user sees two different versions of a DID Doc that are both signed with the same key pair(s), how does the user know which one to trust or which one is the most recent? There are many mechanisms for helping the user make this determination such as a sequence number, a hash in a chained set of hashes, a date time stamp, a version number etc. That information needs to be inside the signature. It needs to be unique in the document. But these sorts of questions often take a long time to answer when encumbered by JSON-LD semantics and syntax. |
We are using two different definitions of extensible. What I mean by extensible is the the document has a core set of defined contents and may be extended by adding additional contents. What appears to me is that the JSON-LD folks mean extensible to mean the a DID-Doc is extended by an external world model. In other words a DID-Doc is an intensive part of an extensive data model. The latter definition is the problem. It explodes the dependency space. It makes discussion difficult. We need to discover the authoritative keys for the DID. Once we discover those we need to discover a few other things like how to access service endpoints that provide other functions or resources. But in a cryptographically verifiable way. That discovery needs to be authoritative. There are a few core things we need to know to make that discovery authoritative. Once we have made it authoritative there are a few other things we now know of that are common that we need to discover like service endpoints and how to talk to them. We can define these in JSON and then add others as they become important over time (extend the core contents of the document to allow that) Extending the document to include a world data model is mixing the larger question of identity with the smaller questions of how to do discovery of the authoritative keys and services. I frankly am having a hard time appreciating why a DID Doc has become the source of this greater problem. It makes doing the simple tasks harder. It is mixing concerns. A DID Doc is meta-data to bootstrap authoritative verification of attestations made by the controller of a DiD. All these extended world model usages could much more appropriately be included in a verifiable credential about the controlling entity. Let’s just have a bootstrap to a service endpoint that provides tha verifiable credential. The verifiable credential then has the extensible world model avaiable to it. This is what I call paradigm greed. That is trying to apply the schema centric approach of an extended world model to the bootstrap needed to credibly verify the a document (verifiable credential) describing the intensive part of an extensive world. Ever computation task is not best described via an extensive world data model. We need clean separation of concerns to do secure cryptographic bootstrapping to a state where a verifiable credential can then provide the world model. Many of the things I see being suggested for the DID Doc could be put in a verifiable credential. Let’s do that and keep DID Doc simple. Indeed, I propose this criteria, any information about the subject entity of the DID that could be provided via a verifiable credential obtained from a service endpoint should not be in the DID-DOC. The only things that should be in a DID-DOC are those items needed to first bootstrap the control authority needed to bootstrap secure communication to such endpoint and validation of said verifiable credentials. Verifiable credential are wonderful things. Let’s have more of them. But not disguised as a DID Doc. |
DID Docs should be "extensible" in the same way and to about the same extent as HTTP headers are extensible: you can add extra stuff without breaking anything, and if the entity you're communicating with groks that extra stuff, fine. Otherwise, it has no effect. We do not need "extensibility" if it means namespacing, a complex resolution/processing model, @contexts, etc. Those additional complexities are very reasonable when you need a true semantic graph (as with VCs)--but the power of DIDs is tied more strongly to their simplicity than to the semantic power of DID docs. If you want semantic power, use services at endpoints, not the DID doc itself. |
The more I think about the above suggestions @SamuelSmithM @dhh1128 the more I am convinced that it cuts to the root of the issue. Do not put anything in a DID Doc that can be provided by a verifiable credential at a service endpoint. Only put in the DID Doc what is essential to access the verifiable credential. With that filter we will have very little left to put in the DID Doc merely the bare essentials and these will hardly need an extensive semantic model. Because if they did then they could be provided by a verifiable credential. We just need the minimum to bootstrap. |
It might help crystalize the mental model to change from DID-Doc to DID Discovery Doc or simply DID Discovery.Data |
The problem, as I see it, is did:peer - that should not be a DID method. In the context of pairwise communications the semantic issues are vastly different than in a 1:* communication about an identifier - where we desperately need JSON-LD. In pairwise communications we do not need machine processable semantics, the semantics should be determined as part of the communication protocol - but in terms of VCs and 1:* DIDs, general purpose, extensible semantics are critical. |
I need to understand a use-case where two communicating parties, or a small group of parties, are communicating and need to appeal to some global semantic mechanism. I just do not see it - and with that, issues like service endpoints, fragment processing, persistent reference get in the way. If I have a DID assigned to a specific pairwise communication, or to a specific credential, the need to discover how to communicate is unclear - it is like calling someone on their phone and then asking them for their phone number. If you can call someone on their phone you do not need a zero-knowledge disclosure process for discerning and validating their phone number - you have that already. On the other hand, when trying to discover and communicate with people, organizations, and things - when you are looking them up in a registry or confirming their claimed identity, then you definitely need all the other mechanics of "fat DID documents" - and when you deal with fat DID documents you need machine processable semantics - which is what JSON-LD provides. Opting for a 2nd layer of meta-configuration about the semantic milieu adds enormous and unwarranted complexity - avoiding JSON-LD in order to re-create semantic negotiation adds tremendous complexity and inhibits adoption. We need to separate out PIDs (Peer/Pairwise DIDs) and DIDs (public decentralized identifiers) - if a ledger is involved, it is a public DID - why else go to the trouble of anchoring it to some global oracle of authoritative state? PIDs are critical - but they are so dramatically different in their use-case domain that trying to get a one-size-fits-all DID document leads to exactly the sort of confusion we are struggling with. I want to see DID:peer -> PIDs and I want to see pairwise DIDs removed from our lexicon - let JSON-LD rule the landscape of interoperable, multi-system, multi-platform identification. They can share some root utilities - like KERI, but these are apples and oranges. |
@ewelton : That's a fascinating take. Initially I hated it, but now I'm stepping back and trying to evaluate more thoughtfully. I'm curious about the broad claim that "when trying to discover and communicate with people, organizations, and things--when you are looking them up in a registry or confirming their claimed identity, then you definitely need all the other mechanics of 'fat DID documents.'" This seems doubtful to me, because we have systems like this today (only not as decentralized), and I don't see them as needing what you claim. But say more about that; maybe you can lead me along... (Possibly we should drop off this comment thread, though, if we veer too much into a tangent from Sam's original intent for this issue...) |
I agree with this statement:
and @dhh1128 - I believe, strongly, in your vision of pushing computation to the edge. I remember attending your presentation in Basel, and that picture with client-server and client-blockchain stuck in my head. I think you are right. but I also believe strongly in JSON-LD - at a recent meeting with a government I was pitching DIDs as an alternative to centralized governmental certificate authorities. One of the selling points was that "you don't have to be bound to semantics, @context gives you control" With Sam's proposal, I lose the flexibility that, just last week, I used to try to sell DIDs to a government in lieu of a centralized authority. What it comes down to is "why are you resolving a DID" - the reason is that you want to engage it - either to perform authentication, or to open a communication channel. That is completely reasonable in the context of people, organizations, and things who participate in a society using DIDs. Participation in a society, especially when that crosses borders requires semantic negotiation - and I think that JSON-LD is about the best offering on that front in the last few decades. On the other hand - there is absolutely no need for that level of semantic capacity when in a "micro-society" - one of the whole points of a private communication is the benefit of a shared semantic. This is what drives "inside jokes between friends" In fact - I've been working (since Basel) on an actual mathematical result around this - it should be possible to exceed the naive shannon information capacity of a channel through "inside jokes" - between friends you can benefit from a form of steganography, so that communication remains secure even if the raw crypto is cracked. This is because the sender/receiver have a semantically tuned system - pairwise communications should not just be about syntax, it should be semantically pairwise - and that means that JSON-LD is useless overhead. |
I understand what @ewelton is saying, however I strongly disagree, both about peer DIDs needing separate treatment and about the requirement to have an extensible semantic graph model at the DID level of decentralized infrastructure. Ironically peer DIDs prove the point that @SmithSamuelM is making: all DID-to-DID communications require bootstrapping a cryptographically secure connection—whether the connection is peer-to-peer or one-to-many. The same underlying mechanisms—persistent identifiers, public keys and service endpoints—are needed in both cases. Sam's argument is that this is all that is needed, and that adoption will be easier and security (and privacy) will be stronger if this is all that is included in the data model (in other words, follow the dictum of the simplest thing that could possibly work.)
While I can understand someone coming from this POV, let me make sure it is clear why Sam and I and others on this thread have been arguing the exact opposite: if what you're trying to solve is a generalized discovery problem, then you not only need tools like a semantic graph model, you also need name services, directory services, search protocols, etc. That's a whole different problem space. And there are tools and technologies that already work very well for that problem space. All those tools and technologies need to do is add DIDs to become even more useful for discovery. If OTOH the problem you are trying to solve is the Decentralized Identifier problem space: entity-controlled persistent decentralized identification and bootstrapping of cryptographically secure communications, you neither need nor want any of those other features. Think of it like the difference between DNS and Web searching. The former uses a highly constrained type of identifier and very simple flat record format to solve one specific problem very well at scale. The latter uses a rich set of identifier schemes (URIs) and highly extensible markup languages. The latter is where you need a semantic graph model, not the former. |
One more point for everyone on this thread: allowing a JSON encoding to be defined in pure JSON without reference to a semantic graph model does not prevent those who want to use a semantic graph model from defining an encoding in JSON-LD, or N-Triples, or N-Quads, or Turtle. Nor does it prevent the CBOR community from defining an encoding in CBOR. |
An open world information model is not a subset of a closed world information model. This is the disconnect. That the concrete data model uses syntax that could be classified as an extension of a simpler syntax does not make the expanded information model simpler but makes it more complex. Semantics and syntax are two different types of complexity. This is why the information model agreement is so vital. This hard functional dependency is exactly why layering is the appropriate model. We can now encapsulate and separate the functionality of the two layers. This removes the most difficult security problems from layer 2. |
I didn't argue this -- so I don't think it's the disconnect. I think we'd all like to get to more common understanding. This is another reason for us to try and focus on a concrete PR. We may find there's actual agreement on whatever you put forward -- or, we may find out where the disconnects really are!
I agree with this, I'm not sure why you think otherwise. Perhaps you think I agree with the maxim "the simplest thing possible is always the best approach". If so, I don't -- what I think is that complexity trade offs should be worth it. Adding constraints (as JSON-LD does) increases complexity for producers of extensions. However, these constraints are added because they have a net decrease in complexity for consumers of extensions and create an increase in interop and reusability. Of course, this is generally why constraints are added. Saying we won't have any constraints just means that there won't be any interop -- what we've created is "too simple" for it. The priority of constituencies allows for spec writers and extension authors to take on more complexity such that consumers may take on less. Consumers know that extensions must all abide by those constraints -- and devs can often write applications or tools just once that are able to consume any information that uses the same approach. Often the complexities we must deal with in writing specs and constructing data models need not be understood at all by other parties -- yet they reap benefits from this approach. An alternative approach would be to force anyone who wants to write an interoperable extension to form a WG and go through the standardization process. Anyway, it's fine to argue that you think we could simplify things to only support the use cases you're interested in. However, a WG is about compromise -- where we attempt to support everyone's use cases to the best of our ability. I think it would be much easier to decide whether certain use cases will be harmed if there's a concrete proposal (a PR) on the table rather than talking about all this in the abstract. |
I'm really impressed by @SmithSamuelM's Information Model Agreement post. It contains a lot of actionable truth that should help us focus the discussion and reach consensus on a simple, secure, privacy-preserving information model to inform the DID specification. |
I think this is insufficient. I believe it is essential to also be able to discover other information about a DID subject at the root level. In fact, some DID subjects may not have any keys or may not have keys that can be used to make assertions, so you go straight from the root of trust to these other pieces of information. This use case is missing in your analysis and I believe explains at least one of the disconnects in this issue. |
I do think this conversation has been insightful and valuable, and I definitely think @SmithSamuelM brought a lot of very clear, valuable, and excellently expressed insight. I think we all brought positive insights to the table despite the echoes of tooth-grinding. At the end of the day we are not doing this to prove points to one another, or to vie for rightness, it is about advancing the core of the internet forward in a very meaningful way. I would like to second the efforts to move the discussion into a PR that reflects the restricted use-cases and limited semantics of the proposal. Issue #65 for example, touches on many of the same abstract discussion points - it would be very helpful to be able to evaluate issues like the discussion in #65 against a concrete vision of a semantically restricted spec - we could then clearly evaluate the impact of the proposed spec in the issues of DID-resolution, DID-metadata, the role of service endpoints, and cases where the DID is not referencing an Aries attached layer-2 agent. I would like to see the reduced, restricted, and simplified model of DIDs in a formal PR - that PR should also clear up the ambiguity introduced by the terminology in the current reference draft. How can we best move towards that PR? |
I am shocked and rather appalled by this statement, reportedly coming from someone who should be an expert in the areas of which they speak, but who demonstrates with this statement that they understand neither JSON-LD nor the Open World data model. (I won't dig into the logical fallacy of [unverifiably] Appealing to Authority, but that's also worth noting.) JSON-LD is not a data model, it is a data serialization format, which is a subset of JSON. If JSON is viable for trusted computing, JSON-LD is also viable. If JSON-LD is not viable, neither is JSON. Note: I believe both are viable for such use, depending primarily on the data serialized therein. The Open World Assumption in this context basically says that "anything that isn't explicitly stated, is unknown" (and that "anyone can say anything about anything", but says nothing about the veracity of those assertions) -- which is a much stronger base for security than the Closed World Assumption, which is basically that "anything that isn't explicitly stated, is not so". I'm guessing that the speaker described above was referring to the common "anything not explicitly permitted is forbidden" security mantra (which is commonly placed in opposition to "anything not explicitly forbidden is permitted"), which has nothing to do with the Open World Assumption, nor with JSON-LD. |
This is easy, someone does the work and puts forward a pull request on the spec in this repo. Any member of the WG (including any employee of any organization that is a member and invited experts) can raise a PR against the spec. If you are not a member of the group, you can always talk to the Chairs/Staff to see if they'll grant you Invited Expert status if you do the work to put together this PR and it looks like it's going somewhere. |
I'm sure there are longer W3C issue threads than this one, but it's definitely the longest one I've ever been involved it. I was at the Hyperledger Aries Connect-a-thon in Provo all this week and each night when I tried to catch up with it I could never read it to the end ;-) However today on my flight back to Seattle I was finally able to finish. So let me share two thoughts. First, I believe this discussion, as long as it has been, has been valuable to the community as it has drawn in a wider set of views about the purpose and information architecture of DIDs and DID documents that have been present at the Credentials Community Group stage of the spec. Second, RE next steps, a number of posts have asked for a "concrete PR" so we could stop arguing in the abstract. While of course someone could simply draft a PR redefining the data model in JSON and removing all dependencies and references to the JSON-LD spec, it’s not at all clear to me that’s the right next step. Rather I expect it might simply result in triggering the same discussions all over again and polarize us further. Instead, I believe this discussion shows there are deeper issues we need to come to agreement on first. But rather than argue those in the abstract, what I would like to suggest is that we can do is break them down into a series of relatively concrete decisions we can discuss and make together. And that will result in steady progress towards consensus on the way forward. Once we have done that, what should be in an eventual PR (or set of PRs) will likely be far more obvious and far less controversial. My plane having landed, I am going to grab a Lyft and then start a new issue on the first of those concrete decisions I think we can make together. |
For the sake of argument I created a DID Method based on did:key, but using JOSE, that has no https://github.com/transmute-industries/did-jose As I note on this issue which is related: w3c/vc-json-schema#7
Also noted on that issue is that I think this approach of requiring the without the context, its just normal json and all the features of jsonld are lost... how will we maintain interop? what is the extension model? so many different ways we could solve these issues, and each feature that we lost will need to be addressed in some fashion... with The more I think about trying to solve this by somehow getting rid of JSON-LD and replacing it with more relaxed normal JSON, the more I feel like its a maybe not a good idea... because while its easy to delete the sure, not everyone uses all these features, but we get them for the price of an |
@OR13 You are going right to the heart of what I believe is at the very center of this debate (and the reason that this thread is so long): there are two different worldviews in conflict here. One worldview, which I'll call the "JSON-LD worldview" or more generally the "open world semantic graph worldview" believes in the power of semantic graphs and wants DID docs to have all (or most) of the features that @msporny describes here. The other worldview, which I'll call the "plain JSON worldview" or more generally the "hierarchical deterministic worldview" feels just the opposite. They do not want to deal with semantic graph models and do not want most of those features because in their view those features represent challenges to: a) simplicity, b) security, and c) privacy, all of which make life more complex for developers and threaten to hinder adoption. In my experience, there are no simple solutions to worldview problems. Almost by definition, both groups are starting not just from different assumptions, but more importantly, from different value models, i.e., views of what it is important and what is not important. Again, that's why this discussion has gone so deep and so wide. Each group is trying to convince the other about its entire worldview. That's a hard, hard problem. The reason I started issue #140 was to start to explore one potential solution which I'll describe briefly here since it's relevant to this issue as well and also to #103 (which started this whole discussion). The essence of the idea is to stop trying to get the two groups to agree on a worldview before we can move forward Instead turn things on their head and do this:
Then, when both groups are done (or far enough along to be ready), get the two groups together and compare/contrast/discuss where they have landed and why. My guess is that the plain JSON folks will have developed a hierarchical deterministic model that is an easy-to-describe subset of the JSON-LD model. If so, aligning the two will actually be pretty easy. We'd end out with two encodings—one in plain JSON that's fairly restrictive (but meets the plain JSON folks requirements), and one in JSON-LD that's much richer (and meets all the JSON-LD folks requirements). And both can work! I'm very curious what you (and others) this of this possible path for moving us forward. |
@OR13 @SmithSamuelM @dhh1128 @ewelton (and others) - Please always wrap
-- except where you are intentionally tagging a github user. (Optimally, go back and edit your previously posted comments to do the same.) There is a github user with the |
@talltree. Well stated +1. When the JSON folks say they want the simplicity of not having an open world extensible model the JSON-LD folks respond that simplicity comes from that very same extensibility. These are two different types of simplicity and they are based on two different design aesthetics. The JSON folks have a very clear view of what they want to do and how to do it and they rationally have concluded that They don’t need JSON-LD. Likewise the JSON-LD folks have a very clear view of what they want to do and how to do it and they rationally have concluded that they need JSON-LD. Its like someone telling someone else they are irrational for preferring pizza over ice cream. What is irrational is to believe that the other side is irrational and that one can persuade them to change their aesthetic. It takes more than that it takes finding a common aesthetic that overrides the conflicting world model aesthetics. So absent that, the practical question is how best to support both aesthetics. And an abstract data model is likely the only approach that could work for both. |
I am a bit worried by the approach that you propose #128 (comment), @talltree; you may underestimate the difficulty of "merging" the two approaches at the end of such a process. My approach would be a little bit different, namely to do this jointly with some principles in mind.
Is this a viable design method moving forward? |
@SmithSamuelM has posted his comment almost at the same time :-) He said:
and that is perfectly fine and true. But the abstract data model has to be embodied in a syntax, and I am worried to create too many syntaxes in parallel might backfire on us. |
I think that making The proposal to use JSON as the default encoding would minimize syntaxes and would be compatible with having JSON-LD syntax be a MAY versus a MUST. But that proposal did not seem to go over well. Hence the alternative of an abstract syntax. But I agree that your proposal is a reasonable way to enable the two approaches to the world model to co-exist. |
Please note that github user Also, github users We really need to be more careful in how we refer to entities! |
@iherman I see your point and agree what you suggest could be a constructive way for the two groups (representing the two worldviews) to work together on the semantics. I'd like to explore that in more detail as it may be the fastest way forward. RE the
When I started looking at the
That would a very clean way for us to have our cake and eat it too, i.e., for all DID documents to share the "simple JSON" syntax and then for DID document authors who want to use the features of JSON-LD to be able to do that with a clear indication of that processing model. |
Looking at it from the point of view of testing (that will become a core issue in the rec process later), what can be tested is the presence of, or the absence of, the But I must admit I do not have a strong feeling about this, I see it as a stylistic difference. I let the document editor work this out :-) We should all be careful about the usage of |
@talltree wrote:
Finally, something we can work with! Thank you @talltree!
Good, this is a concrete requirement that enables me to write a concrete PR against the requirement.
Yes! Another good requirement. For the rest of you in this thread, these are the sorts of things that help editors write text that may achieve consensus. Ok, so I've now spent close to 5 hours reading and re-reading this thread and just spent two hours trying to construct text that I think may have a chance at achieving consensus. Here's a concrete PR that attempts to synthesize this issue into a concrete spec change: Please jump to the PR and let's see if we can hammer on the language and get something that achieves consensus (note: I didn't say "makes everyone happy"... everyone in this thread is going to have to start compromising). |
@iherman -
I have to note that the quoted section of your comment includes two unwrapped instances of |
Can we close this, we are trying to support 3 representations using the new https://github.com/w3c/did-core-registry IMO this issue was resolved at the F2F... and if it was not, we should focus our criticism on the did core registry. |
I am fine with closing it. |
I too am fine with closing it. |
Thanks, closing because the issuer submitter (and concerned parties) are ok with closing it, there was a resolution to specify an abstract data model, and the specification has been changed to include an abstract data model section (that is waiting for content, but everyone expects that content to be written soon), and there is now a registry that assumes the existence of an abstract data model. |
DID Doc Encoding: Abstract Data Model in JSON
This is a proposal to simplify DID-Docs by defining a simple abstract data model in JSON and then permitting other encodings such as JSON-LD, CBOR, etc. This would eliminate an explicit dependency on the RDF data model.
Universal Adoptability
For universal interoperability, DIDS and DID-Docs need to follow standard representations. One goal of the DID specification is to achieve universal adoption. Broad adoption is fostered by using familiar representations or encodings for the DID and DID Doc. The DID syntax itself is derived from the widely adopted and highly familiar URI/URL identifier syntax. This takes advantage not only of familiarity but also the tooling built up around that syntax. Likewise greater adoption is fostered to the degree that the DID Doc representation or encoding uses a familiar widely adopted representation with extant tooling.
The only reason not to use a highly familiar representation is if the requirements for representation demand or greatly benefit from a less familiar representation. The appendix at the end of this document provides some detail about the main purposes of a DID Doc. This shows that a complex representation is not required and may not be beneficial.
In addition, having only a single representation or encoding, albeit highly familiar and widely adopted, may be insufficient to achieve universal adoption. It may require multiple representations or encodings.
Multiple encodings require a standard base encoding from which they may be derived. Or in other words the least common denominator from which other encodings may be derived.
One way to accomplish this is to use an abstract data model as the standard encoding and then allow for other encodings. This was proposed in the following issue:
#103 (comment)
The problem with an abstract data model is that the syntax is expressed in some abstract modeling language, typically a kind of pseudo code. Pseudo code is usually less familiar than real code. This means that even in the major case the spec is written in a language that is unfamiliar. This runs counter to fostering broader adoption. A solution to this problem is to pick a real language encoding for the abstract data model that then provides both an abstracted standard encoding that other encodings can more easily be derived from and also provides the lowest common denominator standard encoding.
Clearly given the web roots of the DID syntax itself as a derivation of URL syntax, JSON's web roots would make it the ideal candidate for an abstract data model language. Of any encoding available, JSON is the closest to a universally adopted encoding. JSON is simple but has sufficient expressive power to model the important data elements needed. It is therefore a sufficient encoding. Annotated JSON could be used to model additional data types such as an ordered mapping (in the event that they are needed). Many of the related standards popular among implementors such as the JWT standards are based on JSON. Casual conversations with many others in the community seem to suggest that a super majority of implementors would support JSON as the standard encoding for the combined abstract data model and default encoding.
Given JSON's rampant familiarity, it should not pose a barrier to implementors of other optional encodings such as JSON-LD or CBOR. Compared to pseudo-code It should be just as easy if not easier to translate JSON to another encoding.
The Elephant in the Room
The result of this proposal would be to make JSON the standard encoding for the DID Doc specification and demote JSON-LD to be an optional encoding. The current DID spec uses JSON-LD as the preferred encoding but does not prevent the use of naive JSON as an encoding. However the DID spec mandates JSON-LD elements that show up as artifacts when using JSON that a JSON implementer must handle specially. Moreover, the semantics of JSON-LD are much more restrictive than JSON. This results in a lot of time being expended unproductively in community meetings discussing the often highly arcane and non-obvious details of JSON-LD syntax and semantics. The community is largely unfamiliar with JSON-LD. It is clear that JSON is sufficient to accomplish the main purposes of the DID Doc. Although JSON-LD may provide some advantages in some cases, its extra complexity runs counter to the goal of fostering more universal adoption. This proposal does not exclude JSON-LD but would encapsulate and isolate discussion about the esoteric syntax and semantics of JSON-LD to that subset of the community that really wants JSON-LD. Each optional encoding including JSON-LD would have a companion specification to the DID spec that defines how to implement that encoding. This structure will make it easier to implement other encodings in the future because JSON is much closer to a lowest common denominator data model than JSON-LD.
The relevant questions up for decision are:
The purpose of this proposal is not to debate the general good and bad of JSON-LD and RDF. There is much good in JSON-LD for many applications. But, relevant here is that JSON-LD is not as well aligned as JSON with the goal of fostering universal adoption. More specifically the RDF model employed by JSON-LD complicates the implementation of other encodings that do not share the RDF data model and RDF semantics. JSON does not suffer from this complication. This complication has the deleterious effect of slowing adoption.
Appendix
Purpose of DID-Doc
The current DID specification includes a specification for a DID Document (DID-Doc). The main purpose of the DID-Doc is to provide information needed to use the associated DID in an authoritative way.
A distinguishing feature of a DID (Decentralized Identifier) is that the controller (entity) of the DID obtains and maintains its control authority over that DID using a decentralized root of trust. Typically this is self-derived from the entropy in a random number (expressed as collision resistance) that is then used to create a cryptographic public/private key pair. When the identifier is universally uniquely derived from this entropy then the identifier has the property of self-certifiability. Another somewhat less decentralized root of trust for an identifier is a public ledger or registry with decentralized governance.
In any event, a more-or-less decentralized root of trust only has value if other entities recognize and respect that root of trust. Hence portable interoperable decentralized identifiers must be based on an interoperable standard representation. Hence the DID standard.
In contrast, "administrative" identifiers obtain and maintain their control authority from a centralized administrative entity. This control authority is not derived from the entropy in a random number. This statement may be confusing to some because administrative identifiers often use cryptographic public/private key pairs. To explain, PKI with public/private key pairs and cryptographic digital signatures enables the conveyance of control authority via signed non-repudiable attestations. But the source of that control authority may or may not be decentralized. Thus an administrative entity may convey trust via PKI (public/private keys pairs) but does not derive its control authority therein. Whereas a decentralized entity may derive its control authority over a DID solely from the entropy in the random seed used to generate the private key in a PKI public/private key pair.
A key technology under pining DIDs is cryptographic signatures by which the control authority over the associated DID and affiliated resources may be verified by any user of the DID. In contrast an administrative identifier always has, as a last recourse, appeal to the authority of the administrative entity and to whatever means that authority is established.
Indeed, given the foregoing explanation, the most important task facing a user of a DID is to cryptographically verify control authority over the DID so that the user may then further cryptographically verify any attestations of the controller (entity) about the DID itself and/or affiliated resources. The verifications must be cryptographic because, with a decentralized root of trust, the original control authority was established cryptographically and the conveyance of that control authority may only be verified cryptographically. With DIDs it's cryptographic verification all the way down.
From this insight we can recognize that a DID-Doc should support a primary purpose and a secondary purpose as follows:
Primary: Aid the user in cryptographically verifying the current control authority over the DID.
Secondary: Aid the user in discovering and verifying anything else affiliated with the DID based on the current control authority.
If the user cannot determine the current control authority over the DID then the information in the DID Doc cannot be authoritatively cryptographically verified. Consequently, absent verified control authority, any use of the DID Doc for any purpose whatsoever is at best problematic.
Process Model for Establishing Cryptographic Control Authority
As mentioned above a fully decentralized identifier is self-certifiable. Other partially decentralized identifiers may be created on a ledger or registry with decentralized governance. The first case is the most important from a process model point of view. The second case is less informative.
The root of trust in a self-certifying identifier is the entropy used to created a universally unique random number or seed. Sufficient entropy ensures that the random seed is unpredictable (collision resistant) to a degree that exceeds the computational capability of any potential exploiter for some significant amount of time. Currently 128 bits of entropy is considered sufficient.
That random seed is then converted to a private key for a given cryptographic digital signature scheme. Through a one-way function, that private key is used to produce a public key. The simplest form of self-certifying identifier includes that public key in the identifier itself. Often the identifier syntax enables it to become a self-certifying name-space where the public key is used as a prefix to a family of identifiers. Any attestation signed with the private key may be verified with the public key. Because of its universal collision resistance no other identifier may be associated with a verifiable attestation. This makes the identifier self-certifying.
Furthermore, instead of the public key itself the identifier may include a fingerprint of the public key. In order to preserve the cryptographic strength of the root of trust in the random seed, the fingerprint must have comparable collision resistance to the original random seed. The application of further one-way functions can be applied successively to produce successive derived fingerprints. This is similar to how hierarchically deterministic key chains are generated. To restate, a one-way function may be applied to the public key producing a derived fingerprint and then another to that fingerprint and so one. The collision resistance must be maintained across each application of a one-way function.
Instead of merely deriving a simple fingerprint, one could take the public key and use it as a public seed that when combined with some other data may be transformed with a one-way function (such as a hash) to produce yet another fingerprint. As long as the process of creation of any derived fingerprint may be ascribed universally uniquely to the originating public/private key pair, the resultant derived identifier may be uniquely associated with attestations signed with the private key and verifiable with the public key. This makes the eventually derived identifier also self-certifiable.
Rotation
The problem is that over time any public/private key pair used to sign attestations becomes weakened due to exposure via that usage. In addition, a given digital signature scheme may become weak due to a combination of increased compute power and better exploit algorithms. Thus to preserve cryptographic control of the identifier in the face of exposure, the originating public/private key may need to be rotated to a new key pair. In this case the identifier is not changed, only the public/private key pair that is authoritative for the identifier is changed. This provides continuity of the identifier under changes in control of the identifier. This poses a problem for verification because there is no longer any apparent connection between the newly authoritative public/private key pair and the identifier. That connection must be established by a rotation operation that is signed by the previously authoritative private key. The signed attestation that is the signed rotation operation transfers authoritative control from one key pair to another. Each successive rotation operation performs a transfer of control.
State Machine Model of Control Authority
To summarize, control authority over a decentralized identifier is originally established though a self-certification process that uniquely associates an identifier with a public/private key pair. Successive signed rotation operations may be then used to transfer that control authority to a sequence of public/private key pairs. The current control authority at any time may be established by starting at the originating key pair and then applying the successive rotation operations in order. Each operation is verified via its cryptographic signature.
The process and data model for this is a state machine. In a state machine there is a current state, an input event and a resultant next state determined by state transition rules. Given an initial state, a set of state transition rules, replaying a sequence of events will always result in the same terminal or current state. This is a simple unambiguous process model. The data model is also simple. It must describe the state and the input events. There is no other data needed. The state is unambiguously and completely determined by the initial state, the transition rules and events. No other context or inference is needed. A simple representation will suffice.
Once the current control authority for a DID has been established to be a given key pair (or key pairs) then any other information affiliated with that DID may be cryptographically verified via a signed attestation using the current key pair(s). The important information needed to establish the authoritative stature of any additional information such as encryption keys or service endpoints is the current authoritative signing key pair(s) for the identifier and that the version of the information in the DID Doc is sourced from the controlling entity of the current key pair(s). This means the DID Doc may benefit from an internal identifier that corresponds to the latest rotation event that establishes the current key pair(s) or some other identifier that associates the DID Doc with specific signing key pair(s). This process of first establishing authoritative key pair(s) greatly simplifies the cryptographic establishment of all the other data.
There are various mechanisms that may be employed to maintain the state and associated event sequence. These could be as simple as a set of servers with immutable logs for the events/states that also run the code for the state transition logic. A more complex setup might rely on a distributed consensus ledger to maintain the state.
The DID Doc in and of itself, however, is insufficient to fully establish the current authoritative key pair(s). Other infrastructure is required. Merely including a set of rotation events in a DID Doc only establishes control authority up to the latest included rotation event. But other rotation events may have happened since that version of the DID Doc was created. Consequently a DID Doc's main role in this respect it to help a user discover the mechanisms used to establish current control authority. This must be done with some care because in a sense the DID Doc is bootstrapping discovery of the authority by which one may trust the discovery provided in the DID Doc. Nonetheless in order to be authoritative, the other information in the DID Doc that is not part of discovering authoritative control does not need an event history but merely a version identifier linking it to the authoritative key pair(s) and an attached authoritative signature from the current authoritative key pair(s).
In other words the DID Doc is used to bootstrap discovery of the current authoritative controlling keys and then to provide authoritative versioned discovery of affiliated information.
RDF Complications
The RDF model uses triples to canonicalize a directed graph. This graph may be used to make inferences about data. This model attaches a context to a given DID Doc that must be verified as authoritative. This expansion complicates the process of producing an authoritative versioned discovery document or an evented state machine. Clearly a clever implementation of a cyclical directed graph could be used to implement versioned discovery documents or evented state machines. Many implementations of RDF, however, use directed acyclical graphs making the implementation of evented state machines at best problematic and versioned discovery documents more cumbersome. This forces a particular potentially unnecessarily more complex-methodology on implementing versioned discovery documents or evented state machines than what might be the easiest or most convenient for the implementer.
The text was updated successfully, but these errors were encountered: