Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use absolute file paths for did:web DIDs. #52

Open
msporny opened this issue Dec 14, 2021 · 31 comments
Open

Use absolute file paths for did:web DIDs. #52

msporny opened this issue Dec 14, 2021 · 31 comments

Comments

@msporny
Copy link
Contributor

msporny commented Dec 14, 2021

There is a lot of translation that's done with a did:web URL that feels like it could be simplified. For example, why don't we just require that all DID URL paths are absolute?

did:web:did.example/did.json   // site-wide DID
did:web:did.example/alice.json     // specific entity DID
did:web:did.example/some/arbitrary/path/foo.json   // you can place DIDs at arbitrary paths

I just checked DID syntax, all the above is valid IIUC... why aren't we just doing that instead? It would allow people to just replace "https://" with "did:web:" to get a valid DID Web URL.

@OR13
Copy link
Collaborator

OR13 commented Dec 15, 2021

@tplooker this would be a breaking change for did web, but IMO something so simple may actually by worth it...

@msporny
Copy link
Contributor Author

msporny commented Dec 15, 2021

@tplooker this would be a breaking change for did web

I think there is a way for it to not be a breaking change. It'll be ugly/hacky, but we can provide an optional code path where if you detect more than two colons in the DID URL for did:web, you can do the weird replacement thing... but we make it a MAY. Or we could do something like -- dereference, and if you don't get anything back, and there are more than two colons, do the weird replacement thing, and try dereferencing again (and if you get a document back, cache that so future dereferences don't have to go through all that rigmarole).

All that said, I'd really like to avoid doing all that (because it adds complexity and attack surface).

@gribneau
Copy link
Contributor

It would allow people to just replace "https://" with "did:web:" to get a valid DID Web URL.

There is a further positive (imo) implication here. Consider the following cases:

did:web:example.com/alice
did:web:example.com/alice/

In both cases, the server would be returning an index document if alice is a directory. This implicitly enables content negotiation and variable representations, if configured server-side without requiring any complexity in the specification.

I think there is a way for it to not be a breaking change. It'll be ugly/hacky..

+1 on breaking over hacky.

@msporny
Copy link
Contributor Author

msporny commented Dec 15, 2021

This implicitly enables content negotiation and variable representations, if configured server-side without requiring any complexity in the specification.

That would certainly be my preference -- stay silent, leave web architecture alone. That said, it seems like @OR13 and @mprorock are both -1s to that?

As an alternative, I suggested the absolute paths thing, where you can be specific about the file extension and therefore, media type... and by doing so, you lock yourself into a media type for that particular DID.

I do agree that disabling conneg is probably going to result in multiple objections in the future. My suggestion is that we don't say anything about conneg right now and just say "use absolute URLs"... dodge the question for the time being so we can make some progress w/o painting ourselves into a corner.

@gribneau
Copy link
Contributor

If we feel compelled to lock ourselves into .json today, I suppose the path forward to enable conneg and multiple representations in the future would be a very simple PR that removes the conditional requiring .json at the end of the string.

We would eventually wind up very nearly at the original proposal. We might have been there a couple of years ago, but the core spec was different then.

@msporny
Copy link
Contributor Author

msporny commented Dec 15, 2021

We would eventually wind up very nearly at the original proposal.

Yeah, I do agree that that's where we'll probably end up... which is why both @OR13 and @mprorock need to provide some justification that content negotiation will result in "something bad happening".

The only argument that I know of that is of concern is when one representation differs from the other in a way that matters (verification method information or service description change). It is a valid concern, but not one where the solution needs to be: Ban content negotiation. Mitigations could be "if conneg is supported, implementations SHOULD derive the serialized content from a single source of data", "if conneg is supported, implementations SHOULD NOT serve serialized content that semantically differ for verification methods or service descriptions".

If we feel compelled to lock ourselves into .json today

I'll also mention the elephant in the room... application/json is not a valid media type for a DID Document... but we all knew this was going to happen and many of us warned the JSON-only folks in the DID WG that insisting that there should be two JSON-based media types for DID Documents would result in this happening.

@gribneau
Copy link
Contributor

gribneau commented Dec 15, 2021

One other disadvantage to ignoring the accept header is that non-DID representations at a DID url are ruled out. If alice is the identity of a user on example.com, one would generally anticipate that https://example.com/alice would present a human readable page for Alice.

Returning the DID at the same URL based on a request header would be rather elegant, in my opinion.

@msporny
Copy link
Contributor Author

msporny commented Dec 15, 2021

Returning the DID at the same URL based on a request header would be rather elegant, in my opinion.

Also remember that if we serve up HTML, you can embed a JSON-LD DID Document in HTML using this feature:

https://www.w3.org/TR/json-ld11/#embedding-json-ld-in-html-documents

... and that's a mechanism that search indexers (like Google and Bing) use to catalog data into their search indexes. If a site can produce that sort of thing at https://did.example/did.json or in the HTML document served from https://did.example/ ... it could then publish Verifiable Credentials in HTML using the same mechanism. Food for thought.

@gribneau
Copy link
Contributor

Yes, I've been thinking that the human readable profile should embed the structured data of the ID and populate the view using that data.

+1

Whether that creates a compatibility issue remains open. It is possible that there would be some value in a specifically tailored exemplary method for this limited set of potential use cases.

@OR13
Copy link
Collaborator

OR13 commented Dec 16, 2021

I think folks are expecting all interoperable did methods to support the same DID URL structures... here is an example:

did:example:123/path/foo?query=bar#fragment-baz

In did web today, this looks like:

did:web:did.actor:alice/path/foo?query=bar#fragment-baz

If we accept this proposal, this becomes:

did:web:did.example/some/arbitrary/path/foo.json/path/foo?query=bar#fragment-baz

^ This seems not great.

@dmitrizagidulin
Copy link
Collaborator

I'm not a huge fan of requiring absolute file paths. It takes us to the bad old days of specifying https://w3c.org/index.php instead of just https://w3c.org

@OR13
Copy link
Collaborator

OR13 commented Dec 16, 2021

I'll also mention the elephant in the room... application/json is not a valid media type for a DID Document...

Very few developers are going to hardcode application/did+ld+json into their client code... but a lot will probably hard code application/json... yeah, we told them this would be a problem...

Google still returns JSON-LD as application/json and so will everyone else who cares about developers.

I will continue to object to any attempt to destroy application/json interoperability, either by considering accept headers, or playing games with file extensions... it is a huge security hole, for no real value... as is the DID Core decision to have 2 JSON representations that are essentially identical.

I think application/did+ld+json should be withdrawn, and so should application/did+json... there should be only 1 standard JSON representation... and luckily we can fix this in the next DID WG charter... the only common requirement that DID Core has for JSON, JSON-LD and any future representations is that id is required and must be a DID (not a DID URL)... thats not enough to justify a new content type... its a laughable... as is attempting to add header processing to support it.

{"id": "did:web:did.actor:alice"}

^ This is a valid JSON DID Document.

{"@context":  "https://www.w3.org/ns/did/v1", "id": "did:web:did.actor:alice"}

^ This is a valid JSON-LD DID Document.

@OR13
Copy link
Collaborator

OR13 commented Dec 16, 2021

However, regardless of the underlying JSON document, I am in favor of simplifying resolution rules, so long as we don't loose interoperability with other did methods... We can forbid folks from using ":" in their did web paths...that would be smarter than encouraging it IMO, since directories that use reserved characters like that are not interoperable and in fact, allowing that kind of stuff can lead to security issues. (/../../pwd.json), etc...

@gribneau
Copy link
Contributor

I can understand the frustration around multiple representations that are all, at the end of the day, valid JSON.

I am more interested in future representations if and when they emerge (i.e. CBOR for tighter transmissions) and the reuse of URLs that make sense intuitively as human readable destinations.

I realize the complexity I've laid out above will be daunting and generally unnecessary for most use cases, and is therefore not really appropriate for a general standard.

@msporny
Copy link
Contributor Author

msporny commented Dec 16, 2021

@OR13 wrote:

did:example:123/path/foo?query=bar#fragment-baz

What is the use case for this URL? Why would someone create something that looks like this?

@msporny
Copy link
Contributor Author

msporny commented Dec 16, 2021

@dmitrizagidulin wrote:

I'm not a huge fan of requiring absolute file paths.

Do you think that we should disable content negotiation for did:web?

I'm trying to figure out what the matrix of requirements is...

@dmitrizagidulin
Copy link
Collaborator

dmitrizagidulin commented Dec 16, 2021

@msporny

Do you think that we should disable content negotiation for did:web?

That question implies that conneg is somehow enabled, or in any way supported, currently. :)

I think several things:

  1. We want to enable did:web files to be hosted on dumb web servers.
  2. We absolutely must not allow multiple representations to be present as multiple hosted files.
  3. I think we should specify the did v1 @context as the default context. Other contexts possible, but not required.

@msporny
Copy link
Contributor Author

msporny commented Dec 16, 2021

That question implies that conneg is somehow enabled, or in any way supported, currently. :)

Conneg is defined by RFC7231, the vast majority of HTTP clients support it. I disagree with any notion that conneg is not enabled by default. This is the Web, it's enabled by default.

Here's what's confusing, it seems like @OR13 and @dmitrizagidulin are saying that they want DID URIs like did:web:did.example:alice, which would expand to an HTTP GET on https://did.example/alice but NOT conneg, but still allow alice.json to be served from the "dumb web server". It's hard for me to reconcile the rules you guys are wanting w/ how HTTP servers are implemented. You might be saying "if .json doesn't exists on a did:web URL, add it and then dereference"? -- but I've re-read this thread multiple times and I don't see that being said.

We want to enable did:web files to be hosted on dumb web servers.

Agree. Dumb web servers support content negotiation. What should happen when an Accept: header is sent for "application/did+json" and nothing else? What should the server do? I'd prefer if you'd point to RFCs and justify the response based on default behavior of HTTP clients and servers.

We absolutely must not allow multiple representations to be present as multiple hosted files.

What specific normative statement would you write to do that?

I think we should specify the did v1 @context as the default context. Other contexts possible, but not required.

I'm sorta fine with that if most people do the right thing, but remember that all expressed key formats could be broken if we do that and implementers don't do the right thing (include the other contexts). That is, if we serve as application/json and the only thing in @context is the did v1 context, then all verification methods will be dropped if processed in JSON-LD mode. How is that helping interop?

@msporny
Copy link
Contributor Author

msporny commented Dec 16, 2021

@OR13 wrote:

I will continue to object to any attempt to destroy application/json interoperability

Who in this thread is attempting to destroy application/json interoperability? The response comes across as hyperbolic, which makes trying to figure out what you're arguing for/against difficult to identify. What level of hyperbole is associated with your other statements? This is all written communication, so while it might be obvious to you, it's really hard to tell for other people that are reading (including me).

either by considering accept headers

Accept headers are processed by default by the vast majority of web servers today per RFC7231. Are you saying that you want to disable default behavior for all HTTP clients and servers? Or are you saying something else?

or playing games with file extensions...

We're playing games with file extensions because you and @mprorock are (it seems) objecting to content negotiation.

it is a huge security hole, for no real value...

Please detail the security hole -- what is the attack you are concerned about? Please be specific.

as is the DID Core decision to have 2 JSON representations that are essentially identical.

That decision was made by the DID WG and achieved consensus. Neither you nor I agree with it, but conflating that w/ content negotiation isn't helping the discussion. At this point, I'm having a hard time tracking your arguments.

Could you please list a set of rules that you'd like to see implemented that we can analyze?

@dlongley
Copy link

dlongley commented Dec 16, 2021

I think we should specify the did v1 @context as the default context. Other contexts possible, but not required.

I'm sorta fine with that if most people do the right thing, but remember that all expressed key formats could be broken if we do that and implementers don't do the right thing (include the other contexts).

Yeah, some supported key formats (and contexts) should be included (by default) as well in some way. This should only be for those people that are looking to do the simplest thing anyway -- so they shouldn't care if their choices are restricted to recommended crypto suites. There should be some consideration for an upgrade path as well; perhaps reserving something in the DID string could signal a new resolver is needed. It might be a good idea to do this anyway. This would put a shelf-life on simplistic DIDs but they'd have an upgrade path to use new crypto suites without changing their DID (they'd just have to list those new contexts).

@dmitrizagidulin
Copy link
Collaborator

@msporny

Here's what's confusing, it seems like @OR13 and @dmitrizagidulin are saying that they want DID URIs like did:web:did.example:alice, which would expand to an HTTP GET on https://did.example/alice but NOT conneg, but still allow alice.json to be served from the "dumb web server". It's hard for me to reconcile the rules you guys are wanting w/ how HTTP servers are implemented. You might be saying "if .json doesn't exists on a did:web URL, add it and then dereference"? -- but I've re-read this thread multiple times and I don't see that being said.

Ah, right, ok, so, the main key to this is that the mapping is /not/ done on the server. The mapping is always done client side, meaning, the resolver client library sees did:web:did.example:alice and would always translate it to https://did.example/alice/did.json, and request that from the web server directly

@OR13
Copy link
Collaborator

OR13 commented Dec 16, 2021

Here is a complete did:web resolver:

// specification
const resolve = (did) =>{
const didToUrl = (did) => {
  const regex = new RegExp(
    `did:web:(?<origin>[a-zA-Z0-9/.\\-_]+)(?<path>[a-zA-Z0-9/.:\\-_]*)`
  );
  const match = did.match(regex);
  if (match.groups.path) {
    return `https://${match.groups.origin}${match.groups.path.replace(
      /:/g,
      "/"
    )}/did.json`;
  }
  return `https://${match.groups.origin}/.well-known/did.json`;
};

return fetch(didToUrl(did))
  .then(response => response.json())
}

// invocation
await resolve('did:web:did.actor:alice')

Notice that no http header was specified.

https://www.w3.org/TR/did-core/#did-resolution

I think did web does not need to support representations other than JSON... doing so increases attack surface, decreases interoperability.

There is nothing in DID Core that says a method must implement resolution for all possible did document representations.

Its a bad idea to implement resolution for representations other than JSON, and its a bad idea to make JSON and JSON-LD different...

https://did.actor/alice/did.json -> `application/json`
https://did.actor/alice/did/index.json -> `application/json`
https://did.actor/alice/did/ -> `application/json`
https://did.actor/alice/index.json -> `application/json`
https://did.actor/alice/ -> `application/json`

Any of these would be find IMO... clients know how to handle json.

Should did web allow for a client to request another representation other than JSON? no.

Today did web only supports JSON, and tomorrow will be the same unless we change the resolution rules... we should not change the rules, except to make them simpler.

In order to make the rules simpler, we should remove well-known...

Adding support for application/did+json or application/did+ld+json makes things worse (more complicated)... if folks really want to do that, let them build a proxy...

  fastify.get('/resolveRepresentation/:did', (req, reply) => { 
     const representation = req.headers.accept;
     const didDocument = await resolve(req.did); 
     const abstractDataModel = ADM.consume(didDocument, 'application/did+json');
     const didDocumentRepresentation = abstractDataModel.produce(representation);
     // possible error thrown due to missing`@context`... 
     // but only if 'application/did+ld+json' is in the accept header.
     // bad guys might inject `@context` here... because they don't understand JSON-LD.... 
     // ... or because they do...
     reply.type(representation); // hope this is understood by the client!
     reply.send({ didDocument: didDocumentRepresentation });
})

Cool now we have a did resolve representation implementation that implements did core correctly.... that 99% of clients won't understand the responses from, since nobody knows what 'application/did+json' or 'application/did+ld+json' is... and we didn't need to change the did web resolution rules... to do it... so we shouldn't change them...

let folks who want to implement resolver middleware like this do it outside the did method... we should not be doing this inside the did method, it increases the attack surface, content negotiation is not needed if the only representation is JSON...

put a proxy up if you want to translate did:web to yaml or XML... don't force did web implementers support representations other than JSON... they will shoot their eyes out.

image

@jceb
Copy link

jceb commented Dec 17, 2021

I like @OR13 's proposal as it simplifies the resolution and makes the implementation easy.

I'm for keeping .well-known/did.json as a default a) because this ensures compatibility with existing implementations, b) the added complexity isn't high and c) the root namespace of the server isn't cluttered with a did.json file.

@sbutterfield
Copy link
Member

sbutterfield commented Apr 15, 2022

FWIW, here's some feedback from a recent early platformitization of did:web over here:

  1. application/json+ld, application/did+ld+json, and application/did+json conneg isn't quite usable yet with our layered servlets approach. Changes are required and we're working on them, but application/json is "for free" and works with customers' existing extensions.

  2. Supporting tenant sub-domain specific /.well-known/ in a multi-tenant architecture is a PITA and basically requires entirely new secure domain routing infra. We're fighting with complex edge networking, constrained internal DNS proxies for customers to CNAME to our zones then host content with Akamai blah blah blah... it's too much.

Looking at did:web as a pure interpretation of browser/client/server app URL resource resolution is how we went about it (for now). That means:
did:web:{customer-sub-domain-CNAME}/{customer-tenant-identifier}?v={p1}&d={p2}#{multibase-multihash-kid} --> application/json publicKeyJwk.kid
gets us to a place where we are able to put together client application frameworks that are cross-platform/browser/library compatible without worrying about documenting custom translations or rules when crazy customer developers want to do predictably crazy things to extend those frameworks.

One way or another, I'm invested in did:web for the duration. So, I'm on the side of simplicity - whatever that looks like.

@gribneau
Copy link
Contributor

Thanks @sbutterfield , I'm glad did:web is flexible enough to fit in.

One way or another, I'm invested in did:web for the duration. So, I'm on the side of simplicity - whatever that looks like.

@msporny
Copy link
Contributor Author

msporny commented Apr 16, 2022

@sbutterfield wrote:

Supporting tenant sub-domain specific /.well-known/ in a multi-tenant architecture is a PITA

Yes, exactly, which is why Digital Bazaar has been arguing against the /.well-known/ design pattern for a while now. Having magic URL transforms are problematic for URLs -- you expect to copy/paste them, especially when they're Web-based URLs -- and have your Web client give you the resource... not have your web client rewrite the URL using some non-standard URL transformation rule (e.g., .well-known or colon-slash replacement).

gets us to a place where we are able to put together client application frameworks that are cross-platform/browser/library compatible without worrying about documenting custom translations or rules

Yes, and this was the driving force behind this issue: The did:web path-colon syntax was an early attempt to shove URL paths into the method-specific-id... and the assertion is that doing so was a design mistake that's going to make did:web this unnecessarily awkward mapping to HTTPS URLs. What we should have done, instead (and can still do), was make the mapping a very clean search/replace of https:// with did:web: -- that's an easy one-liner for developers.

d={p2}#{multibase-multihash-kid}

You might be interested in using a parameterized Hashlink parameter there.

I'm on the side of simplicity - whatever that looks like.

The straight translation of an HTTPS URL into a did:web URL is probably the simplest thing.

The only issue is the DID Subject identifier in the DID Document. For did:web, it really needs to be the canonical URL -- so, ideally something like did:web:subdomain.example/jane -- the one remaining issue, of course, is that the DID Core spec says that id must be DID (not a DID URL). The latter was what we should've done, and can still do in a future version of the spec. At present, the id field is restricted to the [did](https://www.w3.org/TR/did-core/#did-syntax) ABNF, what we should do in the next iteration of the spec is expand it to allow for the [did-url](https://www.w3.org/TR/did-core/#did-url-syntax) -- we can't easily return parameterized DID Documents (which are important) if we don't support did-url as the subject identifier in a DID Document.

I'll also note that we can probably do all of the above AND keep supporting the legacy colon-to-path translation.

At present, I believe the following approach might be able to achieve consensus:

  • Allow returning JSON-LD documents as application/json
  • Include @context in the DID Document for all JSON-based formats
  • Do straight translation from HTTPS URLs to did:web URLs (replace "https://" with "did:web:" and you're done)
  • Allow fallback to "legacy resolution rules" -- colons-as-slashes syntax, .well-known, etc.

I have a clear enough idea of the normative statements to create a PR at this point. I'll raise a PR when I get to it in my work queue.

@sbutterfield
Copy link
Member

@msporny wrote:

You might be interested in using a parameterized Hashlink parameter there.

We're planning to make use of hl= param elsewhere. Intention here was to support direct key addressing when a did doc has multiple kids ... Required by some of our FIPS customers.

Include @context in the DID Document for all JSON-based formats

Just curious - I didn't think @context was required in the DID Doc?

the DID Core spec says that id must be DID (not a DID URL)

Could did:web spec extend did-core and allow for the id property to be the fully qualified did:web URL so long as the DID is a valid did:web identifier? Easier than changing did-core?

The DID for one of our customers at their issuing agent address would be: did:web:{customer-sub-domain-CNAME}
The extended DID for one of their subsidiary units or special subject approver identifiers would be: did:web:{customer-sub-domain-CNAME}/{customer-context-specific-identifier}<? optional rest of path for target resources>

@OR13
Copy link
Collaborator

OR13 commented Apr 17, 2022

Here is an example of using did web with GitHub https://github.com/OR13/signor

@sbutterfield
Copy link
Member

@OR13 for the win

@gribneau
Copy link
Contributor

gribneau commented Apr 20, 2022

make the mapping a very clean search/replace of https:// with did:web: -- that's an easy one-liner for developers.

Strong +1
My initial proposal to extend did:web to support multiple DIDs looked quite like this. The history is visible in the PR.

the one remaining issue, of course, is that the DID Core spec says that id must be DID (not a DID URL). The latter was what we should've done, and can still do in a future version of the spec. At present, the id field is restricted to the [did](https://www.w3.org/TR/did-core/#did-syntax) ABNF, what we should do in the next iteration of the spec is expand it to allow for the [did-url](https://www.w3.org/TR/did-core/#did-url-syntax) -- we can't easily return parameterized DID Documents (which are important) if we don't support did-url as the subject identifier in a DID Document.

+1
The core needs this improvement. At present, it is inconsistent with the underlying IETF work.

I'll also note that we can probably do all of the above AND keep supporting the legacy colon-to-path translation.

I would support the revision even without backward compatibility.

At present, I believe the following approach might be able to achieve consensus:

  • Allow returning JSON-LD documents as application/json
  • Include @context in the DID Document for all JSON-based formats
  • Do straight translation from HTTPS URLs to did:web URLs (replace "https://" with "did:web:" and you're done)
  • Allow fallback to "legacy resolution rules" -- colons-as-slashes syntax, .well-known, etc.

Happy to assist if needed.

@gribneau
Copy link
Contributor

@sbutterfield wrote:

the DID Core spec says that id must be DID (not a DID URL)

Could did:web spec extend did-core and allow for the id property to be the fully qualified did:web URL so long as the DID is a valid did:web identifier? Easier than changing did-core?

There is a bit of a work-around in that the method handles resolution and can simply handle that resolution inconsistently with the core specification, but that leaves the method in violation of the core specification.

I did this with did:psqr to move development of that project forward expeditiously, but @msporny has identified the proper path above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants