Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add discussion on unknown and reserved properties #394

Closed
wants to merge 1 commit into from

Conversation

jricher
Copy link
Contributor

@jricher jricher commented Sep 15, 2020

@jricher
Copy link
Contributor Author

jricher commented Sep 15, 2020

Addresses #205, alternative to text in #388

<p>
When a DID Document is serialized by a producer into a compliant
representation, any reserved properties not for the current
representation MUST NOT be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
representation MUST NOT be used.
representation MUST NOT be present in the serialized representation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this means anywhere... in other words... no JSON-LD can be present ANYWHERE in a JSON DID representation... in other words... no @context nested inside a verificationMethod or service endpoint as well.... or does this only apply to the JSON path $["@context"] ?

Copy link
Member

@msporny msporny Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is highly problematic language... it means that you can't envelope any JSON-LD/JSON Schema/etc... so any sub-dialect of JSON. Right off of the bat you can't envelope JWTs that carry VCs (as a simple example to demonstrate why we didn't have this language in the spec before). Seemingly simple restrictions like this end up becoming highly problematic as an ecosystem develops.

The complexity of generalizing the rule to filter out @context is going to result in unnecessary complexity in the specification that is going to have very negative knock-on effects to any sub-dialect of JSON, like JSON Schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OR13 No, it means as a property, which is a top-level element in the document. Properties can have as many of whichever sub-properties they want.

I disagree with the doom and gloom from @msporny -- now that we have a way to define a reserved property, this language actually limits the fall-on effects of keeping things out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its just a top level property... doesn't that mean that the JSON document MUST allow nested JSON-LD?...
Probably some additional language stating this would be necessary, as its not obvious today.... and then we're one logical step away from realizing that JSON-LD is JSON.... in the same way way that a tiger is a cat... we don't define all cats as not tigers... we should not be defining JSON as NOT JSON-LD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you'd be a bit surprised if the pet store gave you a tiger when you asked for a cat, right? Your metaphor is accurate but not relevant here. I understand that JSON-LD is syntactically JSON, but what you're missing is that by dying on that hill you're declaring, effectively, that all JSON processors now need to deal with JSON-LD stuff and expect it to show up. We decided strongly against this at the Amsterdam F2F, and the representation text which this PR strengthens came from that discussion.

The JSON document doesn't need to allow anything that's not defined on a property. If a property is defined in terms of JSON-LD then it's up to the property to define how to process its value. This is very different from managing the document itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you be surprised to learn that Google and Microsoft return JSON-LD with content type application/json?

The JSON document doesn't need to allow anything that's not defined on a property.

I agree.... JSON Documents do not need to allow anything.... but they do allow anything, for extensibility... thats why JSON is valuable.

<h2>Unknown Properties</h2>
<p>
When a DID Document is created by consuming a compliant representation, any properties
in the representation that are not known to the consumer MUST be added to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "not known to the consumer"... is this essentially "any properties"... I assume the consumer adds known properties as well.... can we define this more concretely?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Shouldn't this just be "preserve all properties"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"unknown" means "not a property that the consumer has been configured to understand (ie, core data model, registry, local extensions) and not a property that's known to be reserved (ie, registry, local extensions)". It'll have a data type in the source format that needs to map to a default data type in the abstract data model.

I think you're both conflating the JSON document with the abstract data model. They are not the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a "local extension" thats not defined anywhere.... in JSON, we just add properties and they are preserved... if you are suggesting that somehow did documents deviate from what JSON developers will expect... i will continue to object.

@@ -2438,12 +2473,12 @@ <h3>
</p>

<p>
The value of the <code>@context</code> object member MUST be ignored as this is
reserved for JSON-LD consumers.
If a <code>@context</code> object member is found, the consumer MUST raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If a <code>@context</code> object member is found, the consumer MUST raise

redundant normative statement.

Copy link
Contributor Author

@jricher jricher Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's redundant but also true, and it's something that JSON production and consumption will specifically need to look out for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only in a world where DID Core redefines what JSON means.

The value of the <code>@context</code> object member MUST be ignored as this is
reserved for JSON-LD consumers.
If a <code>@context</code> object member is found, the consumer MUST raise
an error, as this field is reserved for JSON-LD processing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
an error, as this field is reserved for JSON-LD processing.

When a DID Document is created by consuming a compliant representation, if any
properties in the representation are reserved by another representation, as
indicated in the registry entry for representations, then the consumer MUST
raise an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it this statement means that 2 representations cannot both contain the same reserved property.... so if JSON-LD reserved $schema it could not be present in JSON Documents or vice versa?

I suppose this will cut down on representations and reservations :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the intent, as a representation could declare that it uses another representation's reserved property in a way that's compatible with that. The important thing about a reserved property is that it doesn't end up in the abstract data model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, deny-list and allow-list inheritance... the road to a super confusing data model...

But I do agree that there will be representations that want to specifically support things allowed and prohibited by other representations... the first one would be application/did+vanilla+json... which allowed directives like @context and $schema.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important thing about a reserved property is that it doesn't end up in the abstract data model.

What concrete attack or issue are we attempting to prevent here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No inheritance. Concrete declaration of using the same property to mean the same thing. So JSON-LD and CBOR-LD could both claim "@context" as reserved, which signals everyone else that if they see that field and don't know what to do with it (ie, it's not reserved for them also) then it's an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON does not throw errors when random properties are added.... what you are describing is not JSON.

Copy link
Contributor

@OR13 OR13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of all these changes, but not for the content type application/did+json....

I would accept these changes if they came with a profile which described the prohibition of all processing directives... for example

application/did+json; profile="https://www.hyperledger.org/ns/hyperledger-json"

As is, this PR does not address unknown properties, it simply describes a class of "known reserved" property and then prevents interoperability and extensibility that relies on the presence of those properties.

When a DID Document is serialized by a producer into a compliant
representation, any unknown properties MUST be serialized into the
representation. When the data type for the property is unknown by the producer,
a string serialization of the property SHOULD be used.
Copy link
Contributor

@OR13 OR13 Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is it possible that the abstract data model would have properties of unknown types?... seems like a security issue to serialize an unknown type to a string.... Buffer.from("40636f6e74657874", "hex").toString()...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question -- if an unknown property value can't be converted into the abstract data model such that it can be re-exported to the same representation losslessly, there is a fundamental problem. You can either take the input representation and losslessly place it into the abstract data model (so you can at least pump it back out to the original incoming representation) or you can't. If you can't, you should probably throw an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstract data model is going to have a type for every property, because it's in the abstract data model.

This is why I have argued for the properties within the data model to have abstract types defined, and all representations have clear and deterministic mappings for all known types into and out of the serializations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstract data model is going to have a type for every property, because it's in the abstract data model.

This PR states:

When a DID Document is created by consuming a compliant representation, any properties in the representation that are not known to the consumer MUST be added to the data model as properties without any further processing on the property value.

It won't have a type for unknown properties... right? What am I missing?

Copy link
Contributor

@OR13 OR13 Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have loved for formal types for the abstract data model... I even tried to help define them using JSON Schema... nobody contributed to it... I removed it.

If the abstract data model cannot represent a type, it cannot support representations that rely on that type... which means right now, that we have an abstract data model with no type system, and no way to actually answer this question....

If we say the abstract data model is modeling JSON, we know it supports JSON types... if we say its INFRA, we know it supports infra types...

It's not possible for it to have a property with an unknown type.... unless the abstract data model itself does not live in a type system....

What type system does the abstract data model live in? Haskell? INFRA? JSON? CBOR?... if we don't know, we can't answer the question about which representations are possible with the abstract data model.

Haskell would work for everything (supports CBOR / JSON)...

JSON would not work for CBOR.

CBOR would work for JSON.

INFRA.... TBD... not sure about .... numbers... lol.

Imo this is solved in 1 of 2 ways...

either we define the abstract data model as anything that can be represented in JSON (for now, and later we fix that to allow CBOR / ProtoBuff)....

or we forbid "unknown" properties from being round tripped from it... because serializing an unknown type to a string is dangerous.... what happens to protobuff / cbor ?... a bunch of nonsense binary encoded as a utf-8 string?...doesn't that mean we can't round trip a binary format?

I would actually be in favor of saying that JSON is the abtract data model (for now)... then this problem goes away for JSON and JSON-LD.

In the future, we can define a new abstract data model that is a super set of JSON that can be mapped to it... and have backwards compatibility :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstract data model currently uses Infra types for all declared properties, as well as metadata, so I don't see why we wouldn't do that here. We also have additional formats like URIs and Dates that are easily encoded as formatted strings.

When you create the abstract data model, your source for that (JSON, CBOR, Whatever) will have a data type associated with it. The consumption rules MUST declare a default abstract data type to map that concrete data type to. Same idea with the production direction: all representations declare rules for mapping all allowed abstract data types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I assume infra is capable of preserving strings, arrays and maps.... so let it preserve them, and let did method authors decide if they want to include processing directives or not in THEIR json :) its not for this WG to redefine what JSON is.

@OR13
Copy link
Contributor

OR13 commented Sep 16, 2020

Even without considering the error cases, this PR and the current state of JSON representation in did core remain problematic... here are some simple tests which demonstrate what I mean:

const crypto = require("crypto");

const digest = (representation) => {
  return crypto.createHash("sha256").update(representation).digest("hex");
};

const JSON_WITHOUT_DIRECTIVES_PRODUCER = (abstract) => {
  delete abstract[Buffer.from("@context")];
  delete abstract[Buffer.from("$schema")];
  if (!abstract[Buffer.from("id")]) {
    throw new Error("DID Document requires id");
  }
  return JSON.stringify(abstract, null, 2);
};

const JSON_WITH_DIRECTIVES_PRODUCER = (abstract) => {
  if (!abstract[Buffer.from("id")]) {
    throw new Error("DID Document requires id");
  }
  return JSON.stringify(abstract, null, 2);
};

const JSON_WITH_DIRECTIVES_CONSUMER = (representation) => {
  const parsed = JSON.parse(representation);
  if (!parsed.id) {
    throw new Error("DID Document requires id");
  }
  return parsed;
};

const JSON_WITHOUT_DIRECTIVES_CONSUMER = (representation) => {
  const parsed = JSON.parse(representation);
  if (parsed["@context"]) {
    throw new Error("DID Document is not valid");
  }
  if (!parsed.id) {
    throw new Error("DID Document requires id");
  }
  return parsed;
};

it("json with directives", () => {
  // let the abstract data model be an infra object.
  const abstract = {
    [Buffer.from("@context")]: Buffer.from(
      "https://example.com/context/v1"
    ).toString(),
    [Buffer.from("$schema")]: Buffer.from(
      "https://example.com/schema/v1"
    ).toString(),
    [Buffer.from("id")]: Buffer.from("did:example:123").toString(),
  };
  const digested_abstract = digest(JSON.stringify(abstract, null, 2));
  const produced_representation = JSON_WITH_DIRECTIVES_PRODUCER(abstract);
  const consumed_representation = JSON_WITH_DIRECTIVES_CONSUMER(
    produced_representation
  );
  const digested_consumption = digest(
    JSON.stringify(consumed_representation, null, 2)
  );
  expect(digested_abstract).toEqual(digested_consumption);
});

it("json without directives", () => {
  // let the abstract data model be an infra object.
  const abstract = {
    [Buffer.from("@context")]: Buffer.from(
      "https://example.com/context/v1"
    ).toString(),
    [Buffer.from("$schema")]: Buffer.from(
      "https://example.com/schema/v1"
    ).toString(),
    [Buffer.from("id")]: Buffer.from("did:example:123").toString(),
  };
  const digested_abstract = digest(JSON.stringify(abstract, null, 2));
  const produced_representation = JSON_WITHOUT_DIRECTIVES_PRODUCER(abstract);
  const consumed_representation = JSON_WITHOUT_DIRECTIVES_CONSUMER(
    produced_representation
  );
  const digested_consumed_production = digest(
    JSON.stringify(consumed_representation, null, 2)
  );
  expect(digested_abstract).not.toEqual(digested_consumed_production);
});

Essentially, we have broken JSON by trying to avoid requiring JSON-LD.

JSON with directives is what JSON developers will expect... JSON without directives is what DID Core has currently, and what this PR attempts to pour additional concrete around.

@OR13
Copy link
Contributor

OR13 commented Sep 16, 2020

Also, google knowledge graph https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html

(which returns JSON-LD ONLY):

GET https://kgsearch.googleapis.com/v1/entities:search?query=taylor+swift&key=API_KEY&limit=1&indent=True

returns

content-type: application/json; charset=UTF-8

... in other words, they do exactly what I and everyone else will do... return JSON when asked for JSON.

If the DID Core Spec remains the way it is / this PR were merged, JSON representations of did documents will never be capable of being represented in the knowledge graph.

JSON should have not restrictions beyond the presence of an id property....

If we want to define a profile of JSON which excludes processing directives, I am all for it.

</h3>

<p>
This representation format reserves the <code>@context</code> property.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to add this section, which is a terrible idea, here's the full list of reserved properties:

https://www.w3.org/TR/json-ld/#syntax-tokens-and-keywords

So that means:

@base 
@container 
@context 
@direction 
@graph 
@id 
@import 
@included 
@index 
@json 
@language 
@list 
@nest 
@none 
@prefix 
@propagate 
@protected 
@reverse 
@set 
@type 
@value 
@version 
@vocab 

This is going to complicate JSON-only processors to the point where I would question whether implementing a JSON-only processor would be worth it.

The simpler approach is just to preserve everything when going from one format to the abstract data model, and if you can't, throw an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also point out that this PR will require us to have prior knowledge of other representation formats... for example, if someone were to come up with a JSON Schema representation format (in order to include a signed validation mechanism with their DID Documents), then we'd also have to make sure that the JSON-only implementation would need to be aware of the following:

string
number
integer
boolean
object
array
null
id
type
description
default
patternProperties
dependencies
extends
$ref
$schema
allOf, anyOf, oneOf
enum
format
minLength, maxLength
pattern
minimum, maximum
exclusiveMinimum, exclusiveMaximum
multipleOf
array
items
minItems, maxItems
uniqueItems
additionalItems
object
properties
minProperties, maxProperties
patternProperties
additionalProperties
required

... we're clearly not going to do that as the implementation burden is very high and it conflicts directly with some of the property names we've already picked for the DID Core abstract data model (such as id, type, etc.). The reason that it would be fine to preserve these things is because nesting matters. JSON, JSON-LD, and other JSON formats would be just fine w/ embedded JSON Schema (and embedded JSON-LD) because those formats were designed around two primary features of JSON -- not dropping anything and nested objects providing context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't buy the slippery slope argument here. It's not true that if we do one thing we must do everything possible forever in all time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well sure, we can have a deny-list that is short, or a long one... both are still worse than having an allow-list... IMO, probably best to leave the security engineering to the did method authors... I don't have confidence this WG has the expertise necessary to design a safe deny-list.

@OR13
Copy link
Contributor

OR13 commented Sep 16, 2020

https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html#whitelisting-vs-blacklisting

It is a common mistake to use deny list validation in order to try to detect possibly dangerous characters and patterns like the apostrophe ' character, the string 1=1, or the <script> tag, but this is a massively flawed approach as it is trivial for an attacker to bypass such filters.

Allow list validation is appropriate for all input fields provided by the user. Allow list validation involves defining exactly what IS authorized, and by definition, everything else is not authorized.

If the JSON representation is going to rely on a deny-list and allow unknown properties as its default... it is specifically going against security engineering best practices.

@OR13
Copy link
Contributor

OR13 commented Sep 17, 2020

This PR does not address the security concerns raised in #398

@msporny
Copy link
Member

msporny commented Oct 1, 2020

This PR is still under debate... we had a special topic call to try to resolve some things:

https://www.w3.org/2020/09/29-did-topic-minutes.html

@msporny msporny added the do not merge Do not merge - waiting on resolution to issue label Oct 12, 2020
@msporny msporny added the needs-special-call Needs a special topic call to make progress label Oct 26, 2020
@rhiaro
Copy link
Member

rhiaro commented Nov 16, 2020

I believe this is now superceded by #454

@rhiaro rhiaro added the pending close Issue will be closed shortly if no objections label Nov 16, 2020
@msporny
Copy link
Member

msporny commented Dec 1, 2020

This PR will be closed once #454, which defines what must be done with "unknown/reserved/ignored" properties, is merged.

@msporny
Copy link
Member

msporny commented Dec 1, 2020

#454 has been merged, which achieves the intent of this PR. Closing.

@msporny msporny closed this Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge Do not merge - waiting on resolution to issue needs-special-call Needs a special topic call to make progress pending close Issue will be closed shortly if no objections
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants