add discussion on unknown and reserved properties #394

jricher · 2020-09-15T23:17:43Z

jricher · 2020-09-15T23:18:16Z

Addresses #205, alternative to text in #388

OR13 · 2020-09-15T23:25:23Z

index.html

+          <p>
+    When a DID Document is serialized by a producer into a compliant
+    representation, any reserved properties not for the current
+    representation MUST NOT be used.


Suggested change

representation MUST NOT be used.

representation MUST NOT be present in the serialized representation.

I assume this means anywhere... in other words... no JSON-LD can be present ANYWHERE in a JSON DID representation... in other words... no @context nested inside a verificationMethod or service endpoint as well.... or does this only apply to the JSON path $["@context"] ?

Yeah, this is highly problematic language... it means that you can't envelope any JSON-LD/JSON Schema/etc... so any sub-dialect of JSON. Right off of the bat you can't envelope JWTs that carry VCs (as a simple example to demonstrate why we didn't have this language in the spec before). Seemingly simple restrictions like this end up becoming highly problematic as an ecosystem develops.

The complexity of generalizing the rule to filter out @context is going to result in unnecessary complexity in the specification that is going to have very negative knock-on effects to any sub-dialect of JSON, like JSON Schema.

@OR13 No, it means as a property, which is a top-level element in the document. Properties can have as many of whichever sub-properties they want.

I disagree with the doom and gloom from @msporny -- now that we have a way to define a reserved property, this language actually limits the fall-on effects of keeping things out.

if its just a top level property... doesn't that mean that the JSON document MUST allow nested JSON-LD?...
Probably some additional language stating this would be necessary, as its not obvious today.... and then we're one logical step away from realizing that JSON-LD is JSON.... in the same way way that a tiger is a cat... we don't define all cats as not tigers... we should not be defining JSON as NOT JSON-LD.

But you'd be a bit surprised if the pet store gave you a tiger when you asked for a cat, right? Your metaphor is accurate but not relevant here. I understand that JSON-LD is syntactically JSON, but what you're missing is that by dying on that hill you're declaring, effectively, that all JSON processors now need to deal with JSON-LD stuff and expect it to show up. We decided strongly against this at the Amsterdam F2F, and the representation text which this PR strengthens came from that discussion.

The JSON document doesn't need to allow anything that's not defined on a property. If a property is defined in terms of JSON-LD then it's up to the property to define how to process its value. This is very different from managing the document itself.

would you be surprised to learn that Google and Microsoft return JSON-LD with content type application/json?

The JSON document doesn't need to allow anything that's not defined on a property.

I agree.... JSON Documents do not need to allow anything.... but they do allow anything, for extensibility... thats why JSON is valuable.

OR13 · 2020-09-15T23:30:57Z

index.html

+          <h2>Unknown Properties</h2>
+          <p>
+    When a DID Document is created by consuming a compliant representation, any properties
+    in the representation that are not known to the consumer MUST be added to the


what is "not known to the consumer"... is this essentially "any properties"... I assume the consumer adds known properties as well.... can we define this more concretely?

Agreed. Shouldn't this just be "preserve all properties"?

"unknown" means "not a property that the consumer has been configured to understand (ie, core data model, registry, local extensions) and not a property that's known to be reserved (ie, registry, local extensions)". It'll have a data type in the source format that needs to map to a default data type in the abstract data model.

I think you're both conflating the JSON document with the abstract data model. They are not the same.

what is a "local extension" thats not defined anywhere.... in JSON, we just add properties and they are preserved... if you are suggesting that somehow did documents deviate from what JSON developers will expect... i will continue to object.

OR13 · 2020-09-15T23:58:11Z

index.html

@@ -2438,12 +2473,12 @@ <h3>
        </p>

        <p>
-The value of the <code>@context</code> object member MUST be ignored as this is
-reserved for JSON-LD consumers.
+If a <code>@context</code> object member is found, the consumer MUST raise


Suggested change

If a <code>@context</code> object member is found, the consumer MUST raise

redundant normative statement.

Yes it's redundant but also true, and it's something that JSON production and consumption will specifically need to look out for.

Only in a world where DID Core redefines what JSON means.

OR13 · 2020-09-15T23:58:26Z

index.html

-The value of the <code>@context</code> object member MUST be ignored as this is
-reserved for JSON-LD consumers.
+If a <code>@context</code> object member is found, the consumer MUST raise
+an error, as this field is reserved for JSON-LD processing.


Suggested change

an error, as this field is reserved for JSON-LD processing.

OR13 · 2020-09-16T00:07:53Z

index.html

+    When a DID Document is created by consuming a compliant representation, if any 
+    properties in the representation are reserved by another representation, as 
+    indicated in the registry entry for representations, then the consumer MUST
+    raise an error. 


I take it this statement means that 2 representations cannot both contain the same reserved property.... so if JSON-LD reserved $schema it could not be present in JSON Documents or vice versa?

I suppose this will cut down on representations and reservations :)

That's not the intent, as a representation could declare that it uses another representation's reserved property in a way that's compatible with that. The important thing about a reserved property is that it doesn't end up in the abstract data model.

Ahh, deny-list and allow-list inheritance... the road to a super confusing data model...

But I do agree that there will be representations that want to specifically support things allowed and prohibited by other representations... the first one would be application/did+vanilla+json... which allowed directives like @context and $schema.

The important thing about a reserved property is that it doesn't end up in the abstract data model.

What concrete attack or issue are we attempting to prevent here?

No inheritance. Concrete declaration of using the same property to mean the same thing. So JSON-LD and CBOR-LD could both claim "@context" as reserved, which signals everyone else that if they see that field and don't know what to do with it (ie, it's not reserved for them also) then it's an error.

JSON does not throw errors when random properties are added.... what you are describing is not JSON.

OR13

I'm in favor of all these changes, but not for the content type application/did+json....

I would accept these changes if they came with a profile which described the prohibition of all processing directives... for example

application/did+json; profile="https://www.hyperledger.org/ns/hyperledger-json"

As is, this PR does not address unknown properties, it simply describes a class of "known reserved" property and then prevents interoperability and extensibility that relies on the presence of those properties.

OR13 · 2020-09-16T01:36:44Z

index.html

+    When a DID Document is serialized by a producer into a compliant
+    representation, any unknown properties MUST be serialized into the
+    representation. When the data type for the property is unknown by the producer,
+    a string serialization of the property SHOULD be used.


how is it possible that the abstract data model would have properties of unknown types?... seems like a security issue to serialize an unknown type to a string.... Buffer.from("40636f6e74657874", "hex").toString()...

Same question -- if an unknown property value can't be converted into the abstract data model such that it can be re-exported to the same representation losslessly, there is a fundamental problem. You can either take the input representation and losslessly place it into the abstract data model (so you can at least pump it back out to the original incoming representation) or you can't. If you can't, you should probably throw an error.

The abstract data model is going to have a type for every property, because it's in the abstract data model.

This is why I have argued for the properties within the data model to have abstract types defined, and all representations have clear and deterministic mappings for all known types into and out of the serializations.

The abstract data model is going to have a type for every property, because it's in the abstract data model.

This PR states:

When a DID Document is created by consuming a compliant representation, any properties in the representation that are not known to the consumer MUST be added to the data model as properties without any further processing on the property value.

It won't have a type for unknown properties... right? What am I missing?

I would have loved for formal types for the abstract data model... I even tried to help define them using JSON Schema... nobody contributed to it... I removed it.

If the abstract data model cannot represent a type, it cannot support representations that rely on that type... which means right now, that we have an abstract data model with no type system, and no way to actually answer this question....

If we say the abstract data model is modeling JSON, we know it supports JSON types... if we say its INFRA, we know it supports infra types...

It's not possible for it to have a property with an unknown type.... unless the abstract data model itself does not live in a type system....

What type system does the abstract data model live in? Haskell? INFRA? JSON? CBOR?... if we don't know, we can't answer the question about which representations are possible with the abstract data model.

Haskell would work for everything (supports CBOR / JSON)...

JSON would not work for CBOR.

CBOR would work for JSON.

INFRA.... TBD... not sure about .... numbers... lol.

Imo this is solved in 1 of 2 ways...

either we define the abstract data model as anything that can be represented in JSON (for now, and later we fix that to allow CBOR / ProtoBuff)....

or we forbid "unknown" properties from being round tripped from it... because serializing an unknown type to a string is dangerous.... what happens to protobuff / cbor ?... a bunch of nonsense binary encoded as a utf-8 string?...doesn't that mean we can't round trip a binary format?

I would actually be in favor of saying that JSON is the abtract data model (for now)... then this problem goes away for JSON and JSON-LD.

In the future, we can define a new abstract data model that is a super set of JSON that can be mapped to it... and have backwards compatibility :)

The abstract data model currently uses Infra types for all declared properties, as well as metadata, so I don't see why we wouldn't do that here. We also have additional formats like URIs and Dates that are easily encoded as formatted strings.

When you create the abstract data model, your source for that (JSON, CBOR, Whatever) will have a data type associated with it. The consumption rules MUST declare a default abstract data type to map that concrete data type to. Same idea with the production direction: all representations declare rules for mapping all allowed abstract data types.

so I assume infra is capable of preserving strings, arrays and maps.... so let it preserve them, and let did method authors decide if they want to include processing directives or not in THEIR json :) its not for this WG to redefine what JSON is.

OR13 · 2020-09-16T02:35:53Z

Even without considering the error cases, this PR and the current state of JSON representation in did core remain problematic... here are some simple tests which demonstrate what I mean:

const crypto = require("crypto");

const digest = (representation) => {
  return crypto.createHash("sha256").update(representation).digest("hex");
};

const JSON_WITHOUT_DIRECTIVES_PRODUCER = (abstract) => {
  delete abstract[Buffer.from("@context")];
  delete abstract[Buffer.from("$schema")];
  if (!abstract[Buffer.from("id")]) {
    throw new Error("DID Document requires id");
  }
  return JSON.stringify(abstract, null, 2);
};

const JSON_WITH_DIRECTIVES_PRODUCER = (abstract) => {
  if (!abstract[Buffer.from("id")]) {
    throw new Error("DID Document requires id");
  }
  return JSON.stringify(abstract, null, 2);
};

const JSON_WITH_DIRECTIVES_CONSUMER = (representation) => {
  const parsed = JSON.parse(representation);
  if (!parsed.id) {
    throw new Error("DID Document requires id");
  }
  return parsed;
};

const JSON_WITHOUT_DIRECTIVES_CONSUMER = (representation) => {
  const parsed = JSON.parse(representation);
  if (parsed["@context"]) {
    throw new Error("DID Document is not valid");
  }
  if (!parsed.id) {
    throw new Error("DID Document requires id");
  }
  return parsed;
};

it("json with directives", () => {
  // let the abstract data model be an infra object.
  const abstract = {
    [Buffer.from("@context")]: Buffer.from(
      "https://example.com/context/v1"
    ).toString(),
    [Buffer.from("$schema")]: Buffer.from(
      "https://example.com/schema/v1"
    ).toString(),
    [Buffer.from("id")]: Buffer.from("did:example:123").toString(),
  };
  const digested_abstract = digest(JSON.stringify(abstract, null, 2));
  const produced_representation = JSON_WITH_DIRECTIVES_PRODUCER(abstract);
  const consumed_representation = JSON_WITH_DIRECTIVES_CONSUMER(
    produced_representation
  );
  const digested_consumption = digest(
    JSON.stringify(consumed_representation, null, 2)
  );
  expect(digested_abstract).toEqual(digested_consumption);
});

it("json without directives", () => {
  // let the abstract data model be an infra object.
  const abstract = {
    [Buffer.from("@context")]: Buffer.from(
      "https://example.com/context/v1"
    ).toString(),
    [Buffer.from("$schema")]: Buffer.from(
      "https://example.com/schema/v1"
    ).toString(),
    [Buffer.from("id")]: Buffer.from("did:example:123").toString(),
  };
  const digested_abstract = digest(JSON.stringify(abstract, null, 2));
  const produced_representation = JSON_WITHOUT_DIRECTIVES_PRODUCER(abstract);
  const consumed_representation = JSON_WITHOUT_DIRECTIVES_CONSUMER(
    produced_representation
  );
  const digested_consumed_production = digest(
    JSON.stringify(consumed_representation, null, 2)
  );
  expect(digested_abstract).not.toEqual(digested_consumed_production);
});

Essentially, we have broken JSON by trying to avoid requiring JSON-LD.

JSON with directives is what JSON developers will expect... JSON without directives is what DID Core has currently, and what this PR attempts to pour additional concrete around.

OR13 · 2020-09-16T12:55:32Z

Also, google knowledge graph https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html

(which returns JSON-LD ONLY):

GET https://kgsearch.googleapis.com/v1/entities:search?query=taylor+swift&key=API_KEY&limit=1&indent=True

returns

content-type: application/json; charset=UTF-8

... in other words, they do exactly what I and everyone else will do... return JSON when asked for JSON.

If the DID Core Spec remains the way it is / this PR were merged, JSON representations of did documents will never be capable of being represented in the knowledge graph.

JSON should have not restrictions beyond the presence of an id property....

If we want to define a profile of JSON which excludes processing directives, I am all for it.

msporny · 2020-09-16T13:41:53Z

index.html

+         </h3>
+
+         <p>
+This representation format reserves the <code>@context</code> property.


If we are going to add this section, which is a terrible idea, here's the full list of reserved properties:

https://www.w3.org/TR/json-ld/#syntax-tokens-and-keywords

So that means:

@base @container @context @direction @graph @id @import @included @index @json @language @list @nest @none @prefix @propagate @protected @reverse @set @type @value @version @vocab

This is going to complicate JSON-only processors to the point where I would question whether implementing a JSON-only processor would be worth it.

The simpler approach is just to preserve everything when going from one format to the abstract data model, and if you can't, throw an error.

I should also point out that this PR will require us to have prior knowledge of other representation formats... for example, if someone were to come up with a JSON Schema representation format (in order to include a signed validation mechanism with their DID Documents), then we'd also have to make sure that the JSON-only implementation would need to be aware of the following:

string number integer boolean object array null id type description default patternProperties dependencies extends $ref $schema allOf, anyOf, oneOf enum format minLength, maxLength pattern minimum, maximum exclusiveMinimum, exclusiveMaximum multipleOf array items minItems, maxItems uniqueItems additionalItems object properties minProperties, maxProperties patternProperties additionalProperties required

... we're clearly not going to do that as the implementation burden is very high and it conflicts directly with some of the property names we've already picked for the DID Core abstract data model (such as id, type, etc.). The reason that it would be fine to preserve these things is because nesting matters. JSON, JSON-LD, and other JSON formats would be just fine w/ embedded JSON Schema (and embedded JSON-LD) because those formats were designed around two primary features of JSON -- not dropping anything and nested objects providing context.

I don't buy the slippery slope argument here. It's not true that if we do one thing we must do everything possible forever in all time.

well sure, we can have a deny-list that is short, or a long one... both are still worse than having an allow-list... IMO, probably best to leave the security engineering to the did method authors... I don't have confidence this WG has the expertise necessary to design a safe deny-list.

OR13 · 2020-09-16T15:43:20Z

https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html#whitelisting-vs-blacklisting

It is a common mistake to use deny list validation in order to try to detect possibly dangerous characters and patterns like the apostrophe ' character, the string 1=1, or the <script> tag, but this is a massively flawed approach as it is trivial for an attacker to bypass such filters.

Allow list validation is appropriate for all input fields provided by the user. Allow list validation involves defining exactly what IS authorized, and by definition, everything else is not authorized.

If the JSON representation is going to rely on a deny-list and allow unknown properties as its default... it is specifically going against security engineering best practices.

OR13 · 2020-09-17T13:57:50Z

This PR does not address the security concerns raised in #398

msporny · 2020-10-01T18:27:03Z

This PR is still under debate... we had a special topic call to try to resolve some things:

https://www.w3.org/2020/09/29-did-topic-minutes.html

rhiaro · 2020-11-16T12:36:24Z

I believe this is now superceded by #454

msporny · 2020-12-01T15:32:02Z

This PR will be closed once #454, which defines what must be done with "unknown/reserved/ignored" properties, is merged.

msporny · 2020-12-01T18:49:02Z

#454 has been merged, which achieves the intent of this PR. Closing.

add discussion on unknown and reserved properties

8aec4a2

jricher mentioned this pull request Sep 15, 2020

Address Unknown Properties in JSON-LD #388

Closed

OR13 reviewed Sep 15, 2020

View reviewed changes

OR13 reviewed Sep 16, 2020

View reviewed changes

OR13 requested changes Sep 16, 2020

View reviewed changes

OR13 reviewed Sep 16, 2020

View reviewed changes

msporny reviewed Sep 16, 2020

View reviewed changes

msporny added the do not merge Do not merge - waiting on resolution to issue label Oct 12, 2020

msporny added the needs-special-call Needs a special topic call to make progress label Oct 26, 2020

rhiaro added the pending close Issue will be closed shortly if no objections label Nov 16, 2020

msporny closed this Dec 1, 2020

	representation MUST NOT be used.
	representation MUST NOT be present in the serialized representation.

add discussion on unknown and reserved properties #394

add discussion on unknown and reserved properties #394

Conversation

jricher commented Sep 15, 2020 • edited by pr-preview bot

jricher commented Sep 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msporny Sep 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jricher Sep 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OR13 left a comment • edited

Choose a reason for hiding this comment

OR13 Sep 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OR13 Sep 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OR13 commented Sep 16, 2020 • edited

OR13 commented Sep 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OR13 commented Sep 16, 2020

OR13 commented Sep 17, 2020

msporny commented Oct 1, 2020

rhiaro commented Nov 16, 2020

msporny commented Dec 1, 2020 • edited

msporny commented Dec 1, 2020

jricher commented Sep 15, 2020 •

edited by pr-preview bot

msporny Sep 16, 2020 •

edited

jricher Sep 16, 2020 •

edited

OR13 left a comment •

edited

OR13 Sep 16, 2020 •

edited

OR13 Sep 16, 2020 •

edited

OR13 commented Sep 16, 2020 •

edited

OR13 commented Sep 16, 2020 •

edited

msporny commented Dec 1, 2020 •

edited