Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated cbor section #420

Merged
merged 1 commit into from Dec 1, 2020
Merged

updated cbor section #420

merged 1 commit into from Dec 1, 2020

Conversation

jonnycrunch
Copy link
Contributor

@jonnycrunch jonnycrunch commented Sep 29, 2020

updated CBOR to expand discussion regarding consumption and production.


Preview | Diff

@@ -0,0 +1,58 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, not sure why the .json-ld files got included.

@iherman
Copy link
Member

iherman commented Sep 29, 2020

A formal comment: the DagCBOR is referenced to as a possibility for a special form of CBOR encoding in 6.3.3.1., which is a normative section. This means that DagCBOR must have a normative reference (and hopefully a reference that is indeed stable enough to be considered as normative).

That being said, I wonder whether the whole of section should really be normative. Normative means that it must have testable implementations of that version, and two mutually independent ones at that. It means having a separate serialization format alongside the vanilla CBOR. I wonder whether we should not instead label 6.3.3 as informative (while possibly keeping it in the document).

</section>
</section>
<section>
<h3>Converting Data between Core Representations</h3>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section may be problematic.. Even though CBOR-to-JSON conversion may be a special case where optimizations are possible, "converting" between representations should always go via the abstract data model (ADM). I.e. to "convert" from CBOR to JSON, you would first run the CBOR consumption process, and then the JSON production process. The end result should be the same anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the core data model is defined by https://w3c.github.io/did-core/#data-model (essentially 2 sentences)... I find this paragraph extremely helpful.

If we just replace the word "JSON" with "INFRA" I think most of this still works.... I would prefer to see this section retained, with changes to make it between CBOR and INFRA, and I would argue that the other representations need a similar section or should be removed.

Suggest changing the section name to "Abstract Data Model Conversion" or similar.

@jonnycrunch I assume CBOR -> ADM -> JSON will retain all "properties", you may want to add an explicit sentence about that here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's changed to say "CBOR to INFRA" instead of "CBOR to JSON", then shouldn't that simply be the "Consumption" section (which right now seems almost empty in this PR)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OR13 , it would be cleaner to use CDDL as the ADM, in which case this is simple. I am new to writing spec-text so I may have glossed over the details and would be grateful for any help. personally, I find the definition of an Abstract Data Model, ... well too abstract, which is why I like the Data Definition Language that CDDL provides much more precise and something I can wrap my head around and do testing, while it does provide great support for explicit Type definition, Classes and well-defined required and optional Properties across representations. so CBOR <--> CDDL <--JSON --> and even YAML.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonnycrunch I completely agree, would prefer to use CDDL instead of english and handwaving at INFRA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not CDDL should replace INFRA as the language of the ADM is a question that needs to be resolved by the working group. Until such time as the working group decides to use CDDL in the ADM, I agree with @OR13 that replacing "JSON" with "INFRA" is the appropriate thing to do here. And I agree with @peacekeeper that a description of CBOR -> INFRA belongs in the consumption section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this section could be as simple as
"Conversion from the CBOR representation to another supported representation is a process of using the CBOR consumption rules defined here, followed by the using the production rules for the chosen representation. Conversely, conversion from another representation to CBOR is a process of following that representation's consumption rules, followed by using the CBOR production rules defined here."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would be helpful to flush out the Abstract Data Model and I'd like to make my case that CDDl as a Data Definition Language can stand in for the data model aspects of the ADM as I have demoed before. Happy to present next week for the TPAC sessions next week to make my argument. @cabo, as I have discussed with the Chair and with their approval, I'd like to extend an invitation to be an external Invited Expert and help use through this dilemma. Despite being invited, there is a formal process to accept and join our working group.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the conversion section should be removed from here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this this section doesn't make sense given the direction of the specification. We have an ADM for a reason and general rules for production/consumption as well. The Editors will be cleaning these sections up in future PRs.

index.html Outdated
<li>
Indefinite-length items must be made into definite-length items.
</li>
<li>Duplicate key in the same CBOR map MUST through and Error.</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

through -> throw

index.html Outdated Show resolved Hide resolved
index.html Outdated

<ul>
<li>Property names MUST be represented as a byte string array (major type 2) and contain only UTF-8 strings. </li>
<li>All member properties of a CBOR encoded DID document MUST be either represented in the `@context`, be present in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would a CBOR representation have a @context? Is this the same as CBOR-LD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peacekeeper

Totally valid to have @context in CBOR representation. That is why I included:

The property name @context MAY be used in the CBOR map and 
has the same semantic meaning as JSON-LD producers of a DID document in JSON representation.

As CBOR can support all JSON data types (including JSON-LD) for translation between the DID document model as the JSON generic data model is just a subset of the CBOR generic data model as such including the @context, which gives semantics to the properties, but doesn't force json-LD processing. My understanding of CBOR-LD is externally dictionary by reference using the @context in combination with an algorithm for iterating over properties and will require proper declaration to give the appropriate syntax for CBOR-LD processing required, something like content-type = application/cbor+ld. We can do this, I think, with the current wording, but will eventually need to add that subsection on how that includes the appropriate CBOR tags and algorithm, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I understand that all JSON including JSON-LD can be directly mapped to CBOR. But if we define a CBOR representation of the DID document ADM, then the production rules should clearly say if a @context gets added or not.

The CBOR production rules should also make it clear whether or not @context gets added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peacekeeper why? should every section specifically state if properties related to processing directives are included or excluded?

Seems better to just say what is required, and how preserve by default works.

In a world where there are 10 representations, the approach of having each section comment about the other sections will not work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @OR13 -- what is being suggested by @peacekeeper is not a scalable approach; it would require representations to know about other representations, which is strange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, the ADM using JSON-LD is strange to me as evident by the @type issue that we are currently having. @peacekeeper, I think that from the discussion regarding ADM that we agreed that @context is optional in the JSON representations. It is only when you combine it with MIME = `application/did+json+ld that you explicitly state that JSON-LD processing is required.

Copy link
Contributor

@OR13 OR13 Oct 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically JSON-LD processing is still not required with application/did+json+ld per the spec language today, its a MAY not a MUST IIRC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, thanks for the clarification!

tag 98 for CBOR Object Signing and Encryption (COSE) [[RFC8152]]
</p>
<p>
When interoperating with a text-based representation such as JSON or JSON-LD it is helpful to utilize additional sematics to aid convertion. CBOR Tags numbers 21 to 23 indicate that a byte string value in DID document represented in CBOR might require a specific encoding to represent properly in JSON.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language like this should really be avoided. The point of the ADM and production and consumption processes is really that you should be able to losslessly convert between ANY of the representations. If you add custom instructions for the special CBOR->JSON case, then would you also need custom instructions for every single pair of representations e.g. CBOR->YAML or CBOR->JSON-LD or CBOR->XML, etc.?

I think the Production section in this PR is really strong (perhaps minus the mention of @context which I don't really understand), but this section here about the special CBOR->JSON conversion case should be removed, and the "Consumption" section should be filled instead to explain how to go from a CBOR DID document to the ADM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peacekeeper I get where you are coming from. My mandate was lossless conversion between JSON <--> CBOR and this give specific instructions. I'm having a bit of difficulty with the Abstract Data Model as being too abstract for my taste. If we were to focus on CDDL as representative of the Abstract Data Model, then this would be effortless. CBOR <--> CDDL <--> JSON <--> CDDL <--> YAML. Being new to spec text, I welcome help to nail this down and make it more concrete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this section helpful, especially given the lack of clarity regarding the ADM... I think its possible it could be refactored into the "Data Model Section" and then referenced there, but thats probably not a good idea for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this information may be valuable, the mandate was:

Clear specification in DID Core for producing and consuming CBOR such that it can be "losslessly converted" (via the abstract data model) to the other representations.

Any additional information or guidance for how to better ensure that the CBOR values may be converted into JSON might fit better in the implementation guide.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely should not be defining a direct conversion like this, or encouraging people to do this kind of thing. This is a semantically-driven specification and so the ADM is the only defined method of translation. Anything else (including treating JSON-LD as plain JSON) is liable to work based on luck.

<section>
<h4>COSE signatures</h4>
<section>
<h3>CBOR to JSON</h3>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, highly problematic because this is going from CBOR right through to JSON -- not via Abstract Data Model. if we go this route, we'll have to define NxN algorithms for each representation to go to each other representation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect every developer who reads the spec to essentially do this... the ADM is the kind of complexity people avoid reading, and just guess at... in this case, the guess accurate, you can in fact convert CBOR to JSON or vice versa....without reading about INFRA or the opinions of this WG on data models... we should avoid making the "guessing" incompatible with normative statements, but I think it might also be wise to specifically call out sections like this with a disclaimer that "you have to go through the ADM to convert CBOR to JSON".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a section in the CBOR spec that speaks to this -- can't we just refer to that section?

https://tools.ietf.org/html/rfc7049#section-4

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to https://www.ietf.org/archive/id/draft-ietf-cbor-7049bis-16.html which is already in RFC-EDITOR stage and probably will have an RFC number before you are done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second @msporny 's concerns.
Representation descriptions should define production and consumption rules to and from the ADM, and should not include translation rules between specific representations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to argue for CDDL to facilitate the data representation aspects of the Abstract Data Model and thus JSON <--> CDDL <--> CBOR. Infra just isn't doing it and I think CDDL will.

Copy link
Member

@msporny msporny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm torn about this PR. It's good in that it provides a concrete and fairly detailed approach to how a CBOR encoding could work. So, many thanks to @jonnycrunch for proposing something concrete.

The problem with the PR is that it is providing a CBOR encoding with none of the advantages of CBOR. The payload size isn't greatly reduced, which is one of the primary reasons to use CBOR. I expect that there will be multiple -1s to the approach because it exposes that the group fundamentally does not understand why developers use CBOR.

index.html Outdated
companyURL: "https://www.consensyshealth.com",
w3cid: 95341
}
name: "Jonathan Holt, DO, MS",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add yourself as an Editor or Author of a spec. Those decisions are WG decisions and happen toward the end, when we can holistically look at who has contributed what to the specification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the amount of work I've put into this document, it only seems fair.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the amount that I have contributed, only seems fair.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at what others in the group have contributed to the specification... there are many others that are not listed that have (arguably) contributed more: https://github.com/w3c/did-core/graphs/contributors?from=2017-07-02&to=2020-10-08&type=c

The goal here is that everyone that should be recognized for contributing to the specification will be recognized in time, and doing so is a WG decision after (ideally) taking a data-based approach to contributions across all areas (issues, PRs, calls, etc.). We typically do this right before we publish the Proposed Recommendation (many months from now). Until then, adding and removing people from the authors list tends to lead to a lot of gnashing of teeth.

If you feel strongly about this, you could always escalate it to the Chairs to get a Chair opinion on the matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brentzundel @burnburn for future reference, if we retain CBOR, jonathan holt should be listed as a contributor... potentially he should regardless, given how little contribution this note has seen from WG members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msporny I'm actually already listed as a contributor, this PR only updates my affiliation. Given the amount that I have contributed, it seems only fair.

Copy link
Member

@msporny msporny Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh -- I missed that. Looks like it happened here: 7a4bac4 ... which never should have gotten into the spec.

I had asked that you remove your name at that point as well and thought you had done that:

https://github.com/w3c/did-core/pull/282/files#diff-0eb547304658805aad788d320f10bf1f292797b5e6d745a3bf617584da017051R127-R132

I'll raise the issue with the chairs and see how they want to proceed.

In the meantime, since the text is in the spec, we should update it to your latest affiliation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonnycrunch Manu is right that individual contributors should NEVER add themselves to the author/editor/etc. lists. These determinations are complex and can best be handled as we go to CR or even PR. @manu , since Jonathan's name is currently there please update his affiliation. @jonnycrunch understand that the lists of editors, authors, and contributors will be updated later and your name may shift to a different list or drop off as appropriate at that time. I am speaking in my official capacity as Co-chair.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@burnburn wrote:

@msporny, since Jonathan's name is currently there please update his affiliation.

Acknowledged, will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second what @burnburn has said.
The final determination of who will be listed as an author/editor/etc. has yet to be made and is complex. The editors and chairs will be the ones to make the determination, which will be based primarily on data-driven evaluation of contributions.

@OR13
Copy link
Contributor

OR13 commented Oct 8, 2020

The problem with the PR is that it is providing a CBOR encoding with none of the advantages of CBOR.

I could say the same thing of JSON (for which 0 examples exist today, at least CBOR has 1).

The question remains does this CBOR description preclude DAG_CBOR / CBOR-LD / FancyCBOR2025 ?

If it does than IMO, its a mistake, but if it does not preclude them, and defines a simple first approach to CBOR, I think we should take the PR.

@msporny
Copy link
Member

msporny commented Oct 8, 2020

I could say the same thing of JSON (for which 0 examples exist today, at least CBOR has 1).

Yes, this is true.

The question remains does this CBOR description preclude DAG_CBOR / CBOR-LD / FancyCBOR2025 ?

It does not preclude DAG_CBOR / CBOR-LD / FancyCBOR2025 as long as it has a specific mimetype.

I'll be sad if this goes through and grabs the application/did+cbor MIME Type...

That said, this PR is better than the current CBOR section, which DOES preclude CBOR-LD / FancyCBOR2025.

If it does than IMO, its a mistake, but if it does not preclude them, and defines a simple first approach to CBOR, I think we should take the PR.

The question is -- will anyone find value in the representation? Other than "Hey, I implemented application/did+cbor! (and it gives me no benefit beyond application/did+json, application/did+ld+json)...

@jonnycrunch
Copy link
Contributor Author

The problem with the PR is that it is providing a CBOR encoding with none of the advantages of CBOR.

@msporny, I disagree

According to the CBOR RFC 7049 authors including the late @jimsch,

"Remember that CBOR prioritizes the simplicity of the encoder and decoder over the size of the encoded data."

That said, it doesn't preclude additional representation and flavors of CBOR including CBOR-LD, dagCBOR, etc which can take advantage of data compaction of CBOR. But out of the gate, with this approach implementors get about a 25% size reduction compared to JSON approaches. This helps implementations such as myself with the cost of overhead of storing DID documents immediately and has a framework for use to move forward with additional more compact representations.

@OR13
Copy link
Contributor

OR13 commented Oct 9, 2020

I'll be sad if this goes through and grabs the application/did+cbor MIME Type...

I don't agree.... this is actually what I would expect, I would expect:

application/did+cbor => CBOR
application/did+ld+cbor => CBOR-LD
application/did+dag+cbor => DAG_CBOR

index.html Outdated Show resolved Hide resolved
</p>
A DID document encoded in CBOR must be represented as a CBOR Map (major type 5) conforming to [[RFC7049]].
Similar to the JSON encoding of a DID document, all top-level properties of the DID document MUST be represented by using the
property name as the name of the member of the CBOR map. The values of properties of the data model described in <a href="#core-properties"> § 5. Core Properties</a>, including
Copy link
Member

@msporny msporny Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should point to the CBOR spec for all type definitions. See how we do that for INFRA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but it just highlighted all of the problems of the ADM including the lack of clarity on number representation that we discussed and the gap with INFRA. I added a bunch more details regarding bigNum and BigFloat and yet we don't need those types in the specification. If we want to dwell on these, then I can make lossless conversions between CBOR and JSON via an ADM (ideally in CDDL), but it may be a exercise in futility.

index.html Outdated
all extensions, MUST be encoded in CBOR [[RFC7049]] by mapping property values to CBOR types as follows: </p>

<ul>
<li><a href="https://tools.ietf.org/id/draft-ietf-cbor-7049bis-12.html#section-3.2.2">Numeric</a> values representable as IEEE754 MUST be represented as a Number type (major type 0 or 1).</li>
Copy link
Member

@msporny msporny Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the links to CBOR are non-normative, they point to the BIS version, which is not an IETF standard yet. We need to point to the existing specification.

For example, this link to numbers is already broken as it points to the section on arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific line states that floating point numbers should be expressed as integers... which would lose all precision for those IEEE754 values. We need to be clear on how floats are expressed, how integers are expressed, and what the acceptable value ranges are if an implementation hopes to round-trip information between different languages -- ECMA defines these ranges, see the JSON spec for how they talk about acceptable ranges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, thanks for catching that, floating point numbers are major type 7. will update and point to normative version. As I mentioned numbers are a pain and I think we just need to settle on a representation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language for floating points is that to enforce precision even when the floating point value is equivalent to an integral value, ie. 1.0 is still 1.0 not 1, which is just explicitly stating how to handle numbers and thus achieve lossless conversion while retaining canonical representation.

index.html Outdated
<li><a href="https://tools.ietf.org/id/draft-ietf-cbor-7049bis-12.html#section-3.2.2">Sequence</a> value MUST be represented as an Array type (major type 4) .</li>
<li><a href="https://tools.ietf.org/id/draft-ietf-cbor-7049bis-12.html#section-3.2.2">Unordered sets</a> of values MUST be represented as an Array type (major type 4).</li>
<li><a href="https://tools.ietf.org/id/draft-ietf-cbor-7049bis-12.html#section-3.2.2">Sets of properties</a> MUST be represented as an map type (major type 5).</li>
<li><a href="https://tools.ietf.org/id/draft-ietf-cbor-7049bis-12.html#section-3.2.2">Empty values</a> MUST be represented as a null literal (primitive type 22). </li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain that the abstract data model supports null values... I don't think we have anything in the spec on expressing null values (as we haven't had that need). So, the option here is to stay silent or to define what a null value is in the abstract data model. I suggest we stay silent, we don't have a use case for a null value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the counter argument is What happens when a DID document does not contain a required property after conversion to a different core representation?. Seem like Null would be appropriate then to explicitly state that this required value as determined in the ADM isn't there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out there is a value of undefined, which may be more appropriate. But, I'm open for additional discussion on how to handle this in the context of unknown properties. I'm also more in line with @OR13, that it is critical to explicitly drop unknown properties rather than preserve/retain them as this will lead to a buffer overflow and definite security vulnerability.

index.html Outdated Show resolved Hide resolved
index.html Outdated
A DID document encoded in CBOR must be represented as a CBOR Map (major type 5) conforming to [[RFC7049]].
Similar to the JSON encoding of a DID document, all top-level properties of the DID document MUST be represented by using the
property name as the name of the member of the CBOR map. The values of properties of the data model described in <a href="#core-properties"> § 5. Core Properties</a>, including
all extensions, MUST be encoded in CBOR [[RFC7049]] by mapping property values to CBOR types as follows: </p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should really be talking about how the INFRA values map to CBOR... we don't define any of the CBOR values in the spec. The mappings are INFRA -> CBOR, not CBOR Abstract Type -> CBOR representation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think INFRA is still lacking in some areas for all mappings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bunch more explicit declarations for numbers, again an area that INFRA is lacking.

index.html Outdated
<li><a href="https://tools.ietf.org/id/draft-ietf-cbor-7049bis-12.html#section-3.2.2">Empty values</a> MUST be represented as a null literal (primitive type 22). </li>
<li><a href="https://tools.ietf.org/id/draft-ietf-cbor-7049bis-12.html#section-3.2.2">Other values</a> MUST be represented as a String type. (major type 5)</li>
<li><a href="https://tools.ietf.org/id/draft-ietf-cbor-7049bis-12.html#section-3.2.2">Indefinite-length items</a> MUST be made definite legth.</li>
<li><a href="https://www.rfc-editor.org/rfc/rfc3339.txt">Datetime </a> MUST be encoded in UTF-8 (major type 3). This datetime value MUST be normalized to UTC 00:00, as indicated by the trailing "Z".</li>
Copy link
Member

@msporny msporny Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises the number of options for Datetime to 3... ISO8601, RFC3339, and XMLDateTime -- we need to pick one as a group. The spec currently uses XML DateTime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is even more constrained as it is XMLDateTime with timezone set to Z.

index.html Outdated Show resolved Hide resolved
index.html Outdated
<li>Property names in each CBOR map MUST be unique. </li>
<li>Integer encoding MUST be as short as possible.</li>
<li>The expression of lengths in CBOR major types 2 through 5 MUST be as short as possible. </li>
<li>All floating point values MUST be encoded as 64-bits, even for integral values.</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to discuss loss of precision at 56-bits in environments like node.js / javascript / browser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see: #435

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, see #361

index.html Outdated
</li>
</ul>
<p>
The property name `@context` MAY be used in the CBOR map and has the same semantic meaning as JSON-LD producers of a DID document in JSON representation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange for CBOR to talk about JSON-LD. While the statement is true, it should probably just stay silent on this item.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is in-line with the conclusion of last Thursday's special topic call. @context MAY exist in application/json DID documents, NO json-ld processing is expected. Same is true of the application/cbor DID document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to talk about what to do when consuming properties that are not defined in DID Core nor the registries, rather than specifically mentioning a particular property from another representation.

index.html Outdated
</p>

<p>
All properties of the DID document MUST be included in the root map (major type 5).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is a bit strange as it repeats something stated earlier in the document. I don't quite understand what you're going for here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just being explicitly clear on production rules for CBOR representation. Once we have an unambiguous ADM then it should be defined there.

index.html Outdated
Comment on lines 2747 to 2972
data that follows. Extensibility with CBOR tags also facilitates lossless conversion to
othere core representations. CBOR Tags number 21 to 23 indicate that a following byte string
might require a specific encoding when interoperating with a text-based
representation such as JSON. These tags are useful when an encoder knows that the
byte string data it is writing is likely to be later converted to a
particular JSON-based usage. These three tag numbers suggest conversions to three of the base data
encodings defined in [[RFC4648]]. Tag number 21 suggests conversion to
base64url encoding (Section 5 of [[RFC4648]]), where padding is not used
(see Section 3.2 of [[RFC4648]]); that is, all trailing equals signs
("=") are removed from the encoded string. Tag number 22 suggests
conversion to classical base64 encoding (Section 4 of [[RFC4648]] ), with
padding as defined in [[RFC4648]]. For both base64url and base64,
padding bits are set to zero (see Section 3.5 of [[RFC4648]] ), and
encoding is performed without the inclusion of any line breaks,
whitespace, or other additional characters. Tag number 23 suggests
conversion to base16 (hex) encoding, with uppercase alphabetics (see
Section 8 of [[RFC4648]]). Note that, for all three tag numbers, the
encoding of the empty byte string is the empty text string of other representations.
</p>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what I would need to do as an implementer with this text. How does my implementation change? I don't think any of this PR suggests that we use tags 21-23, so why are we talking about them here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CBOR tags facilitate a lossless conversion between core representations with use of semantic sugar. It would be helpful to implementors who need to represent binary data and give batteries included clear instruction for how to losslessly convert to other representations during DID Resolution. This section highlights the method to convert between core representations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the desire to provide a means for possible lossless conversion to another representation, but am unsure what this description has to do with CBOR Extensibility. Perhaps the section should be re-named?

Copy link
Member

@msporny msporny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general structure is fine, there are some details in the PR that need to be sorted. We probably also need to define a MIME Type for application/did+cbor, but that can happen in another PR.

<section>
<h4>COSE signatures</h4>
<section>
<h3>CBOR to JSON</h3>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a section in the CBOR spec that speaks to this -- can't we just refer to that section?

https://tools.ietf.org/html/rfc7049#section-4

@msporny
Copy link
Member

msporny commented Oct 26, 2020

"Remember that CBOR prioritizes the simplicity of the encoder and decoder over the size of the encoded data."

I was skeptical of this statement, so decided to go and look at how simple CBOR encoders/decoders are wrt. JSON encoders/decoders. Turns out that CBOR encoders/decoders are more complex, sometimes by an order of magnitude from JSON encoders/decoders. So, while that was a design goal of the CBOR WG, they failed to achieve something simpler than JSON in the end due to all the variational stuff done to ensure that encoding size was optimized (within reason).

For example, there are many JSON encoder/decoders that are implemented purely as C header files, and small ones for microcontrollers:

421 sloc -- https://github.com/rafagafe/tiny-json
1367 sloc -- https://github.com/cesanta/frozen

The CBOR encoder/decoders I found were multiple times that size and complexity, here are the best ones I could find:

785 sloc -- https://github.com/cabo/cn-cbor
2275 sloc -- https://github.com/intel/tinycbor

All that to say that the statement is a wash... you can do simple constrained encoder/decoders in JSON, JSON-LD, and CBOR. This isn't something that is only applicable to CBOR, it's applicable to each representation wrt. encoder/decoder simplicity.

@msporny
Copy link
Member

msporny commented Oct 27, 2020

@jonnycrunch -- apologies for the delay in merging... all three times that I've sat down to re-review this PR, I've been pulled away. I hope to do a final review on this during the week.

Copy link
Member

@brentzundel brentzundel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR introduces a lot of great content, but that the organization of that content needs some work.
I leave it to the editors to decide if it would be better to fix the organizational issues before or after merging this PR.

index.html Outdated
</li>
</ul>
<p>
The property name `@context` MAY be used in the CBOR map and has the same semantic meaning as JSON-LD producers of a DID document in JSON representation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to talk about what to do when consuming properties that are not defined in DID Core nor the registries, rather than specifically mentioning a particular property from another representation.

index.html Outdated
</li>
</ul>
<li>Duplicate key in the same CBOR map MUST throw an Error.</li>
<li>All member properties of a CBOR encoded DID document MUST be either represented in the `@context`, be present in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That this CBOR representation actively supports an @context property is strange to me.

index.html Outdated
Comment on lines 2747 to 2972
data that follows. Extensibility with CBOR tags also facilitates lossless conversion to
othere core representations. CBOR Tags number 21 to 23 indicate that a following byte string
might require a specific encoding when interoperating with a text-based
representation such as JSON. These tags are useful when an encoder knows that the
byte string data it is writing is likely to be later converted to a
particular JSON-based usage. These three tag numbers suggest conversions to three of the base data
encodings defined in [[RFC4648]]. Tag number 21 suggests conversion to
base64url encoding (Section 5 of [[RFC4648]]), where padding is not used
(see Section 3.2 of [[RFC4648]]); that is, all trailing equals signs
("=") are removed from the encoded string. Tag number 22 suggests
conversion to classical base64 encoding (Section 4 of [[RFC4648]] ), with
padding as defined in [[RFC4648]]. For both base64url and base64,
padding bits are set to zero (see Section 3.5 of [[RFC4648]] ), and
encoding is performed without the inclusion of any line breaks,
whitespace, or other additional characters. Tag number 23 suggests
conversion to base16 (hex) encoding, with uppercase alphabetics (see
Section 8 of [[RFC4648]]). Note that, for all three tag numbers, the
encoding of the empty byte string is the empty text string of other representations.
</p>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the desire to provide a means for possible lossless conversion to another representation, but am unsure what this description has to do with CBOR Extensibility. Perhaps the section should be re-named?

</section>
</section>
<section>
<h3>Converting Data between Core Representations</h3>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not CDDL should replace INFRA as the language of the ADM is a question that needs to be resolved by the working group. Until such time as the working group decides to use CDDL in the ADM, I agree with @OR13 that replacing "JSON" with "INFRA" is the appropriate thing to do here. And I agree with @peacekeeper that a description of CBOR -> INFRA belongs in the consumption section.

<section>
<h4>COSE signatures</h4>
<section>
<h3>CBOR to JSON</h3>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second @msporny 's concerns.
Representation descriptions should define production and consumption rules to and from the ADM, and should not include translation rules between specific representations.

tag 98 for CBOR Object Signing and Encryption (COSE) [[RFC8152]]
</p>
<p>
When interoperating with a text-based representation such as JSON or JSON-LD it is helpful to utilize additional sematics to aid convertion. CBOR Tags numbers 21 to 23 indicate that a byte string value in DID document represented in CBOR might require a specific encoding to represent properly in JSON.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this information may be valuable, the mandate was:

Clear specification in DID Core for producing and consuming CBOR such that it can be "losslessly converted" (via the abstract data model) to the other representations.

Any additional information or guidance for how to better ensure that the CBOR values may be converted into JSON might fit better in the implementation guide.

</section>
</section>
<section>
<h3>Converting Data between Core Representations</h3>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this section could be as simple as
"Conversion from the CBOR representation to another supported representation is a process of using the CBOR consumption rules defined here, followed by the using the production rules for the chosen representation. Conversely, conversion from another representation to CBOR is a process of following that representation's consumption rules, followed by using the CBOR production rules defined here."

@jonnycrunch
Copy link
Contributor Author

jonnycrunch commented Nov 5, 2020

@msporny section 4 of RFC 7049 give non-normative advice about converting between JSON and CBOR. It is up to spec implementers to be clear about what rules they want to explicitly convert between the two formats. Seems that this group is converging on an Abstract Data Model that I believe is too abstract to be effectively implemented. It would be great to formalize the ADM (including whether or not that includes the '@context`) then work on clear and precise rules such as some of the ones that I suggested in this PR.

with regard to MIME type: my previous PR had it included it,

see line 4686:

<section>

@cabo
Copy link

cabo commented Nov 5, 2020 via email

@msporny
Copy link
Member

msporny commented Nov 17, 2020

Now that #455 has been merged, I'm going to try to focus on this PR and work with @jonnycrunch to get it across the line.

My suggestion here is:

  1. For @jonnycrunch to fix the merge conflicts in this PR as well as the merge history to reflect what he wants to go into the main branch.
  2. For @jonnycrunch to rework the PR in whatever way he sees fit based on Rework Data Model section. #455 and Add representation language to address syntax and "unknown" properties. #454.
  3. For @jonnycrunch to mark the entire section as a work in progress via an issue marker.
  4. For the Editors to merge the PR as-is, as it's an improvement on the current CBOR section, and put in PRs to update the CBOR section to align with other data model changes in the specification.

I want to be very clear that this section isn't done, we're not getting good engagement to improve it (aside from the work that @jonnycrunch has done on it) and changes are going to have to be made to get it into shape for CR. I'm willing to do the work to make that happen... thus the request to merge as-is after @jonnycrunch makes a final pass over the PR.

Folks should object to this path forward for this PR if they have alternatives that may help us get better progress on the vanilla CBOR representation.

@OR13
Copy link
Contributor

OR13 commented Nov 19, 2020

Merge conflict and suggested changes need to be incorporated so we can get this in.

@talltree
Copy link
Contributor

I too am in favor of merging this as soon as the merge conflict and suggested changes can be attended to. The sooner the better.

Copy link
Contributor

@peacekeeper peacekeeper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to add this warning per the various comments in this PR, to point out that conversion must happen via the abstract data model, not between any two representations directly. With this added, I'd be happy to approve this PR.

index.html Outdated
</section>
<section>
<h3>Converting Data between Core Representations</h3>
<p></p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<p></p>
<div class="warning" title="Converting via DID document data model">
<p>
This section conflicts with other parts of the specification, since
conversion has to be defined not between any two representations directly,
but rather in terms of parsing and serializing via the abstract
DID document data model (see <a href="#data-model"></a> and
<a href="#representations"></a>). The content of this section can
potentially be adapted and re-used in other sections such as the one
about the CBOR representation (see <a href="#cbor"></a>).
</p>
</div>

@msporny
Copy link
Member

msporny commented Nov 30, 2020

Ping @jonnycrunch -- still waiting on merge conflicts to be resolved and the PR to be brought up to date with the rest of the specification.

@jonnycrunch
Copy link
Contributor Author

updated CBOR section with particular focus on CBOR <- ADM -> JSON language as per reviewer suggestions and merge conflicts with head branch. Please let me know if there are additional details that I missed.

@w3cbot
Copy link

w3cbot commented Dec 1, 2020

iherman marked as non substantive for IPR from ash-nazg.

@msporny
Copy link
Member

msporny commented Dec 1, 2020

There are still merge conflicts, possibly due to previously merged PRs, please fix the merge conflicts.

@jonnycrunch while you're fixing the merge conflicts, can you clean up the merge history so I don't have to squash-merge... namely, these commits (which modified files that shouldn't have been modified):

Delete package.json - jonnycrunch committed on Sep 29 - 7755fc1
Delete did-v1.jsonld - jonnycrunch committed on Sep 29 - a9dabdf
Delete did-v0.11.jsonld - jonnycrunch committed on Sep 29 - 77f4c39

There are also a lot of "Update index.html" commits that either need to have better commit comments, or be rolled up into a single commit. It's optional if you want to do this or not... if you don't do it, I'll just squash-merge everything together.

Comment on lines +2859 to +2861
<p>
The property name `@context` MAY be present in the CBOR representation of a DID Document and if present SHOULD be ignored as this property is reserved for JSON-LD processing.
</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block should be removed as a similar statement is removed from the JSON section in another PR and we shouldn't add them. If anything, this should just be a note that it might show up, but no normative language here as it's not a defined property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the Editors have discussed items such as this in the PR and will clean it up in another PR given how long this PR has been hanging out there. At this point, it's an improvement on what is in the specification and the Editors will do their best to align it with the rest of the specification while giving folks the opportunity to object/revise in a future PR.

</section>
</section>
<section>
<h3>Converting Data between Core Representations</h3>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the conversion section should be removed from here.

tag 98 for CBOR Object Signing and Encryption (COSE) [[RFC8152]]
</p>
<p>
When interoperating with a text-based representation such as JSON or JSON-LD it is helpful to utilize additional sematics to aid convertion. CBOR Tags numbers 21 to 23 indicate that a byte string value in DID document represented in CBOR might require a specific encoding to represent properly in JSON.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely should not be defining a direct conversion like this, or encouraging people to do this kind of thing. This is a semantically-driven specification and so the ADM is the only defined method of translation. Anything else (including treating JSON-LD as plain JSON) is liable to work based on luck.

@msporny
Copy link
Member

msporny commented Dec 1, 2020

Normative, multiple reviews, changes requested and made, Editors will align this section with the rest of the specification over the next several weeks, merging as a work in progress to avoid objections over the details.

@msporny msporny merged commit 11622f3 into w3c:master Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet