Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinventing the wheel? #3

Open
davek2 opened this issue May 15, 2016 · 2 comments
Open

Reinventing the wheel? #3

davek2 opened this issue May 15, 2016 · 2 comments

Comments

@davek2
Copy link

davek2 commented May 15, 2016

You cite as a CBOR deficiency:
2. Transmitting/storing lots of objects is expensive because keys are encoded for every object

Yet you are not transmitting objects on the wire, you are transmitting arrays.

[1100, [
     ["http://example.com", "Example Com"],
     ["http://example.org", "Example Org", "Example organization"]]]

The schema using a mature, well-known, very widely-deployed, standard language would be:

SearchResults ::= SEQUENCE {
    totalResults    INTEGER,
    results         SEQUENCE OF Page }

Page ::= SEQUENCE {
    url      UTF8STRING,
    title    UTF8STRING,
    snippet  UTF8STRING OPTIONAL }

Using JSON encoding rules on the example search results would give exactly the JSON text shown, and using CBOR encoding rules would give the equivalent CBOR message. Since tags don't appear in the serialization there's no point in including them in the schema except as comments:

Page ::= SEQUENCE {
    url      UTF8STRING,            -- 0
    title    UTF8STRING,            -- 1
    snippet  UTF8STRING OPTIONAL }  -- 2

... but if you did want tags transmitted in the serialization for some reason they could be generated automatically without cluttering up the schema.

Perhaps a standard schema language would do what you want, avoiding the need to invent a new one.

@tailhook
Copy link
Owner

Thank you for taking a look.

Well, I'm not sure what do you mean by "tags":

If you're talking about string names like "url", and "title", then:

  1. They are names of fields after unpacking
  2. You can always turn "debugging mode" and they will be always written in serialized format (and reader always supports them too, just not so efficiently)

If you're talking about numbers, then they are for backwards compatibility, i.e. if you decide to drop some field in page object, you will skip a number:

url @0
snippet @2

In this case it will be transmitted as:

["http://...", null, "xxx"]

(note null is more efficient in cbor than json)

Or if there are many skipped fields:

{0: "http://..", 125: "test"}

(note it's not valid JSON, because of integer keys, but valid CBOR)

At the end of the day has compatibility features very similar to protobuf.

Does it make sense? Should I document some of these better?

The schema using a mature, well-known, very widely-deployed, standard language would be:

By the way, what language are you talking about? From the top of my head, I don't remember any well-known, widely deployed language which is not xml.

@GregoryEAllen
Copy link

Looks like ASN.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants