Context and property optimization/compression #6

msporny · 2020-03-31T17:23:34Z

I've run a few test on this locally over the years, resulting in some pretty great outcomes. I'll start with a few statements and work from there:

It is possible to use cryptographic hashes to represent URLs.
The Blake2 algorithm and Kangaroo12 algorithm support variable length outputs depending on the desired collision resistance.
The JSON-LD Context specifies the context to use when interpreting the semantics of a document, and JSON-LD Contexts are expressed as URLs, as are terms.
It's possible to use integers as CBOR keys and values.
It is possible to create a 16-bit lookup table that would store all well known JSON-LD contexts that are associated with standards

What this means is that we can:

In certain cases, we can compress all JSON-LD Contexts used down to a variable length cryptographic hash... that is, down to a few bytes, and use that as a "base URL" for all terms used in a CBOR-LD document.
In certain cases, we can compress all expanded terms and RDF Class URLs used in a document down to a few bytes using the same algorithm as in the previous step, but this time, utilizing fewer bytes because the use of the JSON-LD Context cryptographic hash gives us a global identification mechanism. That is, we can compress URLs to smaller than we would normally because we have a JSON-LD Context definition hash at the start of a CBOR payload.
We can tag these documents as "compressed CBOR-LD" documents.

If we do all of those things, in certain cases, we get:

single byte to sub-byte values for terms and classes in a CBOR-LD document
global uniqueness (read: excellent collision resistance) for all terms in a CBOR-LD document while not sacrificing storage size
An efficient, semantically meaningful normalization mechanism that depends on byte compares (similar to JCS, but w/o having to do tons of string comparisons) -- we could replace RDF Dataset Normalization in certain scenarios.
An efficient, semantically meaningful binary template format.

In short, we could achieve compression rates up to 75% for small documents.

msporny mentioned this issue Mar 31, 2020

Context-based optimization #4

Open

azaroth42 added this to Non TR Work in JSON-LD Management DEPRECATED Apr 3, 2020

Provide feedback