Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update algorithm output to be the serialized canonical form #97

Merged
merged 4 commits into from
May 5, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 19 additions & 15 deletions spec/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -296,11 +296,10 @@ <h3>Terms defined by this specification</h3>
<dd>The abstract <a>RDF dataset</a> that is provided as input to
the algorithm.</dd>
<dt><dfn>normalized dataset</dfn></dt>
<dd>The immutable, abstract <a>RDF dataset</a> and a set of normalized
<a>blank node identifiers</a> that are produced as output by the algorithm.
A <a>normalized dataset</a> is a restriction on an <a>RDF dataset</a>
where all nodes are labeled, and <a>blank nodes</a> are labeled with canonical <a>blank node identifiers</a>
consistent with running this algorithm on a base <a>RDF dataset</a>.
<dd>A <a>normalized dataset</a> is the combination of an <a>RDF dataset</a>
and a <a>map</a> where [=map/keys=]
are <a>blank nodes</a> from the dataset
and [=map/values=] are the associated canonical <a>blank node identifiers</a>.
A concrete serialization of a <a>normalized dataset</a> MUST label
all <a>blank nodes</a> using these stable <a>blank node identifiers</a>.</dd>
<dt><dfn>identifier issuer</dfn></dt>
Expand Down Expand Up @@ -409,16 +408,19 @@ <h3>Terms defined by cited specifications</h3>
<h2>Canonicalization</h2>

<p>Canonicalization is the process of transforming an
<a>input dataset</a> to a <a>normalized dataset</a>. That
is, any two <a>input datasets</a> that contain the same
information, regardless of their arrangement, will be transformed into
identical <a>normalized dataset</a>. The problem requires directed
<a>input dataset</a> to its <a>serialized canonical form</a>.
That is, any two <a>input datasets</a> that contain the same information,
regardless of their arrangement,
will be transformed into the same <a>serialized canonical form</a>.
The problem requires directed
graphs to be deterministically ordered into sets of nodes and edges. This
is easy to do when all of the nodes have globally-unique identifiers, but
can be difficult to do when some of the nodes do not. Any nodes without
globally-unique identifiers must be issued deterministic identifiers.</p>

<p class="ednote">Strictly speaking, the normalized dataset must be serialized to be stable, as within a dataset, blank node identifiers have no meaning. This specification defines a <a>normalized dataset</a> to include stable identifiers for blank nodes, but practical uses of this will always generate a canonical serialization of such a dataset.</p>
<p class="note">
This specification defines a <a>normalized dataset</a> to include stable identifiers for blank nodes,
practical uses of which will always generate a canonical serialization of such a dataset.</p>

<p>In time, there may be more than one canonicalization algorithm and,
therefore, for identification purposes, this algorithm is named the
Expand Down Expand Up @@ -523,7 +525,6 @@ <h2>Blank Node Identifier Issuer State</h2>
<h2>Canonicalization Algorithm</h2>

<p class="ednote">At the time of writing, there are several open issues that will determine important details of the canonicalization algorithm.</p>
<div class="issue" data-number="4"></div>
<div class="issue" data-number="7"></div>
<div class="issue" data-number="8"></div>
<div class="issue" data-number="10"></div>
Expand Down Expand Up @@ -576,7 +577,8 @@ <h3>Overview</h3>
<a href="#issue-identifier" class="sectionRef"></a>.
If more than one node produces the same N-degree hash,
the order in which these nodes receive a canonical identifier does not matter.</li>
<li id="ca-hl.6"><strong>Finish</strong>. Return the normalized dataset.</li>
<li id="ca-hl.6"><strong>Finish</strong>.
Return the <a>serialized canonical form</a> of the <a>normalized dataset</a>.</li>
</ol>
</section>

Expand Down Expand Up @@ -1252,7 +1254,8 @@ <h3>Algorithm</h3>
</pre>
</details>
</li>
<li id="ca.7">Return the <a>normalized dataset</a>.</li>
<li id="ca.7">Return the <a>serialized canonical form</a>
of the <a>normalized dataset</a>.</li>
</ol>
</section>
</section>
Expand Down Expand Up @@ -2630,7 +2633,7 @@ <h2>Serialization</h2>
<p>This section describes the process of creating a serialized [[N-Quads]] representation
of a <a>normalized dataset</a>.</p>

<p>The <dfn class="lint-ignore">serialized canonical form</dfn> of a <a>normalized dataset</a>
<p>The <dfn>serialized canonical form</dfn> of a <a>normalized dataset</a>
is an N-Quads document [[N-QUADS]]
created by representing each <a>quad</a> from the <a>normalized dataset</a>
in <a>canonical n-quads form</a>,
Expand All @@ -2643,7 +2646,8 @@ <h2>Serialization</h2>

<p>When serializing quads in <a>canonical n-quads form</a>,
components which are <a>blank nodes</a> MUST be serialized using the
canonical label associated with each <a>blank node</a> in the <a>normalized dataset</a>.</p>
canonical label associated with each <a>blank node</a>
from the <a>map</a> component of the <a>normalized dataset</a>.</p>

<aside id="ex-ser-unique-hashes"
class="example"
Expand Down