Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ordered input dataset or "list of quads" and optional mapping from input indices to output indices #89

Closed
dlongley opened this issue Apr 3, 2023 · 21 comments · Fixed by #100
Labels
enhancement New feature or request ready for pr This issue is ready for a PR to be created. spec:enhancement

Comments

@dlongley
Copy link
Contributor

dlongley commented Apr 3, 2023

Starting with my comment here: #86 (comment)

Some discussion spawned around the need, as an optional output, a mapping of quad input indices to quad output indices for selective disclosure use cases.

@gkellogg made this comment:

Just be be clear on what we're talking about, the input dataset is unordered and no blank node labels are persistent or possibly even present.

Which implies that we might want to also take an ordered list of quads as an optional alternative input to the algorithm. Or perhaps we can describe the RDF abstract dataset as being optionally represented as such -- for the case where this mapping output is desirable. Notably, the presence (or lack thereof) of input blank node labels in this case is not relevant.

@dlongley
Copy link
Contributor Author

dlongley commented May 3, 2023

To elaborate on the problem a bit more here:

The goal is to support selective disclosure use cases with verifiable credentials.

The main scenario is this:

A holder of a verifiable credential must be able to reveal a subset of the quads from the verifiable credential to a verifier. The verifier must be able to reproduce the same bnode labels that were used when all of the quads from the verifiable credential were canonized and cryptographically signed by the issuer.

The problem here lies in the fact that canonizing a subset of a dataset can produce different bnode labels (for the same blank nodes) from canonizing the entire dataset.

The process the holder goes through to prepare to selectively disclose a verifiable credential to a verifier looks something like this:

  1. Canonize the VC's dataset to get canonical bnode labels.
  2. Optionally, modify those bnode labels in a pseudo-random way.
  3. Pick out just the quads they want to disclose.
  4. Knowing the verifier will only see those quads, canonize only the selected quads to get the bnode labels the verifier will see.
  5. Somehow produce a map of the bnode labels the verifier will see to the labels used after step 2.
  6. Share that mapping with the verifier when selectively disclosing the verifiable credential.

Step 5 is the step we need to enable for the holder.

Once we've enabled step 5 for the holder, then the verifier can do this:

  1. Canonize the selectively disclosed VC's dataset to get bnode labels that need to be mapped.
  2. Use the mapping from the holder to transform the bnode labels.
  3. Cryptographically verify the transformed data.

@yamdan
Copy link
Contributor

yamdan commented May 8, 2023

@dlongley
I apologize for my delayed response. I agree with you that for selective disclosure use cases, we require a form of mapping from the input dataset to a normalized dataset.

While one potential approach is (1) mapping from input indices to output indices, as you suggested, I believe a less intrusive alternative is (2) mapping from input blank node identifiers to output blank node identifiers. This mapping is already represented by the canonical issuer in the current specification draft.

By using the canonical issuer for this mapping, we can avoid the need for ordered versions of both the input and normalized datasets. The only addition required to the current specification is to include the canonical issuer as an additional intermediate output of the canonicalization algorithm.

As we addressed in #4 (comment), canonical issuers might not be uniquely determined in certain cases. Note that this issue also arises in the case of indices mapping (see the example below). Despite this, it does not pose a problem for selective disclosure use case. We do not require deterministic and unique canonical issuers (or indices mapping) for the selective disclosure you mentioned, as all possible issuers yield the same serialized canonical form. As a result, we must explicitly indicate that the canonical issuer serves as an intermediate output for selective disclosure (and other use cases, if any) and should not be considered canonical output due to its non-deterministic nature.

Example

Assume that we have the following input dataset:

_:e0 <http://example.org/vocab#next> _:e1 .
_:e1 <http://example.org/vocab#next> _:e0 .

(This N-Quads representation should be interpreted as an ordered quads in the case of (1). We can interpret it as unordered RDF dataset for the case of (2).)

Then the output of canonicalization algorithm (serialized canonical form of the normalized dataset) should like this:

_:c14n0 <http://example.org/vocab#next> _:c14n1 .
_:c14n1 <http://example.org/vocab#next> _:c14n0 .

As for (1), we have two possible indices mappings:
(1-1) { "0": "0", "1": "1" }
(1-2) { "0": "1", "1": "0" }

As for (2), we also have two possible canonical issuers:
(2-1) { "e0": "c14n0", "e1": "c14n1" }
(2-2) { "e0": "c14n1", "e1": "c14n0" }

We can ensure that all the above mappings result in the same serialized output (otherwise, there has a flaw in the existing analysis result). Therefore the choice of these mappings does not matter in the selective disclosure usage mentioned above (#89 (comment)).

@gkellogg
Copy link
Member

gkellogg commented May 8, 2023

The mapping of blank nodes to stable identifiers is now part of the definition of a normalized dataset, and this is effectively the same as the map maintained by the canonical issuer.

One thing we could provision for is, if the input is a normalized dataset, initialize the canonical issuer from the map component of the dataset.

@dlongley
Copy link
Contributor Author

dlongley commented May 10, 2023

The lower level steps here are:

  1. We have an original total dataset, OD.
  2. We canonize that to create a canonized total dataset, COD.
  3. We select N quads from COD to produce a selective disclosure dataset, SD.
  4. We canonize SD to produce CSD including a mapping of the bnode labels from SD to the new bnode labels in CSD.

Note: A reversal of the mapping from step 4 will be sent along with the selectively disclosed dataset (the latter of which may have it bnode labels changed at any point but can then have them transformed back via recanonizing and applying the map).

The above step 4 is what we must enable. We must allow another spec to describe the above process where it references our spec here in steps 2 and 4.

@peacekeeper
Copy link
Contributor

peacekeeper commented May 10, 2023

On the 10 May 2023 WG call, @dlongley, @gkellogg and @yamdan expressed their interest to set up a dedicated meeting to resolve this issue, perhaps via Doodle on the WG mailing list, so that other interested members can also participate.

(UPDATE: see message here: https://lists.w3.org/Archives/Public/public-rch-wg/2023May/0011.html)

@dlongley
Copy link
Contributor Author

dlongley commented May 10, 2023

@gkellogg,

One thing we could provision for is, if the input is a normalized dataset, initialize the canonical issuer from the map component of the dataset.

Hmm, this is where my confusion sets in. What does "initialize the canonical issuer" mean, precisely? We don't want to use the values from the map component of the normalized dataset, otherwise we will not produce the new blank node labels (as just the original ones would be output again). We need the new canonical labels as well as the original ones.

While it's true that the holder (the party generating the selectively disclosed dataset) could run the algorithm as you suggest to get output with the original labels, this information would not produce the needed mapping to hand to the verifier (the party receiving the selectively disclosed dataset).

What's key is that the verifier will not have access to the normalized dataset and cannot run the algorithm in this way. The holder also can't "cull" from the normalized dataset and then send it along because it is abstract and a concrete serialization is required for transport (where the bnode labels could change and invalidate the abstract mapping).

So we have a situation with asymmetrical knowledge, where the party that knows the total dataset must produce a transportable mapping that can be applied to the selectively disclosed dataset, post-canonicalization.

The holder thus needs to produce both the original canonical labels and the new labels that would be produced from canonizing just the selectively disclosed quads. So those new labels must be known -- as well as a way to map them back. So, we can't just output the same original labels by passing in the normalized dataset again on the verifier side -- because that's not a thing the verifier has (nor can have).

@gkellogg
Copy link
Member

Okay, I've obviously remained confused about the steps involved, but as I understand it, the desire is to be able to take as an additional input, a map of blank nodes in the input dataset to canonical identifiers previously established (which could be a normalized dataset, either retrieved from a previous run, or constructed). The problem is that, both the map in the normalized dataset, and the canonical issuer used within the algorithm, take specific, not abstract, blank nodes from the dataset.

If an input were constructed as a normalized dataset which includes that original mapping from blank nodes to canonical identifiers, but where the quads in the dataset represented a subset of the original input dataset, then we could maintain the mapping created when run against the original dataset, and correlate it with a mapping run against just the subset of quads and I think you would get what you want, but it is important the the blank node objects, used as keys in both maps, are the same thing.

One way this might be done would be as follows:

  1. Canonicalize a dataset extracting the normalized dataset from the algorithm prior to the serialization as N-Quads.
  2. Update the dataset (original or normalized) in such a way that the mapping from blank nodes to canonical identifiers is maintained, e.g. from a SPARQL UPDATE or LD-Patch. This will result in a dataset which is a subset of the original along with a map from blank nodes in that dataset to their canonical identifiers, which may include map entries that no longer exist within that modified dataset.
  3. Run the canonicalization algorithm again on the updated dataset, retrieving a new normalized dataset containing a new map from blank nodes to canonical identifiers.
  4. The keys from the map in the second normalized dataset MUST be a subset of those from the first normalized dataset, while the associated canonical identifiers may differ. This allows blank nodes in the original dataset to be correlated with blank nodes in the reduced dataset.

Note that the key is that the objects representing the blank nodes remain the same across different runs of the algorithm.

Note that the normalized dataset can be considered to be a combination of the original dataset and a map from blank nodes in that dataset to calculated canonical identifiers.

@dlongley
Copy link
Contributor Author

dlongley commented May 10, 2023

@gkellogg,

Ok, my read is that you're proposing this:

Holder does:

  1. Canonize a dataset, D, to get C and a label mapping L1 (the combination is a "normalized dataset").
  2. Remove quads from C, maintaining the label mapping L1.
  3. Canonize C and get a new label mapping L2.
  4. Reconcile L1 and L2 to produce a mapping of L2 labels to L1 labels, L3.
  5. Transmit C and L3 to the verifier.

The above requires the same abstract blank nodes to be used throughout.

Verifier does:

  1. Canonize C to get C2 and a label mapping, L4.
  2. Replace label entries in L4 using L3.
  3. Serialize C2 and (updated) L4 to canonical N-Quads.

@dlongley
Copy link
Contributor Author

dlongley commented May 10, 2023

@gkellogg,

If the above is what you meant, it has the unfortunate requirement that the same abstract blank nodes must be used throughout the process for the holder. This adds complexity to the quad selection mechanism.

Today, quads can be selected by:

  1. Temporarily skolemizing bnodes and applying a JSON-LD frame to filter the data to what is to be selectively disclosed.
  2. Converting the selectively disclosed data back to N-Quads.
  3. Deskolemizing the bnodes to produce the selected N-Quads.

An additional step would be required, I think, which would involve comparing the selected N-Quads from step 3 with the original dataset -- and producing a set of matches to re-run through the canonicalization algorithm.

I wonder if we can avoid that by exposing the canonical identifier issuer instead.

@dlongley
Copy link
Contributor Author

@gkellogg,

Suppose that we expose the canonical issuer identifier for external use.

This seems like it would practically solve the problem but might run afoul of some abstract RDF rules:

Holder does:

  1. Canonize a dataset, D, to get C (the output label mapping is irrevelant).
  2. Remove quads from C using any method that preserves the labels in C.
  3. Canonize C to get C2 and the exposed canonical issuer identifier, CI.
  4. Reverse the label mappings in CI to produce a mapping of C2 labels to C labels, L1.
  5. Transmit C and L1 to the verifier.

Verifier does:

  1. Canonize C to get C2 and a label mapping, L2.
  2. Replace label entries in L2 using L1.
  3. Serialize C2 and (updated) L2 to canonical N-Quads.

@gkellogg
Copy link
Member

gkellogg commented May 10, 2023

Yes, it struck me that the map of blank node to canonical identifiers is effectively just the canonical issuer; the definition of normalized dataset could be updated to describe it as being the combination of the input dataset and the canonical issuer. Or, perhaps the notion of a normalized dataset isn't important, and it is just the canonical issuer state at the end of the algorithm. But, as this relates to the specific blank node objects in the input dataset, this only gets part way there.

As we can't rely on JSON-LD Framing as being the sole mechanism for removing quads, and we can't rely on systems preserving the blank node identifiers during transforms along the way, it seems that skolemization is the appropriate route.

  1. Canonize D, to get the canonized N-Quads representation C.
  2. Create C1 from C by systematically replace each blank node with a skolem IRI using that blank node identifier based on a standardized form IRI template.
  3. Perform an operation on C1 which results in a strict subset of quads C2, re-serialized in the same partial order as C1.
  4. A new algorithm iterates over the quads in C2, transforming solum IRIs to blank nodes, and issuing new identifiers along the way. the issued identifiers map will now map each canonical identifier from C1 to the new canonical identifier issued for C2.
  5. Serialize C2 to N-Quads to C3, maintaining the issued identifiers map.

This is somewhat different than re-canonicalizing C3, although strictly linear, and avoids potential tooling problems in maintaining blank node identifiers during external operations.

Example:

Input dataset D:

# original dataset
_:b0 <http://schema.org/address> _:b1 .
_:b0 <http://schema.org/familyName> "Jarrett" .
_:b0 <http://schema.org/gender> "Female" .  # gender === Female
_:b0 <http://schema.org/givenName> "Ali" .
_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .
_:b1 <http://schema.org/addressCountry> "United States" .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/PostalAddress> .

Canonicalized result C:

# normalized dataset
_:c14n0 <http://schema.org/addressCountry> "United States" .
_:c14n0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/PostalAddress> .
_:c14n1 <http://schema.org/address> _:c14n0 .
_:c14n1 <http://schema.org/familyName> "Jarrett" .
_:c14n1 <http://schema.org/gender> "Female" .  # gender === Female
_:c14n1 <http://schema.org/givenName> "Ali" .
_:c14n1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .

Skolemized result C1:

<https://w3c.org/ns/rch/skolem#c14n0> <http://schema.org/addressCountry> "United States" .
<https://w3c.org/ns/rch/skolem#c14n0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/PostalAddress> .
<https://w3c.org/ns/rch/skolem#c14n1> <http://schema.org/address> <https://w3c.org/ns/rch/skolem#c14n0> .
<https://w3c.org/ns/rch/skolem#c14n1> <http://schema.org/familyName> "Jarrett" .
<https://w3c.org/ns/rch/skolem#c14n1> <http://schema.org/gender> "Female" .  # gender === Female
<https://w3c.org/ns/rch/skolem#c14n1> <http://schema.org/givenName> "Ali" .
<https://w3c.org/ns/rch/skolem#c14n1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .

Subset dataset C2 (removes address):

<https://w3c.org/ns/rch/skolem#c14n1> <http://schema.org/familyName> "Jarrett" .
<https://w3c.org/ns/rch/skolem#c14n1> <http://schema.org/gender> "Female" .  # gender === Female
<https://w3c.org/ns/rch/skolem#c14n1> <http://schema.org/givenName> "Ali" .
<https://w3c.org/ns/rch/skolem#c14n1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .

Regenerate skolem identifiers and resulting map (C3):

_:c14n0 <http://schema.org/familyName> "Jarrett" .
_:c14n0 <http://schema.org/gender> "Female" .  # gender === Female
_:c14n0 <http://schema.org/givenName> "Ali" .
_:c14n0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> .

Map of identifiers in C3 to identifiers in C

  • _:c14n0 => _:c14n1

@dlongley
Copy link
Contributor Author

@gkellogg,

For step 4:

  1. A new algorithm iterates over the quads in C2, transforming solum IRIs to blank nodes, and issuing new identifiers along the way. the issued identifiers map will now map each canonical identifier from C1 to the new canonical identifier issued for C2.

The first sentence here tripped me up. Can't this just be broken into these steps:

  1. Deskolumize IRIs in C2 to blank nodes.
  2. Re-canonize C2 and get access to the canonical issuer identifier used, which will map canonical identifiers from C1 to those issued for C2.

If so, it looks like it matches the outcome of what I said in step 3 in my comment above. All that would remain after this would be to build a reverse mapping from the canonical issuer identifier such that the canonical identifiers in C2 could be later mapped back to C1 by the verifier.

If you agree, then I think we're on the same page and we just need to see if it passes muster with others, and if so, add some spec text that makes it clear that implementations may expose internal state, such as the canonical issuer identifier, to enable other specs to reference / make use of it in their own custom algorithms.

@gkellogg
Copy link
Member

gkellogg commented May 11, 2023

The problem is that, formally, blank nodes in datasets do not have stable identifiers, even though many/most implementations may retain those from the serialization. This is why the step uses skolem IRIs, which are stable.

@yamdan
Copy link
Contributor

yamdan commented May 11, 2023

Thank you @gkellogg and @dlongley; according to your discussion, I can now correct my idea of using canonical issuer. I describe it below, which is somewhat redundant but might be helpful to catch the picture (at least for me).

The following example contains several elements that are not in the scope of this WG. What I think we have to define here additionally is an extended canonicalization algorithm (= "new algorithm," as @gkellogg said) like canonicalize(D, withMap=true) -> (C, M). Here D is an input dataset that MUST be skolemized (to avoid unstable blank node identifiers). D has to be deskolemized before the canonicalization (as mentioned by @dlongley in the above comment). C is the serialized canonicalization output; M is an issued identifier map produced in this process. We should assert that the input dataset D does not include skolem IDs (e.g., urn:bnid:_:*); otherwise deskolemizing will be failed.

Holder

  1. have input dataset D
# dataset D
_:e0 <http://example.com/#p0> _:e1 .
_:e1 <http://example.com/#p1> _:e2 .
_:e2 <http://example.com/#p2> "Foo" .
  1. canonicalize(D) -> C: canonicalize input dataset D to get serialized canonical form C of normalized dataset
# n-quads C
_:c14n0 <http://example.com/#p2> "Foo" .
_:c14n1 <http://example.com/#p0> _:c14n2 .
_:c14n2 <http://example.com/#p1> _:c14n0 .
  1. skolemize C to create C1
# n-quads C1
<urn:bnid:_:c14n0> <http://example.com/#p2> "Foo" .
<urn:bnid:_:c14n1> <http://example.com/#p0> <urn:bnid:_:c14n2> .
<urn:bnid:_:c14n2> <http://example.com/#p1> <urn:bnid:_:c14n0> .
  1. selectively disclose a part of C1, which results in a partial dataset C2
# dataset C2
<urn:bnid:_:c14n0> <http://example.com/#p2> "Foo" .
<urn:bnid:_:c14n2> <http://example.com/#p1> <urn:bnid:_:c14n0> .
  1. canonicalize(C2, withMap=true) -> (C3, M): deskolemize C2 and run canonicalization algorithm to it to get serialized canonical form C3 with issued identifiers map Mb
# n-quads C3
_:c14n0 <http://example.com/#p1> _:c14n1 .
_:c14n1 <http://example.com/#p2> "Foo" .
# map Mb  (C->C3)
- "c14n0" => "c14n1"
- "c14n2" => "c14n0"
  1. invert Mb to get Mb'
# map Mb'  (C3->C)
- "c14n0" => "c14n2"
- "c14n1" => "c14n0"
  1. apply the map Mb' to the blank node identifiers in C3 to get C4
# n-quads C4
_:c14n2 <http://example.com/#p1> _:c14n0 .
_:c14n0 <http://example.com/#p2> "Foo" .
  1. compare C4 and C to obtain indices map Mi
# map Mi  (C4->C)
- "0" => "2"
- "1" => "0"

(this can also be represented as an array [2, 0])

  1. generate proof and send (C3, Mb', Mi, L, proof) to the verifier, where L is the number of quads in C (to be used to resume the original layout by the verifier)

Verifier

  1. get (C3', Mb', Mi, L, proof), where C3' is a dataset isomorphic to C3 but blank node identifiers may have changed at some point before the verifier receives it.

  2. canonicalize(C3') -> C3: we can resume Holder's C3 by canonicalizing it

# n-quads C3
_:c14n0 <http://example.com/#p1> _:c14n1 .
_:c14n1 <http://example.com/#p2> "Foo" .
  1. apply the map Mb' to the blank node identifiers in C3 to get C4
# n-quads C4
_:c14n2 <http://example.com/#p1> _:c14n0 .
_:c14n0 <http://example.com/#p2> "Foo" .
  1. resume the original layout same as C (when it was signed) using Mi and L
# n-quads C'
_:c14n0 <http://example.com/#p2> "Foo" .
###     ###                      ###     .
_:c14n2 <http://example.com/#p1> _:c14n0 .

(here verifier resumes the (original) first and third quads; and get to know that the second quad have been undisclosed by the holder)

  1. verify them with proof

@dlongley
Copy link
Contributor Author

dlongley commented May 11, 2023

@gkellogg,

The problem is that, formally, blank nodes in datasets do not have stable identifiers, even though many/most implementations may retain those from the serialization. This is why the step uses skolem IRIs, which are stable.

I think we should find a way to address that without creating a new algorithm in our spec here. It seems that, for the implementations you describe, the process can work like this:

  1. Deskolumize C2 such that the output is an abstract dataset, C3, and a mapping of bnodes to deskolemized labels, M1.
  2. Pass C3 into the canonicalization algorithm and a normalized dataset with a new mapping of bnodes to labels, M2, will be obtained.
  3. Reconcile M1 and M2 to produce a mapping of labels from M2 to labels from M1, M3, which is given to the verifier.

Then, the verifier can do this:

  1. Canonize the selective disclosure dataset to get a normalized dataset (dataset C with mapping of bnodes to labels, M4).
  2. Update M4 by replacing the labels using M3.
  3. Serialize C + updated M4 to canonical N-Quads.

So it seems that exposing the canonical identifier issuer and the normalized dataset with the bnode => label mapping can enable either style of implementation to perform the bnode mapping task. It's just that implementations that do not keep blank nodes stable may have an extra step.

@dlongley
Copy link
Contributor Author

@yamdan,

I just saw your comment now -- I'll take a look when I can to see if it matches what I just put in my comment, it looks similar at a high level.

@gkellogg
Copy link
Member

gkellogg commented May 11, 2023

@yamdan said:

  1. canonicalize(C2, withMap=true) -> (C3, M): deskolemize C2 and run canonicalization algorithm to it to get serialized canonical form C3 with issued identifiers map Mb

The problem with this step is that, after deskolemizing C2, you loose any association with the quads from the original dataset and the subset, which is why I kept the skolemized versions until the very end. This might require updating the notion of the issuer to map nodes (either IRIs or Blank Nodes) to canonical labels, and simply invoke it in the primary algorithms on bank nodes, and in the de-skolemizing version to IRIs matching the skolem pattern.

In hindsight, one way to have cast the algorithm as immediately transforming blank nodes to skolem IRIs and using that to create new skolem IRIs with canonical labels, and turn those back into blank nodes with their canonical labels when serializing to N-Quads.

I'll respond to @dlongley later, as I'm tied up for most of the rest of the day.

@yamdan
Copy link
Contributor

yamdan commented May 12, 2023

IIRC, we introduced Skolemization to fix the blank node identifiers so that they do not change during the selective disclosure process, outside the canonicalization.
I am wondering if we can assume some stability inside the canonicalization process (including deskolemization); in fact, we already implicitly assumed that in 4.4.3 (Canonicalization Algorithm), the blank node identifiers are unchanged between step 2 and step 6.

@gkellogg
Copy link
Member

Summary of todays discussion

Objective is to allow a canonicalized dataset to be subsetted and re-canonized such that the canonical identifiers from the original dataset can be correlated with the canonical identifiers from the subsetted dataset.

The problem to overcome is that, presently, the input to the C14N algorithm is a dataset, which abstractly contains no blank node identifiers, although many/most implementations to retain such identifiers.

The solution that emerged from today's discussion was to allow as an input a concrete N-Quads serialization of a dataset, and use this to seed the issued identifiers map in the canonical issuer. Language needs updating, but this effectively maps blank nodes in the input dataset (via their identifiers) to canonical identifiers. Doing so requires ensuring that each blank node in the input dataset has an identifier. This can be created when parsing an N-Quads document as an input, and could be maintained in something like the normalized dataset. If the input was a dataset without identifiers (or where identifiers are only partially assigned), the algorithm would insure that each blank node had a unique identifier.

It is probably worth renaming the normalized dataset to something like stabilized dataset, and replace the map component with an identifier issuer, which effectively records the same thing.

A system trying to use the C14N algorithm for something such as selective disclosure would need to ensure that the blank node identifiers resulting from the first canonicalization, and used when called with a subset dataset, possibly by skolemizing the blank nodes making use of the canonical identifiers, so that they can be re-created when de-skolemizing the subset dataset.

@peacekeeper
Copy link
Contributor

Great, thanks for the summary! Is this ready for PR, or still needs some more discussion?

@gkellogg
Copy link
Member

No, ready for PR, I believe. Still some things to be worked out in the process, but the goal seems clear.

@gkellogg gkellogg added enhancement New feature or request spec:enhancement ready for pr This issue is ready for a PR to be created. and removed needs discussion labels May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ready for pr This issue is ready for a PR to be created. spec:enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants