Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging points for testing #42

Closed
iherman opened this issue Nov 24, 2022 · 25 comments · Fixed by #63
Closed

Logging points for testing #42

iherman opened this issue Nov 24, 2022 · 25 comments · Fixed by #63

Comments

@iherman
Copy link
Member

iherman commented Nov 24, 2022

This is not a fully baked issue, but more a matter for discussion: testing implementations will be fairly difficult. A simple comparison of the end result would not help too much. I wonder whether it is possible to build into the algorithms some "logging points": implementations would be welcome to log some specific values at those points in the code, comparing with log result of other implementations, thereby helping to locate possible bugs more quickly. The combination of this with the test files themselves may be helpful.

Is this a viable thing to do?

@ericprud
Copy link
Member

ericprud commented Nov 24, 2022

I usually set a break point at createJWS to peek at the verifyData and signer parms, so that might be a logical logging point. How much should developers have harder lives in order to reduce the ops for Eve to exploit a logging API to grab a signer?

@iherman
Copy link
Member Author

iherman commented Dec 13, 2022

I would like to continue this discussion, @dlongley @gkellogg. I have now run my implementation through all the tests and (of course) there are a few tests on which my implementation fails. But it is extremely difficult to locate where the bug might be, simply looking at the expected results (consider, for example, test #27). We have to find a way to define some logging points where essential values are logged (tables, hash values, etc). A bit like the example we already have in the spec. By comparing those points we can locate whether an implementation is buggy or, God Forbid!, the spec is buggy...

@gkellogg
Copy link
Member

Maybe taking some lessons from the RDFa processor graph, there could be a mode where a named graph is added to the normalization graph, which contains an RDF representation of the processing work flow. A JSON-LD frame might render this in a way which provides a relatively intuitive structure. Perhaps, it could even be used to create a form of HTML processing example.

@dlongley
Copy link
Contributor

dlongley commented Dec 13, 2022

I agree that we should have some log points and "example vectors" for each of those for a more complex example case. If an implementation works and produces the same logging outputs for a sufficiently complex use case (the so-called "evil tests" come to mind from the test suite), then I would expect it to pass the test suite generally.

We could mention some kind of logging hook to print out the input parameters and the resulting hash from each of the hash algorithms as a starting point. If we need additional logging, I would think it would be in logging paths and potentially identifier issuers prior to hashing. IMO, the trickiest parts to get right are the copying and appropriate use of identifier issuers across permutations.

I almost rewrote the algorithm a few times to use less recursion to simplify this (replaced with loops instead and without changing the output) but never found the time to do it.

Something we would want to make sure we avoid is making it a challenge (or non-conformant) for someone to implement the algorithm in a different way but that still produces the same output -- and what is logged when could have some kind of impact on that. This shouldn't be a major concern, just making a note of it so we can add some language to assuage the concern as needed.

These logging points could even be "less formal" -- and just be a few points in the spec we give names so that we can link to the "example vectors".

@iherman
Copy link
Member Author

iherman commented Dec 14, 2022

We could mention some kind of logging hook to print out the input parameters and the resulting hash from each of the hash algorithms as a starting point.

+1. A log right after entry and right before return

If we need additional logging, I would think it would be in logging paths and potentially identifier issuers prior to hashing. IMO, the trickiest parts to get right are the copying and appropriate use of identifier issuers across permutations.

Yes. That permutation part as well as the recursion is the most complex part. Maybe a log of the recursion variables right after "entering" a new cycle should be added, by also showing the values of the key variables (path, etc).

Something we would want to make sure we avoid is making it a challenge (or non-conformant) for someone to implement the algorithm in a different way but that still produces the same output -- and what is logged when could have some kind of impact on that. This shouldn't be a major concern, just making a note of it so we can add some language to assuage the concern as needed.

These logging points could even be "less formal" -- and just be a few points in the spec we give names so that we can link to the "example vectors".

I think making it a little bit more explicit in the algorithm would be helpful. Something along the lines of the current explanations, ie, entries that make it clear that those steps are not part of the formal algorithm specification. I trust @gkellogg would find a way to present it.

@iherman
Copy link
Member Author

iherman commented Dec 14, 2022

Maybe taking some lessons from the RDFa processor graph, there could be a mode where a named graph is added to the normalization graph, which contains an RDF representation of the processing work flow. A JSON-LD frame might render this in a way which provides a relatively intuitive structure. Perhaps, it could even be used to create a form of HTML processing example.

I am a bit worried of spending too much time on defining a separate set of terms, representation, JSON-LD frame, etc. A log consisting of the key variables, a bit like what a debug system does, should be enough.

@iherman
Copy link
Member Author

iherman commented Dec 14, 2022

I spent some time today to systematically add some logging points; I list them below to see if they are fine as a first shoot for discussion. For obvious reasons, I added much more log points to the Hash N-Degree Quads part, the rest of the algorithm is, after all, really simple.

  • Issue Identifier
    • I did not really add a logging point, rather I chose to display the content of the issue identifier data as part of the logging points (when applicable). The debug issue to consider is that we need some ways of identifying a particular issue identifier, but that may be implementation dependent. (E.g., I use a class to implement the feature, and I add a unique identifier to each class instance).
  • Canonicalization algorithm log points:
    • Right after step 2, showing the value of the blank node to quads map
    • At Step 4.2, showing the value of identifier and its canonical equivalent
    • After the loop in 5.2 showing the value of hash and the return value of 5.4.2. (This may be superfluous.)
    • Before leaving the function, showing the canonical IS issuer
  • Hash 1st degree Quads log points:
    • When entering the function, showing the value of identifier
    • When leaving the function, showing the value of identifier, nquads and hash
  • Hash Related Blank Node log points:
    • When entering the function, showing the value of related and quad (in nquad syntax)
    • When leaving the function, showing the value of input and hash
  • Hash N-Degree Quads log points:
    • When entering the function, showing identifier and issuer
    • Right after step (3) showing the value of Hn
    • When entering the loop in (5) showing the value of hash and data to hash
    • When entering the permutation loop in (5.4) showing the value of the permutation p
    • When entering the loop in 5.4.4 for the values in p, showing the value of related and path
    • After the loop in 5.4.4 but before 5.4.5, showing the value of recursion list and path
    • After the recursion within the loop between 5.4.5.4 and 5.4.5.5, showing path and the issuer copy
    • After step 5.5, showing chosen path and data to hash
    • When leaving the function, showing the returned hash and the returned issuer

@gkellogg
Copy link
Member

Maybe taking some lessons from the RDFa processor graph, there could be a mode where a named graph is added to the normalization graph, which contains an RDF representation of the processing work flow. A JSON-LD frame might render this in a way which provides a relatively intuitive structure. Perhaps, it could even be used to create a form of HTML processing example.

I am a bit worried of spending too much time on defining a separate set of terms, representation, JSON-LD frame, etc. A log consisting of the key variables, a bit like what a debug system does, should be enough.

Presumably, this logging is non-normative, but to be useful, the output should be fairly consistent between implementations and structured. YAML might be a good format, or at least an embedded micro-syntax. Free-form isn't as useful for comparing implementations.

Creating a frame, or even defining terms is probably something left to infrastructure, although a defined context along the lines of what we do for tests would be easy enough.

@iherman
Copy link
Member Author

iherman commented Dec 15, 2022

Presumably, this logging is non-normative, but to be useful, the output should be fairly consistent between implementations and structured. YAML might be a good format, or at least an embedded micro-syntax. Free-form isn't as useful for comparing implementations.

On long term, sure. On short term, it would already be great if we had this in some other implementations (yours? :-) already; I am currently stuck on some tests that I do not really have clear means of testing...

@gkellogg
Copy link
Member

My implementation does have many of these log points, if you set the logging level to debug (or use the option on the distilller).

@gkellogg
Copy link
Member

Just for clarity, this is the debug output from my implementation for the shared hashes example:

[rdf-normalize] script/run examples/shared-hashes.ttl --debug
DEBUG ca: 1deg: _:e0
DEBUG  1deg: input: <http://example.com/#p> <http://example.com/#q> _:e0 . _:e0 <http://example.com/#p> _:e2 .
DEBUG  1deg: nquads: <http://example.com/#p> <http://example.com/#q> _:a . _:a <http://example.com/#p> _:z .
DEBUG  1deg: hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
DEBUG ca: 1deg: _:e1
DEBUG  1deg: input: <http://example.com/#p> <http://example.com/#q> _:e1 . _:e1 <http://example.com/#p> _:e3 .
DEBUG  1deg: nquads: <http://example.com/#p> <http://example.com/#q> _:a . _:a <http://example.com/#p> _:z .
DEBUG  1deg: hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
DEBUG ca: 1deg: _:e2
DEBUG  1deg: input: _:e0 <http://example.com/#p> _:e2 . _:e2 <http://example.com/#r> _:e3 .
DEBUG  1deg: nquads: _:z <http://example.com/#p> _:a . _:a <http://example.com/#r> _:z .
DEBUG  1deg: hash: 15973d39de079913dac841ac4fa8c4781c0febfba5e83e5c6e250869587f8659
DEBUG ca: 1deg: _:e3
DEBUG  1deg: input: _:e1 <http://example.com/#p> _:e3 . _:e2 <http://example.com/#r> _:e3 .
DEBUG  1deg: nquads: _:z <http://example.com/#p> _:a . _:z <http://example.com/#r> _:a .
DEBUG  1deg: hash: 7e790a99273eed1dc57e43205d37ce232252c85b26ca4a6ff74ff3b5aea7bccd
DEBUG ca: single node: node: _:e2, hash: 15973d39de079913dac841ac4fa8c4781c0febfba5e83e5c6e250869587f8659, id: _:c14n0
DEBUG ca: single node: node: _:e3, hash: 7e790a99273eed1dc57e43205d37ce232252c85b26ca4a6ff74ff3b5aea7bccd, id: _:c14n1
DEBUG ca: multiple nodes: node: _:e0,_:e1, hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
DEBUG  ndeg: canon issuer: Issuer: _:e2: _:c14n0, _:e3: _:c14n1
DEBUG  ndeg: identifier: _:e0
DEBUG  ndeg: issuer: Issuer: _:e0: _:b0
DEBUG  ndeg: quads: <http://example.com/#p> <http://example.com/#q> _:e0 . _:e0 <http://example.com/#p> _:e2 .
DEBUG    hrel: related: _:e2, position: o
DEBUG    hrel: predicate: <http://example.com/#p>, position: o
DEBUG    hrel: input: "o<http://example.com/#p>_:c14n0", hash: 29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa
DEBUG  ndeg: hn: 29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa: ["_:e2"]
DEBUG   ndeg: perm: _:e2
DEBUG   ndeg: hash: 29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa, path: _:c14n0, recursion: []
DEBUG  ndeg: datatohash: "29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa_:c14n0", hash: fbc300de5afafd97a4b9ee1e72b57754dcdcb7ebb724789ac6a94a5b82a48d30
DEBUG  ndeg: canon issuer: Issuer: _:e2: _:c14n0, _:e3: _:c14n1
DEBUG  ndeg: identifier: _:e1
DEBUG  ndeg: issuer: Issuer: _:e1: _:b0
DEBUG  ndeg: quads: <http://example.com/#p> <http://example.com/#q> _:e1 . _:e1 <http://example.com/#p> _:e3 .
DEBUG    hrel: related: _:e3, position: o
DEBUG    hrel: predicate: <http://example.com/#p>, position: o
DEBUG    hrel: input: "o<http://example.com/#p>_:c14n1", hash: b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216
DEBUG  ndeg: hn: b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216: ["_:e3"]
DEBUG   ndeg: perm: _:e3
DEBUG   ndeg: hash: b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216, path: _:c14n1, recursion: []
DEBUG  ndeg: datatohash: "b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216_:c14n1", hash: 2c0b377baf86f6c18fed4b0df6741290066e73c932861749b172d1e5560f5045
DEBUG ca: hash_path_list: [["fbc300de5afafd97a4b9ee1e72b57754dcdcb7ebb724789ac6a94a5b82a48d30", Issuer: _:e0: _:b0], ["2c0b377baf86f6c18fed4b0df6741290066e73c932861749b172d1e5560f5045", Issuer: _:e1: _:b0]]
DEBUG -->: node: _:e1, id: _:c14n2
DEBUG -->: node: _:e0, id: _:c14n3
<http://example.com/#p> <http://example.com/#q> _:c14n2 .
<http://example.com/#p> <http://example.com/#q> _:c14n3 .
_:c14n0 <http://example.com/#r> _:c14n1 .
_:c14n2 <http://example.com/#p> _:c14n1 .
_:c14n3 <http://example.com/#p> _:c14n0 .

@gkellogg
Copy link
Member

gkellogg commented Dec 20, 2022

The following is a target YAML output I'm working on for the shared hashes example:

---
ca:
  ca2:
    bn_to_quads:
      e0: ["<http://example.com/#p> <http://example.com/#q> _:e0 .", "_:e0 <http://example.com/#p> _:e2 ."]
      e1: ["<http://example.com/#p> <http://example.com/#q> _:e1 .", "_:e1 <http://example.com/#p> _:e3 ."]
      e2: ["_:e0 <http://example.com/#p> _:e2 .", "_:e2 <http://example.com/#r> _:e3 ."]
      e3: ["_:e1 <http://example.com/#p> _:e3 .", "_:e2 <http://example.com/#r> _:e3 ."]
  ca3:
  - identifier: e0
    nquads: ["<http://example.com/#p> <http://example.com/#q> _:a .", "_:a <http://example.com/#p> _:z ."]
    hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
  - identifier: e1
    nquads: ["<http://example.com/#p> <http://example.com/#q> _:a .", "_:a <http://example.com/#p> _:z ."]
    hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
  - identifier: e2
    nquads: ["_:z <http://example.com/#p> _:a .", "_:a <http://example.com/#r> _:z ."]
    hash: 15973d39de079913dac841ac4fa8c4781c0febfba5e83e5c6e250869587f8659
  - identifier: e3
    nquads: ["_:z <http://example.com/#p> _:a .", "_:z <http://example.com/#r> _:a ."]
    hash: 7e790a99273eed1dc57e43205d37ce232252c85b26ca4a6ff74ff3b5aea7bccd
  ca4:
  - identifier: e2
    hash: 15973d39de079913dac841ac4fa8c4781c0febfba5e83e5c6e250869587f8659
    cid: c14n0
  - identifier: e3
    hash: 7e790a99273eed1dc57e43205d37ce232252c85b26ca4a6ff74ff3b5aea7bccd
    cid: c14n1
  ca5:
  - identifier_list: ["e0", "e1"]
    hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
    ca5.2:
      - hndq:
          - identifier: e0
            Issuer: {e0: b0}
            hndq2:
              - quads: ["<http://example.com/#p> <http://example.com/#q> _:e0 .", "_:e0 <http://example.com/#p> _:e2 ."]
                hrbn:
                  - related: e2
                    position: o
                    quad: "_:e0 <http://example.com/#p> _:e2 ."
                    input: "o<http://example.com/#p>_:c14n0"
                    hash: 29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa
            hndq3:
              hn: {"29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa": ["e2"]}
            hndq5:
              - hash: 29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa
                data_to_hash: ""
                hndq5.4:
                  - perm: ["e2"]
                    hdnq5.4.4:
                      - related: e2
                        path: ""
                    hndq5.4.5:
                      recursion_list: []
                      path: _:c14n0
                hndq5.6:
                  chosen_path: _:c14n0,
                  data_to_hash: "29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa_:c14n0"
            hdnq6:
              hash: 60ae0eabcc4decbe6ac113a9bd5b5225c92bddaf9f4281b6989b86eaed1171b2
              Issuer: {e0: b0}
          - identifier: e1
            Issuer: {e1: b0}
            hndq2:
              - quads: ["<http://example.com/#p> <http://example.com/#q> _:e1 .", "_:e1 <http://example.com/#p> _:e3 ."]
                hrbn:
                  - related: e3
                    position: o
                    quad: "_:e1 <http://example.com/#p> _:e3 ."
                    input: "o<http://example.com/#p>_:c14n1"
                    hash: b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216
            hndq3:
              hn: {"b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216": ["e3"]}
            hndq5:
              - hash: b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216
                data_to_hash: ""
                hndq5.4:
                  - perm: ["e3"]
                    hndq5.4.4:
                      related: e3
                      path: ""
                    hndq5.4.5:
                      recursion_list: []
                      path: _:c14n1
                hndq5.6:
                  chosen_path: _:c14n1
                  data_to_hash: "b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216_:c14n1"
            hdnq6:
              hash: 01dd31165cbc828baf45258099571bf0f14ce5e8a3f80045ef834c8916777496
              Issuer: {e1: b0}
    ca5.3:
      - result: 2c0b377baf86f6c18fed4b0df6741290066e73c932861749b172d1e5560f5045
        Issuer: {e1: b0}
        ca5.3.1:
          - existing_identifier: e1
            cid: c14n2
      - result: fbc300de5afafd97a4b9ee1e72b57754dcdcb7ebb724789ac6a94a5b82a48d30
        Issuer: {e0: b0}
        ca5.3.1:
          - existing_identifier: e0
            cid: c14n3
  ca6:
    Issuer: {e2: c14n0, e3: c14n1, e1: c14n2, e0: c14n3}
normalized_quads:
  - "<http://example.com/#p> <http://example.com/#q> _:c14n2 ."
  - "<http://example.com/#p> <http://example.com/#q> _:c14n3 ."
  - "_:c14n0 <http://example.com/#r> _:c14n1 ."
  - "_:c14n2 <http://example.com/#p> _:c14n1 ."
  - "_:c14n3 <http://example.com/#p> _:c14n0 ."

It's a moving target, but is as readable as my raw debug log output, parses as YAML, and more useful for using to generate HTML output.

@iherman
Copy link
Member Author

iherman commented Dec 20, 2022

(Just from my vacationing sidelines...) I like the yaml output. At the moment, my logging output is a bit messy in terms of output, I may work on re-doing the logging using yaml. But I do not think I will do it this year...

Happy holidays everyone.

@gkellogg
Copy link
Member

For a fairly simple example that exercises more of the algorithm, I get quite a bit of output. My software is now natively producing the YAML logging output. Even for a simple example, trying to present this so it can be followed will still be challenging.

BASE <http://example.com/>
PREFIX : <#>

_:e0 :p1 _:e1 .
_:e1 :p2 "Foo" .
_:e2 :p1 _:e3 .
_:e3 :p2 "Foo" .

Log output:

ca:
  ca2:
    bn_to_quads:
      e0:
        - _:e0 <http://example.com/#p1> _:e1 .
      e1:
        - _:e0 <http://example.com/#p1> _:e1 .
        - _:e1 <http://example.com/#p2> "Foo" .
      e2:
        - _:e2 <http://example.com/#p1> _:e3 .
      e3:
        - _:e2 <http://example.com/#p1> _:e3 .
        - _:e3 <http://example.com/#p2> "Foo" .
  ca3:
  - identifier: e0
    h1dq:
      nquads:
        - _:a <http://example.com/#p1> _:z .
      hash: 24da9a4406b4e66dffa10ad3d4d6dddc388fbf193bb124e865158ef419893957
  - identifier: e1
    h1dq:
      nquads:
        - _:z <http://example.com/#p1> _:a .
        - _:a <http://example.com/#p2> "Foo" .
      hash: a994e40b576809985bc0f389308cd9d552fd7c89d028c163848a6b2d33a8583a
  - identifier: e2
    h1dq:
      nquads:
        - _:a <http://example.com/#p1> _:z .
      hash: 24da9a4406b4e66dffa10ad3d4d6dddc388fbf193bb124e865158ef419893957
  - identifier: e3
    h1dq:
      nquads:
        - _:z <http://example.com/#p1> _:a .
        - _:a <http://example.com/#p2> "Foo" .
      hash: a994e40b576809985bc0f389308cd9d552fd7c89d028c163848a6b2d33a8583a
  ca4:
  ca5:
  - hash: 24da9a4406b4e66dffa10ad3d4d6dddc388fbf193bb124e865158ef419893957
    identifier_list: [ "e0", "e2"]
    ca5.2:
    - hdnq:
        identifier: e0
        issuer: {e0: b0}
        hndq2:
        - quads:
          - _:e0 <http://example.com/#p1> _:e1 .
        hndq3:
        - quad: _:e0 <http://example.com/#p1> _:e1 .
          hndq3.1:
          - position: o
            related: e1
            h1dq:
              nquads:
                - _:z <http://example.com/#p1> _:a .
                - _:a <http://example.com/#p2> "Foo" .
              hash: a994e40b576809985bc0f389308cd9d552fd7c89d028c163848a6b2d33a8583a
            input: "o<http://example.com/#p1>a994e40b576809985bc0f389308cd9d552fd7c89d028c163848a6b2d33a8583a"
            hash: 3d96946f27fc34a78e8d067135a1cb1b77083aebc4b2c6cbdc536f067242686c
          hn: { "3d96946f27fc34a78e8d067135a1cb1b77083aebc4b2c6cbdc536f067242686c": [  "e1"]}
        hndq5:
        - hash: 3d96946f27fc34a78e8d067135a1cb1b77083aebc4b2c6cbdc536f067242686c
          data_to_hash: ""
          hndq5.4:
          - perm: [ "e1"]
            hndq5.4.4:
            - related: e1
              path: ""
            hndq5.4.5:
              recursion_list: [ "e1"]
              path: "_:b1"
              hndq5.4.5.5:
              - related: e1
                hdnq:
                  identifier: e1
                  issuer: {e0: b0, e1: b1}
                  hndq2:
                  - quads:
                    - _:e0 <http://example.com/#p1> _:e1 .
                    - _:e1 <http://example.com/#p2> "Foo" .
                  hndq3:
                  - quad: _:e0 <http://example.com/#p1> _:e1 .
                    hndq3.1:
                    - position: s
                      related: e0
                      input: "s<http://example.com/#p1>_:b0"
                      hash: 924a034861aa3fbdaf67a939abc4a2f4e233351bccb26718cb8c151b1746f677
                  - quad: _:e1 <http://example.com/#p2> "Foo" .
                    hndq3.1:
                    hn: { "924a034861aa3fbdaf67a939abc4a2f4e233351bccb26718cb8c151b1746f677": [  "e0"]}
                  hndq5:
                  - hash: 924a034861aa3fbdaf67a939abc4a2f4e233351bccb26718cb8c151b1746f677
                    data_to_hash: ""
                    hndq5.4:
                    - perm: [ "e0"]
                      hndq5.4.4:
                      - related: e0
                        path: ""
                      hndq5.4.5:
                        recursion_list: []
                        path: "_:b0"
                    hndq5.6:
                      chosen_path: "_:b0"
                      data_to_hash: "924a034861aa3fbdaf67a939abc4a2f4e233351bccb26718cb8c151b1746f677_:b0"
                  hndq6:
                    hash: c484f98e6cbf9e21f287433c8b1caa7f1486fd61d84ab220a494bf8184751b8c
                    issuer: {e0: b0, e1: b1}
                path: "_:b1_:b1<c484f98e6cbf9e21f287433c8b1caa7f1486fd61d84ab220a494bf8184751b8c>"
                issuer_copy: {e0: b0, e1: b1}
          hndq5.6:
            chosen_path: "_:b1_:b1<c484f98e6cbf9e21f287433c8b1caa7f1486fd61d84ab220a494bf8184751b8c>"
            data_to_hash: "3d96946f27fc34a78e8d067135a1cb1b77083aebc4b2c6cbdc536f067242686c_:b1_:b1<c484f98e6cbf9e21f287433c8b1caa7f1486fd61d84ab220a494bf8184751b8c>"
        hndq6:
          hash: 39d609fcd8236b74c70744f492cd2baaf0a55765b380ff9e0811ce23e2f409d7
          issuer: {e0: b0, e1: b1}
    - hdnq:
        identifier: e2
        issuer: {e2: b0}
        hndq2:
        - quads:
          - _:e2 <http://example.com/#p1> _:e3 .
        hndq3:
        - quad: _:e2 <http://example.com/#p1> _:e3 .
          hndq3.1:
          - position: o
            related: e3
            h1dq:
              nquads:
                - _:z <http://example.com/#p1> _:a .
                - _:a <http://example.com/#p2> "Foo" .
              hash: a994e40b576809985bc0f389308cd9d552fd7c89d028c163848a6b2d33a8583a
            input: "o<http://example.com/#p1>a994e40b576809985bc0f389308cd9d552fd7c89d028c163848a6b2d33a8583a"
            hash: 3d96946f27fc34a78e8d067135a1cb1b77083aebc4b2c6cbdc536f067242686c
          hn: { "3d96946f27fc34a78e8d067135a1cb1b77083aebc4b2c6cbdc536f067242686c": [  "e3"]}
        hndq5:
        - hash: 3d96946f27fc34a78e8d067135a1cb1b77083aebc4b2c6cbdc536f067242686c
          data_to_hash: ""
          hndq5.4:
          - perm: [ "e3"]
            hndq5.4.4:
            - related: e3
              path: ""
            hndq5.4.5:
              recursion_list: [ "e3"]
              path: "_:b1"
              hndq5.4.5.5:
              - related: e3
                hdnq:
                  identifier: e3
                  issuer: {e2: b0, e3: b1}
                  hndq2:
                  - quads:
                    - _:e2 <http://example.com/#p1> _:e3 .
                    - _:e3 <http://example.com/#p2> "Foo" .
                  hndq3:
                  - quad: _:e2 <http://example.com/#p1> _:e3 .
                    hndq3.1:
                    - position: s
                      related: e2
                      input: "s<http://example.com/#p1>_:b0"
                      hash: 924a034861aa3fbdaf67a939abc4a2f4e233351bccb26718cb8c151b1746f677
                  - quad: _:e3 <http://example.com/#p2> "Foo" .
                    hndq3.1:
                    hn: { "924a034861aa3fbdaf67a939abc4a2f4e233351bccb26718cb8c151b1746f677": [  "e2"]}
                  hndq5:
                  - hash: 924a034861aa3fbdaf67a939abc4a2f4e233351bccb26718cb8c151b1746f677
                    data_to_hash: ""
                    hndq5.4:
                    - perm: [ "e2"]
                      hndq5.4.4:
                      - related: e2
                        path: ""
                      hndq5.4.5:
                        recursion_list: []
                        path: "_:b0"
                    hndq5.6:
                      chosen_path: "_:b0"
                      data_to_hash: "924a034861aa3fbdaf67a939abc4a2f4e233351bccb26718cb8c151b1746f677_:b0"
                  hndq6:
                    hash: c484f98e6cbf9e21f287433c8b1caa7f1486fd61d84ab220a494bf8184751b8c
                    issuer: {e2: b0, e3: b1}
                path: "_:b1_:b1<c484f98e6cbf9e21f287433c8b1caa7f1486fd61d84ab220a494bf8184751b8c>"
                issuer_copy: {e2: b0, e3: b1}
          hndq5.6:
            chosen_path: "_:b1_:b1<c484f98e6cbf9e21f287433c8b1caa7f1486fd61d84ab220a494bf8184751b8c>"
            data_to_hash: "3d96946f27fc34a78e8d067135a1cb1b77083aebc4b2c6cbdc536f067242686c_:b1_:b1<c484f98e6cbf9e21f287433c8b1caa7f1486fd61d84ab220a494bf8184751b8c>"
        hndq6:
          hash: 39d609fcd8236b74c70744f492cd2baaf0a55765b380ff9e0811ce23e2f409d7
          issuer: {e2: b0, e3: b1}
    ca5.3:
    - result: 39d609fcd8236b74c70744f492cd2baaf0a55765b380ff9e0811ce23e2f409d7
      issuer: {e0: b0, e1: b1}
      ca5.3.1:
      - existing_identifier: e0
        cid: c14n0
      - existing_identifier: e1
        cid: c14n1
    - result: 39d609fcd8236b74c70744f492cd2baaf0a55765b380ff9e0811ce23e2f409d7
      issuer: {e2: b0, e3: b1}
      ca5.3.1:
      - existing_identifier: e2
        cid: c14n2
      - existing_identifier: e3
        cid: c14n3
  - hash: a994e40b576809985bc0f389308cd9d552fd7c89d028c163848a6b2d33a8583a
    identifier_list: [ "e1", "e3"]
    ca5.2:
  ca6:: {canonical_issuer: {e0: c14n0, e1: c14n1, e2: c14n2, e3: c14n3}}

@iherman
Copy link
Member Author

iherman commented Jan 6, 2023

The following is a target YAML output I'm working on for the shared hashes example:

... and here is what mine produces:

- log point: "[info] Entering the canonicalization function (4.5.3 (2))."
  with:
    - Bnode to quads:
        e0:
          - <http://example.com/#p> <http://example.com/#q> _:e0 .
          - _:e0 <http://example.com/#p> _:e2 .
        e1:
          - <http://example.com/#p> <http://example.com/#q> _:e1 .
          - _:e1 <http://example.com/#p> _:e3 .
        e2:
          - _:e0 <http://example.com/#p> _:e2 .
          - _:e2 <http://example.com/#r> _:e3 .
        e3:
          - _:e1 <http://example.com/#p> _:e3 .
          - _:e2 <http://example.com/#r> _:e3 .
- log point: "[info] Entering Hash First Degree Quads function (4.7.3)"
  with:
    - identifier: e0
- log point: "[info] Leaving Hash First Degree Quads function (4.7.3)."
  with:
    - identifier: e0
      quads:
        - <http://example.com/#p> <http://example.com/#q> _:a .
        - _:a <http://example.com/#p> _:z .
      hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
- log point: "[info] Entering Hash First Degree Quads function (4.7.3)"
  with:
    - identifier: e1
- log point: "[info] Leaving Hash First Degree Quads function (4.7.3)."
  with:
    - identifier: e1
      quads:
        - <http://example.com/#p> <http://example.com/#q> _:a .
        - _:a <http://example.com/#p> _:z .
      hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
- log point: "[info] Entering Hash First Degree Quads function (4.7.3)"
  with:
    - identifier: e2
- log point: "[info] Leaving Hash First Degree Quads function (4.7.3)."
  with:
    - identifier: e2
      quads:
        - _:a <http://example.com/#r> _:z .
        - _:z <http://example.com/#p> _:a .
      hash: 15973d39de079913dac841ac4fa8c4781c0febfba5e83e5c6e250869587f8659
- log point: "[info] Entering Hash First Degree Quads function (4.7.3)"
  with:
    - identifier: e3
- log point: "[info] Leaving Hash First Degree Quads function (4.7.3)."
  with:
    - identifier: e3
      quads:
        - _:z <http://example.com/#p> _:a .
        - _:z <http://example.com/#r> _:a .
      hash: 7e790a99273eed1dc57e43205d37ce232252c85b26ca4a6ff74ff3b5aea7bccd
- log point: "[info] Canonicalization function (4.5.3 (4))."
  with:
    - Identifier in first pass: e2=>c14n0
- log point: "[info] Canonicalization function (4.5.3 (4))."
  with:
    - Identifier in first pass: e3=>c14n1
- log point: "[info] Entering Hash N-Degree Quads function (4.9.3)."
  with:
    - identifier: e0
      issuer:
        issuer ID: "1235"
        prefix: c14n
        counter: "2"
        mappings:
          - e2=>c14n0
          - e3=>c14n1
- log point: "[info] Entering Hash Related Blank Node function (4.8.3)"
  with:
    - related: e2
      quad: _:e0 <http://example.com/#p> _:e2 .
- log point: "[info] Leaving Hash Related Blank Node function (4.8.3 (4))"
  with:
    - input to hash: o<http://example.com/#p>_:c14n0
      hash: 29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa
- log point: "[info] Hash N-Degree Quads function (4.9.3 (3))"
  with:
    - Hash to bnodes:
        29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa: &a1
          - e2
- log point: "[info] Hash N-Degree Quads function (4.9.3 (5)), entering loop"
  with:
    - hash: 29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa
      data to hash: ""
- log point: "[info] Hash N-Degree Quads function (4.9.3 (5.4)), entering loop"
  with:
    - p: *a1
      chosen path: ""
- log point: "[info] Hash N-Degree Quads function (4.9.3 (5.4.4)), entering loop"
  with:
    - related: e2
      path: ""
- log point: "[info] Hash N-Degree Quads function (4.9.3 (5.4.5)), before possible
    recursion."
  with:
    - recursion list: []
      path: _:c14n0
- log point: "[info] Hash N-Degree Quads function (4.9.3 (5.5). End of current
    loop with Hn hashes"
  with:
    - chosen path: _:c14n0
      data to hash: 29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa_:c14n0
- log point: "[info] Leaving Hash N-Degree Quads function (4.9.3)."
  with:
    - hash: fbc300de5afafd97a4b9ee1e72b57754dcdcb7ebb724789ac6a94a5b82a48d30
      issuer:
        issuer ID: "1237"
        prefix: b
        counter: "1"
        mappings:
          - e0=>b0
- log point: "[info] Entering Hash N-Degree Quads function (4.9.3)."
  with:
    - identifier: e1
      issuer:
        issuer ID: "1235"
        prefix: c14n
        counter: "2"
        mappings:
          - e2=>c14n0
          - e3=>c14n1
- log point: "[info] Entering Hash Related Blank Node function (4.8.3)"
  with:
    - related: e3
      quad: _:e1 <http://example.com/#p> _:e3 .
- log point: "[info] Leaving Hash Related Blank Node function (4.8.3 (4))"
  with:
    - input to hash: o<http://example.com/#p>_:c14n1
      hash: b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216
- log point: "[info] Hash N-Degree Quads function (4.9.3 (3))"
  with:
    - Hash to bnodes:
        b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216: &a2
          - e3
- log point: "[info] Hash N-Degree Quads function (4.9.3 (5)), entering loop"
  with:
    - hash: b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216
      data to hash: ""
- log point: "[info] Hash N-Degree Quads function (4.9.3 (5.4)), entering loop"
  with:
    - p: *a2
      chosen path: ""
- log point: "[info] Hash N-Degree Quads function (4.9.3 (5.4.4)), entering loop"
  with:
    - related: e3
      path: ""
- log point: "[info] Hash N-Degree Quads function (4.9.3 (5.4.5)), before possible
    recursion."
  with:
    - recursion list: []
      path: _:c14n1
- log point: "[info] Hash N-Degree Quads function (4.9.3 (5.5). End of current
    loop with Hn hashes"
  with:
    - chosen path: _:c14n1
      data to hash: b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216_:c14n1
- log point: "[info] Leaving Hash N-Degree Quads function (4.9.3)."
  with:
    - hash: 2c0b377baf86f6c18fed4b0df6741290066e73c932861749b172d1e5560f5045
      issuer:
        issuer ID: "1239"
        prefix: b
        counter: "1"
        mappings:
          - e1=>b0
- log point: "[info] Canonicalization function, after (4.5.3 (5.2))"
  with:
    - computed for: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
      hash path list:
        - hash: fbc300de5afafd97a4b9ee1e72b57754dcdcb7ebb724789ac6a94a5b82a48d30
          issuer:
            issuer ID: "1237"
            prefix: b
            counter: "1"
            mappings:
              - e0=>b0
        - hash: 2c0b377baf86f6c18fed4b0df6741290066e73c932861749b172d1e5560f5045
          issuer:
            issuer ID: "1239"
            prefix: b
            counter: "1"
            mappings:
              - e1=>b0
- log point: "[info] Leaving the canonicalization function (4.5.3)"
  with:
    - issuer:
        issuer ID: "1235"
        prefix: c14n
        counter: "4"
        mappings:
          - e2=>c14n0
          - e3=>c14n1
          - e1=>c14n2
          - e0=>c14n3

Yes, it is verbose. But I believe it is useful for others developing the algorithm...

@gkellogg
Copy link
Member

gkellogg commented Jan 6, 2023

The "log point" keys could be comments, and there is no identifier directly associating it with a step, other than extracting it from a string. Perhaps using comments might do this, so could be something like the following:

ca5.4.3:
  # Entering the canonicalization function (4.5.3 (2)).
    - Bnode to quads:
        e0:
          - <http://example.com/#p> <http://example.com/#q> _:e0 .
          - _:e0 <http://example.com/#p> _:e2 .
        e1:
          - <http://example.com/#p> <http://example.com/#q> _:e1 .
          - _:e1 <http://example.com/#p> _:e3 .
        e2:
          - _:e0 <http://example.com/#p> _:e2 .
          - _:e2 <http://example.com/#r> _:e3 .
        e3:
          - _:e1 <http://example.com/#p> _:e3 .
          - _:e2 <http://example.com/#r> _:e3 .

Note that, unlike JSON, key order is significant in YAML, so maps can be used for the various steps. This would keep it reasonably close to the system I worked out, but provide some more human readable comment (although these could be properties, too).

Moreover, maybe we more directly align the keys used with the fragment identifiers for each algorithm or algorithm step, to aid in constructing an HTML view. But, allowing the YAML to be more easily interpreted by humans is worthwhile.

@iherman
Copy link
Member Author

iherman commented Jan 7, 2023

Well... comments might be complicated for my case: my logger system generates a javascript object along the way that is serialized into YAML at the very end. I am not sure whether comments can be introduced. But a key of the form "@comment" may work instead. (The advantage is that the log can also be turned into JSON if needed.)

Using the same keys for each log point might be a good idea indeed. I can modify my script to do that.

@iherman
Copy link
Member Author

iherman commented Jan 7, 2023

Can we agree on the keys?

  • ca.X.Y.Z
  • hfdq.X.Y.Z
  • hrbn.X.Y.Z
  • hndq.X.Y.Z

Where X.Y.Z. corresponds to the point in the spec?

@gkellogg
Copy link
Member

gkellogg commented Jan 7, 2023

Can we agree on the keys?

  • ca.X.Y.Z
  • hfdq.X.Y.Z
  • hrbn.X.Y.Z
  • hndq.X.Y.Z

Where X.Y.Z. corresponds to the point in the spec?

We could do that, but we might want to change the anchors we use in the spec to correspond. For example, Step 2.1 of the Canonicalization algorithm currently uses the fragment ca-2-1. I don't think there's any reason that an identifier can't use ., so we could update these all to be ca.2.1.

For completeness:

  • step 3.1 of the Hash First Degree Quads algorithm is h1d-3-1 (in retrospect, this probably should have been h1dq-3-1 for consistency)
  • step 4 of the Hash Related Blank Nodes algorithm is hrbn-4
  • step 3.1 of the Hash N-Degree Quads algorithm is hndq-3-1

Keeping the keys the same as the fragment identifiers facilitates generating links back into the spec, but with consistent naming, these could also be transformed programatically.

I'll update my implementation and post a revised YAML log for one of the examples over the weekend.

@gkellogg
Copy link
Member

gkellogg commented Jan 7, 2023

Well... comments might be complicated for my case: my logger system generates a javascript object along the way that is serialized into YAML at the very end. I am not sure whether comments can be introduced. But a key of the form "@comment" may work instead. (The advantage is that the log can also be turned into JSON if needed.)

Using the same keys for each log point might be a good idea indeed. I can modify my script to do that.

Of course, we can use a term such as “log point” for such comments, which could always be aliased away in a context, or just nod defined, and it would be dropped. But, thus far I haven’t really seen a need to define such a context.

@iherman
Copy link
Member Author

iherman commented Jan 7, 2023

Can we agree on the keys?

  • ca.X.Y.Z
  • hfdq.X.Y.Z
  • hrbn.X.Y.Z
  • hndq.X.Y.Z

Where X.Y.Z. corresponds to the point in the spec?

We could do that, but we might want to change the anchors we use in the spec to correspond. For example, Step 2.1 of the Canonicalization algorithm currently uses the fragment ca-2-1. I don't think there's any reason that an identifier can't use ., so we could update these all to be ca.2.1.

Being consistent with the anchors is a great idea.

For completeness:

  • step 3.1 of the Hash First Degree Quads algorithm is h1d-3-1 (in retrospect, this probably should have been h1dq-3-1 for consistency)
  • step 4 of the Hash Related Blank Nodes algorithm is hrbn-4
  • step 3.1 of the Hash N-Degree Quads algorithm is hndq-3-1

So should we use h1dq.X.Y.Z and hndq.X.Y.Z?

@iherman
Copy link
Member Author

iherman commented Jan 7, 2023

Here is what I have now: iherman/rdfjs-c14n#2

@gkellogg
Copy link
Member

gkellogg commented Jan 7, 2023

Here's my updated and suggested output for the shared hashes example (updated Jan 09 and again Jan 11):

ca:
  log point: Entering the canonicalization function (4.5.3).
  ca.2:
    log point: Extract quads for each bnode (4.5.3 (2)).
    Bnode to quads:
      e0:
        - <http://example.com/#p> <http://example.com/#q> _:e0 .
        - _:e0 <http://example.com/#p> _:e2 .
      e1:
        - <http://example.com/#p> <http://example.com/#q> _:e1 .
        - _:e1 <http://example.com/#p> _:e3 .
      e2:
        - _:e0 <http://example.com/#p> _:e2 .
        - _:e2 <http://example.com/#r> _:e3 .
      e3:
        - _:e1 <http://example.com/#p> _:e3 .
        - _:e2 <http://example.com/#r> _:e3 .
  ca.3:
    log point: Calculated first degree hashes (4.5.3 (3)).
    with:
      - identifier: e0
        h1dq:
          log point: Hash First Degree Quads function (4.7.3).
          nquads:
            - <http://example.com/#p> <http://example.com/#q> _:a .
            - _:a <http://example.com/#p> _:z .
          hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
      - identifier: e1
        h1dq:
          log point: Hash First Degree Quads function (4.7.3).
          nquads:
            - <http://example.com/#p> <http://example.com/#q> _:a .
            - _:a <http://example.com/#p> _:z .
          hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
      - identifier: e2
        h1dq:
          log point: Hash First Degree Quads function (4.7.3).
          nquads:
            - _:z <http://example.com/#p> _:a .
            - _:a <http://example.com/#r> _:z .
          hash: 15973d39de079913dac841ac4fa8c4781c0febfba5e83e5c6e250869587f8659
      - identifier: e3
        h1dq:
          log point: Hash First Degree Quads function (4.7.3).
          nquads:
            - _:z <http://example.com/#p> _:a .
            - _:z <http://example.com/#r> _:a .
          hash: 7e790a99273eed1dc57e43205d37ce232252c85b26ca4a6ff74ff3b5aea7bccd
  ca.4:
    log point: Create canonical replacements for hashes mapping to a single node (4.5.3 (4)).
    with:
      - identifier: e2
        hash: 15973d39de079913dac841ac4fa8c4781c0febfba5e83e5c6e250869587f8659
        canonical label: c14n0
      - identifier: e3
        hash: 7e790a99273eed1dc57e43205d37ce232252c85b26ca4a6ff74ff3b5aea7bccd
        canonical label: c14n1
  ca.5:
    log point: Calculate hashes for identifiers with shared hashes (4.5.3 (5)).
    with:
      - hash: 3b26142829b8887d011d779079a243bd61ab53c3990d550320a17b59ade6ba36
        identifier list: [ "e0", "e1"]
        ca.5.2:
          log point: Calculate hashes for identifiers with shared hashes (4.5.3 (5.2)).
          with:
            - identifier: e0
              hndq:
                log point: Hash N-Degree Quads function (4.9.3).
                identifier: e0
                issuer: {e0: b0}
                hndq.2:
                  log point: Quads for identifier (4.9.3 (2)).
                  quads:
                  - <http://example.com/#p> <http://example.com/#q> _:e0 .
                  - _:e0 <http://example.com/#p> _:e2 .
                hndq.3:
                  log point: Hash N-Degree Quads function (4.9.3 (3)).
                  with:
                    - quad: <http://example.com/#p> <http://example.com/#q> _:e0 .
                      hndq.3.1:
                        log point: Hash related bnode component (4.9.3 (3.1))
                        with:
                    - quad: _:e0 <http://example.com/#p> _:e2 .
                      hndq.3.1:
                        log point: Hash related bnode component (4.9.3 (3.1))
                        with:
                          - position: o
                            related: e2
                            input: "o<http://example.com/#p>_:c14n0"
                            hash: 29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa
                  Hash to bnodes:
                      29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa:
                        - e2
                hndq.5:
                  log point: Hash N-Degree Quads function (4.9.3 (5)), entering loop.
                  with:
                    - related hash: 29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa
                      data to hash: ""
                      hndq.5.4:
                        log point: Hash N-Degree Quads function (4.9.3 (5.4)), entering loop.
                        with:
                        - perm: [ "e2"]
                          hndq.5.4.4:
                            log point: Hash N-Degree Quads function (4.9.3 (5.4.4)), entering loop.
                            with:
                              - related: e2
                                path: ""
                          hndq.5.4.5:
                            log point: Hash N-Degree Quads function (4.9.3 (5.4.5)), before possible recursion.
                            recursion list: []
                            path: "_:c14n0"
                      hndq.5.5:
                        log point: Hash N-Degree Quads function (4.9.3 (5.5). End of current loop with Hn hashes.
                        chosen path: "_:c14n0"
                        data to hash: "29cf7e22790bc2ed395b81b3933e5329fc7b25390486085cac31ce7252ca60fa_:c14n0"
                hndq.6:
                  log point: Leaving Hash N-Degree Quads function (4.9.3).
                  hash: fbc300de5afafd97a4b9ee1e72b57754dcdcb7ebb724789ac6a94a5b82a48d30
                  issuer: {e0: b0}
            - identifier: e1
              hndq:
                log point: Hash N-Degree Quads function (4.9.3).
                identifier: e1
                issuer: {e1: b0}
                hndq.2:
                  log point: Quads for identifier (4.9.3 (2)).
                  quads:
                  - <http://example.com/#p> <http://example.com/#q> _:e1 .
                  - _:e1 <http://example.com/#p> _:e3 .
                hndq.3:
                  log point: Hash N-Degree Quads function (4.9.3 (3)).
                  with:
                    - quad: <http://example.com/#p> <http://example.com/#q> _:e1 .
                      hndq.3.1:
                        log point: Hash related bnode component (4.9.3 (3.1))
                        with:
                    - quad: _:e1 <http://example.com/#p> _:e3 .
                      hndq.3.1:
                        log point: Hash related bnode component (4.9.3 (3.1))
                        with:
                          - position: o
                            related: e3
                            input: "o<http://example.com/#p>_:c14n1"
                            hash: b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216
                  Hash to bnodes:
                      b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216:
                        - e3
                hndq.5:
                  log point: Hash N-Degree Quads function (4.9.3 (5)), entering loop.
                  with:
                    - related hash: b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216
                      data to hash: ""
                      hndq.5.4:
                        log point: Hash N-Degree Quads function (4.9.3 (5.4)), entering loop.
                        with:
                        - perm: [ "e3"]
                          hndq.5.4.4:
                            log point: Hash N-Degree Quads function (4.9.3 (5.4.4)), entering loop.
                            with:
                              - related: e3
                                path: ""
                          hndq.5.4.5:
                            log point: Hash N-Degree Quads function (4.9.3 (5.4.5)), before possible recursion.
                            recursion list: []
                            path: "_:c14n1"
                      hndq.5.5:
                        log point: Hash N-Degree Quads function (4.9.3 (5.5). End of current loop with Hn hashes.
                        chosen path: "_:c14n1"
                        data to hash: "b7956ea1d654d5824496eb439a1f2b79478bd7d02d4a115f4c97cbff6b098216_:c14n1"
                hndq.6:
                  log point: Leaving Hash N-Degree Quads function (4.9.3).
                  hash: 2c0b377baf86f6c18fed4b0df6741290066e73c932861749b172d1e5560f5045
                  issuer: {e1: b0}
        ca.5.3:
          log point: Canonical identifiers for temporary identifiers (4.5.3 (5.3)).
          issuer:
              - blank node: e1
                canonical identifier: c14n2
              - blank node: e0
                canonical identifier: c14n3
  ca.6:
    log point: Replace original with canonical labels (4.5.3 (6)).
    canonical issuer: {e2: c14n0, e3: c14n1, e1: c14n2, e0: c14n3}

(There are more examples at https://github.com/ruby-rdf/rdf-normalize/tree/develop/examples).

It does most of what Ivan's version does, with some changes for including called algorithm steps from the step from which they're called. Note that adding the "log point" as a property sometimes requires introducing the "with" key to allow array values to be listed.

It gets deeper with examples like test022 and duplicate paths which do more recursion.

Once we've reached some consensus, I'll update the PR.

@yamdan
Copy link
Contributor

yamdan commented Jan 11, 2023

I generated the logs for test022 with my developing implementation, which looks like the similar results as @gkellogg 's test022. Yet, mine is still not correctly indented for recursively called h1dq and hndq.

@gkellogg
Copy link
Member

I generated the logs for test022 with my developing implementation, which looks like the similar results as @gkellogg 's test022. Yet, mine is still not correctly indented for recursively called h1dq and hndq.

Fantastic that you've gotten this far! Along with @dlongley's implementation (there are likely others I'm unaware of), we certainly have achieved the broad implementation requirements for the spec (at least the C14N algorithm part).

We should probably do some annotation of the algorithms to identify some subset of tests that exercise each bit, and cross-link with the test suite description, which is fairly straightforward, and probably doesn't really need to be in place until PR.

To get the proper indenting, my implementation outputs YAML natively, with methods accepting an indentation depth, and some other extra explicit indentation. It's pretty sensitive to changes, so we depend on the algorithms being fairly mature.

My plan is to add some hand-written examples to #63 (markup logging) and abandon #64 (rendered log results).

I think another PR should add log results for each test, and update the HTML version of the test suite (at least) to reference these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants