Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random graph theorist's remarks on WD 22 May 2023 #111

Closed
lars-hellstrom opened this issue Jun 7, 2023 · 5 comments
Closed

Random graph theorist's remarks on WD 22 May 2023 #111

lars-hellstrom opened this issue Jun 7, 2023 · 5 comments

Comments

@lars-hellstrom
Copy link

Some points in the text that I feel a need to remark upon:

You never explain the abbreviation c14n. In the text proper it only seems to appear as part of identifiers, so technically not in need of an explanation, but it's still rather mysterious until you google it.

§1, explanation of NP-intermediate

it is believed to be an NP-Intermediate problem, that is, neither known to be solvable in polynomial time nor NP-complete

Whether something is an NP-intermediate problem does not depend on what is known about it. The 'believed' already covers for the fact that its status is not known.

As a scholar, I would also find it natural if there somewhere in the document was explained how (undirected) Graph Isomorphism reduces to RDF Graph C14n. Easiest reduction I can think of would be that each vertex and each edge has a blank node, and the triples all encode the incidence relation (i.e., all predicates are the same, having vertex node as subject and edge node as object). If however you say that as a matter of style going into such detail is not appropriate, then I certainly would believe that.

§3.1, definition of gossip path

The definition here is of a gossip path, not some uniquely determined the gossip path. (Later, the URDNA2015 algorithm selects one gossip path, but the details of that is singled out are not found here.)

At times the later text also seems to identify a gossip path (graph-theoretical object) with a string representing that gossip path. You may want to consider promoting that latter concept to something explicitly defined; it could make the exposition clearer.

§3.1 or §3.2: Dataset isomorphism

Dataset isomorphism is not mentioned as something defined (whether in this document or by reference), but it really should be, since it is central to defining the problem of canonicalisation.

§4, editor's note

I strongly support this sentiment, that the algorithm should go into an appendix.

Things that should be normatively defined include:

  • canonicalisation function: function mapping RDF datasets to isomorphic RDF datasets, with the property that two datasets are mapped to the same thing if and only if they are isomorphic.
  • canonicalisation partial function: function mapping some RDF datasets to isomorphic RDF datasets. One probably wants the situation that c14n(G) defined but c14n(H) undefined to mean that G and H are not isomorphic. (ISTR there was some issue that mentioned having the algorithm give up if presented with too difficult problems, but right now I can't find the number. When the algorithm is configured to do that, it implements a partial function.)
  • runtime guarantees for algorithm. Where DoS is a concern, one wants to know an algorithm at least terminates within a reasonable time.

§4.6 and 4.8, degree as in "Hash First Degree Quads" and "Hash N-Degree Quads"

This use of 'degree' is just plain wrong, although I fear it might be too deeply entrenched for you to consider changing. The standard meaning of 'degree' in the context of graphs is as a synonym of valency (number of incidences of edges to the vertex under consideration). The concept at play here is rather that of radius of the neighbourhood being considered. I suppose for bounded valency graphs you could arrive at the radius being the degree of the polynomial bound on the cardinality of the neighbourhood considered, but even that is a far-fetched choice of terminology; depth (as in 'recursion depth') seems more natural.

However, the risk of confusion is probably low.

§4.8.1, definition of shortest gossip path

Am I reading this right, that the shortest gossip path need not be the path of shortest length in the graph sense, if some predicates have very long IRIs and other predicates very short IRIs? Two 50 character IRIs are longer than three 20 character IRIs. (Of course, I may have missed some hashing step that would give components uniform length.)

Multiple graphs dataset example?

As far as I can tell, all examples only have the default graph in the dataset. With quads, the graph is just one more endpoint of the edge, so probably no increased complexity as far as cases in the algorithm are concerned, but it looks a bit strange that it speaks about quads when throughout there are just triples in the examples (especially when serialised for hashing).

I find myself wondering how hard the algorithm would have to determine that a Möbius ladder graph is not isomorphic to a circular ladder graph of same order. (For graph isomorphism to appear as a subproblem of c14n, consider a dataset with two graphs whose names are blank nodes. At what point can it tell those nodes get different hashes?) Or for that matter, Möbius ladder to a circular ladder of order 2 more. The algorithm does not directly use multiplicity of hashes, does it? Only whether hashes are unique or not. Hence a 10-vertex 3-regular graph is not so obviously distinct from a 12-vertex 3-regular graph.

Canonicalise versus normalise

I would suggest 'canon*' as the better term. Whereas 'normal form' certainly describes what you are going for, just plain 'normalise' admits many alternative meanings, for example on the lines of "getting rid of silly parts we don't care about". In the context of graph colouring, "normalising" could easily mean "reduce edge multiplicities to 1".

gkellogg added a commit that referenced this issue Jun 13, 2023
Adds some motivation for why algorithms are considered normative.
Adds some definitions for "canonicalization function" and "canonicalization partial function" as suggested in #111.

Fixes #112.
@iherman
Copy link
Member

iherman commented Jun 14, 2023

Admin question: shouldn't this issue be closed, now that the various points have been "delegated" to individual issues?

@TallTed
Copy link
Member

TallTed commented Jun 14, 2023

I suggest that @gkellogg change the From https://github.com/w3c/rdf-canon/issues/111: in the delegated issues to From @lars-hellstrom in https://github.com/w3c/rdf-canon/issues/111: such that @lars-hellstrom receives notifications of updates to those issues and it's clear to whom we should address, "Does this handling sufficiently address your issue/question?" on each.

@gkellogg
Copy link
Member

I think to change this, we need feedback specifically to @lars-hellstrom so that we can change the status of this PR to another "wr" state. @lars-hellstrom is welcome to follow the delegated issues if he has more concerns; I don't think we need to do this by proxy, as many of the issues we were addressing anyway.

gkellogg added a commit that referenced this issue Jun 14, 2023
* Adds text to indicate that the algorithms are normative.
* Adds some motivation for why algorithms are considered normative.
* Adds some definitions for "canonicalization function" and "canonicalization partial function" as suggested in #111.

Fixes #112.

---------

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Dan Yamamoto <dan@iij.ad.jp>
Co-authored-by: Ivan Herman <ivan@ivan-herman.net>
@peacekeeper
Copy link
Contributor

@lars-hellstrom As co-chair of the RCH WG, I'd like to thank you for your review and feedback about the document! The group discussed your points on a call this week, here are minutes: https://www.w3.org/2023/06/13-rch-minutes.html#t04

As you can see above, @gkellogg created multiple more granular issues, in order to address specific points you raised.

You are of course welcome to comment more on the individual issues in this repo. Perhaps you would also be interested in joining the WG as a member and contributing PRs yourself, to help address some of the topics you identified?

@gkellogg
Copy link
Member

@lars-hellstrom, thanks again for your comments. Separate issues were raised and addressed based upon this input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants