-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random graph theorist's remarks on WD 22 May 2023 #111
Comments
Admin question: shouldn't this issue be closed, now that the various points have been "delegated" to individual issues? |
I suggest that @gkellogg change the |
I think to change this, we need feedback specifically to @lars-hellstrom so that we can change the status of this PR to another "wr" state. @lars-hellstrom is welcome to follow the delegated issues if he has more concerns; I don't think we need to do this by proxy, as many of the issues we were addressing anyway. |
* Adds text to indicate that the algorithms are normative. * Adds some motivation for why algorithms are considered normative. * Adds some definitions for "canonicalization function" and "canonicalization partial function" as suggested in #111. Fixes #112. --------- Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com> Co-authored-by: Dan Yamamoto <dan@iij.ad.jp> Co-authored-by: Ivan Herman <ivan@ivan-herman.net>
@lars-hellstrom As co-chair of the RCH WG, I'd like to thank you for your review and feedback about the document! The group discussed your points on a call this week, here are minutes: https://www.w3.org/2023/06/13-rch-minutes.html#t04 As you can see above, @gkellogg created multiple more granular issues, in order to address specific points you raised. You are of course welcome to comment more on the individual issues in this repo. Perhaps you would also be interested in joining the WG as a member and contributing PRs yourself, to help address some of the topics you identified? |
@lars-hellstrom, thanks again for your comments. Separate issues were raised and addressed based upon this input. |
Some points in the text that I feel a need to remark upon:
You never explain the abbreviation c14n. In the text proper it only seems to appear as part of identifiers, so technically not in need of an explanation, but it's still rather mysterious until you google it.
§1, explanation of NP-intermediate
Whether something is an NP-intermediate problem does not depend on what is known about it. The 'believed' already covers for the fact that its status is not known.
As a scholar, I would also find it natural if there somewhere in the document was explained how (undirected) Graph Isomorphism reduces to RDF Graph C14n. Easiest reduction I can think of would be that each vertex and each edge has a blank node, and the triples all encode the incidence relation (i.e., all predicates are the same, having vertex node as subject and edge node as object). If however you say that as a matter of style going into such detail is not appropriate, then I certainly would believe that.
§3.1, definition of gossip path
The definition here is of a gossip path, not some uniquely determined the gossip path. (Later, the URDNA2015 algorithm selects one gossip path, but the details of that is singled out are not found here.)
At times the later text also seems to identify a gossip path (graph-theoretical object) with a string representing that gossip path. You may want to consider promoting that latter concept to something explicitly defined; it could make the exposition clearer.
§3.1 or §3.2: Dataset isomorphism
Dataset isomorphism is not mentioned as something defined (whether in this document or by reference), but it really should be, since it is central to defining the problem of canonicalisation.
§4, editor's note
I strongly support this sentiment, that the algorithm should go into an appendix.
Things that should be normatively defined include:
§4.6 and 4.8, degree as in "Hash First Degree Quads" and "Hash N-Degree Quads"
This use of 'degree' is just plain wrong, although I fear it might be too deeply entrenched for you to consider changing. The standard meaning of 'degree' in the context of graphs is as a synonym of valency (number of incidences of edges to the vertex under consideration). The concept at play here is rather that of radius of the neighbourhood being considered. I suppose for bounded valency graphs you could arrive at the radius being the degree of the polynomial bound on the cardinality of the neighbourhood considered, but even that is a far-fetched choice of terminology; depth (as in 'recursion depth') seems more natural.
However, the risk of confusion is probably low.
§4.8.1, definition of shortest gossip path
Am I reading this right, that the shortest gossip path need not be the path of shortest length in the graph sense, if some predicates have very long IRIs and other predicates very short IRIs? Two 50 character IRIs are longer than three 20 character IRIs. (Of course, I may have missed some hashing step that would give components uniform length.)
Multiple graphs dataset example?
As far as I can tell, all examples only have the default graph in the dataset. With quads, the graph is just one more endpoint of the edge, so probably no increased complexity as far as cases in the algorithm are concerned, but it looks a bit strange that it speaks about quads when throughout there are just triples in the examples (especially when serialised for hashing).
I find myself wondering how hard the algorithm would have to determine that a Möbius ladder graph is not isomorphic to a circular ladder graph of same order. (For graph isomorphism to appear as a subproblem of c14n, consider a dataset with two graphs whose names are blank nodes. At what point can it tell those nodes get different hashes?) Or for that matter, Möbius ladder to a circular ladder of order 2 more. The algorithm does not directly use multiplicity of hashes, does it? Only whether hashes are unique or not. Hence a 10-vertex 3-regular graph is not so obviously distinct from a 12-vertex 3-regular graph.
Canonicalise versus normalise
I would suggest 'canon*' as the better term. Whereas 'normal form' certainly describes what you are going for, just plain 'normalise' admits many alternative meanings, for example on the lines of "getting rid of silly parts we don't care about". In the context of graph colouring, "normalising" could easily mean "reduce edge multiplicities to 1".
The text was updated successfully, but these errors were encountered: