-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more N-Quads escaping tests. #81
Conversation
- Add 4 and 8 hex char unicode tests. - Add IRI escaping tests. - Add graph name escaping tests.
This is actually related to my comments in https://lists.w3.org/Archives/Public/public-rch-wg/2023Mar/0000.html:
The test suite currently plays two roles:
Clearly, (2) is important, and essential for the deployment of any URDNA implementation. However, we should realize that only (1) is required to pass the CR phase per W3C Process. Put it another way, this test suite is not required to test the underlying RDF environment. Very specifically: say my implementation fails on some nasty Unicode or IRI escaping issue; what should I do? Re-implement the underlying RDF environment? Obviously not. The only thing I could do is to hope that that environment will, eventually, take care of the issue. However, in the meantime, my implementation will not pass the full CR test suite for this Working Group specification: we shoot into our own foot. I would therefore propose to separate the URDNA necessary tests from the others. The non-necessary tests should be marked as such and should be explicitly flagged as non-required for the CR exit criteria for this specification. Furthermore, these tests should be submitted to the WG that maintains the RDF tests suites, because they are of a general value. Cc @pchampin |
Understood and agree. Making tests optional as needed or moving them to other projects sounds great. From an implementor and interop view, it would be good to make it easy to find and use related test suites. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that some of this may be affected by potential changes to N-Quads canonicalization.
When I ran the current test suite by an experimental update, I found that this (the original version) was the only test that was affected.
+1 I'd also break down the test into smaller bits testing different aspects of escaping. Otherwise, finding a specific issue can be a pain. For instance, tests in IRI representation, datatyped literal representation, and the representation of escaped and non-escaped characters. Such tests should also go in to the N-Quads test suite eventually.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we not modify test0060, other than the anticipated results considering upcoming changes in canonicalization. Instead, we should create new tests. That said, N-Quads parsing and serialization tests probably belong with N-Quads tests (currently here.
As for what should be tested, note that use of UCHAR escapes aside, the IRI grammar still has a narrow range of ascii characters it that are part of isegment
/ipchar
:
isegment = *ipchar
ipchar = iunreserved / pct-encoded / sub-delims / ":"
/ "@"
iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar
ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
/ %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
/ %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
/ %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
/ %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
/ %xD0000-DFFFD / %xE1000-EFFFD
In the ASCII range, this is really just ALPHA / DIGIT / "-" / "." / "_" / "~" /
. Testing UCHAR versions of those characters, or higher Unicode characters is fine, but we can't through in characters (other than explicitly %-encoded) that are not allowed. A parser should replace UCHAR escapes with the unescaped Unicode characters. When canonicalizing, I don't think that any characters in uschar
would be represented by either ECHAR or UCHAR when canonicalizing.
Add generate.yml for GH Action to build and updatea PR for tests and reports when changed. * Use git-auto-commit-action. * Add permissions.
the previous phrasing gave the wrong impression of *defining* what it meant to be isomorphic
… .html output file.
* Adds serialization section Fixes #86. * Clarify that blank nodes are serialized using their canonical label. --------- Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com> Co-authored-by: Dan Yamamoto <dan@iij.ad.jp>
* Mark introduction as informative. * Mark Security Considerations and Privacy Considerations as informative. * No separator when concatenating; better punc & language --------- Co-authored-by: Gregg Kellogg <gregg@greggkellogg.net> Co-authored-by: Dan Yamamoto <dan@iij.ad.jp>
@davidlehn This should make the test use legal ranges. |
@yamdan and @davidlehn I've added comprehensive characters in literals from U+0000 through U+008F (the last don't really display). See if you think these are reasonable and we can merge this. |
@gkellogg Thank you for the comprehensive cases with tests for capitalizing HEX. It seems reasonable to me. |
Although C1 control characters could benefit from escaping, as you note, this is no commonly done, and I don't think there's quite the same problem with control characters making a visual representation of a string behave unreasonably, just not necessarily display properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but, of course, by it's nature it's hard to know for sure until implementations have been updated to run against it. Seems ok to merge to me at this time and we can sort out any trouble if we find it later. Thanks!
It would be great if DB would update their implementation and submit an implementation report. For now, it's just Ruby and @iherman's TypeScript implementations, both of which will need to update after this PR is merged. |
Notes:
Preview | Diff