New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for a unified validation reporting language #15

Open
gimsieke opened this Issue Jun 12, 2018 · 12 comments

Comments

Projects
None yet
6 participants
@gimsieke
Copy link

gimsieke commented Jun 12, 2018

From Mathieu’s email 2017-12-10 message:

Hi all,

In my company we need an XML generic format to get errors and warning from a multiple validation (schema + schematron).
We’ve been looking at existing languages like SVRL, PSVI, XSV, saxon report language but none of them exactly matches our need.
At the beginning, we thought SVRL was the best candidate, but we found it’s actually too much tied with schematron validation only.

So we finally create an internal grammar which we call XVRL (stands for “XML Validation Report Language”).

We discussed in Amsterdam meetup about a unifying report ports of validation steps in xproc as describe here : xproc/1.0-specification#135 and we thought XVRL could help in this direction ?
Please find as attach file a sample and the grammar (both Relax NG and XML Schema).
If you find this is an interesting candidate I guess we could add in on github with an open source licence.

Any comments welcome about the format itself and about its use with xproc ?

Best regards

Matthieu Ricaud.

PS : do you think adding this proposal to issue xproc/1.0-specification#135 makes sens ? or any where else (xml-dev list?)
PPS : We also made an XVRL to JSON conversion using the xpath 3.0 xml-to-json() function

reporting.zip

@eriksiegel

This comment has been minimized.

Copy link
Contributor

eriksiegel commented Sep 10, 2018

Matthieu writes to me about this 20181006:

Let me explain the reason why I go to this XVRL format:

I need an RNG reporting file for my project. At first I thought about SVRL (I don't like re-inventing the rules especially when it's an standard, iso one by the way).
But looking more precisely at SVRL I realized it was really tied to Schematron validation:

  • An SVRL file starts with svrl:schematron-output
  • for each sch:rule in the schematron, it generates an <svrl:fired-rule context="...">
  • Within a rule, for each assert that failed, SVRL generates an <svrl:failed-assert test="…" location="…">
  • The same with svrl:successful-report
  • The message can also reference a diagnostic
  • there is also a reference to the phase with svrl:active-pattern

A schema like RNG doesn’t use assertion or report, it only describe the structure. If the XML is not valid against this structure then the schema processor will generate an error.
This error may be interpreted differently from a processor or another : is the attribute foo missing or the name of the element is not good ?
It’s completely different from what schematron do and why SVRL was design for.

That’s why I go to a more generic error format.
As a state of art, I had a look to :
2) XSV
Example :report.xsv.xml , report.xsv.html
3) PSVI
Schema :https://www.w3.org/2001/05/PSVInfoset.xsd
Example : http://www.ukoln.ac.uk/metadata/dcmi/dcxml/psvi/psvi4-3.xml
4) Saxon report for XSD validation
cf. https://www.saxonica.com/documentation/index.html#!functions/saxon/validate
saxon.validation.report.xml

  1. Other
    Cf. https://stackoverflow.com/questions/39974143/validate-xml-with-schema-and-get-validation-errors-in-xml

At the end I found the easiest way for my goal was to create this XVRL format.

XVRL was just a proposal, I invented quickly this syntax because of our need in my company. It can be improved, rename, delivered as open source, versionned etc.
I can do that my company will really probably agree with this.

@eriksiegel

This comment has been minimized.

Copy link
Contributor

eriksiegel commented Sep 10, 2018

I think his reasoning is sound. I've looked at the other formats (when possible, the first was behind a login) and there's nothing that completely fits.

The only problem is that his proposal has no status whatsoever. But if we invented something ourselves the same problem would occur.

So I suggest:

  1. Ask Matthieu to open source his proposal and put it on GitHub
  2. Reference this format in our step descriptions

The only thing I don't know is if this (referencing"just" some standard published on GitHub) is allowed given our W3C connection?

Thoughts?

@gimsieke

This comment has been minimized.

Copy link
Author

gimsieke commented Sep 10, 2018

I agree that SVRL is too much focused on Schematron. However, there is nothing in XSD/RNG/DTD validation outputs that couldn’t be squeezed into SVRL.

A couple of things that we rely on in SVRL are lacking in XVRL:

  • Messages need to be able to contain markup. We use this for HTML hyperlinks, tables or lists in the message text, and also for transporting formalized, non-free-text message metadata for which there is no SVRL attribute like @role. Examples include the ubiquitous @srcpath attribute which is a location identifier that will be kept across multiple conversion steps (for ex. <span class="srcpath">file:/C:/cygwin/home/gerrit/Springer/docx2app-git/test_after/Drews_334495_1_En/M_0_004.docx.tmp/word/document.xml?xpath=/w:document[1]/w:body[1]/w:p[6]/w:r[12]</span>) or classification (for ex. <span class="category">Typesetting</span>, with span as the SVRL span element, not the HTML one.
  • Instead of @role which is often used to transport severity information, we’d prefer a severity attribute with a fixed vocabulary (info, warning, error, fatal-error).
  • In addition to arbitrary markup in the messages (maybe only in other namespaces than XVRL), people should be allowed to use arbitrary attributes. They probably need to be in other namespaces, too. For transpect, we can put the srcpath in such a custom attribute, @tr:srcpath. Some attributes will pertain to individual localized messages, but most of them will relate to what is called report in XVRL.
  • It might make sense to standardize some of the custom metadata fields that we use right now, like srcpath, category and family, where family is a name for what is now the validation-report element. We called it family because phase was already taken. Maybe purpose or kind will also be ok for this. The idea is to attach a name to the validation in order to discern the validation report of an intermediate XML format from a validation of an EPUB OPF document, for example. The schema element in the metadata for each validation-report should already provide such a unique name, but only as a file name (system identifier). It might not be desciptive enough. Also, two schemas with different system identifiers may implement the same validation family.
  • The family is meant for grouping different messages in human-readable reports. Another view groups the validation messages by category or aspect, such as “Typography” or “Style name conventions”. In order to be able to provide these alternative groupings for users, the category names must be localizable. Therefore they may not be just attributes on report. report elements should rather have category children, where category may carry @xml:lang attributes. There may be multiple categories per report and language.
  • It is probably a good idea to split the message into the main message and supplementary information (for which Schematron’s diagnostic was originally intended, until it became used for L10N only). Supplementary information may hold the aforementioned tables, lists, or links to documentation. See screenshot below for an example.
  • Whether the element names validation-report, report, and message are most intuitive remains to be discussed.
  • We should check whether all SVRL peculiarities (phase, fired rule and its context, …) can be accommodated. There should be an XSLT transformation from SVRL to XVRL. SVRL is
  • It is desirable to be able to have a summary after metadata below the top-level element. It lists the most severe severity, the number of distinct messages (reports in the current XVRL sense) for each severity and the total number of messages for each severity.
  • If some standardization body (maybe the XProc CG) published the spec, there should be another namespace URI than http://www.lefebvre-sarrut.eu/ns/els/xvrl (something with xproc.org in it).
  • Since some schema validators are unable to report the error path, the path attribute should be optional.

I propose that we as the XProc CG publish a modified spec (and give Matthieu due credit for the original spec). We can call it XVRL or GVRL (G for generalized).

supplementary_message

@eriksiegel

This comment has been minimized.

Copy link
Contributor

eriksiegel commented Sep 11, 2018

Ehh, ok... Given the rather big list up there, is making such a spec something you or Le-tex can do? It wouldn't make sense for me to be just a scribe for somebody else's strong ideas about some subject...

And of course it can't wait very long (a few months?) to give the implementors enough time.

@gimsieke

This comment has been minimized.

Copy link
Author

gimsieke commented Sep 11, 2018

Yes, I can write the spec, a Relax NG schema and the SVRL→XVRL XSLT

@eriksiegel eriksiegel assigned gimsieke and unassigned eriksiegel Sep 11, 2018

@mricaud

This comment has been minimized.

Copy link

mricaud commented Sep 19, 2018

Hi all

Thanks for the reporting and improvments!
I like your proposal Gerrit, makes sens, and yes, feel free to make a new spec, xvrl was just a first attempt.

Maybe the new namespace should not ne bound to xproc in case one like to use it in another context ? but which organization then ? Well maybe it's easier to bound it to xproc after all !

Most of the validation engine use a kind a dictionnary to display error messages. Like i18n, it uses some id for "part of sentences". Don't know if it's a good idea to represent it in the new format ?
Exemple : https://github.com/IDPF/epubcheck/tree/master/src/main/resources/com/thaiopensource
This is specific to the schema language and maybe it should be computed before the final XVRL ?

If you need any help just tell me !

Cheers

@ndw ndw transferred this issue from xproc/3.0-specification Nov 1, 2018

@AndrewSales

This comment has been minimized.

Copy link

AndrewSales commented Jan 31, 2019

Hello,
@gimsieke directed me here and after consulting @sgmlguru and @Gertone also, would like to offer to pitch in.
For reference, we also did something along these lines (but inadequate for these purposes) a while back.
Not wishing to tread on any toes, @gimsieke, @mricaud - but I stand ready :)

@gimsieke

This comment has been minimized.

Copy link
Author

gimsieke commented Jan 31, 2019

Hi Andrew, by all means, please join us here. I will create a draft Schema (starting with a preliminary RNC that Norm created a couple weeks ago) in another repo over the weekend. Then we can discuss on, before, and after Thursday’s unconference track whether the proposed model seems adequate and what else people might need in “XVRL”.

@ndw

This comment has been minimized.

Copy link
Collaborator

ndw commented Jan 31, 2019

Yes, please, @AndrewSales
I'd welcome a coherent proposal. I've scratched at it a bit, but haven't produced anything I'm confident about.

@ndw

This comment has been minimized.

Copy link
Collaborator

ndw commented Feb 6, 2019

@gimsieke

This comment has been minimized.

Copy link
Author

gimsieke commented Feb 12, 2019

@hrennau suggested that we also consider supporting SHACL validation (there’s issue #8 that already mentions SHACL), and then there needs to be a bidirectional mapping (if not an outright identity for overlapping areas covered) between the XVRL vocabulary and the SHACL Validation Report Vocabulary. The serialization format is then probably everything that an RDF graph can be serialized as.

@Gertone

This comment has been minimized.

Copy link

Gertone commented Feb 12, 2019

There is also SHEX as an alternative to SHACL. Implementations are not at the level yet that everyone is extremely happy using them, so it might be pretty early days to implement them as a step. But worth looking into of course. But if we do so, we need to give whatever comes out of https://tools.ietf.org/pdf/draft-handrews-json-schema-00.pdf at least the same attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment