Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specific release .zip, or just readme text, for using v.Nu RELAX NG schemas in XML editors? #251

Closed
GrahamHannington opened this issue Feb 29, 2016 · 13 comments

Comments

@GrahamHannington
Copy link

I want to use the XHTML5 RELAX NG schema files from v.Nu with the jEdit validating XML editor.

The jEdit XML plugin and v.Nu both use Jing, so I figure this should be relatively straightforward.

What I've done

I've spent some time hitting my head against a brick wall trying to get Jing in jEdit to recognize the "http://whattf.org/datatype-draft" datatype library I'd transplanted from v.Nu, but a jEdit XML plugin developer has kindly come to my rescue, so I'm over that bump.

Now I'm trying to figure out the best method for transplanting the XHTML5 RELAX NG schema files into jEdit.

So far, I've considered two methods:

  1. Downloading the schema files from the GitHub repo source
  2. Extracting the schema files from the release .jar

Method 1: Downloading the schema files from the GitHub repo source

I downloaded the RELAX NG schema files from the schema directory of the v.Nu GitHub repo.

I noted that the .drivers/xhtml5.rnc file referred to the following nonexistent file:

include "html5/xhtml5full-xhtml.rnc"

but I decided to worry about that later, and point the jEdit XML plugin instead to html5/xhtml5.rnc as the schema.

That works - jEdit performs as-you-type validation of XHTML5 documents - but only up to a point. The schema does not encompass the full set of XHTML5 markup. For example, it doesn't define role attributes.

That RELAX NG include directive cited above for a file path that was missing from the source was already a clue that just downloading the "pre-build" schema files from the repo source was probably the wrong approach, so I went looking in the release (built) vnu.jar...

Method 2: Extracting the schema files from the release .jar

With apologies if you already know this, but just to establish the context of this method: the release .jar contains the schema files in a nu/validator/localentities/files/ directory, "flattened" into a single directory, and without file extensions. Relative directory paths and file extensions have been subsumed into extension-less file names, like this:

schema_html5_xhtml5full_xhtml_rnc

with an entitymap file in the same (files) directory that maps those extension-less file names to URLs, like this:

http://s.validator.nu/html5/xhtml5full-xhtml.rnc    schema_html5_xhtml5full_xhtml_rnc

I plan to do some regex-based replacement on the lines in that entitymap to turn it into a batch (Windows .bat) file of copy commands that will re-create the original relative paths and file extensions. That way, I think I'll be able to present jEdit with a set of schema files that accurately reflects what the built v.Nu uses, rather than attempting to work with the "pre-build" source files. I imagine that if I spent a little more time looking at the source files and the build process, I could probably figure out what parts I needed to add, or change, but that strikes me as potentially wasted effort.

What do I want?

I want a programmatically repeatable method for getting the latest schema files from v.Nu into jEdit with the fewest steps. Both methods I've considered involve steps that might, due to my ignorance, be unnecessary: for example, it makes some kind of sense to me to get the "built" schema files exactly as v.Nu uses them, but I don't really want to have to unflatten those file names in the release .jar.

It occurs to me that I might not be alone in wanting to integrate the (XHTML5 and other) RELAX NG schemas in v.Nu with other applications such as validating XML editors; in particular, editors such as jEdit that support the Java interface for RELAX NG datatype libraries.

And I imagine others have already already done this; if not for jEdit, then for other editors or tools. But I've not found any written accounts. Which is why I characterized this issue as a request for a specific release .zip, or even just some additional readme text; although, I'd be happy just to get some advice on the best method. For example, download - and augment? - the schema source files, or extract and rename the schema files in the release .jar?

@sideshowbarker
Copy link
Contributor

I understand what you want and why but I don’t plan to provide myself from this repo/project a .zip release with the relaxng files, nor a readme on how to use the files outside the context of the checker.

I don’t mean to sound dismissive but that use case is not something I personally want to directly support—for example, I don’t want to have to respond to bug reports or requests for help about it.

But if you or somebody else would be interested in separately distributing on your own some version of the relaxng files for use in jEdit and emacs and such, I would be willing to provide some initial help as far as responding to questions around assembling it all.

But even then it would frankly take me a lot of time myself to go back and look at the structure of it all an figure out how to make a standalone set of static files for it (rather then the generated files the checker itself uses)—and I’m not really sure it would take me any less time to do that than it would you or somebody else who’s determined and who actually has real need/motivation to do it (which I don’t).

@GrahamHannington
Copy link
Author

With no irony: I sincerely appreciate your candour.

The following statement in your reply is valuable information to me, and motivates me to do this myself:

it would frankly take me a lot of time myself to go back and look at the structure of it all an figure out how to make a standalone set of static files

I'm leery of saying "I'll do it," because I've made similar promises (mostly to myself) in the past, and paid work has taken precedence. But I don't think it will take me that long.

I mostly live under a rock made of proprietary schemas, so my perspective is skewed, but it seems to me that v.Nu is carrying the torch for validation of the XML serialization of HTML. (I was going to write "the HTML Living Standard", but I'm slowly catching on to the politics between W3C and WHATWG.) There are a few other players, but v.Nu is the current benchmark. Please keep up the good work.

@GrahamHannington
Copy link
Author

This issue is closed, and I'm happy with that, but I thought I'd report on what I've done, and ask a question. Feel free to refer me to some other channel of correspondence.

What I've done

I've developed a Windows PowerShell script (you're allowed to smile) that extracts the *_rnc files from the nu\validator\localentities\files directory in vnu.jar, and renames and copies them into a directory structure that corresponds to the HTTP URLs in the entitymap. I'm using those files - that is, an extracted copy of the same RELAX NG schema files that v.Nu uses, in the same directory structure - to validate XHTML5 documents in jEdit.

I'm considering forking the schema directory from this v.Nu repo, but before I do that, I need to get the time to understand and extract the subset of code from build.py that relates to the schema files, and customize it to generate what I have now (via that PowerShell script). That way, I can pull updates to the source schema files from this repo, rather than extracting the distribution ("built") schema files from vnu.jar. That appeals to me, but I can see arguments for each approach.

Who maintains the "master copy" of the schema files for XHTML5, and where?

I think the answer is you (@sideshowbarker) and here (in this repo).

What do I mean by "master copy"?

I can see that you commit updates to the .rnc files in this repo.

Do you apply those changes based on updates made to some other schema files, maintained in some other public (or private?) location, or is this repo the primary public source for the latest XHTML5 schema? That is, do you apply changes to these schema files based directly on changes to the HTML specification, or based on changes to schema files maintained somewhere else, by someone else?

I understand that anyone is free to follow the development of HTML, and maintain their own schema. I guess what I want to know is: are you - via this repo - effectively the "wellspring", the "keeper" of the XML serialization of HTML as it continues to evolve, or are you a secondary source?

@GrahamHannington
Copy link
Author

Another "progress report", for what it's worth...

Extracting the schemas from vnu.jar and comparing them with the source schema files in this repo was a useful exercise - and it gave me a set of schema files on my local file system that I could use, in exactly the same structure as used by vnu.jar - but it seemed like too much work, and inelegant. I didn't really want to be repeating that extraction process.

After spending more time looking at the related code in build.py, and having had the idea to leverage github.io URLs, I'm now using the following RELAX NG compact syntax schema (.rnc) file with jEdit to validate XHTML5:

include "http://validator.github.io/validator/schema/html5/xhtml5.rnc"
include "http://validator.github.io/validator/schema/html5/aria.rnc"
# Cannot retrieve the following file via github.io from .drivers directory,
# I think because of the leading period in the directory name,
# even when escaped as %2E (does the period get "normalized away"?).
# Happily, I personally don't want to validate legacy markup.
# If you do, then download a copy of legacy.rnc to the same directory
# as this file, and refer to it via a local URI (include "legacy.rnc")
# include "http://validator.github.io/validator/schema/.drivers/legacy.rnc"
include "http://validator.github.io/validator/schema/html5/microdata.rnc"
include "http://validator.github.io/validator/schema/html5/web-components.rnc"

This works for me. (But it also stirs a memory of W3C's excessive DTD traffic.)

The only reason I have any local schema file at all is because the aria.rnc, microdata.rnc, and web-components.rnc files are not included by the repo source file schema/html5/xhtml5.rnc.

(All of these files are included by the distributed file schema/html5/xhtml5full-xhtml.rnc, but that file is created by build.py; it is not present in the source repo.)

Are you feeling generous?

I know your view on supporting my particular use case, but if you (@sideshowbarker) felt like it, you could save me from having to keep any local schema file at all, by adding the following lines to schema/html5/xhtml5.rnc:

include "aria.rnc"
include "microdata.rnc"
include "web-components.rnc"

(I don't much care about legacy.rnc)

then I could simply point jEdit at:

http://validator.github.io/validator/schema/html5/xhtml5.rnc

(I'd still be missing SVG, but I'd could live with creating local schema files for "XHTML5 + SVG" et al.)

If you haven't responded to this comment in the next few days, I'll consider raising a separate issue for this: whether or not you want to directly support my particular use case, I don't see why those files are not included - in particular, aria.rnc - especially since build.py includes them in the schemaDriver... variable assignments that it uses to build the distributed files.

The two xhtml5.rnc files in the repo source

I notice that there are two xhtml5.rnc files, with different contents, in this repo. One is distributed in vnu.jar, the other is not.

I'm asking you to add some lines to the source file schema/html5/xhtml5.rnc. I believe this will have no effect whatsoever on the distributed files (in vnu.jar), because, as far as I can tell, this file is not distributed, and is not used as the base for any distributed file.

I can see that the separate source file schema/.drivers/xhtml5.rnc is distributed (as schema/xhtml5.rnc).

@sideshowbarker
Copy link
Contributor

I know your view on supporting my particular use case, but if you (@sideshowbarker) felt like it, you could save me from having to keep any local schema file at all, by adding the following lines to schema/html5/xhtml5.rnc:

include "aria.rnc"
include "microdata.rnc"
include "web-components.rnc"

Done in c946eac

Sorry for not having responded yet to your message from a couple days back. Will re-read now and try to respond to any open questions in your recent comments here.

@sideshowbarker
Copy link
Contributor

is this repo the primary public source for the latest XHTML5 schema?

Yes. It’s actually the only canonical source for the XHTML5 schema.

But as I alluded to in previous comments, it’s not the goal for it to be that, but instead basically just a consequence of the fact that the schema is a necessary component of the code for the HTML checker under its current architecture.

That is, do you apply changes to these schema files based directly on changes to the HTML specification

Yes.

I guess what I want to know is: are you - via this repo - effectively the "wellspring", the "keeper" of the XML serialization of HTML as it continues to evolve,

Yes

or are you a secondary source?

There is not other source I know of.

But all that said, it’s important to note that the schema is not official in any way. It is not a standard. The only standard for HTML is the spec itself, and the only standard formalism for HTML is the prose of the the HTML Standard. The schema is an implementation (and of just a part of it).

@sideshowbarker
Copy link
Contributor

Regarding legacy.rnc, it’s not intended to be useful outside the context of the checker service itself, so there should never be any need for anybody to ever use a copy of it anywhere else in any other context.

@sideshowbarker
Copy link
Contributor

After re-reading through the comments here, I think there aren’t any open questions I’ve not responded to yet—but if I missed anything, please lemme know.

@GrahamHannington
Copy link
Author

Thank you for your answers. Especially this:

The only standard for HTML is the spec itself, and the only standard formalism for HTML is the prose of the the HTML Standard.

👍 👍 Crystal clear and quoteworthy.

@GrahamHannington
Copy link
Author

@sideshowbarker, here's another "progress report"...

Regarding your earlier comment:

I don’t plan to provide myself from this repo/project ... a readme on how to use the files outside the context of the checker. ...

I've made a start on that readme in a new GitHub repository under a new GitHub organization that I have created specifically for this purpose:

https://github.com/unsoup/validator

Right now, that repo is just a readme describing what I've done so far: a mashup of my correspondence with you and with the jEdit plugin developer.

@sideshowbarker
Copy link
Contributor

I've made a start on that readme in a new GitHub repository under a new GitHub organization that I have created specifically for this purpose:

https://github.com/unsoup/validator

I would be happy to include a link to that in the project docs here somewhere—so feel free to either open a PR with a suggested addition, or I will get something added the next time I need to make another update of some kind to the docs.

@GrahamHannington
Copy link
Author

Re:

I would be happy to include a link to that...

Thank you @sideshowbarker, that's very kind of you. I'll work on a suggested addition (but if you get there first, please go ahead).

Re:

https://github.com/unsoup/validator

I've edited the readme, adding a more concise explanation for the existence of the repo, and pushing the (tl;dr?) history to the bottom.

I've also added the first real content to the repo: a "shim schema" (shchema, pronounced "shkeema") for XHTML5 + MathML + SVG + RDFa, as a workaround for the fact that the corresponding schema in the v.Nu repo refers to paths that exist only in the distributed packages (that is, inside vnu.jar).

@GrahamHannington
Copy link
Author

@sideshowbarker,

FYI, another progress report (I'm not going to keep adding such comments to this issue forever, but I thought you might be interested in what I've just done).

Regarding your comment:

how to make a standalone set of static files for it (rather then the generated files the checker itself uses)

I've added to the unsoup/validator GitHub repo a Windows PowerShell script that automates the extraction of the schemas from the latest vnu.jar into a directory structure that corresponds to the URIs used by v.Nu:

https://github.com/unsoup/validator/blob/gh-pages/tools/Get-vnu-Schemas.ps1

The script also copies the main license file from the root directory of the v.Nu repo into the root directory of the output, and the more specific license file from the schema/html5 directory of the v.Nu repo to the html5 directory of the output. This seemed to me the polite, prudent, and legally correct thing to do. Please let me know if you think I should do something different in this regard.

I've also added to unsoup/validator the results of running that script:

https://github.com/unsoup/validator/tree/gh-pages/schema-release

In brief: I've created a standalone set of static files from the generated files the checker itself uses.

tripu pushed a commit to tripu/validator that referenced this issue Aug 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants