Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have ReSpec run in JSDOM, stop using RespecDocWriter #106

Merged
merged 2 commits into from May 15, 2018
Merged

Conversation

tidoust
Copy link
Member

@tidoust tidoust commented May 15, 2018

Reffy needs to run ReSpec on Editor's Drafts. It relied on RespecDocWriter until now, which uses Puppeeter, which in turn relies on a headless version of Chrome. That requires downloading 100s of MB of dependencies and is a tad heavy for the task at hands, especially as we consider running Reffy as an HTTP service.

With this update, RespecDocWriter is no longer being used. Instead, Reffy makes ReSpec run in JSDOM, which is much more lightweight (and JS-based).

Problem is that JSDOM does not yet support enough features to run ReSpec. Concrete issues are documented in:
w3c/respec#1469 (comment)

This update monkey-patches JSDOM's code to add good-enough implementations of missing features, as well as to regain control over network requests so that all network requests go through our cache. It also monkey-patches ReSpec to:

  1. drop a couple of modules that cannot run because they use non-implemented features of JSDOM that cannot be easily patched
  2. update the code of a couple of ReSpec dependencies for the same reason

The whole thing remains fragile: updates to ReSpec or to its dependencies may well break the regular expressions used to monkey patch the code. Most of the monkey-patching should resist minor changes. A couple of them may not though. I suggest we try this for some time and see whether that proves too hard to maintain.

Reffy needs to run ReSpec on Editor's Drafts. It relied on RespecDocWriter
until now, which uses Puppeeter, which in turn relies on a headless version
of Chrome. That requires downloading 100s of MB of dependencies and is a tad
heavy for the task at hands, especially as we consider running Reffy as an
HTTP service.

With this update, RespecDocWriter is no longer being used. Instead, Reffy makes
ReSpec run in JSDOM, which is much more lightweight (and JS-based).

Problem is that JSDOM does not yet support enough features to run ReSpec.
Concrete issues are documented in:
w3c/respec#1469 (comment)

This update monkey-patches JSDOM's code to add good-enough implementations of
missing features, as well as to regain control over network requests so that
all network requests go through our cache. It also monkey-patches ReSpec to:

1. drop a couple of modules that cannot run because they use non-implemented
features of JSDOM that cannot be easily patched
2. update the code of a couple of ReSpec dependencies for the same reason

The whole thing remains fragile: updates to ReSpec or to its dependencies may
well break the regular expressions used to monkey patch the code. Most of the
monkey-patching should resist minor changes. A couple of them may not though.
* The whole thing is PRETTY UGLY.
*
* NB: This code WILL LIKELY BREAK when switching to new versions of JSDOM or
* ReSpec.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a simple test we can build to detect that early?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For JSDOM, we control the switch to a new version, so running a crawl locally before pushing the version bump could be good-enough (but it would certainly be better to write a few test cases!).

For ReSpec, without test cases, it's a bit more difficult, because the latest version of ReSpec is used "on the fly". We could force the use of a specific version of ReSpec to make sure updates are explicit and remain under our control.

// so let's pretend they are just @media rules
// https://github.com/jsdom/jsdom/issues/2026
// (NB: this replacement is just for convenience, to avoid JSDOM reporting
// lengthy errors (including a full dump of the CSS) to stderr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in practice we're not interested in CSS parsing - cf jsdom/jsdom#2005

// HTML (with <pre> and <p> intertwined). For some reason, this does
// not bother regular browsers. It does bother JSDOM though.
data = data.replace(/r\.createTextNode\("\\n"\)/, 'r.createTextNode("\\n\\n")');

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe drop r. to make less fragile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there's another call to createTextNode("\n") somewhere else in the code which would be modified as well. I'll stick with that code for now, but will investigate upstreaming that change to ReSpec (after all, the HTML produced is wrong in regular browsers as well, it just happens not to have any bad consequence there)

reffy.js Outdated
.catch(err => console.error(err));
return promise.then(_ => console.log('-- THE END -- '))
.catch(err => {
console.error('-- ERROR CAUCHT --');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/UCH/UGH/

const {window} = new JSDOM(html, {
url: responseUrl,
resources: 'usable',
runScripts: 'dangerously',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thrilling!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, any interaction with dom is potentially dangerous, I've always said that :)

Copy link
Member

@dontcallmedom dontcallmedom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of nits, but looks good (as in "awfully ugly")

@tidoust tidoust merged commit 9c5160c into master May 15, 2018
@tidoust tidoust deleted the drop-dep-respec branch May 15, 2018 10:24
// needs them around <pre> tags, otherwise it produces really weird
// HTML (with <pre> and <p> intertwined). For some reason, this does
// not bother regular browsers. It does bother JSDOM though.
data = data.replace(/r\.createTextNode\("\\n"\)/, 'r.createTextNode("\\n\\n")');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tidoust We would be interested in fixing this in jsdom (or potentially in parse5). Do you have a sample of the HTML which behaves differently between jsdom and other browsers?

Your list in w3c/respec#1469 (comment) is very nice as well. It'd be great if you could file new issues if you find something not covered by the existing ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Zirro! I don't think there is any bug in jsdom, actually. What I get from jsdom is pretty much what I expect to get (wrong markup in, wrong markup out). The output I do not understand is the one in regular browsers... but I suspect that the wrong markup gets fixed by chance by ReSpec afterwards when it runs the core/highlight module (module which cannot run in jsdom). I ran out of time on this issue, so did not investigate further, but the real bug is in marked.js who produces wrong markup, not in jsdom.

I didn't file any issue in jsdom from the list because all of the current missing features either appear as real issues already, or are explicit pointed out as missing in code comments or commit messages. So that all seemed well-known already. Reffy now runs ReSpec in JSDOM (with the monkey-patching above), so we'll keep an eye on potential new problems that we may find!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants