Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing the DOMParser API in Worker Contexts #11068

Open
sqwr opened this issue Feb 24, 2025 · 1 comment
Open

Exposing the DOMParser API in Worker Contexts #11068

sqwr opened this issue Feb 24, 2025 · 1 comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest

Comments

@sqwr
Copy link

sqwr commented Feb 24, 2025

What problem are you trying to solve?

Web workers are widespread on the Web. Service workers, in particular, have found usage in Manifest V3 extensions and serverless platforms like Cloudflare workers. In the scope of web security research, we have recently experimented with various HTML manipulation features within workers. Unlike web pages, that have a document object and the DOMParser API are exposed, workers do not have a DOM and are not exposed the DOMParser API. We resolved to test different third-party solutions, including pure JavaScript HTML parsers like JSDOM (which we could not port to the browser due to various system dependencies) and Cheerio. We also experimented with the lol_html crate provided by Cloudflare and ported it to workers’ contexts with Web Assembly (Wasm). Finally, we performed different performance measurements and found that the DOMParser outperforms the third-party solutions, i.e., Wasm and Cheerio, in all browsers. Based on these results, we would like to initiate a feature request to expose the DOMParser API in worker contexts. We would be glad to provide the results of the measurements we have performed to support our request better.

What solutions exist today?

Using third-party solutions like Cheerio, or Web Assembly with lol_html

How would you solve it?

Expose the DOMParser API to workers contexts: service workers, shared workers and dedicated workers

Anything else?

We are attaching the performance results. On the X axis, the numbers are the size of HTML files we collected from the top 100 Tranco list. The costs are in milliseconds on the Y axis. The experiments consisted of parsing the HTML, appending a script element to the DOM, and returning the resulting HTML. The costs presented are averages computed over 10 runs for each resource. The vertical bars are multiplicating factors (20x, 3x, 0.25x) we applied to have a better sense of the results in particular for the small resources.

domparser.pdf

@sqwr sqwr added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Feb 24, 2025
@domenic
Copy link
Member

domenic commented Feb 25, 2025

The problem with this is that it requires reimplementing the entire DOM in a worker context (since DOMParser returns a Document full of Nodes). And then figuring out what all the parts of the DOM should do in workers, even when they're not visual. (E.g. if you parse <canvas> you get a HTMLCanvasElement. Or if you call parsedDocument.body.getBoundingClientRect(), what should that return. Or reimplementing all of the style system for parsedLinkEl.sheet or parsedDiv.style.)

This is a huge amount of work for both a specification and an implementation, so it's very unlikely to ever happen. I'm tempted to just straightaway close this, but I supposed we have a lot of "needs implementer interest" proposals left open indefinitely, and one more doesn't hurt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest
Development

No branches or pull requests

2 participants