Proposal: getHTML() for Document interface #10280

lukewarlow · 2024-04-17T08:29:38Z

What problem are you trying to solve?

There's currently no way to serialise a document object, that includes shadow roots.

What solutions exist today?

document.documentElement.outerHTML is the closest we have but it won't serialize shadow roots, and it wont serialize the DOCTYPE node either.

How would you solve it?

Define a new document.getHTML() function that would return a string such as <!DOCTYPE html><html><head><title>Title</title></head><body><p>Contents</p></body>

Anything else?

No response

The text was updated successfully, but these errors were encountered:

keithamus · 2024-04-17T09:08:12Z

/cc @mfreed7

annevk · 2024-04-17T14:20:21Z

It seems reasonable, but do we have use cases for this? I think we mainly added Document.parseHTMLUnsafe() because there's a bunch of existing usage of DOMParser for custom sanitizer purposes, but it's not clear they also need to (or do) serialize at a document level.

lukewarlow · 2024-04-17T14:42:06Z

One case that I can think of is a cobrowsing functionality where the current pages DOM is serialised sent across the wire and reconstructed on the other side. Though in practice those situations might actually use something like a tree walker to serialise to their own format.

lukewarlow · 2024-04-17T14:47:28Z

It would be a nice HTML equivalent of new XMLSerializer().serializeToString(document) I guess too? Not sure how much that is used.

mfreed7 · 2024-04-23T00:01:45Z

If there's a use case, this is relatively trivial to implement, so I'm supportive.

lukewarlow · 2024-04-23T08:43:51Z

@domenic you mentioned you might have a use case for this in jsdom. I know that isn't actually using the platform it's recreating it but the use case might transfer?

Edit: Here's the method https://github.com/jsdom/jsdom?tab=readme-ov-file#serializing-the-document-with-serialize

FND · 2024-05-21T20:11:46Z

In terms of use cases, people like myself who create/use local-only applications (AKA self-saving HTML documents) might need to reconstruct the current document's HTML in order to persist changed state. To that end, I typically use something like the following:

let doctype = new XMLSerializer().serializeToString(document.doctype);
let html = [doctype, document.documentElement.outerHTML].join("\n");

However, I typically do not want to persist whatever might have happened within the shadow DOM, as all relevant state should reside within the light DOM (think <li>s representing tasks, each of which might turn into an interactive widget at runtime - persisting the latter's interaction state would only complicate things, plus core content remains accessible even if JavaScript is unavailable for whatever reason).

Happy to elaborate if this is deemed relevant or interesting - though I might be slow to respond for the next week or so.

lukewarlow · 2024-05-22T08:56:37Z

https://pptr.dev/api/puppeteer.page.content this function in puppeteer is effectively doing what getHTML() would do.

It's using the XML serialiser for doctype + documentElement.outerHTML so swapping it to use this new getHTML() function would be easy and then puppeteer could be used for server rendering web components as one example use case. See https://github.com/puppeteer/puppeteer/blob/97a4951d52b95b4815db989d30e82a00f5dc3d2b/packages/puppeteer-core/src/api/Frame.ts#L719

One key bit that would be needed to swap that for this new document.getHTML() is that comment nodes outside the html element would need to be serialised too, like we would do for the doctype.

argyleink · 2024-05-22T15:40:57Z

VisBug, a more designer centric version of what Firebug was, has had a long outstanding feature request to save changes made via the extension. We've attempted things like outerHTML and then posting it to netlify (for example) but things were lossy with shadow DOM and a few other things. Users would also like a way to save the changes to a local file, which they could then send around to share their modifications with the team. I could see VisBug using this API to aid in saving page modifications.

myfonj · 2024-05-22T17:38:47Z

Archiving tools

For Example SingleFile has to do quite a bit of dark magic to preserve contents of the shadow DOM, and I guess they would be happy to switch to some native API instead.

By almost telepathic coincidence, lately I tried to make a "poor-man's SingleFile static snapshooter": simple bookmarklet that constructs a dataURI of displayed page stripping it from from all scripts, and injecting a base tag (for eventually loading images, styles, …) and (partially) uriEncoding the documentElement.outerHTML (https://myfonj.github.io/utils/bookmarklets/dataurize-page.html) as a quick exploration, with a vile idea to "backup" smaller documents into dataURI bookmarks (I carry loads of useful stuff this way).
N.b. it:

contains funny const doctype = document.compatMode == 'CSS1Compat' ? '<!doctype html>': ''; statement that frankly feels unnecessary (EDIT: now I see few comments above that it's possible to do simpler new XMLSerializer().serializeToString(document.doctype); what feels more robust), and
does not do any shadow DOM unwrapping magic.

So for example it works quite OK for this GH issue page, but it completely misses main content on Caniuse (since it is a custom element), for example.

lukewarlow added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Apr 17, 2024

keithamus added the topic: shadow Relates to shadow trees (as defined in DOM) label Apr 17, 2024

lukewarlow changed the title ~~Proposal: GetHTML() for Document interface~~ Proposal: getHTML() for Document interface Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: getHTML() for Document interface #10280

Proposal: getHTML() for Document interface #10280

lukewarlow commented Apr 17, 2024

keithamus commented Apr 17, 2024

annevk commented Apr 17, 2024

lukewarlow commented Apr 17, 2024

lukewarlow commented Apr 17, 2024

mfreed7 commented Apr 23, 2024

lukewarlow commented Apr 23, 2024 •

edited

Loading

FND commented May 21, 2024

lukewarlow commented May 22, 2024 •

edited

Loading

argyleink commented May 22, 2024

myfonj commented May 22, 2024 •

edited

Loading

Proposal: getHTML() for Document interface #10280

Proposal: getHTML() for Document interface #10280

Comments

lukewarlow commented Apr 17, 2024

What problem are you trying to solve?

What solutions exist today?

How would you solve it?

Anything else?

keithamus commented Apr 17, 2024

annevk commented Apr 17, 2024

lukewarlow commented Apr 17, 2024

lukewarlow commented Apr 17, 2024

mfreed7 commented Apr 23, 2024

lukewarlow commented Apr 23, 2024 • edited Loading

FND commented May 21, 2024

lukewarlow commented May 22, 2024 • edited Loading

argyleink commented May 22, 2024

myfonj commented May 22, 2024 • edited Loading

Archiving tools

lukewarlow commented Apr 23, 2024 •

edited

Loading

lukewarlow commented May 22, 2024 •

edited

Loading

myfonj commented May 22, 2024 •

edited

Loading