-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: getHTML() for Document interface #10280
Comments
/cc @mfreed7 |
It seems reasonable, but do we have use cases for this? I think we mainly added |
One case that I can think of is a cobrowsing functionality where the current pages DOM is serialised sent across the wire and reconstructed on the other side. Though in practice those situations might actually use something like a tree walker to serialise to their own format. |
It would be a nice HTML equivalent of |
If there's a use case, this is relatively trivial to implement, so I'm supportive. |
@domenic you mentioned you might have a use case for this in jsdom. I know that isn't actually using the platform it's recreating it but the use case might transfer? Edit: Here's the method https://github.com/jsdom/jsdom?tab=readme-ov-file#serializing-the-document-with-serialize |
In terms of use cases, people like myself who create/use local-only applications (AKA self-saving HTML documents) might need to reconstruct the current document's HTML in order to persist changed state. To that end, I typically use something like the following: let doctype = new XMLSerializer().serializeToString(document.doctype);
let html = [doctype, document.documentElement.outerHTML].join("\n"); However, I typically do not want to persist whatever might have happened within the shadow DOM, as all relevant state should reside within the light DOM (think Happy to elaborate if this is deemed relevant or interesting - though I might be slow to respond for the next week or so. |
https://pptr.dev/api/puppeteer.page.content this function in puppeteer is effectively doing what getHTML() would do. It's using the XML serialiser for doctype + documentElement.outerHTML so swapping it to use this new getHTML() function would be easy and then puppeteer could be used for server rendering web components as one example use case. See https://github.com/puppeteer/puppeteer/blob/97a4951d52b95b4815db989d30e82a00f5dc3d2b/packages/puppeteer-core/src/api/Frame.ts#L719 One key bit that would be needed to swap that for this new document.getHTML() is that comment nodes outside the html element would need to be serialised too, like we would do for the doctype. |
VisBug, a more designer centric version of what Firebug was, has had a long outstanding feature request to save changes made via the extension. We've attempted things like outerHTML and then posting it to netlify (for example) but things were lossy with shadow DOM and a few other things. Users would also like a way to save the changes to a local file, which they could then send around to share their modifications with the team. I could see VisBug using this API to aid in saving page modifications. |
Archiving toolsFor Example SingleFile has to do quite a bit of dark magic to preserve contents of the shadow DOM, and I guess they would be happy to switch to some native API instead. By almost telepathic coincidence, lately I tried to make a "poor-man's SingleFile static snapshooter": simple bookmarklet that constructs a dataURI of displayed page stripping it from from all scripts, and injecting a base tag (for eventually loading images, styles, …) and (partially) uriEncoding the
So for example it works quite OK for this GH issue page, but it completely misses main content on Caniuse (since it is a custom element), for example. |
What problem are you trying to solve?
There's currently no way to serialise a document object, that includes shadow roots.
What solutions exist today?
document.documentElement.outerHTML
is the closest we have but it won't serialize shadow roots, and it wont serialize the DOCTYPE node either.How would you solve it?
Define a new document.getHTML() function that would return a string such as
<!DOCTYPE html><html><head><title>Title</title></head><body><p>Contents</p></body>
Anything else?
No response
The text was updated successfully, but these errors were encountered: