Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: getHTML() for Document interface #10280

Open
lukewarlow opened this issue Apr 17, 2024 · 10 comments
Open

Proposal: getHTML() for Document interface #10280

lukewarlow opened this issue Apr 17, 2024 · 10 comments
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: shadow Relates to shadow trees (as defined in DOM)

Comments

@lukewarlow
Copy link
Member

What problem are you trying to solve?

There's currently no way to serialise a document object, that includes shadow roots.

What solutions exist today?

document.documentElement.outerHTML is the closest we have but it won't serialize shadow roots, and it wont serialize the DOCTYPE node either.

How would you solve it?

Define a new document.getHTML() function that would return a string such as <!DOCTYPE html><html><head><title>Title</title></head><body><p>Contents</p></body>

Anything else?

No response

@lukewarlow lukewarlow added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Apr 17, 2024
@keithamus keithamus added the topic: shadow Relates to shadow trees (as defined in DOM) label Apr 17, 2024
@keithamus
Copy link
Contributor

/cc @mfreed7

@lukewarlow lukewarlow changed the title Proposal: GetHTML() for Document interface Proposal: getHTML() for Document interface Apr 17, 2024
@annevk
Copy link
Member

annevk commented Apr 17, 2024

It seems reasonable, but do we have use cases for this? I think we mainly added Document.parseHTMLUnsafe() because there's a bunch of existing usage of DOMParser for custom sanitizer purposes, but it's not clear they also need to (or do) serialize at a document level.

@lukewarlow
Copy link
Member Author

One case that I can think of is a cobrowsing functionality where the current pages DOM is serialised sent across the wire and reconstructed on the other side. Though in practice those situations might actually use something like a tree walker to serialise to their own format.

@lukewarlow
Copy link
Member Author

It would be a nice HTML equivalent of new XMLSerializer().serializeToString(document) I guess too? Not sure how much that is used.

@mfreed7
Copy link
Contributor

mfreed7 commented Apr 23, 2024

If there's a use case, this is relatively trivial to implement, so I'm supportive.

@lukewarlow
Copy link
Member Author

lukewarlow commented Apr 23, 2024

@domenic you mentioned you might have a use case for this in jsdom. I know that isn't actually using the platform it's recreating it but the use case might transfer?

Edit: Here's the method https://github.com/jsdom/jsdom?tab=readme-ov-file#serializing-the-document-with-serialize

@FND
Copy link

FND commented May 21, 2024

In terms of use cases, people like myself who create/use local-only applications (AKA self-saving HTML documents) might need to reconstruct the current document's HTML in order to persist changed state. To that end, I typically use something like the following:

let doctype = new XMLSerializer().serializeToString(document.doctype);
let html = [doctype, document.documentElement.outerHTML].join("\n");

However, I typically do not want to persist whatever might have happened within the shadow DOM, as all relevant state should reside within the light DOM (think <li>s representing tasks, each of which might turn into an interactive widget at runtime - persisting the latter's interaction state would only complicate things, plus core content remains accessible even if JavaScript is unavailable for whatever reason).

Happy to elaborate if this is deemed relevant or interesting - though I might be slow to respond for the next week or so.

@lukewarlow
Copy link
Member Author

lukewarlow commented May 22, 2024

https://pptr.dev/api/puppeteer.page.content this function in puppeteer is effectively doing what getHTML() would do.

It's using the XML serialiser for doctype + documentElement.outerHTML so swapping it to use this new getHTML() function would be easy and then puppeteer could be used for server rendering web components as one example use case. See https://github.com/puppeteer/puppeteer/blob/97a4951d52b95b4815db989d30e82a00f5dc3d2b/packages/puppeteer-core/src/api/Frame.ts#L719

One key bit that would be needed to swap that for this new document.getHTML() is that comment nodes outside the html element would need to be serialised too, like we would do for the doctype.

@argyleink
Copy link
Contributor

VisBug, a more designer centric version of what Firebug was, has had a long outstanding feature request to save changes made via the extension. We've attempted things like outerHTML and then posting it to netlify (for example) but things were lossy with shadow DOM and a few other things. Users would also like a way to save the changes to a local file, which they could then send around to share their modifications with the team. I could see VisBug using this API to aid in saving page modifications.

@myfonj
Copy link

myfonj commented May 22, 2024

Archiving tools

For Example SingleFile has to do quite a bit of dark magic to preserve contents of the shadow DOM, and I guess they would be happy to switch to some native API instead.


By almost telepathic coincidence, lately I tried to make a "poor-man's SingleFile static snapshooter": simple bookmarklet that constructs a dataURI of displayed page stripping it from from all scripts, and injecting a base tag (for eventually loading images, styles, …) and (partially) uriEncoding the documentElement.outerHTML (https://myfonj.github.io/utils/bookmarklets/dataurize-page.html) as a quick exploration, with a vile idea to "backup" smaller documents into dataURI bookmarks (I carry loads of useful stuff this way).
N.b. it:

  1. contains funny const doctype = document.compatMode == 'CSS1Compat' ? '<!doctype html>': ''; statement that frankly feels unnecessary (EDIT: now I see few comments above that it's possible to do simpler new XMLSerializer().serializeToString(document.doctype); what feels more robust), and
  2. does not do any shadow DOM unwrapping magic.

So for example it works quite OK for this GH issue page, but it completely misses main content on Caniuse (since it is a custom element), for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: shadow Relates to shadow trees (as defined in DOM)
Development

No branches or pull requests

7 participants