Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose document fragment serialization algorithm #965

Open
LeaVerou opened this issue Apr 1, 2021 · 16 comments
Open

Expose document fragment serialization algorithm #965

LeaVerou opened this issue Apr 1, 2021 · 16 comments

Comments

@LeaVerou
Copy link

LeaVerou commented Apr 1, 2021

This came up during TAG review of Sanitizer API.

They have a sanitize() method that returns a DocumentFragment and a sanitizeToString() which merely calls sanitize() and then serializes the fragment.

Instead of adding ad hoc methods in various Web Platform APIs to invoke the HTML fragment serialization algorithm, there should be a way to serialize document fragments directly. I imagine there are reasons not to add innerHTML to document fragments, otherwise it would already be there. Perhaps a new getter, or a method that also accepts an optional context type?

@domenic domenic transferred this issue from whatwg/html Apr 1, 2021
@domenic
Copy link
Member

domenic commented Apr 1, 2021

Hmm I transferred this to whatwg/dom but maybe it really belongs in https://w3c.github.io/DOM-Parsing/, at least until we finish merging that spec into HTML. Sorry about that.

As you say, I have the feeling this has been previously discussed, and maybe there were web compatibility reasons why it hasn't been added. We added innerHTML to ShadowRoot after all; why did we do that instead of adding it to its base class DocumentFragment?

In w3c/DOM-Parsing#21 there's some discussion of the problem of the context element, but I believe that only impacts the setter, not the getter. Maybe we could add a getter to DocumentFragment? Or maybe it was just compat issues...

@mozfreddyb
Copy link
Contributor

When building the prototype Sanitizer API in Gecko, I have actually thought the same, but didn't think of (dare thinking of?) suggesting this for wider implementation.

I even used this as a short-cut in Gecko. We currently have an internal, C++-only getter for innerHTML that we could easily expose to the web (added in bug 1653232).

I'm a bit unsure about the compat bit and how to resolve it though..

@Jamesernator
Copy link

Probably for the same reason as DocumentFragment, you can't serialize a document either (except via XMLSerializer but this forces xhtml rather than html).

The current proposal for declarative shadow dom contains a .getInnerHTML(...) function, so even if .innerHTML on DocumentFragment/Document is a web-compat issue, that could potentially be used instead.

@rniwa
Copy link
Collaborator

rniwa commented Apr 6, 2021

See the following public-webapps threads:

This wasn't really prevented / motivated by the compatibility concern. It was because innerHTML on template and ShadowRoot was conceptualized as a way to be able to support setter. With DocumentFragment which is not ShadowRoot, we don't know the context element so we need to decide which parsing mode to use. It would be indeed weird & controversial to introduce innerHTML that only supports getting but not setting since that could be a compatibility concern especially if we decide to throw an exception in the setter.

@annevk
Copy link
Member

annevk commented Apr 12, 2021

Yeah, I think this is a problem with the sanitizer API. The fragment parser takes a string and a context element. Both are significant.

@LeaVerou
Copy link
Author

Yeah, I think this is a problem with the sanitizer API. The fragment parser takes a string and a context element. Both are significant.

I'm not sure how to parse this comment. Are you saying the serialization algorithm shouldn't be exposed because it's Sanitizer API's problem, or that it shouldn't accept a context element as a parameter? Or something else?

@annevk
Copy link
Member

annevk commented May 12, 2021

I think (willing to be convinced otherwise) HTML string sanitization needs to account for the same input parameters as HTML string parsing, i.e., the string and the context element. Otherwise it might be tricky to insert into <tr> or some such.

@LeaVerou
Copy link
Author

I think (willing to be convinced otherwise) HTML string sanitization needs to account for the same input parameters as HTML string parsing, i.e., the string and the context element. Otherwise it might be tricky to insert into <tr> or some such.

I'm still confused. Are you saying the algorithm shouldn't be exposed on the DOM side, or making a comment to inform the API shape?

@annevk
Copy link
Member

annevk commented May 12, 2021

API shape / model.

@annevk
Copy link
Member

annevk commented May 18, 2021

The sanitize API question moved (back) to WICG/sanitizer-api#42.

With regards to OP, having considered the sanitization API more carefully, I don't think we should expose the serialization of a DocumentFragment directly (other than through innerHTML) as there is no context-free parse function that would give you back the DocumentFragment you serialized. The exception here is XML, but I don't think we should offer an API that ends up being problematic for all HTML users (which will be the majority).

(As a simple example, consider obj.innerHTML = "<td>test<td>test". If obj is an HTML body element, it will contain a Text node "testtest". If it's an HTML tr element, it will end up containing what you expect.)

So I'll leave this open for a bit to see if there is a compelling counterargument, but I'm inclined to close this given the above.

@Jamesernator
Copy link

So I'll leave this open for a bit to see if there is a compelling counterargument, but I'm inclined to close this given the above.

It might be a separate issue, but could we get a way to serialize a whole document? (other than XMLSerializer which forces XHTML even if the document is HTML).

Serializing a document doesn't suffer from the context-specific deserialization/serialization issue so it should be fine to expose.

@annevk
Copy link
Member

annevk commented May 18, 2021

Yeah, that seems like a reasonable request to me as an issue against either DOM Parsing or the HTML Standard. Ideally it would come with some use cases. (Browsers are already forced to implement it for XMLHttpRequest come to think of it, which should make it pretty cheap to add.)

@LeaVerou
Copy link
Author

@annevk This argues that the serialization function needs to accept a (mandatory?) context parameter, not that there shouldn't be a serialization function at all, no?

@annevk
Copy link
Member

annevk commented May 18, 2021

@LeaVerou the serialization doesn't depend on the context, but the parsing of it does. Given that mismatch, offering context-free serialization (i.e., documentFragmentInstance.something rather than someElement.innerHTML) will likely lead to subtle problems in practice. (As you know there's no way we can annotate the string and say that it can be only be handed to innerHTML successfully when used on a tr element or some such.)

@LeaVerou
Copy link
Author

Does this mean the sanitizer API should not be offering methods that only accept the string to sanitize as input?

@annevk
Copy link
Member

annevk commented May 18, 2021

Yeah, I discussed that with @mozfreddyb yesterday and that's why he reopened WICG/sanitizer-api#42 for further discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants