Expose document fragment serialization algorithm #965

LeaVerou · 2021-04-01T18:22:36Z

This came up during TAG review of Sanitizer API.

They have a sanitize() method that returns a DocumentFragment and a sanitizeToString() which merely calls sanitize() and then serializes the fragment.

Instead of adding ad hoc methods in various Web Platform APIs to invoke the HTML fragment serialization algorithm, there should be a way to serialize document fragments directly. I imagine there are reasons not to add innerHTML to document fragments, otherwise it would already be there. Perhaps a new getter, or a method that also accepts an optional context type?

The text was updated successfully, but these errors were encountered:

domenic · 2021-04-01T18:28:22Z

Hmm I transferred this to whatwg/dom but maybe it really belongs in https://w3c.github.io/DOM-Parsing/, at least until we finish merging that spec into HTML. Sorry about that.

As you say, I have the feeling this has been previously discussed, and maybe there were web compatibility reasons why it hasn't been added. We added innerHTML to ShadowRoot after all; why did we do that instead of adding it to its base class DocumentFragment?

In w3c/DOM-Parsing#21 there's some discussion of the problem of the context element, but I believe that only impacts the setter, not the getter. Maybe we could add a getter to DocumentFragment? Or maybe it was just compat issues...

mozfreddyb · 2021-04-06T11:18:17Z

When building the prototype Sanitizer API in Gecko, I have actually thought the same, but didn't think of (dare thinking of?) suggesting this for wider implementation.

I even used this as a short-cut in Gecko. We currently have an internal, C++-only getter for innerHTML that we could easily expose to the web (added in bug 1653232).

I'm a bit unsure about the compat bit and how to resolve it though..

Jamesernator · 2021-04-06T22:57:28Z

Probably for the same reason as DocumentFragment, you can't serialize a document either (except via XMLSerializer but this forces xhtml rather than html).

The current proposal for declarative shadow dom contains a .getInnerHTML(...) function, so even if .innerHTML on DocumentFragment/Document is a web-compat issue, that could potentially be used instead.

rniwa · 2021-04-06T23:59:53Z

See the following public-webapps threads:

This wasn't really prevented / motivated by the compatibility concern. It was because innerHTML on template and ShadowRoot was conceptualized as a way to be able to support setter. With DocumentFragment which is not ShadowRoot, we don't know the context element so we need to decide which parsing mode to use. It would be indeed weird & controversial to introduce innerHTML that only supports getting but not setting since that could be a compatibility concern especially if we decide to throw an exception in the setter.

annevk · 2021-04-12T08:00:19Z

Yeah, I think this is a problem with the sanitizer API. The fragment parser takes a string and a context element. Both are significant.

LeaVerou · 2021-05-12T15:58:32Z

Yeah, I think this is a problem with the sanitizer API. The fragment parser takes a string and a context element. Both are significant.

I'm not sure how to parse this comment. Are you saying the serialization algorithm shouldn't be exposed because it's Sanitizer API's problem, or that it shouldn't accept a context element as a parameter? Or something else?

annevk · 2021-05-12T16:21:26Z

I think (willing to be convinced otherwise) HTML string sanitization needs to account for the same input parameters as HTML string parsing, i.e., the string and the context element. Otherwise it might be tricky to insert into <tr> or some such.

LeaVerou · 2021-05-12T17:17:22Z

I think (willing to be convinced otherwise) HTML string sanitization needs to account for the same input parameters as HTML string parsing, i.e., the string and the context element. Otherwise it might be tricky to insert into <tr> or some such.

I'm still confused. Are you saying the algorithm shouldn't be exposed on the DOM side, or making a comment to inform the API shape?

annevk · 2021-05-12T17:22:54Z

API shape / model.

annevk · 2021-05-18T05:30:00Z

The sanitize API question moved (back) to WICG/sanitizer-api#42.

With regards to OP, having considered the sanitization API more carefully, I don't think we should expose the serialization of a DocumentFragment directly (other than through innerHTML) as there is no context-free parse function that would give you back the DocumentFragment you serialized. The exception here is XML, but I don't think we should offer an API that ends up being problematic for all HTML users (which will be the majority).

(As a simple example, consider obj.innerHTML = "<td>test<td>test". If obj is an HTML body element, it will contain a Text node "testtest". If it's an HTML tr element, it will end up containing what you expect.)

So I'll leave this open for a bit to see if there is a compelling counterargument, but I'm inclined to close this given the above.

Jamesernator · 2021-05-18T05:48:33Z

So I'll leave this open for a bit to see if there is a compelling counterargument, but I'm inclined to close this given the above.

It might be a separate issue, but could we get a way to serialize a whole document? (other than XMLSerializer which forces XHTML even if the document is HTML).

Serializing a document doesn't suffer from the context-specific deserialization/serialization issue so it should be fine to expose.

annevk · 2021-05-18T06:29:16Z

Yeah, that seems like a reasonable request to me as an issue against either DOM Parsing or the HTML Standard. Ideally it would come with some use cases. (Browsers are already forced to implement it for XMLHttpRequest come to think of it, which should make it pretty cheap to add.)

LeaVerou · 2021-05-18T10:26:23Z

@annevk This argues that the serialization function needs to accept a (mandatory?) context parameter, not that there shouldn't be a serialization function at all, no?

annevk · 2021-05-18T10:31:56Z

@LeaVerou the serialization doesn't depend on the context, but the parsing of it does. Given that mismatch, offering context-free serialization (i.e., documentFragmentInstance.something rather than someElement.innerHTML) will likely lead to subtle problems in practice. (As you know there's no way we can annotate the string and say that it can be only be handed to innerHTML successfully when used on a tr element or some such.)

LeaVerou · 2021-05-18T10:50:25Z

Does this mean the sanitizer API should not be offering methods that only accept the string to sanitize as input?

annevk · 2021-05-18T10:54:32Z

Yeah, I discussed that with @mozfreddyb yesterday and that's why he reopened WICG/sanitizer-api#42 for further discussion.

domenic transferred this issue from whatwg/html Apr 1, 2021

LeaVerou mentioned this issue May 12, 2021

Early design review: Sanitizer API w3ctag/design-reviews#619

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose document fragment serialization algorithm #965

Expose document fragment serialization algorithm #965

LeaVerou commented Apr 1, 2021

domenic commented Apr 1, 2021

mozfreddyb commented Apr 6, 2021

Jamesernator commented Apr 6, 2021

rniwa commented Apr 6, 2021

annevk commented Apr 12, 2021 •

edited

Loading

LeaVerou commented May 12, 2021

annevk commented May 12, 2021

LeaVerou commented May 12, 2021

annevk commented May 12, 2021

annevk commented May 18, 2021

Jamesernator commented May 18, 2021

annevk commented May 18, 2021

LeaVerou commented May 18, 2021

annevk commented May 18, 2021

LeaVerou commented May 18, 2021

annevk commented May 18, 2021

Expose document fragment serialization algorithm #965

Expose document fragment serialization algorithm #965

Comments

LeaVerou commented Apr 1, 2021

domenic commented Apr 1, 2021

mozfreddyb commented Apr 6, 2021

Jamesernator commented Apr 6, 2021

rniwa commented Apr 6, 2021

annevk commented Apr 12, 2021 • edited Loading

LeaVerou commented May 12, 2021

annevk commented May 12, 2021

LeaVerou commented May 12, 2021

annevk commented May 12, 2021

annevk commented May 18, 2021

Jamesernator commented May 18, 2021

annevk commented May 18, 2021

LeaVerou commented May 18, 2021

annevk commented May 18, 2021

LeaVerou commented May 18, 2021

annevk commented May 18, 2021

annevk commented Apr 12, 2021 •

edited

Loading