Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing serialized access to custom element implementation DOM for web archival benefits #657

Closed
ernsheong opened this issue Aug 23, 2017 · 6 comments

Comments

@ernsheong
Copy link

ernsheong commented Aug 23, 2017

I come from the perspective of wanting to archive web pages that are built out of custom elements, from the browser extension itself (as part of my efforts for https://pagedash.com). This might also benefit traditional server-side web archivers such as the Internet Archive, as well as efforts to enable server-side rendering for web apps built with custom elements.

As it currently stands, with the Shadow DOM, an element's actual HTML implementation is hidden from the parent (innerHTML returns "", outerHTML returns the current element), and this is problematic for my purpose and also from an SEO-renderer's purpose (even Sam Li's http://github.com/GoogleChrome/rendertron requires forcing the use of the Shady DOM polyfill to get SSR rendering to work instead).

We are totally fine on this front if web components contain largely static and self-contained data, but the reality is that most web components need "hydration" via a dynamic AJAX call, etc.

So I am proposing a way of exposing the custom element's DOM state in a serialized fashion so that SSRs and other tools can get a hold of the DOM and serve it "as is" without a hydration step. Ideally it can be exposed without compromising the encapsulation that guarantees that the parent DOM cannot meddle with the custom element's implementation.

@domenic
Copy link
Collaborator

domenic commented Aug 23, 2017

There is already a good way of serializing: serialize a script element that sets up a shadow root.

@ernsheong
Copy link
Author

ernsheong commented Aug 23, 2017

Can you provide an example of what you mean?

Here's a scenario:

<parent-element>
  // Shadow DOM
  <child-element>
    // Shadow DOM
    <foo-bar>...</foo-bar>
  </child-element> 
</parent-element>

So then <parent-element>.shadowRoot.innerHTML returns "<child-element></child-element>", but there is no way to know that <foo-bar></foo-bar> is there within the child's Shadow DOM.

@domenic
Copy link
Collaborator

domenic commented Aug 23, 2017

Here's a serialization of that:

<parent-element>
</parent-element>
<script>
"use strict";

const shadowRoot = document.currentScript.previousElementSibling.attachShadow({ mode: "open" });
shadowRoot.innerHTML = `<child-element></child-element>`;

const shadowRoot2 = shadowRoot.firstElementChild.attachShadow({ mode: "open" });
shadowRoot2.innerHTML = `<foo-bar>...</foo-bar>`;
</script>

@ernsheong
Copy link
Author

Thanks for the example 👍

This works, if:

  1. I own the page and know what goes into it all the way to the child leaf node.
  2. The APIs that the custom element depends on (and draws data from) is still alive (problematic for a web archive).

As we know, the initial state of a web app could be an empty index.html shell (Initial State), which is then followed by JS loading, AJAX calls, and injection of more DOM elements into the shell (-> Rendered State). Before custom elements, as it currently stands, it is trivial to capture the entire DOM in the Rendered State: document.documentElement.outerHTML gives me everything in the Rendered State.

So my goal is really to capture the Rendered State of the HTML DOM as it is. And the 2 points above does not apply: I don't own the page, and the APIs that the elements depend on may not exist anymore. Pre custom elements, this is not an issue as illustrated above, I can still render the Rendered State irregardless of API availability. With custom elements, it seems like there isn't a way to get the entire Rendered State of the page. Hope I make sense.

Is there a way to capture the Rendered State of a Custom Elements page all the way down to the child leaf node?

@matthewp
Copy link

You can get the entire rendered state of the page by traversing all of the .shadowRoot properties and building up your HTML string that way.

@ernsheong
Copy link
Author

Wow, thanks. Yes, that is possible. Thank you all for your quick responses 😃😃😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants