Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp about:blank and about:srcdoc iframe/popup base URL inheritance #421

Closed
foolip opened this issue Dec 18, 2015 · 19 comments · Fixed by #9464
Closed

Revamp about:blank and about:srcdoc iframe/popup base URL inheritance #421

foolip opened this issue Dec 18, 2015 · 19 comments · Fixed by #9464
Labels
interop Implementations are not interoperable with each other normative change topic: browsing context topic: navigation

Comments

@foolip
Copy link
Member

foolip commented Dec 18, 2015

Update as of 2023-06-01 by @domenic: this issue has expanded to cover general base URL inheritance for about:blank iframes/popups, and about:srcdoc iframes. See #421 (comment) for the current proposal for how to update the spec and achieve interop.


Original 2015 post contents by @foolip:

This is about a Blink bug filed by @bzbarsky

For an iframe with no src, Edge uses the parent document's URL, while Firefox uses the parent document's base URL:
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3791

It looks like the base URL is frozen is Edge, but can be affected by a later base URL change of the parent document in Firefox:
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3792

In both cases, Blink uses "about:blank" as the base URL, but will actually look at the parent document's base URL while trying to resolve URLs in the iframe document.

Everyone seems to agree about the simpler case with a src attribute, that the iframe's URL is also its base URL:
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/3793

This does not appear to be handled in https://html.spec.whatwg.org/#creating-a-new-browsing-context

@bzbarsky, can you describe what the model for this in Gecko is? Is it specific steps for iframe insertion, or does any of it also apply to frame elements, object elements, createDocument() or createHTMLDocument()?

@foolip
Copy link
Member Author

foolip commented Dec 18, 2015

Looking closer, I can't find any connection at all between HTML's various base URL concepts and DOM's base URL. The string "concept-document-base-url" appears nowhere in the generated HTML source, and the only link with exactly the text "base URL" is in script settings for browsing contexts but in fact refers to HTML's document base URL concept like all the rest.

@annevk, is this all entirely broken? How many concepts are the supposed to be, and how do they fit together?

@Ms2ger
Copy link
Member

Ms2ger commented Dec 18, 2015

The DOM definition doesn't seem quite right; I'd think we'd always want HTML's "document base URL" (except for resolving the base element's href attribute).

@annevk
Copy link
Member

annevk commented Dec 18, 2015

My idea was that the DOM defines a base URL concept for documents and HTML takes care of setting it as appropriate. None of this is quite done.

@foolip
Copy link
Member Author

foolip commented Dec 18, 2015

On the Blink bug, @bzbarsky said "My (and Firefox's) definition is the one that's in the spec, and the one that happens to make sense: .baseURI returns the thing that's used as the base URI."

Although I can't make sense of the specs, I agree that it would make a lot of sense if what document.baseURI returns is actually the base URL that's used to resolve URLs in that document. I also think it would be great if the base URL is an internal slot like in DOM, and not a computed value like in HTML. Then, script could do the exact equivalent by using new URL(relativeURL, document.baseURI). I'm not sure if fallback base URL makes that impossible.

If somebody has a detailed model in mind for this, it would be interesting to see in which ways Blink differs.

@bzbarsky
Copy link
Contributor

bzbarsky commented Jan 7, 2016

@foolip So on the HTML spec end, I'm not sure what this issue is even about. The "should inherit" behavior is already covered by the "fallback base URL" as defined at https://html.spec.whatwg.org/#fallback-base-url and the corresponding use in https://html.spec.whatwg.org/#document-base-url

Maybe the question is whether this is the right way to spec it, since it doesn't actually match any implementations?

@bzbarsky, can you describe what the model for this in Gecko is?

Sure. Each document has a "baseURL" field, which may be null. Each document has a concept of "base URL", which is computed as follows: if the document is a srcdoc document and has a parent document (long story about why it might not have the latter, involving references to documents that outlive their browsing context), return the "base URL" of the parent document. Otherwise if the "baseURL" field is not null, return that. Otherwise return the "document's address" in HTML spec terms.

The "baseURL" field is set by several things; this may not be quite right because I'm trying to exclude codepaths that are only accessible to extensions, sorry:

  1. Cloning a document via cloneNode will copy the "baseURL" field.
  2. Adding/removing <base> elements or href attributes on them will set the field, possibly to null (because <base> elements assume they're the only ones who set the field, even though that's totally not the case).
  3. Some document-creation codepaths take an explicit value to store in the field; I haven't looked at them very carefully. Mostly relevant for stuff created via DOMImplementation and I can look up the details if they really matter.
  4. XMLHttpRequest responseXML documents will get the "baseURL" set to the final post-redirect URL, afaict.
  5. Initial about:blank gets the base URL of the frameElement (in the .baseURI sense) of the window it's in as its "baseURL", as far as I can tell. But note that Gecko's initial about:blank is not quite the same thing as the spec's at the moment.
  6. Documents loaded from various about: URIs, including non-initial about:blank (which is actually the spec's initial about:blank in some cases) get a "baseURL" in the following way: when initially converting the string to a Gecko URL object, we record the base URL the string was "resolved" relative to at the time (in quotes, because it's not a relative URL at all, of course; but we record the thing that would have been used as a base URL for resolving a relative URL if we'd had a relative URL). This base URL ends up in the "baseURL" field of the resulting document.

I fully expect that no one else does anything like item 6 in that list. It was a quick and easy hack to make about:blank generally more or less web-compatible a long time ago...

@foolip
Copy link
Member Author

foolip commented Jan 11, 2016

@foolip So on the HTML spec end, I'm not sure what this issue is even about. The "should inherit" behavior is already covered by the "fallback base URL" as defined at https://html.spec.whatwg.org/#fallback-base-url and the corresponding use in https://html.spec.whatwg.org/#document-base-url

Maybe the question is whether this is the right way to spec it, since it doesn't actually match any implementations?

When filing this issue, I was looking for some kind of interaction with DOM's "base URL", and I expected the "base URL" field to be updated somewhere in https://html.spec.whatwg.org/#creating-a-new-browsing-context, which does not happen.

You're right that HTML's computed "document base URL" and "fallback base URL" explain why things are resolved correctly, but the DOM+HTML specs taken together don't explain why node.baseURI would ever be anything other than "about:blank".

So, we need to figure out at least:

  1. Which base URL fields there should be and when they are updated. (Blink has 3)
  2. What's used to resolve URLs.
  3. What's used for node.baseURI.

There's a lot of small differences here, not sure how to proceed.

@bzbarsky
Copy link
Contributor

That's fair. My main design requirments here are (1) web compat, whatever that means and (2) having node.baseURI match what's actually used for relative URL resolution.

@foolip
Copy link
Member Author

foolip commented Jan 11, 2016

Yeah, I agree with those goals.

It sounds like tests for a number of different documents are needed.

  1. <iframe> with no attributes (about:blank), <iframe src> and <iframe srcdoc>
  2. <frame>
  3. document.open()
  4. document.implementation.create*Document()
  5. XMLHttpRequest

For each of those, check node.baseURI and the actual base URL used to resolve:

  1. Initially
  2. After adding a <base> element
  3. After removing a <base> element
  4. If the parent document's base URL changes

Ms2ger added a commit to Ms2ger/dom that referenced this issue May 31, 2016
This definition is more correct right now, and using a getter rather than a
mutable field makes it easier to figure out what is returned.

CC <whatwg/html#421>.
annevk pushed a commit to whatwg/dom that referenced this issue Jun 1, 2016
This definition is more correct right now, and using a getter rather than a mutable field makes it easier to figure out what is returned.

See also: whatwg/html#421.
@annevk annevk added topic: navigation interop Implementations are not interoperable with each other labels May 4, 2017
@annevk
Copy link
Member

annevk commented May 9, 2017

@bzbarsky you mostly describe base URL as a static field, but for srcdoc it's a computation? Or do we compute it once for srcdoc too and then update it as things change?

@annevk
Copy link
Member

annevk commented May 9, 2017

@foolip tests for XMLHttpRequest: web-platform-tests/wpt#5863.

@bzbarsky
Copy link
Contributor

bzbarsky commented May 9, 2017

you mostly describe base URL as a static field, but for srcdoc it's a computation?

My comments above predated Gecko doing sane things for base URIs in srcdoc.

The current setup in Gecko is as follows:

  1. Documents have a "document base URI" field which contains a URI or null.
  2. This "document base URI" field is set in various places (creation of initial about:blank documents, some XSLT stuff, <base> elements being inserted/removed, something about DOMImplementation.createDocument, which apparently uses about:blank as the base URI?, various XHR/DOMParser bits, etc).
  3. Getting the document base URI checks that field. If non-null, the value in it is returned. If null, for a srcdoc document the base URI of the parent is returned; for non-srcdoc the document URI is returned.

This has the somewhat unintuitive behavior that if you start with an initial about:blank, insert a <base href>, and then remove it, you end up with a different base URI than you started with. In practice that hasn't been a problem.

@annevk
Copy link
Member

annevk commented May 9, 2017

Why can't we set it when creating a srcdoc document too?

@annevk
Copy link
Member

annevk commented May 9, 2017

(Aside: the idea is that for all Document objects the default base URL would be "about:blank", since we didn't want baseURI to return a non-URL value.)

@bzbarsky
Copy link
Contributor

bzbarsky commented May 9, 2017

Why can't we set it when creating a srcdoc document too?

We can. The question is whether the srcdoc should dynamically track the base URI of the parent or not (i.e. snapshot it at creation time) and how <base> removal in srcdoc should behave. The current setup in Gecko was basically done to align closer with the current srcdoc spec.

@annevk
Copy link
Member

annevk commented May 9, 2017

Okay. I tend to think we should just snapshot it at creation time (and then only change it for <base> changes inside the srcdoc document), but I guess at this point we might be looking at compat fallout :/

@hiroshige-g
Copy link
Contributor

Drafted WPT just for <iframe srcdoc>: web-platform-tests/wpt#23130

Chromium: Takes snapshot of parent's base URL around the time of <iframe srcdoc>.
Firefox: Reflects parent's updates dynamically.

(The opposite direction from #5474, where Chromium reflects parent's referrer policy updates while Firefox takes snapshot)

@domenic
Copy link
Member

domenic commented Sep 28, 2022

Some parts of the Chrome team (@csreis @wjmaclean) have started investigating this area in https://bugs.chromium.org/p/chromium/issues/detail?id=1356658 . We'd love to get interop on base URL inheritance in general, and I've volunteered to help with the spec discussions.

As general background when talking about inheritance discussions, there are potentially two parties involved: creator (= embedder for iframes), and navigation initiator. These two are the same for the initial about:blank, but are not the same in general, including for non-initial about:blank or for about:srcdoc.

I believe the team's proposal is:

  • about:srcdoc frames inherit base URL from their embedder at document creation time, and it is snapshotted then.

  • about:blank frames+popups inherit base URL from their navigation initiator at document creation time, and it is snapshotted then.

Related issues are #2883 and #3989. (Plus #8105, which is a proposal for a change to limit how much of the base URL is inherited; but IMO we should only explore that after first getting interop.)

Implications of this proposal:

  • Disconnected iframes keep their base URL. The current spec "crashes" for srcdoc iframes in such cases.

  • Base URLs do not update in response to parent or creator base URLs changing. The current spec does such dynamic updates for both about:srcdoc and about:blank.

  • We will inherit the base URL for about:srcdoc iframes that are not srcdoc documents, as discussed in Fallback base URL computation for a document with about:srcdoc but not an iframe srcdoc document #3989. Separately we can continue discussion there about perhaps preventing such documents from ever being created in the first place...

  • We will align base URL inheritance with origin inheritance, for cases where people navigate to a non-initial about:blank. The current spec has such about:blank windows always inherit base URL from their creator and origin from the navigation initiator; we will align both on navigation initiator.

@domenic domenic added the agenda+ To be discussed at a triage meeting label Sep 28, 2022
@annevk
Copy link
Member

annevk commented Sep 28, 2022

This generally sounds good to me. Thank you for working on it! cc @cdumez

@past past removed the agenda+ To be discussed at a triage meeting label Sep 30, 2022
@domenic
Copy link
Member

domenic commented Oct 4, 2022

One addition to the plan: we believe we'll need to store the base URL in the session history entry for such documents, just like we do for the origin currently. This helps in cases like: https://example.com/ opens an about:blank popup, which has a base URL; that popup navigates to https://other.example/; the opener window closes; then the popup traverses back (no bfcache). In such cases we should keep the base URL of https://example.com/ for the popup even when it's back on about:blank.

There are a few other possible ways to get the desired behavior here, but we think using the session history entry is nicest because it's symmetrical with what we're already doing with origin.

domenic added a commit that referenced this issue Oct 31, 2022
This monster completely rewrites everything to do with navigation and traversal.

It introduces the "navigable" and "traversable navigable" concepts, which take on many of the roles that browsing contexts previously did, but better. A navigable can present a sequence of browsing contexts, which to the user seem to all be the same, but due to browsing context group switches, have different WindowProxys and are allocated in different agent clusters. A traversable navigable manages the session history for itself and all its descendant navigables, providing a synchronization point and source of truth.

The general flow of navigation and traversal is now geared toward creating a session history entry, populated with the appropriate document, before finally applying the history "step". The step concept for session history, managed by the traversable, replaces the previous idea of joint session history, which was a sort of deduplicated union of individual session histories for each browsing context within a top-level browsing context.

Notable things we won't tackle this round, but are much easier to tackle in the future:

- Iframe restoration on (non-bfcache) history traversal is not yet specified.
- Overlapping navigations and traversals (see #6927) are not perfect yet, although this makes them better.
- Browsing context names (see #313) are not perfect yet, although this makes them better.
- Base URL inheritance and storage in session history (see #421, #2883, and #3989) is not yet specified.
- Sandbox flag storage in session history (see #6809) is not yet specified.
- Task queuing when creating agents/realms/windows/documents (see #8443) remains sketchy.
- Window object reuse is not yet rationalized (see #3267).

Closes #854 by clarifying the javascript: URL origin and origin-checking setup.

Closes #1073 by properly resetting active-ness of documents when they are removed.

Closes #1130 by removing the source browsing context concept, using a sourceDocument argument instead, and taking source snapshot params at the appropriate early time.

Closes #1191 by properly sharing document state across documents, as well as overlapping same-document navigations plus cross-document traversals.

Closes #1336 by properly handling child browsing contexts.

Closes #1382 by only unloading after we are sure we have a new document (i.e., not a 204 or download).

Closes #1454 by rewriting session history closer to what implementations do, with the nested history concept in particular taking care of the issues discussed there.

Closes #1524 by introducing the POST data concept and storing it in the document state.

Closes #2436 by rewriting the spec for history.go() to be clear about the results. Tests: web-platform-tests/wpt#36366.

Closes #2566 by introducing an explicit "history object" definition. Tests: web-platform-tests/wpt#36367.

Closes #2649 through clear creation of srcdoc documents, including during history traversal.

Closes #3215 by preserving POST data and reusing it on reloads.

Closes #3447 by specifying a precise mechanism (the ongoing navigation) for canceling navigations, and the points at which that mechanism is consulted. It also stops queuing a task for hyperlink navigations.

Closes #3497 by posting appropriate tasks for cross-event-loop navigations.

Closes #3615 by rewriting traverse a history by a delta, which eventually calls into apply the history step, to navigate all relevant navigables.

Closes #3625 by storing information in the document state (not just the URL), so that future traversals can reconstruct the request appropriately.

Closes #3730 by doing proper task queuing for navigation, including one for javascript: URLs but not including one for normal same-frame navigations. Tests: web-platform-tests/wpt#36358.

Closes #3734 by rewriting the definition of script-closable to use well-defined concepts.

Closes #3812 by removing all uses of "active document" as a predicate instead of a property.

Closes #4054 by introducing the session history traversal queue and renaming the previous "history traversal task source" to "navigation and traversal task source".

Closes #4121 by doing the "allowed to navigate" check at the top of apply the history step.

Closes #4428 by keeping a strong reference from documents (including bfcached documents) to their containing browsing context.

Closes #4782 by introducing the top-level traversable and navigable concepts.

Closes #4838 by doing sandbox checking in a much more precise manner, in particular snapshotting the relevant flags early in any traversals.

Closes #4852 by using document state (in particular history policy container, request referrer, and request referrer policy) in reloads.

Closes #5103 by properly restoring scroll positions for everything that is traversed, as part of properly traversing more than one navigable.

Closes #5350 by properly restoring window names across browsing context group switches, and going back to the same browsing context as was previously there when traversing back across a BCG switch boundary. (Implementations could create new browsing contexts, as long as they restore the WindowProxy scripting relationships and other browsing context features; the result is observably equivalent.)

Closes #5597 by rewriting "allowed to download" to just take booleans, derived from the appropriate snapshotted or computed sandboxing flags.

Closes #5767, modulo bugs and oversights we made, by rewriting everything :).

Closes #5877 by re-specifying "fully active" in terms of navigables, instead of browsing contexts.

Closes #6446 by properly firing beforeunload to all descendant navigables, although whether or not they actually prompt still allows implementation leeway.

Closes #6483 by introducing the distinction between current session history entry and active session history entry.

Closes #6514 by settling on using a single origin for these checks.

Closes #6628 by storing window.name values in the document state, so even in strange splitting situations like described there, they remain.

Closes #6652 by no longer changing history.state when reactivating a document from bfcache ("restore the history object state" is called only when documentsEntryChanged is true). Tests: web-platform-tests/wpt#36368.

Closes #6773 by having careful handling of synchronous navigations during traversals. Test updates: web-platform-tests/wpt#36364.

Closes #6798 by treating javascript: URL navigations as replacements.

Works towards #6809 by storing srcdoc resources in the document state.

Closes #6813 by storing referrer in the document state. Tests for the repopulation case: web-platform-tests/wpt#36352. (No tests yet for the reload case.)

Closes #6947 by rolling its contents into this change: PDF documents are put in the same category as other inaccessible, no-DOM documents.

Closes #7107 by clearing history state on redirects and when origin changes by other means, such as CSP.

Closes #7441 by making window.blur() a no-op because that was simpler than updating it to operate on navigables.

Closes #7722 by incorporating its contents into the rewritten version.

Closes #8295 by refactoring the iframe/frame load event specs to avoid the bug.

Helps with #8395 by at least ensuring the javascript: case does not fire beforeunload. Tests: web-platform-tests/wpt#36488. (The other cases remain open for investigation and testing.)

Closes #8449 by exporting "create a fresh top-level traversable" which is designed for the use case in question.

Co-authored-by: Domenic Denicola <d@domenic.me>
Co-authored-by: Dominic Farolino <domfarolino@gmail.com>
@domenic domenic changed the title Iframes should inherit the base URL of the parent document in some cases Revamp about:blank and about:srcdoc iframe/popup base URL inheritance Jun 1, 2023
domenic pushed a commit that referenced this issue Jul 13, 2023
This gives an "about base URL" member to Document, document state, and navigation params. The intention is to capture a Document's creator's base URL when creating a new browsing context, and preserve it to (1) the newly created Document itself, and (2) the newly-created document state. Notably, preserving it in document state means that the same base URL is used when we recreate the Document while traversing the session history.

For the navigation case, we capture the initiator's base URL in the navigate algorithm as initiatorBaseURLSnapshot (alongside initiatorOriginSnapshot). This eventually threads through, via the document state and navigation params, to the point where we initialize a new Document object.

Finally, we remove the concept of a browsing context's creator base URL algorithm, and update the fallback base URL algorithm accordingly to refer to the relevant Document's new "about base URL" member.

This is all rather different from how the previous specification works. Previously, behavior differed between about:srcdoc and about:blank; base URL changes were supposed to be inherited in a live, not snapshotted, fashion; sometimes the navigation initiator was used and sometimes the browsing context creator/embedder; and the spec "crashed" for disconnected srcdoc iframes.

Closes #421. Closes #2883. Closes #3989.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop Implementations are not interoperable with each other normative change topic: browsing context topic: navigation
Development

Successfully merging a pull request may close this issue.

7 participants