-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processing the manifest is no longer a function of document URL #834
Conversation
Needs review and also analysis of how implementations behave to see if this will actually break sites. From my vague understanding, at least Chrome already considers sites to be non-installable if they have no start URL, so this change doesn't affect how sites are treated. |
Firefox Preview also considers sites to be non-installable if the start URL is missing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :D
index.html
Outdated
present, or not <a>within scope</a> of the {{Document}}'s | ||
<a data-cite="DOM#dom-document-url">URL</a>, consider the document | ||
not <a>installable</a>. | ||
</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "in scope of a URL" defined? What does that mean for document URL https://example.com/a/b/c/page.html? That the start_url must have that as a prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a section in the navigation-scope section of the spec defining in scope. https://www.w3.org/TR/appmanifest/#navigation-scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that behaviour doesn't sound appropriate here. Would be good to have test cases for this (input -> output examples).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. It's been awhile since I wrote this so I'm not sure what my intention was, but I suspect this was simply a mistake. It should have said "not same origin as", not "not within scope of".
That matches the existing check (which I'm removing), that would check if start_url was same origin as the document URL, and if not, ignore the supplied start_url and default to the document URL. Only now, we just consider it non-installable.
Does that sound right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds right.
@@ -1375,9 +1380,7 @@ <h3> | |||
<a data-cite="FETCH#concept-request-body">body</a>. | |||
</li> | |||
<li>Let <var>manifest</var> be the result of running <a>processing a | |||
manifest</a> given <var>text</var>, <var>manifest URL</var>, and the | |||
URL that represents the address of the <a>top-level browsing | |||
context</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The money shot.
@@ -1492,8 +1494,7 @@ <h3> | |||
</li> | |||
<li>Set <var>manifest</var>["<a>start_url</a>"] to the result of | |||
running <a>processing the <code>start_url</code> member</a> given | |||
<var>manifest</var>["<a>start_url</a>"], <var>manifest URL</var>, and | |||
<var>document URL</var>. | |||
<var>manifest</var>["<a>start_url</a>"] and <var>manifest URL</var>. | |||
</li> | |||
<li>Set <var>manifest</var>["<a>lang</a>"] to the result of running | |||
<a>processing the <code>lang</code> member</a> given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is for line 1506. start_url can now be undefined, the processing for scope segfaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specs don't segfault :)
This is WAI. "undefined" here doesn't mean set it to nullptr in C++. It means (according to how a JS value is converted into a dictionary member) to leave it empty as if the key didn't exist at all. So this logic means to treat an invalid start_url
exactly the same as a missing start_url
.
(Note that the WebIDL data structure permits only two options for start_url
: missing or a valid-but-possibly-empty string. So it should not be possible to store some other null-type value in here.)
C++ implementations should make sure this is the case and of course not segfault.
URL</var>, and a <a>URL</a> <var>document URL</var>. This algorithm | ||
returns a <a>URL</a>. | ||
<a>USVString</a> <var>value</var>, and a <a>URL</a> <var>manifest | ||
URL</var>. This algorithm returns a <a>URL</a> or undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we default to processing '.' instead of returning undefined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially, that would mean if start_url
is invalid, it defaults to the parent directory of the manifest file.
Currently, Chrome and Firefox (at least) do not consider a site installable if start_url
is missing or invalid, behaviour which I am attempting to enshrine by returning undefined (effectively treating it as missing) and by explicitly saying the document is not installable if start_url
is not present. I think it's desirable / conservative to fail in these bad cases, rather than try to come up with some "reasonable" default.
Also this default can go horribly wrong: if the manifest is hosted on a CDN, this would cause both the start_url
and scope
to be set to the parent directory on that CDN, which would likely be a 404 or a directory listing page.
1e76301
to
0699d45
Compare
Now the start_url is stripped from the manifest if it is invalid, rather than defaulting to the document URL. Also, we do not check that start_url is same-origin as document URL.
The document is not installable if there is no start URL or if the start URL is not within scope of the document URL.
This appears to have been a mistake.
0699d45
to
8d1c09f
Compare
@hober would you be able to confirm (or assign someone on Safari to confirm) whether this will break compatibility with how Safari works when installing a site. TL;DR: Currently the manifest says if start_url is missing or invalid, use the URL of the current document. We are proposing that it simply be non-installable (which is how Chrome and Firefox already behave). |
@othermaciej or @hober : Hi Safari folks, would you be able to give guidance on whether the above change would result in breakage to Safari? |
@mgiuca I asked the relevant folks internally and they think this change is reasonable. |
Thanks @hober . That means we can land this (after resolving merge conflicts). |
index.html
Outdated
@@ -304,6 +304,10 @@ <h2> | |||
</li> | |||
</ul> | |||
</li> | |||
<li>Otherwise, if |manifest|["{{WebAppManifest/start_url}}"] is not | |||
present, or not <a>same origin</a> as the {{Document}}'s | |||
{{Document/URL}}, consider the document not <a>installable</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that user agents let you install whatever you want (via "add to home screen") - maybe we should say:
{{Document/URL}}, consider the document not <a>installable</a>. | |
{{Document/URL}}, a user agent MAY consider the document not <a>installable</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this suggestion. Edge also has the "Install Site as an App" (or similar) menu option for what were "Pinned Sites".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't see this before. It's not so simple that we can simply say "MAY" here; otherwise that would allow user agents to install a document linking a manifest whose start_url
and scope
are on a different origin (where previous to this PR, we reset the start_url
and scope
to the document URL).
So I think if we change this, we need to have some wording similar to the "non-fatal" case of step 2 above, where we say "the user agent MAY either: 1. set start_url
to the document URL and scope
to match, or 2. consider the document not installable."
You're right, Chromium browsers allow this, but I'm sure we don't just let you install a manifest that has its start and scope on another origin. I'll figure out what they do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Chromium we currently enforce that start_url
and scope
are same-origin with the document URL, which the spec requires. If that isn't met, we discard the values.
When this happens, Chromium browsers use the document URL in place of the manifest values when a site has a shortcut created for it. That's behaviour outside the scope of the manifest spec and a browser choice.
Since this PR removes that same-origin requirement, we would remove that enforcement in our parsing when updating to match this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Chromium we currently enforce that start_url and scope are same-origin with the document URL, which the spec requires. If that isn't met, we discard the values.
I think treating start_url
as it if it was not present when it's not same origin or outside the scope makes sense, as does assigning it a sensible default (document URL) when in error. That can then remove the "installable" decision at that point (that shouldn't be part of the processing algorithm, IMO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so that does unfortunately mean that processing the manifest is still a function of the document URL (if we say "discard start_url
and scope
if they are not same-origin as document URL and replace them with document URL and the dir of the document URL, respectively"). That's explicitly what I was trying to avoid here.
I think I can live with it if we do that as a separate step outside the "processing the manifest".
That basically means we do "processing the manifest" which is NOT a function of the document URL, and then in a step outside of that, we do the above check if you're trying to install a document.
Marcos, I think you're right, having "installability of a document" doesn't make a lot of sense since user agents can always choose to install any document, and in any case, our current steps to determine installability are MAY MAY MAY. So we can scrap that. But I think we do need to replace this with some other steps for essentially "sanitizing a manifest for installation from a document".
What I want to allow is basically two separate high-level operations:
- Installing a manifest. (Not associated with a document at all.) That's essentially what's enabled by this PR, by allowing the user agent to process a manifest without a document URL, then install it.
- Installing a document. Get the associated manifest, if there is one. If there isn't, you can make one up. But this step should ensure that the manifest is same-origin as the document, to prevent sites from being able to install PWAs on another origin.
I propose that 1 is taken care of by just the definition "install" (what that means is UA-specific). And for 2, we should replace "steps to determine the installability of the document" with the following:
Steps to install a document, given a processed manifest manifest:
- Let manifest and manifest URL be the result of obtaining the manifest.
- If obtaining the manifest results in an error, set manifest to a processed manifest, populated from the top-level browsing context Document's metadata in a user-agent-specific way (e.g., setting
manifest["name"]
to the document title). - If
manifest["start_url"]
is not present, or not same origin as the Document URL, setmanifest["start_url"]
to the Document URL, and setmanifest["scope"]
to the result of parsing "." using the Document URL as the base URL. - Install the manifest.
This is too much work to do in this PR, so I think I'll just continue propping up the "installability of the document" concept right now, and do the above later after some more discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've updated the PR. I filed a bug #912 about the above issue (wherein Marcos and I agree, we shouldn't have "installability of the document") but I haven't tackled it here.
I've updated the text so the UA can now either consider the document not installable, OR can use the document URL to replace the off-origin start_url.
And I've updated the commit description. Actually, this is no longer a breaking change (!). You can carry on behaving exactly as before, if you like, which is replacing an invalid or off-origin start_url with the document URL, or you can error out (newly allowed behaviour). In one sense, this is now just a (useful) refactor, moving the replace-start-with-document logic up out of the manifest processor and into the "steps to determine installability of the document".
The next step in #912 will be rewriting those steps to be about postprocessing the manifest, not determining installability.
@marcoscaceres how is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine - see my one suggestion.... @mgiuca, do you still want to proceed with this?
Yep. |
Ok cool, please merge whenever you are ready. |
4f6f22c
to
b6cea81
Compare
Doing a last-minute re-read. There's a bit of a complication (as usual, because of the unfixable awkwardness that
So if Need to think a bit more about this before landing. |
oh, lol, "insatiability"... curse you autocorrect. |
Quick note that the bulk of my reply is on this discussion thread, in case you missed it. (I often can't find GitHub discussion threads so I wonder if I should have just pasted my reply at the bottom where people can see it :/) |
We have identified one potential breakage of existing behaviour with this change: If you have a manifest with no (or invalid)
(This is because under the current spec, the scope is set to its fallback value before other URLs are compared using "within scope". Under the proposed text, the scope is set to undefined, then the other URLs are compared using "within scope", and then the user agent has the option of setting the scope to its fallback value.) For example, say that you are in a document at {
"scope": "https://anothersite.com",
"icons": [{
"src": "https://somecdn.com/icon.png"
}],
"shortcuts": [{
"name": "Click me",
"url": "https://example.com/foo/click.html"
}]
} If you quickly eyeball this manifest, you'll see that both the icon and shortcut are on a different origin to the scope, so you might initially think both the icon and shortcut would be rejected. However, icons do not need to be on any particular origin, so those are OK. You'd think, though, that the shortcut would not be valid. What actually happens, under the current spec, though, is:
In fact, the shortcut is valid in this case, which I find quite gross, as the validity of the shortcut depends on which URL you were on when you installed the manifest. If you installed the exact same manifest from Under my proposed text for this PR, the above case would invalidate the shortcut unconditionally. Now what would happen is:
This is a minor breaking change, but I think it's a good change, because:
Edit: Adding the "breaking" label back due to the above scenario. Edit: Added protocol handlers. |
Another edge case that changes with this PR: For example, say that you are in a document at {
"scope": "https://example.com/"
} Current spec:
Proposed spec:
So the new scheme would be changing |
I think it's too risky to manipulate the So, if the scope is:
Continue checking the rest (invalidate from there). |
Yeah I just had a similar thought. What if you have an app whose I think we have to 🔥 the whole manifest. Actually I think this is the subject of #784. |
Chatting on the call with @marcoscaceres about nuking the whole manifest vs selectively cleaning it out. @mgiuca would prefer to clean only the members that would become suspect if we changed the scope. Thinking about creating a definition called "scope-dependent member", then If the manifest has no |
Wait, I just realised, this already happens automatically, due to the change to "within scope" to always return false if scope is undefined. So we don't need to worry about the case where the manifest has an invalid scope. The only case we need to explicitly invalidate these members is the #784 case: if you link to a manifest from a document whose origin does not match the manifest's valid scope. |
I did an HTTP Archive analysis, looking for manifests that would break based on the above change. Conclusion: This change will not break any existing sites, since there appear to be no sites relying on the document URL to provide an implicit scope for a share target, shortcut, file handler or protocol handler. (i.e., every manifest in the archive with a Further, there are 304 manifests that would experience a change of scope to the parent directory of the document URL, due to there being no |
Thanks @mgiuca for validating our assumptions! As part of this change, it would be great to have some WPTests to make sure we all implement this properly. |
Yeah... it looks like all the WPTs are manual right now. Is that sufficient? It kind of sucks because the manifest processor is in theory a very cleanly testable mathematical function (input string -> output data structure, no side effects). Chromium's test suite, for example, includes literally hundreds of tiny test cases that test this processor's expected output, and that's what I've been playing with during this experiment. But I don't see how to translate that to WPT, since we have no way of verifying the browser's internal representation of the manifest. We can do a handful of manual test cases, but I wouldn't want to translate Chromium's O(hundreds) of tests into manual tests. TBH, I've given this particular issue way (waaaay) more time than I really can afford, and I need to move on to other things. I don't know if I have the bandwidth to add new tests here (I could update existing tests, but it's a big investment to create them from scratch). FYI here is the corresponding Chromium CL: https://chromium-review.googlesource.com/c/chromium/src/+/2280807 |
We could expose the parsed manifest back to JS via the Document IDL. It might be useful for web app framework libraries to read. |
I agree that getting the processed manifest might be nice, but it has implications as it triggers fetching, etc. Probably not something we want to deal with in "v1". |
If I understand this right, we will have 3 notions/options of scope based on invalidations:
and I believe the proposal is to base the share targets/etc validation on the 2nd of these? I have a crazy idea here. Looking towards the goal of having the manifest being the source of truth for a web app (where we should be able to install a web app with just a manifest) what if we introduce a new noun into the spec - an 'isolated' or 'isolatable' manifest: An 'isolated' manifest is a manifest that has a start_url that includes an origin. 'isolated' manifests are applicable to installing/applying to a document_url IFF the document_url is the same origin as the start_url. This gets rid of the possibility of 3. above, and also doesn't seem to break anything currently (as per mgiuca@'s analysis above). This also starts reenforcing the idea of a manifest being 'all that you need' to install a web app, and starts that definition as something that has a start_url including an origin. It also doesn't invalidate manifest that, say, only have a theme_color, as that is fine and valid. This manifest is not 'isolated', and going towards the goal here, that manifest wouldn't be used to fully describe a webapp to be installed. |
Does this mean non-isolated manifests can't be installed? I think we still want to allow this and just block powerful capabilites like file handlers. |
No - all manifests can be installed except isolated manifests that have been invalidated due to the document url being a different origin from the start_url origin. Having an origin in your start_url basically just adds a new constraint on the manifest being valid (and installable)- that the document url origin is the same origin as the start_url |
I'm worried about doing this for v1, tbh... we should initially try to specify for the behavior of Chrome, Safari, and Firefox and how they work in relation to the document. We have a whole bunch of interop issues to deal with just with that. |
I'm hoping I didn't bikeshed this too much! You can ignore my crazy idea :) |
This change (choose one):
changes normative sections without changing behavior)
Implementation commitment (delete if not making normative changes):
Commit message:
If the start URL is missing or invalid, the processed manifest simply does not have a start_url, rather than defaulting to the document URL. However, in the steps to determine the installability of the document, the old behaviour is maintained (a manifest without a start_url is allowed to use the document URL).
This doesn't actually break existing behaviour, but it enables new user agent behaviours:
Closes #668 and #669.
Preview | Diff