Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<picture> and <source> elements: How should 404s with placeholder image data be handled? #8916

Open
DavidJCobb opened this issue Feb 20, 2023 · 6 comments
Labels

Comments

@DavidJCobb
Copy link

It's possible for a request for an image to receive a 404 response that has an image Content-Type with valid image data attached, to be displayed as a placeholder. Currently, img elements are required to display this placeholder data (per 4.8.4.3 Processing Model):

Whether the image is fetched successfully or not (e.g. whether the response status was an ok status) must be ignored when determining the image's type and whether it is a valid image.

What's less clear is how this should apply to the case of picture elements with a list of sources, when one of those sources fails with a placeholder 404 image.

The status quo

My understanding of the spec is that if a source fails with a 404, regardless of whether placeholder data is received, the browser is supposed to try the next source in the list:

  • The media resource selection algorithm, step 9, case children, sub-step 8 requires that the source's target resource be fetched using the resource fetch algorithm.
  • The resource fetch algorithm, step 4, case remote, sub-step 8, sub-sub-step 4 requires that the network request used to fetch the resource be verified. Verification fails if the internal response's status code is not 200 or 206.
  • Some of the steps around there link to the HTTP fetch standard; I dug around and didn't find anything in that standard that would result in the browser "lying" about a 404 request with placeholder data not being a 404. Looking at the HTTP-network fetch steps, it seems like the server's original status code is applied to the response directly, at least for codes outside the [100, 199] range.
  • If the resource fetch algorithm fails, then the media resource selection algorithm, step 9, case children will not be aborted at sub-step 8, and so will continue to the next steps, which involve moving onto the next source (if any) and trying that one.

It seems that the spec-compliant behavior would be:

  • If a source fails with a 404, try the next source, regardless of whether the failing source had placeholder image data attached to the 404 response.
  • If all sources fail, and the img also fails, then use only any placeholder image data attached to the img's 404 response. If the img 404 response did not contain placeholder data, then display a broken image (i.e. no placeholder data from any source will be used).

Current browser behavior

If Chromium and Mozilla Firefox encounter a source that 404s with placeholder content, they will display that placeholder content, instead of making any attempt to try the next source(s) in the list. Additionally, neither browser makes any attempt to try more than the first source with a supported MIME type. Consider the code below:

<!doctype html>
<html>
   <body>
      <picture>
         <!-- <source srcset="https://example.com/nonexistent.png" type="image/jpeg" /> -->
         <source srcset="https://i.ytimg.com/vi/asdf/nonexistent.jpg" type="image/jpeg" />
         <source srcset="https://i.ytimg.com/vi/dQw4w9WgXcQ/maxresdefault.jpg" type="image/jpeg" />
         <source srcset="https://i.ytimg.com/vi/dQw4w9WgXcQ/sddefault.jpg" type="image/jpeg" />
         <img set="https://i.ytimg.com/vi/dQw4w9WgXcQ/default.jpg" />
      </picture>
   </body>
</html>

We are here trying to display the YouTube thumbnail for Rick Astley's "Never Gonna Give You Up." There are multiple possible URLs for a YouTube thumbnail, and not all such URLs are available for every video; additionally, if any image request to the YouTube thumbnail server fails, the server returns a 404 response with a 120x90px image/jpeg placeholder attached. This placeholder is what the above code displays in Firefox and Chromium. If we uncomment the obviously broken example.com URL, then Firefox and Chromium just display a broken image; watching the network requests allows us to see that they never attempt to request anything else after failing to load nonexistent.png.

It seems, then, like a key portion of the spec for picture and source elements is completely unimplemented in every major browser.

So I guess my questions are...

  1. Should the spec be amended to match existing browser behavior? It seems like it'd be far more useful if web browsers actually implemented the "try each source until one works" part of the spec instead.

  2. Suppose that web browsers were to ignore 404 placeholder data on a source: if a source 404s, the browser tries the next source or, if none remain, the img, instead of just displaying any placeholder data that was sent along with the 404.

    If all sources fail, the img also fails, and one or more sources had placeholder data, then should the spec be amended to have the browser display one of those sources' placeholders? Should the browser prefer the earliest source in the list that had a placeholder? Should the browser prefer the img placeholder, if one was received, over all source placeholders?

    I don't know terribly much about how network requests are done under the hood, but I'd imagine that if browsers can use the earliest placeholder in the list, then they don't have to actually bother to fully load the placeholders for any failing source elements after that one.

  3. ...Am I even interpreting these parts of the spec properly to begin with?

@DavidJCobb
Copy link
Author

Even more bizarre is that the standard's text for the source element itself recommends using JavaScript to manually accomplish the behavior that the resource selection algorithm already seems to require from browsers:

If the author isn't sure if user agents will all be able to render the media resources provided, the author can listen to the error event on the last source element and trigger fallback behavior:

<script>
 function fallback(video) {
   // replace <video> with its contents
   while (video.hasChildNodes()) {
     if (video.firstChild instanceof HTMLSourceElement)
       video.removeChild(video.firstChild);
     else
       video.parentNode.insertBefore(video.firstChild, video);
   }
   video.parentNode.removeChild(video);
 }
</script>
<video controls autoplay>
 <source src='video.mp4' type='video/mp4; codecs="avc1.42E01E, mp4a.40.2"'>
 <source src='video.ogv' type='video/ogg; codecs="theora, vorbis"'
         onerror="fallback(parentNode)">
 ...
</video>

Of course, I'm not sure the above code complies with the spec behavior either. If a source element fails, the browser is supposed to trigger the onerror handler on the failing source element (resource selection algorithm, step 9, case children, sub-step 9). The JavaScript code snippet that the standard recommends above only places an onerror event handler on the last source; if the browser isn't already trying all sources one by one on its own, then it'd never get to that last source. I haven't tried it yet, but I'd expect that for a script like this to work, you'd need to put onerror on the first source or on the media element itself; and if you put it on the first source, then in the fallback function, you'd have to copy it to the next source before removing the failing source.

@domenic
Copy link
Member

domenic commented Feb 20, 2023

Is "placeholder" just your word for "a 404 response"?

@DavidJCobb
Copy link
Author

More or less, yes; in this context, I was using it to mean "a 404 response that has valid image data attached, that a browser can display." Apologies -- I should've been more explicit about that.

@annevk
Copy link
Member

annevk commented Feb 24, 2023

This seems like a specification bug to me. We shouldn't use different fetching rules from <img> for <picture>.

cc @zcorpan @yoavweiss

@zcorpan
Copy link
Member

zcorpan commented Feb 24, 2023

This algorithm does not apply to img (or picture), it only applies to media elements i.e. audio and video.

The relevant algorithm for img is https://html.spec.whatwg.org/multipage/images.html#updating-the-image-data which selects a URL to load and sticks with it; the response being a 404 does not cause the resource selection to change. So the browser behavior you see matches the spec for this case.

The "try next source if the response was invalid" behavior for media elements were intentionally not copied for picture, since it causes unreasonable implementation complexity.

Maybe the spec should be clearer that the "resource selection algorithm" doesn't apply for img?

@zcorpan zcorpan added the clarification Standard could be clearer label Feb 24, 2023
@DavidJCobb
Copy link
Author

DavidJCobb commented Feb 24, 2023

Maybe the spec should be clearer that the "resource selection algorithm" doesn't apply for img?

That, and I need to be more attentive. I found the link to the resource selection algorithm at the bottom of the section for source elements, and didn't notice that it's only mentioned as being used when the source's parent node is a media element (i.e. not a picture). My apologies.

Some other notes re: clarity though:

The section for picture elements has a note box stating that "the resource selection algorithm is different," but it doesn't appear to specify which algorithm is used; linking to img's algorithm here could be helpful. The only specific remarks that this note box gives about source elements is that their src attributes are not used, though it's not explicit about why (i.e. the srcset attribute isn't mentioned).

The section for source elements notes that picture and media elements have different behaviors, but doesn't seem to directly state that the former has a different resource selection algorithm. Not sure whether the section for source would need to state that if the section for picture states it explicitly, but stating it explicitly in both places could help prevent misinterpretations like mine, maybe.

The "try next source if the response was invalid" behavior for media elements were intentionally not copied for picture, since it causes unreasonable implementation complexity.

Thanks for the info. Did some research to try and understand this better... It looks like that was decided upon here in order to have picture avoid media element edge-cases that stemmed from sources being separate elements from their containing media. Makes me wonder whether having a more advanced selection algorithm on an attribute (like srcset on img) would work better, but I guess that'd be a separate discussion, if it's one worth having.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants