-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 31-media segment must have frame for all AV tracks #43
Fix issue 31-media segment must have frame for all AV tracks #43
Conversation
From separate email from @jdsmith3000, this PR is being reviewed now. |
@wolenetz: I would like to confirm the behavior with and without this change. With it, muxed streams that lack frame data in a given audio or video track per segment will trigger an error and signal end of stream for what could be a problem in a single segment. Our current implementation would play through this condition with some artifact, and continue parsing subsequent segments. The difference in parsing between having zero and one coded frame per track doesn't seem obvious to us. Can you elaborate a bit on why you believe it has benefit? I recall the discussion about having objective criteria, but the we don't still don't see the benefit in adding this error for missing track data. It would be most beneficial from our view to encourage equal duration data across tracks, though I'm not sure even there that signaling end of stream on unequal durations would be desirable. |
Edge cases that may not interop well are left open without a change like this proposed one. Some examples (all with a muxed A/V SourceBuffer, with 1 audio track and 1 video track): Example a) Append video-only media segment that has valid coded frame groups in time range [2000,2200) Example b) Notably, the same coded frames were appended for each track in both examples, but example a) stalls on seek+play from time 0, though that may not be interoperably implemented. I'm for making it clear in the spec what should interoperably play with or without stall, and what expected SourceBuffer.buffered() should return for each of these example scenarios. I believe the change I've proposed simplifies these edge cases, but if it introduces playback failure for common MSE API users, I would of course be interested in alternative clarifications. Based on our preliminary chats, we think such playback failure would not be common. |
@wolenetz; We see this as more of a quality of implementation issue and not one that should be formalized in MSE. For both your example a) and example b) (one muxed and one not), our MSE should play. If no data is available across all tracks, then we would stop and wait for data to be appended. As you note, issue #31 feelslike a problem that would not commonly be encountered, though we do think it is real world, and sometimes intentionally. It's possible for low frame rate videos to miss entire segments (e.g. a slideshow), and we believe it's relatively common at the conclusion of movies for audio or video to end first. We've prefer not to make MSE changes that could break these scenarios, and advocate instead to play content whenever possible, even if it's audio or video only in gaps. |
@jdsmith3000, web authors desire interoperability. If the spec is unclear, for example, about what buffered ranges are for example (a), authors risk playback quality either way (perhaps they do want to stall until media is available for all A/V streams, perhaps they don't.) Ideally, authors shouldn't need to detect browser vendor to condition their expectations for scenarios like this. I'm not against rolling back the WebM pre-existing restriction, but would need some better clarification still in the spec around what is expected (e.g., what would group_{start,end}_timestamps look like in the AppendMode transition that might occur in the middle of single-stream media segment appends to muxed SourceBuffers? Do you have a suggestion for how to word this in the spec to both improve interoperability expectation for web authors while not regressing the scenarios you describe (low frame rate videos; jagged-ended A/V; etc)? |
@wolenetz: My read of your change was that it primarily required at least one frame of data be in each sourcebuffer track for playback to continue. We believe playback should continue if a single track has at least one frame. We expect the majority of content to be well-formed, and so would prefer to play unless all tracks are missing data. This is what we strive to do in Edge today. If there are other aspects of the change that I've not been discussing, please highlight them for me. |
@jdsmith3000, for well-formed muxed content, this change makes no difference. The change is meant to give interoperable predictability to "not well-formed" muxed content. If it's too strict, I would be interested in alternate text. Somewhat related: how does Edge behave if there is 1 SourceBuffer each for audio and video track, and one of the SourceBuffers has a discontinuity (per the coded frame processing algorithm) at time X? My understanding of HTML5 and MSE is this should cause a playback stall at time X. Does Edge play through the discontinuity without a stall? |
I'm going to close this PR (without merging it), as we don't want to regress the lenient behavior some user agents (Edge, and Chrome soon) have for less-than-well-formed muxed A/V streams that may not have coded frames for all audio and video tracks in each media segment. One ad-hoc example where this leniency might be required would be for low-latency muxed live streams where video framerate might be low, but audio needs to be rendered at low latency. For at least ISO-BMFF bytestream, this implies using very small moof's, not all of which might contain video. Thanks for engaging in this discussion, Jerry, and helping prevent regressing at least this example scenario that could be important for some MSE API users :) |
Happy New Year!
@jdsmith3000 : please take a look. While I think this satisfies the gap I noted in the original w3c bug 29188, I didn't include any "roughly equal duration" nor "roughly same starting [decode?] timestamp" language in the non-normative note, since those should be taken care of by the explicit, normative, logic in the coded frame processing algorithm's discontinuity detection. If you know of any gaps still in that logic related to this, please let me know. Otherwise, I prefer keeping the fix for this narrow and not introduce any potentially confusing/conflicting duration/timestamp language in the new note.
@acolwell : please take a look at the web bytestream format portion of this change. The intent is to just make it clear that the requirement of at least one coded frame for each A/V track exist in media segment is now common to all bytestreams (this is no longer something specific to just webm bytestream format).
I also included a quick reference in the main spec's changelog to plh@'s recent heartbeat editorial fixes.