Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow up gap controller fixes #2453

Merged
merged 42 commits into from Dec 5, 2019
Merged

Conversation

robwalch
Copy link
Collaborator

@robwalch robwalch commented Nov 20, 2019

This PR will...

Align the gap-controller with changes from #2359 and #2370 (68fb401...c7c0e13)

Why is this Pull Request needed?

  1. Jump start gaps (fixes).
  2. Do not report stalls before playback has started or when seeking.

Sample streams with gaps:

https://playertest.longtailvideo.com/adaptive/short/short.m3u8 (start gap and partial)
https://playertest.longtailvideo.com/adaptive/sintel_aes/playlist.m3u8
https://dapian.video-yongjiu.com/20190901/9943_6699ca5d/index.m3u8 (Safari)

Resolves issues:

Fixes #2323
Fixes #2381
Fixes #2436

Checklist

  • changes have been done against master branch, and PR does not conflict
  • new unit / functional tests have been added (whenever applicable)
  • API or design changes are documented in API.md

@robwalch robwalch added this to the 0.13.0 milestone Nov 20, 2019
@robwalch robwalch added this to Needs review in 0.13.0 Nov 20, 2019
@robwalch robwalch force-pushed the bugfix/fix-and-improve-gap-controller branch from bbe18c1 to e9224d5 Compare November 21, 2019 00:29
Copy link
Collaborator

@tchakabam tchakabam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug in this currently used GapController base implementation (i.e the one from v1.0 if I understand well).

The bug is not part of the diff here, but in the common base.

Please take a look at this line: https://github.com/video-dev/hls.js/pull/2453/files#diff-6783293fa33ed21567e2a1971b008425R53

Line 53 of this gap-controller.js file here:

    if (media.seeking && BufferHelper.isBuffered(media, currentTime)) {
      return;
    }

This means we are returning from the poll method when we are seeking AND the data is in buffer.

This doesn't make any sense. Here is why: We want to detect stalls, meaning when the media SHOULD be playable, but is not. That is given exactly when the above condition is fulfilled.

On the other hand, we want to not detect stall when we are seeking and the data is NOT in the buffer. In that case, a playhead "halt" is EXPECTED, and we should return from poll.

Currently, we get because of the above bug in the logic a BUFFER_STALLED_ERROR on every seek.

To fix that, you need to change line 53 to:

if (media.seeking && !BufferHelper.isBuffered(media, currentTime)) {

That will mean, we expect the playhead to halt here, since we are seeking to a place we don't have buffered data yet. So we don't need to "detect stalls".

Inversely, any other case where we are seeking and have data (or not seeking at all) deserves stall detection.

Hope you can follow these probably straight forward explanations on the matter 👍

@robwalch robwalch force-pushed the bugfix/fix-and-improve-gap-controller branch from 45c8f71 to b930963 Compare November 21, 2019 19:22
Copy link
Collaborator

@tchakabam tchakabam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual problem with this patch now I see atm is we are not detecting a specific stall case which should be detected. See my comment on the relevant line.

if (!this.stalled) {

// There is no playable buffer (seeked, waiting for buffer)
if (bufferInfo.len === 0 && !bufferInfo.nextStart) {
Copy link
Collaborator

@tchakabam tchakabam Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this condition (bufferInfo.len === 0) be equivalent to the one we check just further above (!media.buffered.length) ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what about bufferInfo.nextStart being 0 ? Also, why is it necessary to check on this value here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably change that to ! Number.isFinite(bufferInfo.nextStart) ?

Also, I am bit confused: How do we even get to this check, since we check for !media.buffered.length before already and return in that case?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understood now again the diff between !media.buffered.length and bufferInfo.len === 0. Since the latter results from only looking at time-ranges up-front the passed media position. So yes, this makes more sense to me now.

Fixes the false-positive detection we saw after seeking it seems ✔️

Copy link
Collaborator Author

@robwalch robwalch Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bufferInfo is relevant to currentTime and very different from media.buffered.

bufferInfo.len is the forward buffer for the buffered range that currentTime intersects.
bufferInfo.nextStart is the start of the next buffered range.

If we are not in a buffered range and there is no buffered range to jump to, there is no need to continue. We are waiting for media to buffer on start or seeked at this point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, remembered that just before :) see previous comment!

I totally agree on the necessity to check on bufferInfo.nextStart, but you should use
! Number.isFinite(bufferInfo.nextStart) to correctly handle the case when that value is 0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not in a buffered range and there is no buffered range to jump to, there is no need to continue. We are waiting for media to buffer on start or seeked at this point.

Correct! Exactly that! And you should apply the same logical approach when the seeking flag is true, which you are avoiding with the line I had commented on above. I still don't understand why this implementation will simply want to ignore that specific case, while making it necessary to add logic further down to check against seeking again. But we don't need to discuss this further, as I dont see it beneficial for the current work (we can come back to it when we will see that as an issue, or address it in another patch to gain common understanding about it).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchakabam can we mark this as resolved?

Copy link
Collaborator

@tchakabam tchakabam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't fully agree on a detail (line 56), this passes my checks and works for the cases we wanted to fix, and the issues we initially wanted to address in the logic are fixed now (except the one I mentioned here).

So, after further checking and latest changes, I would agree to merge this.

👍

if (!this.stalled) {

// There is no playable buffer (seeked, waiting for buffer)
if (bufferInfo.len === 0 && !bufferInfo.nextStart) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not in a buffered range and there is no buffered range to jump to, there is no need to continue. We are waiting for media to buffer on start or seeked at this point.

Correct! Exactly that! And you should apply the same logical approach when the seeking flag is true, which you are avoiding with the line I had commented on above. I still don't understand why this implementation will simply want to ignore that specific case, while making it necessary to add logic further down to check against seeking again. But we don't need to discuss this further, as I dont see it beneficial for the current work (we can come back to it when we will see that as an issue, or address it in another patch to gain common understanding about it).

@tchakabam
Copy link
Collaborator

Ok I realized also, the case where bufferInfo.nextStart is 0 can't happen at the same time as bufferInfo.len === 0 since media-time can't be negative. But still, I would prefer if it was more explicit what we are checking against (0 or undefined?)

tchakabam
tchakabam previously approved these changes Nov 21, 2019
@mtoczko
Copy link
Collaborator

mtoczko commented Nov 24, 2019

@robwalch you can take a look at #2207
After seek to 0, player is stalled.
stream:https://mtoczko.github.io/hls-test-streams/test-audio-pdt/playlist.m3u8

@robwalch
Copy link
Collaborator Author

After seek to 0, player is stalled.

Hi @mtoczko. Are you OK with me hosting these test streams for use in our functional tests?

Here's the fix for that and the functional test that was failing (different partial fragment stream):
1fb7e64

And here's a new functional test that covers #2207 with hls-test-streams/test-audio-pdt/playlist.m3u8 uploaded to my test stream bucket:
d77a7d9

If you prefer I not host this I can remove the last commit. We don't add github hosted streams to the functional tests, otherwise I would use your link.

@mtoczko
Copy link
Collaborator

mtoczko commented Nov 25, 2019

Hi @mtoczko. Are you OK with me hosting these test streams for use in our functional tests?

HI @robwalch. I don't see a problem with that.

Here's the fix for that and the functional test that was failing (different partial fragment stream):
1fb7e64

And here's a new functional test that covers #2207 with hls-test-streams/test-audio-pdt/playlist.m3u8 uploaded to my test stream bucket:
d77a7d9

Good job.

Copy link
Collaborator

@itsjamie itsjamie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested Chrome on Windows for the common start gap jumps using the preview.

One thing I noticed that goes against the constants at the top of gap-controller, it's still jumping the whole 10s gap on https://s3.amazonaws.com/qa.jwplayer.com/~alex/123633/new_master.m3u8. However the constant name MAX_START_GAP_JUMP is set to 2.0. This seems to indicate something is incorrect in the handling with regards to that constant. However, it does function how I imagine we want it to (in that it plays).

logger.js:40 [warn] > skipping hole, adjusting currentTime from 0 to 10.296643000000001
(anonymous) @ logger.js:40
_trySkipBufferHole @ gap-controller.js:179
_tryFixBufferStall @ gap-controller.js:122
poll @ gap-controller.js:105
_checkBuffer @ stream-controller.js:1271
doTick @ stream-controller.js:129
tick @ task-loop.ts:111
setInterval (async)
setInterval @ task-loop.ts:73
startLoad @ stream-controller.js:57
(anonymous) @ hls.ts:301
startLoad @ hls.ts:300
onMediaAttached @ stream-controller.js:667
onEventGeneric @ event-handler.ts:82
onEvent @ event-handler.ts:69
emit @ index.js:203
trigger @ observer.ts:13
BufferController._this._onMediaSourceOpen @ buffer-controller.ts:239
main.js:539 Error event: {type: "mediaError", details: "bufferSeekOverHole", fatal: false, reason: "fragment loaded with buffer holes, seeking from 0 to 10.296643000000001", frag: Fragment}

Copy link
Member

@michaelcunningham19 michaelcunningham19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, some of the complexity of these conditionals makes this more difficult to follow - but there are no quick improvements I can suggest to improve that. I think with good inline comments on the inner workings we can keep it clear enough.

I tested this out manually on one of the assets called out in the description and confirmed this does fix the stall issue. That was Chrome on macOS.

@tchakabam
Copy link
Collaborator

Overall looks good, some of the complexity of these conditionals makes this more difficult to follow

We should address that in another ticket (maybe inspired from my initial patch on this) once v1.0 is merged and we can all work together on one version again 🎉

tchakabam
tchakabam previously approved these changes Nov 26, 2019
Copy link
Collaborator

@tchakabam tchakabam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm this fixes things to an equivalent result as we had obtained with our previous patch!

Please proceed as you are satisfied with the results as well 👍

this.stallReported = false;
this.stalled = null;
this.moved = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be called playheadHasMoved to be more clear what it is for someone new to the codebase

return;
}

if (media.ended || !media.buffered.length || media.readyState > 2) {
return;
// Clear stalled state after seeked so that we don't report stalls coming out of a seek
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 that's good stuff, we didn't get there in the previous PR

if (!this.stalled) {
const isBuffered = bufferInfo.len > 0;

// There is no playable buffer (seeked, waiting for buffer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment about having seeked is confusing, isn't it? we are not checking for that variable in the below condition

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchakabam can we mark this as resolved?

const isBuffered = bufferInfo.len > 0;

// There is no playable buffer (seeked, waiting for buffer)
if (!isBuffered && !bufferInfo.nextStart) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just explain also, what are we checking against here on bufferInfo.nextStart ? Zero or undefined? When it is zero isBuffered would be true obviously, so no need to check that. We would expect it to be undefined if nothing upfront playhead is loaded yet, right?

However, it could be zero and could the playhead on some implementations go to negative values? (in the embedded world there might be other video-element/DOM/MSE custom impl)

Ok, that may seem a little bit far-fetched seen from here, where we rely on things complying with standard browser specs... But:

What really bugs me here is also that it is not explicit wether we check for zero or undefined, even if in this specific case the two specific checks collapse into one.

I would prefer if this check was explicit, in fact I think we really expect this to be undefined in some cases where isBuffered is false, right? So I would prefer it we would use something like Number.isFinite, also to be consistent with other places in the code where we get data-structs with conditionnally present fields.

Copy link
Collaborator Author

@robwalch robwalch Nov 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero or undefined?

Either. I don't think it's possible to have a nextStart of 0. If we're at 0, and there was a buffered range starting here, then bufferInfo.len would be 1 or more, and nextStart would only have a value if there was a second range further down. nextStart can also never be less than currentTime.

I would prefer if this check was explicit

I don't think it needs to be explicit based on the reasoning above. I prefer less code and consider the not operator to imply either.

I did change this slightly in the v1 branch so that nextStart is always a number and we don't have to worry about <> checks.

const nextStart = bufferInfo.nextStart || 0;

Copy link
Collaborator

@tchakabam tchakabam Nov 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your reasoning is same as mine and ...

nextStart can also never be less than currentTime.

... I explained in my precedent post precisely that it might be zero if your currentTime can go negative :)

But that is a case we don't need to think about now - it is not going to happen on standard browsers.

I don't think it needs to be explicit based on the reasoning above. I prefer less code and consider the not operator to imply either.

We have the same reasoning, but don't take the same action from it.

I consider that, since we do not expect zero (since you said yourself it should not happen at any point where we reach the check), we effectively really check wether this is a number or not (using for example Number.isFinite). And it should be explicit that we do that, and nothing else and that is why the check is done.

Any how, avoiding in the first place to have conditional fields and returned result objects like here with nextStart having the undefined value is fixing the issue at its root really, which I appreciate very very much (and we should in general make it a consistent guideline in Hls.js to avoid fields carrying undefined value at all).

Which in fact allow to take the elegant logical shortcut like you say for v1 (be able doing const nextStart = bufferInfo.nextStart || 0;).

So, let's all wait for v1, looking forward :)

Copy link
Collaborator

@tchakabam tchakabam Nov 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To sum up, after what you said on bufferInfo.nextStart in the v1 branch, I guess we can then dismiss this discussion and have this all converge to the state of how it is in v1 later on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchakabam can we mark this as resolved?

@robwalch
Copy link
Collaborator Author

One thing I noticed that goes against the constants at the top of gap-controller, it's still jumping the whole 10s gap

@itsjamie In that case the first segment was flagged as "partial" (It contains audio and video with no overlapping time ranges) so it must be skipped. The 2 second MAX_START_GAP_JUMP applies to jumping to the start of the next (first) buffered range. I added this constant to avoid jumping the first segment during an up-switch on start. This happens in one of our functional tests; The buffer for the first segment is flushed as the second higher quality segment is appended, then the first is rebuffered at a higher quality. I'm not a fan of that behavior, but it is what it is for now.

@robwalch
Copy link
Collaborator Author

@michaelcunningham19 Thanks for feedback. I'm hoping we can refactor this after feature/v1.0.0 is merged into master. Addressing the complexity should be one goal, but I think this module should not operate independently of the fragment tracker and stream controller. Awareness of what is loading and appending and where will help both stall reporting and gap jumping, and we might even strive to separate the logic for each of these features.

@tchakabam
Copy link
Collaborator

You should maybe pull in current master to get the CI flakieness fixed by #2445

@itsjamie
Copy link
Collaborator

itsjamie commented Nov 27, 2019

There seems to be an issue with the seek 5s from end functional test. Explicitly with the Oceans AES content.
Stream: https://playertest.longtailvideo.com/adaptive/oceans_aes/oceans_aes.m3u8

Depending on what has been loaded the duration element on the video is set to 118. This causes us to seek to 113, which is straight into a buffer hole that won't be filled because of what content falls in the last segment. This currently will stall and the test case fails. Depending on what has been loaded, this duration may be updated to the values the Media Source knows and we will seek not into the hole and the test will pass.

@robwalch robwalch force-pushed the bugfix/fix-and-improve-gap-controller branch from 826ab11 to 2cfbe31 Compare November 27, 2019 15:34
Copy link
Collaborator Author

@robwalch robwalch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JStackHouse I’ll look into that test issue. Are you able to reproduce that running the functional tests locally?

It looks like we also need to distinguish between the user seeking and the gap-controller seeking. If we’re seeking because we needed to nudge or jump a gap, at that point, these seek guards should not apply because additional nudges may be required, and if they fail then we should still report the stall.

@robwalch robwalch force-pushed the bugfix/fix-and-improve-gap-controller branch from 4a33848 to 189655b Compare December 3, 2019 17:30
@robwalch
Copy link
Collaborator Author

robwalch commented Dec 4, 2019

There seems to be an issue with the seek 5s from end functional test. Explicitly with the Oceans AES content.

@itsjamie 4fb2206 should address issues with seeking into buffer gaps.

The test run for that change failed on the DRM test stream. I don't have any problems running it locally. I changed the assertions so that it will print out the complete callback result when failing. If it's not a fluke hopefully that will give me a better idea of what's going on.

UPDATE 1
https://travis-ci.org/video-dev/hls.js/jobs/620379203?utm_medium=notification&utm_source=github_status
Yep! "keySystemNoKeys". Any ideas? The last two log statements suggest we got a license. Maybe a race condition in _attemptSetMediaKeys?

2019-12-03 23:49:18.171: "[log] > License request succeeded"
2019-12-03 23:49:18.172: "[log] > Received license data, updating key-session"

UPDATE 2
Added byteLength to the license data logs statement to help troubleshoot, but the build passed!

2019-12-05 00:22:22.773: "[log] > Received license data (length: 714), updating key-session"

If the "keySystemNoKeys" error comes up again we'll be able to see whether or not it has to do with the license data payload.

Copy link
Member

@michaelcunningham19 michaelcunningham19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job!

0.13.0 automation moved this from Needs review to Reviewer approved Dec 5, 2019
@tchakabam
Copy link
Collaborator

If the "keySystemNoKeys" error comes up again we'll be able to see whether or not it has to do with the license data payload.

👍 Totally, it will help taking conclusions comparing the size of the license on a succeeding client

@robwalch robwalch merged commit ffe1569 into master Dec 5, 2019
0.13.0 automation moved this from Reviewer approved to Done Dec 5, 2019
@robwalch robwalch deleted the bugfix/fix-and-improve-gap-controller branch December 5, 2019 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.13.0
  
Done
5 participants