Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bandwidth sampling for small transfers and fast connections #3595

Closed
wants to merge 3 commits into from

Conversation

kanongil
Copy link
Contributor

@kanongil kanongil commented Mar 9, 2021

This PR will...

Fix bandwidth sampling weight for small transfers and/or fast connections.

Why is this Pull Request needed?

The current logic is broken for fast transfers (due to small transfer sizes, or from a fast connection). See #3563 for details.

Are there any points in the code the reviewer needs to double check?

This changes the exact meaning of the abrEwma<X> options, which could be considered a breaking change.

I have also had to significantly lower the abrEwmaFastLive default to accommodate LL-HLS level downswitches. The old value meant that it takes too long to discover a new lowered bitrate, and could cause multiple successive emergency level switches and stalls. Note that the new value could make a temporary connection issue more likely to cause temporary downswitches. It might make sense to only use this low value for LL-HLS content, but that is outside the scope of this patch.

Resolves issues:

Partly fixes #3563. With this PR, the ABR algorithm is more likely to switch up from a low level (still only when there is sufficient bandwidth). Low latency content on high latency links are still unable to measure a suitable bitrate.

Checklist

  • changes have been done against master branch, and PR does not conflict
  • new unit / functional tests have been added (whenever applicable)
  • API or design changes are documented in API.md

Copy link
Collaborator

@robwalch robwalch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kanongil,

Thanks for the change suggestion. I have to think about this one a bit more. I can't accept it based on how it would impact all streams. For now, a custom abrController is probably the best solution for your use-case until we can address the issue holistically.

@kanongil
Copy link
Contributor Author

kanongil commented Mar 18, 2021

I can't accept it based on how it would impact all streams.

That is the point of this PR. To fix the most egregious issue of #3563.

Did you see the detailed note in the commit, that gives an example of just how broken the current estimator is?

With 5s segments of size 100,000:
Start 10x 5 sec transfers => 160Kbps (reference)

Rate change (old):
1x 0.5 sec transfer (1.6Mbps) => 216Kbps
vs
1x 0.05 sec transfer (16Mbps) => 222Kbps (10x faster is only 4% more!!)

Rate change (new):
1x 0.5 sec transfer (1.6Mbps) => 627Kbps
vs
1x 0.05 sec transfer (16Mbps) => 5.3Mbps

Ie. a sampling with a 10x bandwidth increase can mean just a 1.35x increase over a 5 second interval, while a 100x bandwidth increase only makes it 1.39x !!!

An estimation rework is essential to ever get LL-HLS to work with ABR switching. This PR fixes the fundamental issue, and high-bandwidth estimation issues with the current implementation.

I really hope you will prioritise a fix before 1.0.0.

Note that the new default abrEwmaFastLive value is not essential to the fix, and could be omitted from the PR. Hls.js will need a mechanism to lower it for smooth near-edge LL-HLS ABR playback, though. Maybe the abrEwmaFastLive value could be capped to 1/2 the current time (in seconds) to buffer exhaustion?

The estimator factored the transfer duration into the weight. This means that
very fast transfers will have negligible influence on the measured average.

The minDelayMs_ was used to cap this effect, but it also capped the bitrate.
This meant that a 1000 byte transfer, transferred in 10ms, would register at a
rate of 160Kbps instead of 800Kbps (on top of the low weight).

The fix is to use the duration of the transferred fragment / part for the weight.
This preserves the effect of samples that match the line rate, while
significantly increasing the effect of fast transfers (due to small size, or
from a fast connection).

With 5s segments of size 100,000:
Start 10x 5 sec transfers => 160Kbps (reference)

Rate change (old):
1x 0.5 sec transfer (1.6Mbps) => 216Kbps
vs
1x 0.05 sec transfer (16Mbps) => 222Kbps (10x faster is only 4% more!!)

Rate change (new):
1x 0.5 sec transfer (1.6Mbps) => 627Kbps
vs
1x 0.05 sec transfer (16Mbps) => 5.3Mbps
@kanongil
Copy link
Contributor Author

I had another look at the estimation, and found that it can be simplified, and work better, if the weight is always a fixed value for each sample.

So my initial patch tried to use the fragment / part duration for the weight, which makes some sense, and certainly works a lot better than the current logic. However, it meant that the halfLife needed to be quite different for LL-HLS vs normal content.

I came to realise that there are essentially 2 modes when estimating, part loading vs. fragment loading, and both needs to adjust for bandwidth changes in sample time. Ie. when close to live edge, both modes have ~2 parts/fragments time to react to a bandwidth change.

Based on this realisation, I changed the abrEwmaFast/Slow values to just represent samples. This means that the same value will work quite nicely for both part and fragment loading, and the implementation can be simplified. I converted from the current slow=3 & fast=9 values using a normalised 6 second fragment duration. Besides improving the estimation responsiveness of part loading, I expect it will also work better for fragment loading for playlists with high/low fragment durations without tweaking abrEwmaFast/Slow.

As part of this revision I also removed the Live/VOD distinction of the config values. I did this since the default values were already the same, and because adjusting the values based on the playlist type is a very simplistic approach. It makes much more sense to adjust the values based on the current buffer level. Both live and VOD playback can have low & high buffer levels due to bandwidth conditions. The current values are tuned to a low buffer level, so it would be prudent to detect a high buffer level and raise the values dynamically, to avoid quality dips from a temporary bandwidth burp. This is probably outside of the scope of the current patch, though.

@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the Stale label Apr 16, 2022
@stale
Copy link

stale bot commented Apr 19, 2022

This issue has been automatically closed because it has not had recent activity. If this issue is still valid, please ping a maintainer and ask them to label it accordingly.

@stale stale bot closed this Apr 19, 2022
@robwalch robwalch reopened this Apr 19, 2022
@robwalch robwalch removed the Stale label Apr 19, 2022
@robwalch robwalch added this to In progress in Low-Latency HLS (LL-HLS) via automation Apr 19, 2022
@robwalch robwalch moved this from In progress to Awaiting Triage in Low-Latency HLS (LL-HLS) Jul 14, 2022
Copy link
Collaborator

@robwalch robwalch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a Contribution

What needs to happen is we must create separate estimates for round trip time (RTT or time to first byte depending on Browser JS API availability) and bandwidth. RTT will remain consistent between small and large segments from the same origin, and will inversely impact the estimate of time to load/append/play created by the ABR controller and used for variant switching. Splitting out these estimates based on available loader data and using them in our current stall/switch timing calculations should improve selection for streams with shorter (partial) segments.

robwalch added a commit that referenced this pull request Aug 4, 2022
…ents and higher TTFB

Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283)
Fixes #3563 and Closes #3595 (special thanks to @kanongil)
robwalch added a commit that referenced this pull request Aug 4, 2022
…ents and higher TTFB

Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283)
Fixes #3563 and Closes #3595 (special thanks to @kanongil)
@robwalch robwalch moved this from Awaiting Triage to In progress in Low-Latency HLS (LL-HLS) Aug 4, 2022
robwalch added a commit that referenced this pull request Aug 5, 2022
…ents and higher TTFB

Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283)
Fixes #3563 and Closes #3595 (special thanks to @kanongil)
@robwalch
Copy link
Collaborator

robwalch commented Nov 4, 2022

This PR has been replaced by #4825

@robwalch robwalch closed this Nov 4, 2022
@robwalch robwalch removed this from In progress in Low-Latency HLS (LL-HLS) Nov 4, 2022
robwalch added a commit that referenced this pull request Nov 28, 2022
…ents and higher TTFB

Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283)
Fixes #3563 and Closes #3595 (special thanks to @kanongil)
silltho pushed a commit to silltho/hls.js that referenced this pull request Dec 2, 2022
…ents and higher TTFB

Fixes video-dev#3578 (special thanks to @Oleksandr0xB for submitting video-dev#4283)
Fixes video-dev#3563 and Closes video-dev#3595 (special thanks to @kanongil)
robwalch added a commit that referenced this pull request Jan 9, 2023
…ents and higher TTFB

Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283)
Fixes #3563 and Closes #3595 (special thanks to @kanongil)
robwalch added a commit that referenced this pull request Jan 12, 2023
…ents and higher TTFB

Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283)
Fixes #3563 and Closes #3595 (special thanks to @kanongil)
robwalch added a commit that referenced this pull request Jan 18, 2023
* Improve bandwidth estimation and adaptive switching with smaller segments and higher TTFB
Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283)
Fixes #3563 and Closes #3595 (special thanks to @kanongil)

* Load rate and loaded delay calculation fixes

* Convert ttfbEstimate to seconds

* Include main variant init segments in TTFB sampling

* Use ttfb estimate in abandon rules down-switch timing
robwalch added a commit that referenced this pull request Jan 26, 2023
* Improve bandwidth estimation and adaptive switching with smaller segments and higher TTFB
Fixes #3578 (special thanks to @Oleksandr0xB for submitting #4283)
Fixes #3563 and Closes #3595 (special thanks to @kanongil)

* Load rate and loaded delay calculation fixes

* Convert ttfbEstimate to seconds

* Include main variant init segments in TTFB sampling

* Use ttfb estimate in abandon rules down-switch timing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rate estimation is fundamentally broken
2 participants