Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low-latency HLS Streaming #1

Open
wants to merge 22 commits into
base: master
from

Conversation

@johnBartos
Copy link
Contributor

commented Sep 28, 2018

Please leave all feedback in this PR. Nothing is set in stone; if you think something won't work, or there's a better way, please open a discussion so we can make it better. I'm still doing editing & cleanup but grammatical improvements are also very appreciated.

Also note that this PR isn't just for Hls.js - other clients are welcome and encouraged to participate. It'd be great to see more implementations in the wild.

Please see https://github.com/video-dev/hlsjs-rfcs/pull/1/files#diff-585ec6c4984f979b2e85a1ee0280b804R191 for the list of open questions. Everyone is welcome to help resolve them!

Show resolved Hide resolved proposals/0001-lhls.md Outdated
@shacharz

This comment has been minimized.

Copy link

commented Sep 28, 2018

gj on the initiative!

some initial notes:

  • media server and http edge aren’t necessarily the same machine, what happens when you have a prefetch segment but the edge isn’t configured to chunked transfer ?
  • Should media container be in the discussion?
  • should playback rate and rebuffer handling be part of the spec?
  • when does the media server need to advertise the next prefetch segment ?
    Specifically race conditions like edge doesn’t have the file at all yet should be considered to avoid 404’s
  • Spec terminology nits: first and last are a bit hard to follow, I prefer newest, oldest
@johnBartos

This comment has been minimized.

Copy link
Contributor Author

commented Oct 3, 2018

@shacharz Thanks for the feedback!

  1. I think this is out of scope for this proposal. I'm assuming that developers know how to deliver segments via HTTP transfer; this proposal creates a common language to signal them to a client.

  2. Personally I'm all for CMAF-only. But some developers want MPEG-TS and have provided compelling reasons. That discussion is ongoing. But as for whether it should be part of this spec is questionable. If a developer wants MPEG-TS and a client wants to support why should we stop them?

  3. I added this as an open question. For me the crux is whether it's valuable to force a common solution, or whether to let this behavior be up to the client. I'm personally leaning towards the client. A compromise may be another tag which specifies the maximum distance desired from the live edge.

  4. Good question. I'm open to merging language which guides developers on this. I'm not very skilled with delivery technology so it'd be great to have some help.

  5. I'll change the instances of first & last that I personally wrote, but in other places (like with MEDIA-SEQUENCE-NUMBER) I use the terminology of the spec.

@ScottKell

This comment has been minimized.

Copy link

commented Nov 13, 2018

Here are some additional comments. Nice work John! Overall, I don't see any trouble spots with implementing

  1. "MSE-based": In a couple places it sounds like the client must be MSE based. To me, LHLS is independent of MSE or an MSE-based client. It could be a native mobile client or a flash based player :)

  2. "A segment must remain available to clients for a period of time equal to the duration of the segment plus the duration of the longest Playlist file distributed by the server containing that segment."
    Not sure I understand the 2nd half of ^^. " duration of the longest Playlist file distributed by the server containing that segment."

  3. Prefetch Media Segments
    I think it should state that the prefetch segment may be advertised only after a single byte is available

  4. "A prefetch segment must not be advertised with an EXTINF tag. The duration of a prefetch segment must be equal to or less than what is specified by the EXT-X-TARGETDURATION tag."

I'm not sure how a media server can always guarantee that duration of the prefetch segment is <= TARGETDURATION. The target duration in real world live cases is typically that largest segment that has been seen at that point. For example, if the media server is not transcoding and only repackaging on GOP boundaries of the incoming encoded stream, the incoming stream may end up with a longer prefetch segment based on GOP boundaries than it has previously seen. This may be an edge case, but it would be fairly common with Wowza's Server

  1. General Server Resp
    Maybe worth calling out that server must NOT respond with an error to any prefetch segments. No 404s, etc

  2. Unresolved questions
    6a. "Should the manifest only update after the prefetch frags have completed? Can prefetch frags be repeated if they are not yet completed?"
    I don't understand 2nd question. What would a repeated prefetch tag be?

6b. I think rebuffering and playback rate are client specific. This spec does LHLS really well so IMO focus on that and leave these up to specific client implementations. Different use cases may call for different approaches (gradual catch up to live versus immediately jump to live in cases of drift, for example)

6c. I think CMAF is the logical companion of LHLS for many reasons, but no reason to restrict ts chunk delivery that I can see.

@johnBartos

This comment has been minimized.

Copy link
Contributor Author

commented Nov 20, 2018

Hey @ScottKell! Thanks for the in-depth review. I've been a bit tied up with the next Hls.js release but I'll address your feedback shortly after.

@nicoweilelemental

This comment has been minimized.

Copy link

commented Nov 22, 2018

On point 3. Prefetch Media Segments:

For LHLS+Chunked CMAF segments, if we want to align the logic on DASH (where the AvailabilityTimeOffset parameter handles this case), the client shall not be told to request a segment until the first CMAF chunk is available on the origin (as it is the smallest logical unit). If we reference a segment for prefetch one segment ahead, it will open multiple CDN connections on the origin (hopefully not too much if the CDN correctly collapses requests), and assuming that the origin won't reject those connection requests, the gain will just be the network connection opening time, which is negligible compared to the segment duration/load time.

For LHLS+TS segments I guess that there is less constraints, so starting after a few uploaded bytes (equivalent to the TS headers?) might work fine.

The additional problem is that the origin can add some latency if it's buffering the data coming from the packager, so the AvailabilityTimeOffset defined at packager level won't be totally accurate - the packager will need to add the origin-generated latency to define the precise time when the segment can be advertised for prefetching in the playlist.

@johnBartos

This comment has been minimized.

Copy link
Contributor Author

commented Dec 10, 2018

@ScottKell

"MSE-based": In a couple places it sounds like the client must be MSE based. To me, LHLS is independent of MSE or an MSE-based client. It could be a native mobile client or a flash based player :)

Good point, I'll change language to reflect this.

"A segment must remain available to clients for a period of time equal to the duration of the segment plus the duration of the longest Playlist file distributed by the server containing that segment."
Not sure I understand the 2nd half of ^^. " duration of the longest Playlist file distributed by the server containing that segment."

This is Apple's language but I think I can do a better job of simplifying it.

Prefetch Media Segments
I think it should state that the prefetch segment may be advertised only after a single byte is available

Sounds reasonable, but I'm wondering if this needs to be a requirement. Is it fundamentally impossible (or ill-advised) to advertise before the first byte is available, or is this specific to Wowza?

"A prefetch segment must not be advertised with an EXTINF tag. The duration of a prefetch segment must be equal to or less than what is specified by the EXT-X-TARGETDURATION tag."

I'm not sure how a media server can always guarantee that duration of the prefetch segment is <= TARGETDURATION. The target duration in real world live cases is typically that largest segment that has been seen at that point. For example, if the media server is not transcoding and only repackaging on GOP boundaries of the incoming encoded stream, the incoming stream may end up with a longer prefetch segment based on GOP boundaries than it has previously seen. This may be an edge case, but it would be fairly common with Wowza's Server

General Server Resp
Maybe worth calling out that server must NOT respond with an error to any prefetch segments. No 404s, etc

Hmm, it seems like errors are allowed by RFC8216:

If the server wishes to remove an entire presentation, it SHOULD
provide a clear indication to clients that the Playlist file is no
longer available (e.g., with an HTTP 404 or 410 response).

The client should be able to handle bad status codes (and will have to), so I think its better to imply that the client must be able to handle these cases.

Unresolved questions
6a. "Should the manifest only update after the prefetch frags have completed? Can prefetch frags be repeated if they are not yet completed?"
I don't understand 2nd question. What would a repeated prefetch tag be?

A prefetch tag is repeated if its found in two manifests e.g. it remains after refreshing. But now that I'm reading this again it doesn't really make sense. Will mark as resolved.

6b. I think rebuffering and playback rate are client specific. This spec does LHLS really well so IMO focus on that and leave these up to specific client implementations. Different use cases may call for different approaches (gradual catch up to live versus immediately jump to live in cases of drift, for example)

True, it's a bit of a can of worms to offer guidelines on this. I was thinking more from the perspective of an event publisher, who wants to guarantee that their viewers are as close to the live edge as possible, regardless of which client they're using. If no catch-up is required it's hard to guarantee without knowing how the client is implemented. But I agree, I think it should be left up to the client.

6c. I think CMAF is the logical companion of LHLS for many reasons, but no reason to restrict ts chunk delivery that I can see.

That's what I've been hearing as well. Will mark as resolved.

Again, my thanks to you and the Wowza team for the feedback! 👍

@johnBartos

This comment has been minimized.

Copy link
Contributor Author

commented Dec 11, 2018

@nicoweilelemental

For LHLS+Chunked CMAF segments, if we want to align the logic on DASH (where the AvailabilityTimeOffset parameter handles this case), the client shall not be told to request a segment until the first CMAF chunk is available on the origin

It'd be pretty difficult to make an AvailabilityTimeOffset analogue in HLS. PROGRAM-DATE-TIME could probably be used along with a PREFETCH tag, but I'm not sure how accurate this would be in practice. I believe that the HLS-thonic way is to have the server control availability via manifest - if it's in the manifest the client is allowed to download it, otherwise it won't. The server can choose when it appends the segment, be it after the first byte or before. I think the only danger is if the client requests a refresh too soon and misses the refresh, and subsequently has to wait duration / 2 before re-requesting. I'll see if we can come up with anything to better synchronize playlist refresh between the sever/client - I know of a method using etags (or something similar) in the HTTP request, but I'm not sure if it's appropriate in a standard.

If we reference a segment for prefetch one segment ahead, it will open multiple CDN connections on the origin (hopefully not too much if the CDN correctly collapses requests)

We're limiting the max amount of prefetch segments to 2 to deal with load. We wanted to do just one, but other encoders have setups where two segments can be transcoding at once (the example I was given was that the next segment begins while the b-frames of the last segment are being completed).

For LHLS+TS segments I guess that there is less constraints, so starting after a few uploaded bytes (equivalent to the TS headers?) might work fine.

I think this is generally a good practice but up to the server. Putting stuff in the spec has it's own danger - trying to remove/deprecate features becomes more difficult because devs may be relying on it.

The additional problem is that the origin can add some latency if it's buffering the data coming from the packager, so the AvailabilityTimeOffset defined at packager level won't be totally accurate - the packager will need to add the origin-generated latency to define the precise time when the segment can be advertised for prefetching in the playlist.

Yeah this will be a problem with manifest refreshing too - the actual refresh time should be the duration of a segment + whatever latency is incurred on the encoding/delivery side. As alluded to before, I believe this can be accomplished with the etag:

On the server:
response.etag = Tsegment + Tencoding

On the client:
refreshOffset = Tencoding = response.etag - Tsegment

Where refreshOffset is added to the manifest refresh time (usually equal the duration of the playlist).

Just some napkin math but I believe that's the general idea. This Will Law's idea, I'm going to follow up with him to see if this correct. I haven't anticipated how necessary this will be for the success of LHLS so it's not in the spec yet; it may be a "wait and see" thing.

Thanks for the feedback!

@TBoshoven
Copy link

left a comment

I left a few suggestions, mostly to fix some of the language.

Some more comments:

  1. I would also like to see documentation about which use cases are out-of-scope (ULL / real-time communication).
  2. The guide-level explanation specifically mentions Hls.js a number of times, even though the section describes a pretty generic client implementation (MSE and Fetch API excluded). I think it might be valuable to focus less on the Hls.js implementation in that section.
  3. In general, I think the spec should not depend on the client-side implementation being written in JavaScript or using the MSE/Fetch APIs. These are used in the concrete Hls.js implementation, but implementation of a client that does not have access to these APIs (for example in an FFmpeg-based client solution) should be possible using this RFC. However, the Hls.js implementation can be used to illustrate how a client could be implemented.
Show resolved Hide resolved proposals/0001-lhls.md Outdated
Show resolved Hide resolved proposals/0001-lhls.md Outdated
Show resolved Hide resolved proposals/0001-lhls.md Outdated

## Media Segment Tags

The server must not precede any prefetch segment with metadata other than those specified in this document, with the specified constraints.

This comment has been minimized.

Copy link
@TBoshoven

TBoshoven Dec 13, 2018

I think this constraint breaks extensibility. Since you explicitly list the #ext-x- tags that are not allowed in this section, this statement can be considered redundant.


* Transform a prefetch segment to a complete segment. ([Prefetch Transformation](#prefetch-transformation))

To each prefetch segment response, the server must append the `Transfer-Encoding: chunked` header. The server must maintain the persistent HTTP connection long enough for a client to receive the entire segment - this must be no less than the time from when the segment was first advertised to the time it takes to complete.

This comment has been minimized.

Copy link
@TBoshoven

TBoshoven Dec 13, 2018

This limits communication to HTTP/1.1, because HTTP/2 does not support this mechanism (as described in RFC7540, Section 8.1).
Furthermore, Chunker Transfer Encoding requires that we follow a specific protocol (described in RFC7230, Section 4.1) which is not referred to here.

I recommend making a distinction between HTTP versions and referring to the specs that describe streaming (chunked) data transfer.

This comment has been minimized.

Copy link
@johnBartos

johnBartos Jan 25, 2019

Author Contributor

👍 Good idea, I've gotten similar feedback

Show resolved Hide resolved proposals/0001-lhls.md Outdated
Show resolved Hide resolved proposals/0001-lhls.md Outdated
Show resolved Hide resolved proposals/0001-lhls.md Outdated
## What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?

- Alternative connection protocols (WebRTC, Websockets, etc.)
- Manifestless mode

This comment has been minimized.

Copy link
@TBoshoven

TBoshoven Dec 13, 2018

It might be worth linking to a resource that describes this concept.

This comment has been minimized.

Copy link
@johnBartos

johnBartos Jan 25, 2019

Author Contributor

Yeah we can link out to some DASH docs for now, there isn't anything specified for HLS atm

#EXT-X-PREFETCH:https://foo.com/bar/7.ts
```

`5.ts, 6.ts, and 7.ts` all have the Discontinuity Sequence Number of 1. Note how the `PREFETCH-DISCONTINUITY` transformed to the conventional `EXT-X-DISCONTINUITY` tag, and how that tag still applies to prefetch segments.

This comment has been minimized.

Copy link
@TBoshoven

TBoshoven Dec 13, 2018

A new #EXT-X-PROGRAM-DATE-TIME tag was also introduced. I cannot find this behavior described in earlier sections.

This comment has been minimized.

Copy link
@johnBartos

johnBartos Jan 25, 2019

Author Contributor

It was in an old draft, I'll add it back

@jkarthic-akamai

This comment has been minimized.

Copy link

commented Dec 19, 2018

We're limiting the max amount of prefetch segments to 2 to deal with load. We wanted to do just one, but other encoders have setups where two segments can be transcoding at once (the example I was given was that the next segment begins while the b-frames of the last segment are being completed).

@johnBartos Thanks for this work! Please find my comment below.

I still don't understand why we would need 2 prefetch segments. Yes, I agree that the encoder might be still working with the B-frames of the last segment, while working on the I-frame and/or P/B-frames of the current segment. This is an expected behavior when the encoder is multi-threaded based on frame-parallelism. But still I would expect the encoder to output and upload the frames based on the monotonically increasing order of DTS. In such a case the upload of the current segment would start only after the upload of previous segment is complete. One practical example is the x264 encoder + ffmpeg. x264 encoder is frame-based multithreaded and hence could work on frames across segment at the same time. But still it always outputs the frames on the order of monotonically increasing DTS. Hence I would suggest limiting the number of prefetch segments to 1.

Please feel free to correct me if my understanding is incomplete.

@nicoweilelemental

This comment has been minimized.

Copy link

commented Dec 20, 2018

@johnBartos I concur with @jkarthic-akamai : while it make sense to prefetch the N+1 segment as soon as the first bytes or CMAF chunk are available on the origin, prefetching the N+2 segment will require to support long polling over a duration superior to the duration segment. It's just gonna open sockets with no data to transmit, generate timeouts and false positive errors in the logs (like in dash with the wallclock misalignment problems).

Honestly I don't see CDNs or origins supporting the N+2 prefetch anytime, the benefit of it is absolutely not proven and very long polling comes with a lot of security risks for the CDNs/origins. It's just not realistic to require N+2 prefetch support across the chain. I also suggest to limit the number of prefetch segments to 1.

@johnBartos

This comment has been minimized.

Copy link
Contributor Author

commented Jan 2, 2019

@jkarthic-akamai @nicoweilelemental I agree; thanks for the breakdown. I'll amend the spec for N+1 only

@nicoweilelemental

This comment has been minimized.

Copy link

commented Jan 3, 2019

Thanks @johnBartos - looking forward for the hls.js implementation !

# Summary
[summary]: #summary

Low-latency streaming is becoming an increasingly desired feature for live events, and is typically defined as a delay of two seconds or less from point of capture to playback (glass-to-glass). However, the current HLS specification precludes this possibility - within the HLS guidelines, the best attempts have achieved about four seconds glass-to-glass, with average implementations typically beyond thirty seconds. This RFC proposes modifications to the HLS specification ("HTTP Live Streaming 2nd Edition" specification (IETF RFC 8216, draft 03)") which aim to reduce the glass-to-glass latency of a live HLS stream to two seconds or below. The scope of these changes are centered around a new "prefetch" segment; it's advertising, delivery, and interpretation within the client.

This comment has been minimized.

Copy link
@heff

heff Jan 10, 2019

Member

This is kind of a moot point because I think 2 seconds is a good goal for this project, but 2 seconds is probably the most aggressive definition of "low latency" I've seen. Wowza pegs it at 1-5s and @wilaw has it at 4-10s, with 2 seconds being closer to "Ultra low latency". Not sure what I expect you to do with that info but thought it was worth pointing out in case there's opportunity for industry consistency.

This comment has been minimized.

Copy link
@nicoweilelemental

nicoweilelemental Jan 11, 2019

Agreed. I've been working with Will on the new latency ranges definition that you have seen in his Demuxed presentation. We defined low latency by what you can achieve with 1s and 2s segments of regular HLS/DASH (meaning : 4 to 10 seconds latency) and ultra low latency as what you can achieve with chunked CMAF (meaning : between 1 and 4 seconds). We used this technology criteria as the previous latency ranges definition by Wowza & Streaming Media was mixing technology and use case requirements criterias.
latency_ranges-v3

So we maybe should say here: 'Low-latency and Ultra low-latency streaming are becoming increasingly desired features for live events, and are typically defined as a delay of 4 to 10 seconds for low latency and 1 to 4 seconds for ultra low latency, from point of capture to playback (glass-to-glass)'

This comment has been minimized.

Copy link
@johnBartos

johnBartos Jan 17, 2019

Author Contributor

Yeah agreed. Thanks for keeping me up to date with Will's work - I want to be aligned with whatever he's doing where possible. Has he been factoring the Streamline project into his definitions? The LHLS fork of Exoplayer is getting 1.1s latency which is pretty crazy.

'Low-latency and Ultra low-latency streaming are becoming increasingly desired features for live events, and are typically defined as a delay of 4 to 10 seconds for low latency and 1 to 4 seconds for ultra low latency, from point of capture to playback (glass-to-glass)'

Sounds good to me!

As of writing this, Hls.js by default will have between 1 and 2 segment's duration of client-side latency. The plan is to start the stream at the last complete segment, and begin buffering prefetch from there. I'm not sure what server-side latency is looking like but it should put us at the lower end of the "Low latency" definition.

This comment has been minimized.

Copy link
@johnBartos

johnBartos Jan 17, 2019

Author Contributor

We defined low latency by what you can achieve with 1s and 2s segments of regular HLS/DASH (meaning : 4 to 10 seconds latency) and ultra low latency as what you can achieve with chunked CMAF (meaning : between 1 and 4 seconds).

So by this definition we're building an ultra low latency player.

This comment has been minimized.

Copy link
@nicoweilelemental

nicoweilelemental Jan 17, 2019

By default yes, but the player configuration shall allow a higher target latency. Which leads me to another close consideration : in DASH we are heading towards setting the target latency on the manifest level, and not on the player configuration level. Would it be interesting for LHLS to discuss a similar approach, like #EXT-X-TARGETLATENCY: 2500 (if we measure in milliseconds) ?

This comment has been minimized.

Copy link
@jkarthic-akamai

jkarthic-akamai Jan 21, 2019

@nicoweilelemental That's an interesting idea!!! There are lot of advantages in being able to configure the playout latency at the manifest itself. If we going ahead with this, I propose a small modification though. Instead of setting a target latency, I suggest we set a target buffer size. Since the encoder's latency is not in player's control this definition could be little-bit misleading. Instead we could ask the player to maintain a specific target buffer size with the condition that it continuously loads data to be as close to the live edge as possible. Something like #EXT-X-TARGETBUFFERSIZE: 2500 .

This comment has been minimized.

Copy link
@nicoweilelemental

nicoweilelemental Jan 21, 2019

You're right @jkarthic-akamai it's hard for the player to determine what's the actual E2E latency. The only way would be to make at least one #EXT-X-PROGRAM-DATE-TIME insertion mandatory per child playlist. In dash we recommend putting the Producer Reference Time in the prft mp4 box, which the player can parse to get the actual timecode of a segment: it could replace #EXT-X-PROGRAM-DATE-TIME for latency measurement.
Setting the #EXT-X-TARGETBUFFERSIZE instead will roughly fill the same purpose while relaxing the dependency to absolute time.

Show resolved Hide resolved proposals/0001-lhls.md Outdated

The client may opt into an LHLS stream. If so, the client must choose a prefetch Media Segment to play first from the Media Playlist when playback starts. The client must choose prefetch Media Segments for playback in the order in which they appear in the Playlist; however, the client may open connections to as many prefetch segments as desired. If data from a newer prefetch Media Segment is received before an older one, the client should not append this data to the SourceBuffer; doing so may stall playback. If the client opts out of LHLS, it must ignore all prefetch Media Segments, and any additional constraints outlined in this specification.

The client may set a minimum amount of buffer to begin and maintain playback. The client should not impose a minimum buffered amount greater than one target duration; doing so may introduce undue latency.

This comment has been minimized.

Copy link
@heff

heff Jan 17, 2019

Member

This feels like somewhere we might want to provide a little more guidance, smart defaults, or open the door to configure the target min buffer somehow. With ultra low latency there will be a fine balance between lower latency and more rebuffering, that will likely be audience dependent. A lot of players don't give you any buffer config options today.

This comment has been minimized.

Copy link
@johnBartos

johnBartos Jan 17, 2019

Author Contributor

Yeah I need to rework this section. It was trying to be an analogue of this section in the HLS spec but it didn't really come out right.

This feels like somewhere we might want to provide a little more guidance, smart defaults, or open the door to configure the target min buffer somehow.

We're trying to be as hands-off as possible with recommendations - clients should be able to build whatever experience best suits their usecase. Maybe we can strike a balance with language (should instead of must), but it may be more productive to put any kind of guidance in some ancillary doc where we don't have to worry about spec compliance. Hls.js will be a reference implementation, too.

This comment has been minimized.

Copy link
@johnBartos

johnBartos Jan 17, 2019

Author Contributor

clients should be able to build whatever experience best suits their usecase.

Just to clarify this, they should build whatever low-latency usecase best suits them. The original intent behind this section was to ensure that they were operating like a low-latency player (and not just ignoring prefetch segments or having like 30s of buffer or whatever) but I don't think it came out right. I'll take another stab at it.

This comment has been minimized.

Copy link
@heff

heff Jan 21, 2019

Member

@nicoweilelemental's other comment about providing the target latency in the manifest would actually solve my concerns here. I like that idea a lot. Assuming a player respects that tag, it wouldn't have to expose much else.

This comment has been minimized.

Copy link
@ScottKell

ScottKell Jan 21, 2019

+1 to target latency in the manifest. There definitely needs to be a knob to turn for different use cases as they walk the line between latency and rebuffering

This comment has been minimized.

Copy link
@johnBartos

johnBartos Jan 23, 2019

Author Contributor

I agree with Will and his summary. The client will play the LHLS stream as best as it can given the current conditions and configuration. The problem with a target is that it's not actionable - if the client is behind because the network is slow, there's nothing it can do to get ahead. I was checking to see if DASH had something similar but couldn't find anything (but I didn't have the chance to look very hard).

But even then, manifest updates and network delays can cause us to stall if we try to play at the live edge. So we will actually delay where we will play at so we don't stall. The time we adjust by can be specified in the manifest using the MPD@suggestedPresentationDelay attribute. This specifies the delay to give the live stream to allow for smooth playback. If it isn't specified, we will give a reasonable default value.

suggestedPresentationDelay looks interesting (it seems to specify a minimum latency), but I'm not sure how useful it is for LHLS. The encoder must update the manifest with new segments on an interval equal to the average length of a segment.

I think the manifest should add information to allow the latency to be estimated and I would support the spec saying that the package MUST add #EXT-X-PROGRAM-DATE-TIME to the media playlists.

I had #EXT-X-PROGRAM-DATE-TIME this in an original draft just for this purpose, but deleted it - i couldn't come up with an acceptable definition that all encoders could follow. Is the timestamp the time when the segment begins transcoding, has finished; or is it something like when the manifest was created in-memory (or between whatever stages in a transcoding pipeline). Maybe it doesn't matter too much. Input appreciated here, I'd like to add it back.

This comment has been minimized.

Copy link
@wilaw

wilaw Jan 23, 2019

There are indeed many potential reference points for where to hang the definition of #EXT-X-PROGRAM-DATE-TIME. MPEG has defined 6 if I remember correctly for the equivalent in DASH. There is no need for that complexity here. All you need is a COMMON reference point that all clients can access. They can then achieve synchronization (per @jkarthic-akamai comment above) by targeting a fixed delta from that point. A practical point would be the wallclock time at which that frame of media entered the encoder. Assuming very small camera delay, the delta between that value and the wallclock time when the media frame is displayed by the client then represents the end-to-end latency. That's for lab confirmation. In the real production world, there is an unknown production delay upstream of the encoder (camera, OB truck, satellite contribution, broadcast profanity delay etc) so its very difficult for the end client to calculate the true e2e latency. Luckily, we don;t need to know the true e2e latency even for synch, we just need a consistent reference point between clients.

This comment has been minimized.

Copy link
@heff

heff Jan 23, 2019

Member

I can get behind the philosophy of the manifest just reporting the availability. But defining the target latency somewhere is critical in these use cases, so it brings me back to the original comment of needing more in the spec to get clients to expose configuration. i.e. if iOS Safari implements LHLS with a set latency target and no option to configure it, that's not good.

A practical point would be the wallclock time at which that frame of media entered the encoder.

I'm not sure the other options but that seems the most sensible. However with a UGC platform, streamers can use anything that streams RTMP (e.g. OBS) to the central service, and in that case I don't believe the central service has access to when the media frame entered the original encoder. For this use case I think I'd be fine with just using the time the central service received the media frame in the stream. It ignores the time prior to that, but assuming I can configure all my players' target latencies, I can adjust for that.

This comment has been minimized.

Copy link
@jkarthic-akamai

jkarthic-akamai Jan 24, 2019

I had #EXT-X-PROGRAM-DATE-TIME this in an original draft just for this purpose, but deleted it - i couldn't come up with an acceptable definition that all encoders could follow. Is the timestamp the time when the segment begins transcoding, has finished; or is it something like when the manifest was created in-memory (or between whatever stages in a transcoding pipeline). Maybe it doesn't matter too much. Input appreciated here, I'd like to add it back.

I agree that there is no need for us to define #EXT-X-PROGRAM-DATE-TIME strictly. We could just stick with the original definition from the official HLS spec. https://tools.ietf.org/html/draft-pantos-http-live-streaming-23#page-17 . We just suggest to make it as mandatory(MUST) parameter, so that all clients have a common reference point to sync.

This comment has been minimized.

Copy link
@biglittlebigben

biglittlebigben Jan 28, 2019

FWIW, we are relying on #EXT-X-PROGRAM-DATE-TIME to decide on what segment to start playback to reach our target latency. This approach has been successful for us.

heff and others added some commits Jan 17, 2019

Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>

TBoshoven and others added some commits Jan 25, 2019

Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
Update proposals/0001-lhls.md
Co-Authored-By: johnBartos <jbartos7@gmail.com>
@biglittlebigben

This comment has been minimized.

Copy link

commented Jan 26, 2019

@johnBartos I concur with @jkarthic-akamai : while it make sense to prefetch the N+1 segment as soon as the first bytes or CMAF chunk are available on the origin, prefetching the N+2 segment will require to support long polling over a duration superior to the duration segment. It's just gonna open sockets with no data to transmit, generate timeouts and false positive errors in the logs (like in dash with the wallclock misalignment problems).

Honestly I don't see CDNs or origins supporting the N+2 prefetch anytime, the benefit of it is absolutely not proven and very long polling comes with a lot of security risks for the CDNs/origins. It's just not realistic to require N+2 prefetch support across the chain. I also suggest to limit the number of prefetch segments to 1.

Some feedback coming from our experience at Periscope and Twitter, after managing a large scale LHLS deployment for more than 2 years: having 2 prefetch segments is an important feature for us to constrain the total end to end latency. It allows the client to start receiving data immediately for the next segment after the current prefetch segment is ended. Without this, the client would need to eagerly request a new playlist immediately after the server closes the current prefetch segment. This means that:

  • The client will need to keep a longer buffer to avoid a stall while it downloads the new playlist and requests the new prefetch segment
  • This will cause a large request spike for the playlist asset at the time server closes the prefetch segment because all clients will request the mew playlist at the same time
    We have had no issue with our CDN vendor supporting long polling HTTP requests for these extra prefetch segments.
@johnBartos

This comment has been minimized.

Copy link
Contributor Author

commented Jan 26, 2019

@biglittlebigben

That was the conclusion @TBoshoven and I came to - I hadn't considered the impact of refresh misses when we were having the above discussion. I'm going to amend the segment to be able to handle multiple segments. If your system can support a dozen prefetch segments, or just one, the spec shouldn't stop you; if you want to support one, that's fine too. But I believe that two will be the natural default. We're trying avoid being prescriptive wherever possible.

Thanks for sharing your experience! If there's anything else you see that could be improved I'd be happy to hear it.

(Edit: removed the requirement for at least two. There may be clients/servers who negotiate some other refresh scheme which allows for accurate refreshes with just 1.)

@jkarthic-akamai

This comment has been minimized.

Copy link

commented Jan 28, 2019

@biglittlebigben

That was the conclusion @TBoshoven and I came to - I hadn't considered the impact of refresh misses when we were having the above discussion. I'm going to amend the segment to be able to handle multiple segments. If your system can support a dozen prefetch segments, or just one, the spec shouldn't stop you; if you want to support one, that's fine too. But I believe that two will be the natural default. We're trying avoid being prescriptive wherever possible.

Thanks for sharing your experience! If there's anything else you see that could be improved I'd be happy to hear it.

(Edit: removed the requirement for at least two. There may be clients/servers who negotiate some other refresh scheme which allows for accurate refreshes with just 1.)

@biglittlebigben makes a very good point. I faced the exact same issue he mentioned when I tried to add LHLS support to Exoplayer with just 1 prefetch segment. Definitely 2 prefetch segments in the playlist would be useful to reduce the latency significantly. But for the purpose of practical systems we could add a condition that the client should request the 2nd PREFETCH segment, only after the 1st PREFETCH segment is loaded completely. In that way we could achieve low latency without imposing the strict conditions around CDNs and origins supporting long polling.

Also I don't understand the reason behind more than 2 prefetch segments. Can we limit the maximum number of prefetch segments to 2? Is there a practical use-case for publishing 3 or more prefetch segments?

@biglittlebigben

This comment has been minimized.

Copy link

commented Jan 28, 2019

@biglittlebigben
That was the conclusion @TBoshoven and I came to - I hadn't considered the impact of refresh misses when we were having the above discussion. I'm going to amend the segment to be able to handle multiple segments. If your system can support a dozen prefetch segments, or just one, the spec shouldn't stop you; if you want to support one, that's fine too. But I believe that two will be the natural default. We're trying avoid being prescriptive wherever possible.
Thanks for sharing your experience! If there's anything else you see that could be improved I'd be happy to hear it.
(Edit: removed the requirement for at least two. There may be clients/servers who negotiate some other refresh scheme which allows for accurate refreshes with just 1.)

@biglittlebigben makes a very good point. I faced the exact same issue he mentioned when I tried to add LHLS support to Exoplayer with just 1 prefetch segment. Definitely 2 prefetch segments in the playlist would be useful to reduce the latency significantly. But for the purpose of practical systems we could add a condition that the client should request the 2nd PREFETCH segment, only after the 1st PREFETCH segment is loaded completely. In that way we could achieve low latency without imposing the strict conditions around CDNs and origins supporting long polling.

This requirement of waiting would delay arrival of data for the 2nd prefetch segment by a network roundtrip time after the client is done downloading the 1st one. Any such added jitter will require the client to keep a longer buffer (by the roundtrip duration here). Maybe that's an acceptable trade off if there is a consensus that supporting long requests is a challenge for CDNs, but again, we have had no issue with our vendor.

Also I don't understand the reason behind more than 2 prefetch segments. Can we limit the maximum number of prefetch segments to 2? Is there a practical use-case for publishing 3 or more prefetch segments?

I would say that there is a tradeoff between the amount of prefetch segments and how often the client needs to refresh the playlist. This is particularly true for segments before a discontinuity, that can be shorter than the target duration.

[discontinuities]: #ext-x-discontinuity

A prefetch segment must not be advertised with an `EXT-X-DISCONTINUITY` tag. To insert a discontinuity just for prefetch segments, the server must insert the `EXT-X-PREFETCH-DISCONTINUITY` tag before the newest `EXT-X-PREFETCH` tag of the new discontinuous range.

This comment has been minimized.

Copy link
@biglittlebigben

biglittlebigben Jan 28, 2019

In our use case, discontinuities are caused by a gap in timestamps in the stream sent by the broadcaster (because of a networking issue or a camera flip for instance). This means that we do not know if there is a discontinuity at the time we advertise the prefetch segment. We only know of the discontinuity at the point when we write the first data into the prefetch segment.

This means that we would add a EXT-X-PREFETCH-DISCONTINUITY tag to a prefetch segment after it has first been advertised, and that a player may not see the EXT-X-PREFETCH-DISCONTINUITY tag by the time it starts playing back media from that prefetch segment.

This behavior doesn't seem to be explicitly forbidden by the spec, but it does require the player to have some heuristic to detect discontinuities that were not advertised in time.

This comment has been minimized.

Copy link
@johnBartos

johnBartos Feb 1, 2019

Author Contributor

This means that we would add a EXT-X-PREFETCH-DISCONTINUITY tag to a prefetch segment after it has first been advertised, and that a player may not see the EXT-X-PREFETCH-DISCONTINUITY tag by the time it starts playing back media from that prefetch segment.

I can foresee this being a problem if there is a large gap between the PTS values. Hls.js feeds a discontinuity tag into the muxer so that it can modify PTS values post-discontinuity to ensure that there is no gap. Without the tag we'd be inserting a gap into the sourceBuffer equal to to the PTS gap in seconds. Depending on how much forward buffer exists we can jump it without a stall, but we'd still need to do a seek in userland which would cause a momentary disruption.

This comment has been minimized.

Copy link
@biglittlebigben

biglittlebigben Feb 1, 2019

How would you imagine the interaction between EXT-X-GAP and prefetch segments should work? Should it be allowed to add EXT-X-GAP to an already advertised prefetch segment? That would be another way to address the use case above I believe.

This comment has been minimized.

Copy link
@johnBartos

johnBartos Feb 1, 2019

Author Contributor

This behavior doesn't seem to be explicitly forbidden by the spec

I'll add language which addresses this case. It doesn't make sense to add it as a prefetch discontinuity after the fact, but it must appear when transformed to a complete segment. Players should be able to handle gaps so I don't believe that this a problem.

Maybe the prefetch discontinuity isn't even needed anymore - I need to think a bit harder about SSAI workflows. You can probably just stop adding prefetch segments and then insert a regular discontinuity before the ads.

This comment has been minimized.

Copy link
@johnBartos

johnBartos Feb 2, 2019

Author Contributor

How would you imagine the interaction between EXT-X-GAP and prefetch segments should work? Should it be allowed to add EXT-X-GAP to an already advertised prefetch segment? That would be another way to address the use case above I believe.

I suppose a gap would a) cause the client to ignore the segment or b) abort the connection if it has already been requested, but data has not yet been received. Not sure if I understand how to fix gaps with this - would you mind giving me an example?

In general the blocker for post-advertisement insertion of prefetch tags is that clients may not refresh the playlist in time to know that the segment has a new tag. I can see adding tags post-advertisement in the case that there are multiple prefetches and the tag is on N+1 or greater - the client should refresh the manifest before the encoder beings sending segments. But I think there will be some tricky race conditions with this kind of solution.

This comment has been minimized.

Copy link
@biglittlebigben

biglittlebigben Feb 27, 2019

We currently detect the gap in timestamps in the backend. If it is over a threshold, we start a new segment and insert a discontinuity. This does mean that players watching over LHLS will not hear about the discontinuity in time from the playlist. Our player has heuristics to detect such gaps.

This comment has been minimized.

Copy link
@itsjamie

itsjamie Mar 1, 2019

So, I don't believe this makes much of a change to that workflow.

The prefetch segment up until the point the packager detects the gap in the backend would continue to push data normally like a prefetch. Upon detecting the gap, it would complete the segment, close the connection.

Updates the manifest locally with that completed segment, advertising the length now with a EXT-INF, push that manifest to the origin with the next prefetch having the DISCO applied to it.

Thoughts?

This comment has been minimized.

Copy link
@biglittlebigben

biglittlebigben Mar 1, 2019

That's what we would do indeed. Is this allowed by the spec though? (inserting a discontinuity for a segment that didn't have a prefetch discontinuity?)

This comment has been minimized.

Copy link
@johnBartos

johnBartos Mar 1, 2019

Author Contributor

@biglittlebigben Yes, and I believe it should also be mandatory

This comment has been minimized.

Copy link
@biglittlebigben

biglittlebigben Mar 1, 2019

That would work for our use case then. Any feedback from player developers about having to detect and handle timestamp discontinuities?

@johnBartos

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

Hi all,

We've been working hard on a demo in order to prove out some of the ideas we've introduced here. You can find it here: http://demo.jwplayer.com/lhls/. A beta build off of Hls.js will be made available in the near future, which you can play around with yourself. In this demo, you can use your own file with the ?file= query selector.

The results have been very positive so far. We're able to reach ~1s of latency under ideal conditions, but 4s is typically a comfortable amount. Our tests have currently been done with LL-CMAF.

The demo has surfaced some new requirements, which are critical to a functional LHLS implementation:

  1. The ability for the client to compute its latency to the encoder
  2. The ability for the client to synchronize manifest refresh times with the encoder

Requirement 1 requires #EXT-X-PROGRAM-DATE-TIME to be present in the manifest. So this will be a must instead of should. I'm still thinking of how exactly to define it; for example, our current stream does not put it on the prefetch, but it may be better to do that.

Requirement 2 is a bit trickier. Synchronization of updates with the server is critically important if you're using one prefetch tag, since a refresh miss can mean the client runs through whatever small forward buffer it has. Our current implementation estimates the next manifest refresh using a combination of PROGRAM-DATE-TIME and a few HTTP headers. We'll be codifying more of this as our implementation improves, but you can read our first attempt here.

What requirement #2 also exposes is how the current refresh miss logic is insufficient for smooth playback at lower latencies. It states that the interval should be halved on miss - we have found this to be much too slow in practice. In the case of a miss the next reload needs to be made much sooner.

In general, the demo has underscored the challenge of implementing LHLS. I'd like to continue a discussion on a companion guide, which outlines best practices/patterns for accomplishing common functionality, such as playback rate manipulation for latency targeting.

I know I've been a bit slack in integrating the current round of feedback, but I'll be getting around to that soon. Having a functioning codebase will be critical for testing out solutions to the more challenging requirements

@jkarthic-akamai

This comment has been minimized.

Copy link

commented Apr 10, 2019

Requirement 2 is a bit trickier. Synchronization of updates with the server is critically important if you're using one prefetch tag, since a refresh miss can mean the client runs through whatever small forward buffer it has. Our current implementation estimates the next manifest refresh using a combination of PROGRAM-DATE-TIME and a few HTTP headers. We'll be codifying more of this as our implementation improves, but you can read our first attempt here.

What requirement #2 also exposes is how the current refresh miss logic is insufficient for smooth playback at lower latencies. It states that the interval should be halved on miss - we have found this to be much too slow in practice. In the case of a miss the next reload needs to be made much sooner.

I agree with you on the issues with timing the "manifest refresh" properly. And it is potential bottleneck for latency improvement. Based on this experience should we modify the spec to mandate atleast two PREFETCH tags for LHLS. When we are mandating two PREFETCH urls, I would like if the spec clearly mentions that the fetching of the first prefetch should be COMPLETE, before request for fetching of the second prefetch segment is made. Such a wording in the spec will relax the long polling support requirements for the CDN or the origin HTTP server. Better yet would be specify a mandatory delay between download completion of nth segment to the download start of the (n+1)th segment. Encoder can set this delay to be equal to one frame duration, so that existing set of HTTP servers and CDN can support LHLS with two PREFETCH tags seamlessly.
Currently such a behavior of specifying a small delay is already possible with DASH with "availabilityTimeOffset" tag. It is only logical if LHLS also has some means of supporting the same, so that this manifest refresh logic is not in the critical path.

@heff

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

The buffer length in the demo seems a lot more spikey than I'd expect, compared to the DASH LL demo where the buffer level stays pretty flat. Does it include video that's already been played?

image

@htleeab

This comment has been minimized.

Copy link

commented Jun 10, 2019

Apple announce Low-Latency HLS in the wwdc2019.
https://developer.apple.com/videos/play/wwdc2019/502/
There are new tags:
EXT-X-SERVER-CONTROL:
EXT-X-PART-INF:
EXT-X-PART:
EXT-X-RENDITION-REPORT:
EXT-X-SKIP:
Protocol Extension for Low-Latency HLS (Preliminary Specification): https://developer.apple.com/documentation/http_live_streaming/protocol_extension_for_low-latency_hls_preliminary_specification

@zmousm

This comment has been minimized.

Copy link

commented Jun 11, 2019

It would seem to me the preliminary LHLS spec by Apple is vastly more complicated than what has been discussed here.

@kevleyski

This comment has been minimized.

Copy link

commented Jul 15, 2019

Was wondering on thoughts around Community LHLS and Apple Low Latency HLS
I'm doing a presentation on low latency this week and want to say something like hls.js is implementing Apple Low Latency, but is that true?
Perhaps a better comment is along the lines of perhaps there will be a best-of-breed here where Apple might take what the community are up to and vice versus.

https://tinyurl.com/yyr2rz8m

@ScottKell

This comment has been minimized.

Copy link

commented Jul 15, 2019

Was wondering on thoughts around Community LHLS and Apple Low Latency HLS
I'm doing a presentation on low latency this week and want to say something like hls.js is implementing Apple Low Latency, but is that true?
Perhaps a better comment is along the lines of perhaps there will be a best-of-breed here where Apple might take what the community are up to and vice versus.

I think there is 0% chance that Apple takes what the community has developed. They have decided on their path and are headed that direction

Not sure on hls.js, but I know other player vendors, including Wowza, are working on implementing the Apple LHLS spec

ScottK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.