New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Sites offering alternative simultaneous media streams #389
Comments
This is a great idea and shouldn't be too difficult to implement. There are however a few things that need addressing:
Due to these, I propose that we keep the video|audio(|image?) selector as-is, and add the stream selector on top of this. So the format selector will look like: r'''(?x)
(?P<merge>merge)?
(?P<which>b|w|all|best|worst)
(?P<what>v|a|video|audio)?
(?P<containing>\*)?
(?:\.(?<stream>all|\w+))?
(?:\.(?<n>[1-9]\d*))?
''' Here's how it would address the above points
We haven't changed the meaning of
|
That’s a good example, but I also thought about having a mechanism that would exclude certain streams from consideration by these ‘collective’ selectors (storyboards, audio tracks in languages you’re not interested in, etc.), which could mitigate this problem. |
Necessary for #343. * They are identified by `vcodec=acodec='none'` * These formats show as the worst in `-F` * Any postprocessor that expects audio/video will be skipped * `b*` and all related selectors will skip such formats * This commit also does not add any selector for downloading such formats. They have to be explicitly requested by the `format_id`. Implementation of a selector is left for when #389 is resolved
Necessary for yt-dlp#343. * They are identified by `vcodec=acodec='none'` * These formats show as the worst in `-F` * Any postprocessor that expects audio/video will be skipped * `b*` and all related selectors will skip such formats * This commit also does not add any selector for downloading such formats. They have to be explicitly requested by the `format_id`. Implementation of a selector is left for when yt-dlp#389 is resolved
Related to this issue, YouTube now allows some videos to have 2 audio tracks such as a descriptive audio track for the blind. So yt-dlp defaults to picking Both
|
Looking back, I think my design sketch was perhaps a tad too simplistic. It would make some sense to attach some metadata as well to the whole stream/feed (what DASH refers to as an ‘adaptation set’) instead of individual formats (‘representations’ in DASH parlance). This way we would be able to attach language and stream kind information like ‘original audio’, ‘dubbed translation’, ‘voice-over translation’, ‘audio description track’, ‘forced subtitle [i.e. to be paired with an audio translation]’, ‘video with burned-in subtitles’ to the feeds themselves, so that those metadata consistently propagate to all formats and can be used in selectors. |
In retrospect, neither of our original proposals quite work. There are fundamentally 2 separate features that should be addressed here:
So my new proposal is:
r'''(?x)
(?P<which>b|w|best|worst)
(?P<what>v|a|video|audio)?
(?P<containing>\*)?
(?:\.(?<n>[1-9]\d*))?
'''
r'''(?x)
(?P<merge>merge)?all
(?P<what>v|a|video|audio)? # New
'''
r'''(?x)
(?P<merge>merge)?all
(?: # This is the same pattern as single formats
(?P<which>b|w|best|worst)
(?P<what>v|a|video|audio)?
(?P<containing>\*)?
(?:\.(?<n>[1-9]\d*))?
)
(?:{(?<field>\w+)})
''' Eg:
Random thoughts:
This still addresses all my points from #389 (comment):
becomes
becomes
yt-dlp current assumes (wrongly) that each format contains atmost one video and one audio. The lack of a syntax to select this is a result of this assumption. We cannot really address it in this issue
becomes #3562 implements a subset of this, though it's current syntax is not fully compatible with my suggestion |
Panopto extractor would benefit from this too (similar to Mediasite). I should also note that the streams may not start/end all at the same time. This can be seen commonly with Panopto (e.g. audio stream may start a little after video stream begins). I've also seen cases where video streams that end and start throughout, overlapping or not. Panopto provides timing data which is used in the web browser for syncing the streams. |
This seems to be unrelated to this issue. We can create some field similar to |
That looks a good idea. |
I don’t see a reason why a host couldn’t ever serve multiple different feeds of the same ‘type’. I think feeds ought to be distinguished based on their identity, not classification into a few rigid categories. So you need to distinguish between which format belongs to which feed, and only then describe what the feeds contain and how they relate to each other. For example, to be able to warn when the user chooses to download a video feed and a dubbed audio feed without also getting the forced subtitle paired with the dub. Speaking of subtitles, the current situation with subtitles seems to be very similar, except worse, because subtitles have no format IDs or selectors; you can choose the language or container format of the subtitle, but if there happen to be multiple subtitles with the same language and container, there is no way to distinguish them. For this reason, I think subtitle streams should be folded into format selection, and the existing subtitle command-line options translated into modifying the format selector:
Ouch. I’m not even sure which container formats support such skew-synchronized streams, if any. |
I don't quite understand this part. Could you elaborate? Especially, what you mean by "feed" and "identity" in this context |
Sure, I can do that. But it's not really important imo. The |
The definitions I use here are:
The point is that feeds have identity beyond their describing metadata, and that the the extractor must be able to declare: these are the available feeds, and those are the formats that serve them, instead of simply classifying formats into fixed-in-advance buckets and hoping no host will ever serve two simultaneous screencasts from different devices, or serve each person attending a conference call as a separate video/audio feed. This is basically already the situation with subtitles, as they cannot be distinguished beyond language and format. Adding more buckets to categorise feeds may ameliorate the issue, but not truly solve it. |
I did not mean that
Isn't that just this? f1: type=screencast1, f2: type=screencast2.
I agree on the issue with subtitles, but I don't see how the same issue exists here. What am I missing? 🤔 |
Moved to #4846 |
Description
(Spun off from #343)
A handful of sites sometimes offer multiple media streams of a given type that are meant to be played simultaneously or as alternatives to each other. The biggest one is probably Mediasite, which often offers a screencast stream (and presentation slides) in addition to the video stream showing the speaker. Other such cases are possible: in the past, GDCVault offered an alternative audio track containing the translation of the speaker’s talk, and a video stream containing the slides; #347 is an issue with a site offering both subtitled and dubbed versions of the same video.
This is not very common, but when it happens, it’s something of a pain to support. Currently, alternative Mediasite streams are offered as separate formats with a negative preference value, which means they are not downloaded by default, even though we can do it just fine. Because it is nowhere even mentioned that they are available for download, they can be hard to discover (see ytdl-org/youtube-dl#20611, ytdl-org/youtube-dl#23003). Some kind of general framework for handling such cases would be useful.
Here’s my design sketch: each format can declare a set of streams it contains by including a
'streams'
key in its dict containing a non-empty list of stream identifiers (strings). If two formats declare the same stream identifier, they shall be considered as containing two different quality versions of the same content. If a format doesn’t have a'streams'
key, it will be synthesised based on the'acodec'
and'vcodec'
keys: the list will contain'audio'
unless'acodec'
is'none'
and it will contain'video'
unless'vcodec'
is'none'
.The meaning of
best
would then be modified, and a couple of other selectors added:best
: Picks the single best pre-multiplexed format that contains all streams;allbest
: For each stream offered by the download, picks the best format containing it, and downloads them separately;mergeallbest
: For each stream offered by the download, picks the best format containing it, and merges them all afterwards. If merging is not possible, nothing is selected.The default value of the
-f
option would then becomemergeallbest/allbest
. Analogous selectors for the worst formats could be provided as well.List of extractors that could potentially benefit (with example URLs if possible):
--no-check-certificates
for the moment)The text was updated successfully, but these errors were encountered: