[Discussion] Upcoming changes to format selection #4846

pukkandan · 2022-09-04T14:15:18Z

If you are here only to check for changes to the defaults, just read "Multistreams" and "Default selector for stdout" sections

Superseeds #389

"maybe merge" operator

A new operator +? will be added. It is similar to +, but will always work as-if multistreams were disabled; i.e. The RHS will be added to the format only if the LHS doesn't already have the same kind (audio/video) of stream. This deprecates the need for --(audio/video)-multistreams

"of each" operator

A new operator <field> will be added that is used to select the best format for each value of field; e.g. bv<height>, ba<language> will select the best video format of each height and best audio of each language.

"type" field

Supplementing to the "of each" operator, a new field type will be added to the format dict that can be used by the extractor to separate the different types of formats

Filterable groups

Currently, filters [...] applied on groups (...) are simply distributed over it's components. So it is not possible to, say, download the "best format under 1G". You may expect (bv*+ba)[filesize<1G] to work - but it translates to bv*[filesize<1G]+ba[filesize<1G] which is not what we want.

While I can simply fix this behavior, that would break compatibility and I am afraid some users may actually be making use of the above "feature". So instead, a new type of grouping {} will be added. Then {bv*+ba}[filesize<1G] will work as expected, while () will retain it's old behavior.

Multistreams

youtube-dlc had added the ability for us to merge multiple audio (and video) formats into one file. So far, yt-dlp has kept this option behind the multistreams switches so that the default selector of bv*+ba can work correctly. However, the way this works is confusing and so these options will be enabled by default. The default selector can now handle this using +? - see below.

Default selector

The default selector will be changed to bv*+?ba/b. This means that the best video format will be selected, and if it doesn't have an audio stream, the best audio format will be merged with it. This is effectively the same as the current default, but works irrespective of whether multistreams are enabled or not.

Default selector for stdout

Currently, if you use -o -, the default selector is b/bv+ba. However, yt-dlp has had the ability to stream multiple formats to stdout (using ffmpeg) for a while. So this default will also now be changed to bv*+?ba/b. Note that this will cause the download to happen through ffmpeg. If you don't want this, you will need to give -fb.

PS: This change may be postponed due to #4478

Documentation

A lot of the format selection documentation is a mess and it would be good to take this opportunity to rewrite it. But I suspect any documentation I write will end up even less user-friendly that the current one. So any help with this would be appreciated (make a PR). If no one wants to do this, I will abandon the idea and just add the new changes to the current documentation.

I have an inital implementation of most these, but they need a lot of cleanup. I will open a PR when ready. In the mean time, I would appreciate any feedback on the above changes, and am also open to suggestions for better syntax for the new operators. I am not too happy with the current ones, but couldn't think of anything better.

Related #4553

The text was updated successfully, but these errors were encountered:

pukkandan · 2022-09-04T16:31:18Z

{bv*+ba}[filesize<1G] will work as expected

That filter is very ambiguous though. For example, in the below scenario: Audio 1-1MB Audio 2-5MB Video 3-500MB Video 4-999MB will 4+1 or 3+2 be selected (assuming that 1GB=1000MB and not 1024MB)

Even though 4+1 and 3+2 have same filesize, one of them will always be "better" than the other due to other metadata (according to any -S given). So that will be picked

Okay, but what if 4+1 has better quality video, but 3+2 has better quality audio?

It will be handled the same way pre-merged formats are; i.e. better video is prefered

pukkandan · 2022-09-12T16:49:49Z

More changes:

Custom tokenizer

The current format selector code abuses tokenize.tokenize, causing issues like #4901. I will instead be implementing a custom tokenizer for the new code

Standardization of filtering operators

While --match-filter and --format filtering use most of the same operators, they are currently implemented separately, leading to minor inconsistencies. We will standardize them into a single implementation, adding quoting and ! (not) to format filtering, but not the && operator (Use the current [filter1][filter2] syntax instead)

Kenshin9977 · 2022-10-15T23:49:00Z

I understand that {} is introduced so that it doesn't break () distributive behaviour that may be already be used.
For the same reason I think that having +? "replacing" + because of multistreams isn't the best way to proceed. While some people may use ()'s side effect, I think a lot more will be impacted by the change brought to +.
It would be better if the default stays + and + behaviour stays the same. +? (or why not ++?) would the operator used for multistreams selection.

pukkandan · 2023-02-09T17:05:15Z

I think a lot more will be impacted by the change brought to +.

I don't think so, but I like your proposal anyway

pukkandan · 2023-02-09T17:32:00Z

Another potential change that has been proposed multiple times is #5629

Prioritize AV1

One of the other weird quirks of yt-dlp's default format selection is the de-prioritization of av1 and DV

yt-dlp/README.md

Line 1540 in f14c233

    
           Note that the default has `vcodec:vp9.2`; i.e. `av1` is not preferred. Similarly, the default for hdr is `hdr:12`; i.e. dolby vision is not preferred. These choices are made since DV and AV1 formats are not yet fully compatible with most devices. This may be changed in the future as more devices become capable of smoothly playing back these formats.

Since playing DV is pretty niche and playing it on anything that's not compatible will significantly degrade quality, I think it makes sense to leave it de-prioritized permanently.

On the other hand, the AV1 exception was always intended to be removed eventually. It still don't have out of the box support on many systems¹, but most sensible players does already support it. Hardware support is still lacking, but that is not a
huge concern if software decoders are good enough.

So, perhaps it's time for us to make the change?

As always, it's mostly apple's fault! ↩

Jules-A · 2023-02-09T18:31:18Z

@pukkandan I think Prioritizing AV1 below 4k is definitely needed. 1440p and below is starting to become rather common and can be decoded on CPU fine for many people, it's 4k that really needs hardware decode which most people don't have.

gamer191 · 2023-05-06T11:15:18Z

I think a lot more will be impacted by the change brought to +.

I don't think so, but I like your proposal anyway

I'm confused. Is yt-dlp going to use @Kenshin9977's proposal? If yes, the initial post should be updated imo

pukkandan · 2023-05-06T16:52:28Z

I think a lot more will be impacted by the change brought to +.

I don't think so, but I like your proposal anyway

I'm confused. Is yt-dlp going to use Kenshin9977's proposal? If yes, the initial post should be updated imo

Currently undecided.

pukkandan · 2023-06-17T22:44:41Z

@kasper93 There is a separate issue open for it already (and I'm working on it rn). This is not place to discuss site specific format issues.

krackers · 2023-07-30T23:14:00Z

@pukkandan With regard to "Standardization of filtering operators," is there a plan to add support for logical OR of conditions in format filtering? Logical OR for match-filter was added in #3144, but currently there is no mechanism to pick the best video out of those that match either condition A or B.

It was implemented for match-filter by allowing repeated args, but this wouldn't work for format-filtering so it seems the cleanest option is to introduce a new explicit || operator so we can do [fmt1 || fmt2].

pukkandan · 2023-07-31T08:32:49Z

bv[A]/bv[B] already work. The new parser will also support bv([A]/[B]). Adding a new operator is unnecessary

krackers · 2023-07-31T18:58:45Z

I thought / specifies formats in priority of order though. So bv[A]/bv[B] takes the best video satisfying A, and if nothing matches then it falls back to best video satisfying B.

This is not the same as bv[A || B], which considers all videos matching A, union those matching B and takes the best out of all of those.

Concrete examples:

format 1: matches A, 720p
format 2: matches B, 1080p

bv[A]/bv[B] would select format 1 (720p). bv[A || B] would select format 2 (1080p)

MrRawes · 2023-08-12T09:43:51Z

note that on youtube vp9.2 has been found to be better then hdr av1
TheFrenchGhosty/TheFrenchGhostys-Ultimate-YouTube-DL-Scripts-Collection#14

lvqcl · 2023-08-12T12:24:49Z

That was 3 years ago though. Maybe AV1 HDR encoding quality has improved since then.

P.S. but imho SDR should be preferred over HDR by default.

Changelog available [here](https://github.com/yt-dlp/yt-dlp/releases/tag/2022.10.04) **Note:** There are some small changes coming to format selection syntax and defaults in a release or two. See [here](yt-dlp/yt-dlp#4846) for details. Signed-off-by: Thomas Staudinger <Staudi.Kaos@gmail.com>

DHandspikerWade · 2023-11-18T16:32:36Z

The most recent MacOS release (Sonoma) adds AV1 support and the new M3 processors have hardware decoders for the codec. This would likely remove the Apple blockers mentioned in #4846 (comment) going forward.

kepstin · 2023-12-10T21:12:51Z

The particular use case I have is that I'd like to download youtube videos with the best audio formats in all of the available languages out of en, ja, and id merged - but if none of those languages are available then merge in the video's default language instead.

i.e. I want to build a format string that expresses:

Merge in bestaudio[language=en], or skip if nothing matched
Merge in bestaudio[langauge=ja], or skip if nothing matched
Merge in bestaudio[language=id], or skip if nothing matched
If the output doesn't already have an audio track, then merge in bestaudio

Right now, as far as I've been able to figure out, the only way to do this is to generate all the possible combinations of languages in a chain of fallbacks, so this is what I have listed in my yt-dlp config: --format bestvideo+((bestaudio[language=en]+bestaudio[language=ja]+bestaudio[language=id])/(bestaudio[language=en]+bestaudio[language=ja])/(bestaudio[language=ja]+bestaudio[language=id])/(bestaudio[language=en]+bestaudio[language=id])/bestaudio[language=en]/bestaudio[language=ja]/bestaudio[language=id]/bestaudio)

It's a good thing I only have 3, because this is a combinatorial explosion situation :/

Alternatively, or perhaps additionally, a user-friendly --audio-langs option where someone can provide a list of audio languages to attempt to download (plus some way to specify whether to fallback or error if none of the specified languages were available) would be really nice to have.

pukkandan · 2023-12-11T00:29:12Z

@kepstin Your use-case will be handled by the "of each" operator. However, this project is currently on hold due to a lack of time.

pukkandan added the discussion/announcement label Sep 4, 2022

This was referenced Sep 4, 2022

DISCUSSIONS #3765

Open

[Feature Request] Sites offering alternative simultaneous media streams #389

Closed

This comment was marked as resolved.

Sign in to view

pukkandan pinned this issue Oct 4, 2022

Kenshin9977 mentioned this issue Oct 16, 2022

Update and rewrite FORMAT SELECTION section #5253

Open

Owez mentioned this issue Jan 7, 2023

Add format config option and improve download reliability Owez/yark#57

Closed

pukkandan mentioned this issue Jan 7, 2023

[extractor/crunchyroll] Add is_subbed and is_dubbed to extractors #5878

Closed

9 tasks

pukkandan unpinned this issue Feb 16, 2023

This comment was marked as spam.

Sign in to view

This comment was marked as resolved.

Sign in to view

This was referenced Apr 16, 2023

--sub-format all #6832

Closed

Default to downloading AV1 video #5629

Open

yt-dlp missing quality of YouTube web #6858

Closed

pukkandan mentioned this issue May 5, 2023

format selection: allaudio selects all audio formats #6990

Closed

9 tasks

This comment was marked as resolved.

Sign in to view

This comment was marked as off-topic.

Sign in to view

gamer191 mentioned this issue Jun 18, 2023

Is it normal that b[height<=480] gives me a better quality video than b[filesize<25M] ? #7341

Closed

9 tasks

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Upcoming changes to format selection #4846

[Discussion] Upcoming changes to format selection #4846

pukkandan commented Sep 4, 2022 •

edited

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

pukkandan commented Sep 4, 2022 •

edited

pukkandan commented Sep 12, 2022

Kenshin9977 commented Oct 15, 2022 •

edited

pukkandan commented Feb 9, 2023

pukkandan commented Feb 9, 2023 •

edited

Jules-A commented Feb 9, 2023

This comment was marked as spam.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

gamer191 commented May 6, 2023

pukkandan commented May 6, 2023

This comment was marked as resolved.

This comment was marked as off-topic.

pukkandan commented Jun 17, 2023 •

edited

This comment was marked as off-topic.

krackers commented Jul 30, 2023 •

edited

pukkandan commented Jul 31, 2023

krackers commented Jul 31, 2023 •

edited

MrRawes commented Aug 12, 2023 •

edited

lvqcl commented Aug 12, 2023 •

edited

DHandspikerWade commented Nov 18, 2023

kepstin commented Dec 10, 2023 •

edited

pukkandan commented Dec 11, 2023

[Discussion] Upcoming changes to format selection #4846

[Discussion] Upcoming changes to format selection #4846

Comments

pukkandan commented Sep 4, 2022 • edited

If you are here only to check for changes to the defaults, just read "Multistreams" and "Default selector for stdout" sections

"maybe merge" operator

"of each" operator

"type" field

Filterable groups

Multistreams

Default selector

Default selector for stdout

Documentation

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

pukkandan commented Sep 4, 2022 • edited

pukkandan commented Sep 12, 2022

More changes:

Custom tokenizer

Standardization of filtering operators

Kenshin9977 commented Oct 15, 2022 • edited

pukkandan commented Feb 9, 2023

pukkandan commented Feb 9, 2023 • edited

Prioritize AV1

Footnotes

Jules-A commented Feb 9, 2023

This comment was marked as spam.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

gamer191 commented May 6, 2023

pukkandan commented May 6, 2023

This comment was marked as resolved.

This comment was marked as off-topic.

pukkandan commented Jun 17, 2023 • edited

This comment was marked as off-topic.

krackers commented Jul 30, 2023 • edited

pukkandan commented Jul 31, 2023

krackers commented Jul 31, 2023 • edited

MrRawes commented Aug 12, 2023 • edited

lvqcl commented Aug 12, 2023 • edited

DHandspikerWade commented Nov 18, 2023

kepstin commented Dec 10, 2023 • edited

pukkandan commented Dec 11, 2023

pukkandan commented Sep 4, 2022 •

edited

pukkandan commented Sep 4, 2022 •

edited

Kenshin9977 commented Oct 15, 2022 •

edited

pukkandan commented Feb 9, 2023 •

edited

pukkandan commented Jun 17, 2023 •

edited

krackers commented Jul 30, 2023 •

edited

krackers commented Jul 31, 2023 •

edited

MrRawes commented Aug 12, 2023 •

edited

lvqcl commented Aug 12, 2023 •

edited

kepstin commented Dec 10, 2023 •

edited