[ie/common] Support ranges in MPD #8711

kinolaev · 2023-12-03T22:56:53Z

Description of your pull request and other information

Support for Initialization@range and SegmentURL@mediaRange attributes in MPD files. When these attributes are present, yt-dlp should download only the specified portion of the associated media file.

Related PR #5661. segment_urls were renamed to segments as suggested there.

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

iFUCKINGHATEcomputers · 2024-01-23T12:57:29Z

This MPD issue affects Vbox7 and needs to be fixed urgently - on February 22nd Vbox7 will be privating nearly all their videos. Archivists and regular users need an easy way to back up videos/channels, whether individually or en masse.

https://www.reddit.com/r/Archiveteam/comments/19d6ou7/video_platform_vbox7_is_about_to_restrict_access/

dirkf · 2024-01-24T04:31:53Z

yt_dlp/extractor/common.py

+            def parse_range(byte_range):
+                if isinstance(byte_range, str):
+                    splitted_byte_range = byte_range.split('-')
+                    if len(splitted_byte_range) == 2:
+                        return {
+                            'start': int(splitted_byte_range[0]),
+                            'end': int(splitted_byte_range[1]) + 1,
+                        }


The PR parses the byte range in order to fit what is expected by the fragment downloader (where the 1 added to the range end is subtracted). However, the DASH spec (ISO 23009) specifically says that the range values are strings that conform to the HTTP specification for use in the Range header, so the processing here is not strictly necessary. Instead, the range could be passed into the HTTP header in the DASH downloader, as in ytdl-org/youtube-dl#30279.

On a style note, splitted isn't a normal English formation. The past participle of split is also split, and so is the past tense: "the range was split in two", "a colon never split the start and end of the range",

Hello @dirkf! Thank you for your feedback!

Although it's possible to pass a range string directly to http headers, I decided to parse it not only because the fragment downloader doesn't support range strings. I found this in some MPD files:

<SegmentList timescale='1000' duration='4000'> <Initialization sourceURL='https://example.com/2160p.mp4' range='36-800'/> <SegmentURL media='https://example.com/2160p.mp4' mediaRange='801-379516'/> <SegmentURL media='https://example.com/2160p.mp4' mediaRange='379517-656206'/>  </SegmentList>

So the same file is referenced several times with different ranges. In this case we can optimize the extractor by merging consecutive fragments where the URLs are the same and the start of the range is equal to the end of the previous range. In my case one file was referenced about 100 times and I believe by replacing 100 small requests with 1 large we could significantly reduce the downloading time. I hope to submit a PR with this optimization sometime, that's why I would like to stay with current byte_range format. What do you think?

Thank you for your note on language! I'm really appreciate it because I'm actively learning English now) I copied that variable name from downloader/hls.py and it seems there are several more variables with splitted_ in their names. Maybe we can merge it as is and then correct all such spelling mistakes?) I can do it a bit later.

Hello @dirkf! I added the optimization described above and fixed the spelling mistake. Could you please review the updated PR?

dirkf · 2024-02-01T22:44:01Z

Have a look at the new yt-dl PR if you like. Maybe the test cases will be useful, even if nothing else.

kinolaev force-pushed the feat-dash-byte-range branch from a922e03 to 2f71dd3 Compare December 3, 2023 22:58

seproDev added the enhancement New feature or request label Dec 4, 2023

bashonly self-requested a review December 6, 2023 18:52

iFUCKINGHATEcomputers mentioned this pull request Jan 23, 2024

Urgent - Vbox7 support needs to be fixed, so that archivists can save nearly all its' videos before they're permanently lost on Feb 22 ytdl-org/youtube-dl#32701

Closed

5 tasks

dirkf reviewed Jan 24, 2024

View reviewed changes

kinolaev force-pushed the feat-dash-byte-range branch 3 times, most recently from b92c56c to 0cbeb90 Compare February 1, 2024 15:02

[ie/common] Support ranges in MPD

484b391

kinolaev force-pushed the feat-dash-byte-range branch from 0cbeb90 to 484b391 Compare February 1, 2024 20:26

Grub4K mentioned this pull request Mar 24, 2024

Backports from youtube-dl #9523

Closed

6 tasks

pukkandan added the solved-upstream The issue has been solved in youtube-dl. Do not make PR for this label Mar 24, 2024

bashonly mentioned this pull request May 26, 2024

[Help needed] Merging pending changes from youtube-dl #21

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ie/common] Support ranges in MPD #8711

[ie/common] Support ranges in MPD #8711

kinolaev commented Dec 3, 2023

iFUCKINGHATEcomputers commented Jan 23, 2024

dirkf Jan 24, 2024

kinolaev Jan 24, 2024

kinolaev Feb 1, 2024

dirkf commented Feb 1, 2024

[ie/common] Support ranges in MPD #8711

Are you sure you want to change the base?

[ie/common] Support ranges in MPD #8711

Conversation

kinolaev commented Dec 3, 2023

Description of your pull request and other information

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

What is the purpose of your pull request?

iFUCKINGHATEcomputers commented Jan 23, 2024

dirkf Jan 24, 2024

Choose a reason for hiding this comment

kinolaev Jan 24, 2024

Choose a reason for hiding this comment

kinolaev Feb 1, 2024

Choose a reason for hiding this comment

dirkf commented Feb 1, 2024