Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ie/common] Support ranges in MPD #8711

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kinolaev
Copy link

@kinolaev kinolaev commented Dec 3, 2023

Description of your pull request and other information

Support for Initialization@range and SegmentURL@mediaRange attributes in MPD files. When these attributes are present, yt-dlp should download only the specified portion of the associated media file.

Related PR #5661. segment_urls were renamed to segments as suggested there.

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

@seproDev seproDev added the enhancement New feature or request label Dec 4, 2023
@bashonly bashonly self-requested a review December 6, 2023 18:52
@iFUCKINGHATEcomputers
Copy link

This MPD issue affects Vbox7 and needs to be fixed urgently - on February 22nd Vbox7 will be privating nearly all their videos. Archivists and regular users need an easy way to back up videos/channels, whether individually or en masse.

https://www.reddit.com/r/Archiveteam/comments/19d6ou7/video_platform_vbox7_is_about_to_restrict_access/

Comment on lines 2616 to 2624
def parse_range(byte_range):
if isinstance(byte_range, str):
splitted_byte_range = byte_range.split('-')
if len(splitted_byte_range) == 2:
return {
'start': int(splitted_byte_range[0]),
'end': int(splitted_byte_range[1]) + 1,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR parses the byte range in order to fit what is expected by the fragment downloader (where the 1 added to the range end is subtracted). However, the DASH spec (ISO 23009) specifically says that the range values are strings that conform to the HTTP specification for use in the Range header, so the processing here is not strictly necessary. Instead, the range could be passed into the HTTP header in the DASH downloader, as in ytdl-org/youtube-dl#30279.

On a style note, splitted isn't a normal English formation. The past participle of split is also split, and so is the past tense: "the range was split in two", "a colon never split the start and end of the range",

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @dirkf! Thank you for your feedback!

Although it's possible to pass a range string directly to http headers, I decided to parse it not only because the fragment downloader doesn't support range strings. I found this in some MPD files:

<SegmentList timescale='1000' duration='4000'>
  <Initialization sourceURL='https://example.com/2160p.mp4' range='36-800'/>
  <SegmentURL media='https://example.com/2160p.mp4' mediaRange='801-379516'/>
  <SegmentURL media='https://example.com/2160p.mp4' mediaRange='379517-656206'/>
  <!-- SegmentURLs -->
</SegmentList>

So the same file is referenced several times with different ranges. In this case we can optimize the extractor by merging consecutive fragments where the URLs are the same and the start of the range is equal to the end of the previous range. In my case one file was referenced about 100 times and I believe by replacing 100 small requests with 1 large we could significantly reduce the downloading time. I hope to submit a PR with this optimization sometime, that's why I would like to stay with current byte_range format. What do you think?

Thank you for your note on language! I'm really appreciate it because I'm actively learning English now) I copied that variable name from downloader/hls.py and it seems there are several more variables with splitted_ in their names. Maybe we can merge it as is and then correct all such spelling mistakes?) I can do it a bit later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @dirkf! I added the optimization described above and fixed the spelling mistake. Could you please review the updated PR?

@kinolaev kinolaev force-pushed the feat-dash-byte-range branch 3 times, most recently from b92c56c to 0cbeb90 Compare February 1, 2024 15:02
@dirkf
Copy link
Contributor

dirkf commented Feb 1, 2024

Have a look at the new yt-dl PR if you like. Maybe the test cases will be useful, even if nothing else.

@Grub4K Grub4K mentioned this pull request Mar 24, 2024
6 tasks
@pukkandan pukkandan added the solved-upstream The issue has been solved in youtube-dl. Do not make PR for this label Mar 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request solved-upstream The issue has been solved in youtube-dl. Do not make PR for this
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants