-
-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ie/common] Support ranges in MPD #8711
base: master
Are you sure you want to change the base?
Conversation
a922e03
to
2f71dd3
Compare
This MPD issue affects Vbox7 and needs to be fixed urgently - on February 22nd Vbox7 will be privating nearly all their videos. Archivists and regular users need an easy way to back up videos/channels, whether individually or en masse. |
yt_dlp/extractor/common.py
Outdated
def parse_range(byte_range): | ||
if isinstance(byte_range, str): | ||
splitted_byte_range = byte_range.split('-') | ||
if len(splitted_byte_range) == 2: | ||
return { | ||
'start': int(splitted_byte_range[0]), | ||
'end': int(splitted_byte_range[1]) + 1, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR parses the byte range in order to fit what is expected by the fragment downloader (where the 1 added to the range end is subtracted). However, the DASH spec (ISO 23009) specifically says that the range values are strings that conform to the HTTP specification for use in the Range
header, so the processing here is not strictly necessary. Instead, the range could be passed into the HTTP header in the DASH downloader, as in ytdl-org/youtube-dl#30279.
On a style note, splitted
isn't a normal English formation. The past participle of split
is also split
, and so is the past tense: "the range was split in two", "a colon never split the start and end of the range",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @dirkf! Thank you for your feedback!
Although it's possible to pass a range string directly to http headers, I decided to parse it not only because the fragment downloader doesn't support range strings. I found this in some MPD files:
<SegmentList timescale='1000' duration='4000'>
<Initialization sourceURL='https://example.com/2160p.mp4' range='36-800'/>
<SegmentURL media='https://example.com/2160p.mp4' mediaRange='801-379516'/>
<SegmentURL media='https://example.com/2160p.mp4' mediaRange='379517-656206'/>
<!-- SegmentURLs -->
</SegmentList>
So the same file is referenced several times with different ranges. In this case we can optimize the extractor by merging consecutive fragments where the URLs are the same and the start of the range is equal to the end of the previous range. In my case one file was referenced about 100 times and I believe by replacing 100 small requests with 1 large we could significantly reduce the downloading time. I hope to submit a PR with this optimization sometime, that's why I would like to stay with current byte_range
format. What do you think?
Thank you for your note on language! I'm really appreciate it because I'm actively learning English now) I copied that variable name from downloader/hls.py
and it seems there are several more variables with splitted_
in their names. Maybe we can merge it as is and then correct all such spelling mistakes?) I can do it a bit later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @dirkf! I added the optimization described above and fixed the spelling mistake. Could you please review the updated PR?
b92c56c
to
0cbeb90
Compare
0cbeb90
to
484b391
Compare
Have a look at the new yt-dl PR if you like. Maybe the test cases will be useful, even if nothing else. |
Description of your pull request and other information
Support for
Initialization@range
andSegmentURL@mediaRange
attributes in MPD files. When these attributes are present,yt-dlp
should download only the specified portion of the associated media file.Related PR #5661.
segment_urls
were renamed tosegments
as suggested there.Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?