New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ie/niconico] Directly download live timeshift videos; WebSocket fixes #9411
base: master
Are you sure you want to change the base?
[ie/niconico] Directly download live timeshift videos; WebSocket fixes #9411
Conversation
Major changes: - Make a downloader for live timeshift videos. Time-based download rate limit applies. RetryManager-based error recovery applies. - Fix the incorrect url for WebSocket reconnection. - Correctly close the WebSocket connection. - [!] Apply "FFmpegFixupM3u8PP" for both non-timeshift and timeshift MPEG-TS files by adding "m3u8_*" prefixes and inheriting from "HlsFD". - [!] Change the protocol from "hls+fmp4" to "hls" in "startWatching" WebSocket requests because I didn't see it in my test. Minor changes: - Support metadata extraction when no formats. - Set "live_status" instead of "is_live". - Clean up "info_dict": Change WebSocket configs to private to hide them from users; extract common fields and remove unused ones. - Update a download test.
yt_dlp/downloader/niconico.py
Outdated
|
||
def communicate_ws(reconnect): | ||
if reconnect: | ||
ws = self.ydl.urlopen(Request(ws_url, headers={'Origin': f'https://{ws_origin_host}'})) | ||
self.ws = self.ydl.urlopen(Request( | ||
self.ws.url, headers={'Origin': self.ws.wsw.request.headers['Origin']})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wsw
is internal only (part of the websockets library handler) and not part of the websocket response interface, so this will break when we introduce a new library.
(Sorry I've been needing to rename it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to that, if it is needed we can prob add the original Request object to WebSocket responses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @coletdjnz ! Thanks for your comment.
As a high-level class, using internal things is not OK. In the original code, the hostname comes from IE. I can change it to that.
# Info Extractor
def _real_extract(self, url):
return {
"__ws": {
"ws": ws,
"origin": f'https://{hostname}',
},
}
# Downloader
self.ws.url, headers={'Origin': self.ws['origin']}))
Your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps in http_headers
infodict property?
Otherwise I'd say that is probably fine too, since it's internal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps in
http_headers
infodict property?
Done. Please see 41c6125 .
It seems that I can't catch a part of
Can anyone point me out? Thanks. |
yt_dlp/downloader/niconico.py
Outdated
class NiconicoLiveFD(NiconicoLiveBaseFD): | ||
""" Downloads niconico live without being stopped """ | ||
|
||
def real_download(self, filename, info_dict): | ||
with self._ws_context(info_dict): | ||
new_info_dict = info_dict.copy() | ||
new_info_dict.update({ | ||
'protocol': 'm3u8', | ||
}) | ||
|
||
return FFmpegFD(self.ydl, self.params or {}).download(filename, new_info_dict) | ||
|
||
|
||
class NiconicoLiveTimeshiftFD(NiconicoLiveBaseFD, HlsFD): | ||
""" Downloads niconico live timeshift VOD """ | ||
|
||
_PER_FRAGMENT_DOWNLOAD_RATIO = 0.1 | ||
|
||
def real_download(self, filename, info_dict): | ||
with self._ws_context(info_dict) as ws_context: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to avoid adding more "protocols". Can't we keep the single protocol and do something like:
class NiconicoLiveFD(NiconicoLiveBaseFD): | |
""" Downloads niconico live without being stopped """ | |
def real_download(self, filename, info_dict): | |
with self._ws_context(info_dict): | |
new_info_dict = info_dict.copy() | |
new_info_dict.update({ | |
'protocol': 'm3u8', | |
}) | |
return FFmpegFD(self.ydl, self.params or {}).download(filename, new_info_dict) | |
class NiconicoLiveTimeshiftFD(NiconicoLiveBaseFD, HlsFD): | |
""" Downloads niconico live timeshift VOD """ | |
_PER_FRAGMENT_DOWNLOAD_RATIO = 0.1 | |
def real_download(self, filename, info_dict): | |
with self._ws_context(info_dict) as ws_context: | |
class NiconicoLiveFD(NiconicoLiveBaseFD): | |
"""Downloads niconico live/timeshift VOD""" | |
_PER_FRAGMENT_DOWNLOAD_RATIO = 0.1 | |
def real_download(self, filename, info_dict): | |
with self._ws_context(info_dict) as ws_context: | |
if info_dict.get('is_live'): | |
info_dict = info_dict.copy() | |
info_dict['protocol'] = 'm3u8' | |
return FFmpegFD(self.ydl, self.params or {}).download(filename, info_dict) | |
Since the live videos are being downloaded by ffmpeg, they won't need fixup, no?
This is not a good solution. Just add the new conditions to the fixup. Something like I'm ambivalent about the protocol name change, but definitely don't inherit from |
class DurationLimiter(): | ||
def __init__(self, target): | ||
self.target = target | ||
|
||
def __enter__(self): | ||
self.start = time.time() | ||
|
||
def __exit__(self, *exc): | ||
remaining = self.target - (time.time() - self.start) | ||
if remaining > 0: | ||
time.sleep(remaining) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo, this is cleaner inline than as a context manager. But just personal preference. I wont force you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I tried to inline the logic but got a bit more complicated code lines with additional comments, so gave up.
- Use "downloader_options" to pass options used by the downloader. - Combine the two downloaders into one. - Don't inherit from "HlsFD". Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
aka "--load-info". Don't save a Response object to info JSON. Just create a new WebSocket connection during the download. Due to Niconico's logic, the manifest m3u8 url will be unusable soon if there is no active WebSocket connection, so the reconnection will give us a valid manifest m3u8, unless the WebSocket url has already expired.
Not really. Lines 3498 to 3501 in 263a4b5
yt-dlp/yt_dlp/postprocessor/ffmpeg.py Lines 888 to 891 in 263a4b5
If we are downloading a normal livestream (e.g., an m3u8) with "FFmpegFD" (redirected from "HlsFD"), the video file will be post-processed by "FFmpegFixupM3u8PP". This is because
I agree, but that's incomplete. We need this: // YoutubeDL.py
- ffmpeg_fixup(downloader == 'hlsnative' and not self.params.get('hls_use_mpegts')
- or info_dict.get('is_live') and self.params.get('hls_use_mpegts') is None,
+ ffmpeg_fixup(not self.params.get('hls_use_mpegts')
+ and (downloader in ('hlsnative', 'niconico_live') or info_dict.get('is_live')),
'Possible MPEG-TS in MP4 container or malformed AAC timestamps',
FFmpegFixupM3u8PP)
// postprocessor/ffmpeg.py
class FFmpegFixupM3u8PP(FFmpegFixupPostProcessor):
def _needs_fixup(self, info):
yield info['ext'] in ('mp4', 'm4a')
- yield info['protocol'].startswith('m3u8')
+ yield info['protocol'].startswith('m3u8') or info['protocol'] == 'niconico_live'
Your opinion? |
Another question: Lines 3498 to 3501 in 263a4b5
I don't understand the difference between Lines 1002 to 1015 in 263a4b5
|
IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information
This PR should not block the incoming release (see Kanban).
Summary
Major changes:
Minor changes:
Related PR:
Test
To test this PR:
About the new downloader
Design
For live and timeshift videos on Niconico Live (ニコニコ生放送), the media playlists are always dynamic. Our FFmpeg downloader works well with them. However, for timeshift ones, the MPEG-TS fragments are actually VOD, so we can download it via HTTP instead of FFmpeg.
Niconico server expects a "
start
" field in the manifest playlist request. The value of that field is the playback position (in seconds) of a video. That is, requesting with different values gives us fragments at different time points. I guess this key might be used by the resume mechanism of Niconico player [1].Downloading many fragments without delay will result in HTTP 403. That's apparently rate limit exceeded. In this PR, the download speed is limited by fragment length and download time.
Downloading fragments without an active WebSocket connection will also cause HTTP 403. That's the authorization way of Niconico. Due to network jitters and other exceptions, the WebSocket connection needs to be re-established. If the server refreshes the manifest playlist url, all subsequent requests with previous urls will be HTTP 403. That's why I protect the playlist with a lock.
[1]: In browser's DevTools, search "beginning_timestamp" in the "stream_sync.json" file.
For "FFmpegFixupM3u8PP"
This is totally a hack. I think there could be a better way to do so.
yt-dlp/yt_dlp/YoutubeDL.py
Lines 3498 to 3501 in 263a4b5
yt-dlp/yt_dlp/postprocessor/ffmpeg.py
Lines 888 to 891 in 263a4b5
Copy-Paste-oriented programming
yt-dlp/yt_dlp/downloader/ism.py
Lines 259 to 260 in 263a4b5
yt-dlp/yt_dlp/downloader/fragment.py
Lines 432 to 435 in 263a4b5
.
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?