Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ie/niconico] Directly download live timeshift videos; WebSocket fixes #9411

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

pzhlkj6612
Copy link
Contributor

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

This PR should not block the incoming release (see Kanban).

Summary

Major changes:

  • Make a downloader for live timeshift videos. Time-based download rate limit applies. RetryManager-based error recovery applies.
  • Fix the incorrect url for WebSocket reconnection.
  • Correctly close the WebSocket connection.
  • [!] Apply "FFmpegFixupM3u8PP" for both non-timeshift and timeshift MPEG-TS files by adding "m3u8_*" prefixes and inheriting from "HlsFD".
  • [!] Change the protocol from "hls+fmp4" to "hls" in "startWatching" WebSocket requests because I didn't see it in my test.

Minor changes:

  • Support metadata extraction when no formats.
  • Set "live_status" instead of "is_live".
  • Clean up "info_dict": Change WebSocket configs to private to hide them from users; extract common fields and remove unused ones.
  • Update a download test.

Related PR:

Test

To test this PR:

About the new downloader

Design

For live and timeshift videos on Niconico Live (ニコニコ生放送), the media playlists are always dynamic. Our FFmpeg downloader works well with them. However, for timeshift ones, the MPEG-TS fragments are actually VOD, so we can download it via HTTP instead of FFmpeg.

Niconico server expects a "start" field in the manifest playlist request. The value of that field is the playback position (in seconds) of a video. That is, requesting with different values gives us fragments at different time points. I guess this key might be used by the resume mechanism of Niconico player [1].

Downloading many fragments without delay will result in HTTP 403. That's apparently rate limit exceeded. In this PR, the download speed is limited by fragment length and download time.

Downloading fragments without an active WebSocket connection will also cause HTTP 403. That's the authorization way of Niconico. Due to network jitters and other exceptions, the WebSocket connection needs to be re-established. If the server refreshes the manifest playlist url, all subsequent requests with previous urls will be HTTP 403. That's why I protect the playlist with a lock.

[1]: In browser's DevTools, search "beginning_timestamp" in the "stream_sync.json" file.

For "FFmpegFixupM3u8PP"

This is totally a hack. I think there could be a better way to do so.

  • Derive the class from "HlsFD"

yt-dlp/yt_dlp/YoutubeDL.py

Lines 3498 to 3501 in 263a4b5

ffmpeg_fixup(downloader == 'hlsnative' and not self.params.get('hls_use_mpegts')
or info_dict.get('is_live') and self.params.get('hls_use_mpegts') is None,
'Possible MPEG-TS in MP4 container or malformed AAC timestamps',
FFmpegFixupM3u8PP)

  • Add the "m3u8_*" prefixes

class FFmpegFixupM3u8PP(FFmpegFixupPostProcessor):
def _needs_fixup(self, info):
yield info['ext'] in ('mp4', 'm4a')
yield info['protocol'].startswith('m3u8')

Copy-Paste-oriented programming

  • RetryManager

retry_manager = RetryManager(self.params.get('fragment_retries'), self.report_retry,
frag_index=frag_index, fatal=not skip_unavailable_fragments)

  • Fragment processing

def download_and_append_fragments(
self, ctx, fragments, info_dict, *, is_fatal=(lambda idx: False),
pack_func=(lambda content, idx: content), finish_func=None,
tpe=None, interrupt_trigger=(True, )):

.


Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Major changes:

- Make a downloader for live timeshift videos. Time-based download rate
  limit applies. RetryManager-based error recovery applies.
- Fix the incorrect url for WebSocket reconnection.
- Correctly close the WebSocket connection.
- [!] Apply "FFmpegFixupM3u8PP" for both non-timeshift and timeshift
  MPEG-TS files by adding "m3u8_*" prefixes and inheriting from "HlsFD".
- [!] Change the protocol from "hls+fmp4" to "hls" in "startWatching"
  WebSocket requests because I didn't see it in my test.

Minor changes:

- Support metadata extraction when no formats.
- Set "live_status" instead of "is_live".
- Clean up "info_dict": Change WebSocket configs to private to hide them
  from users; extract common fields and remove unused ones.
- Update a download test.

def communicate_ws(reconnect):
if reconnect:
ws = self.ydl.urlopen(Request(ws_url, headers={'Origin': f'https://{ws_origin_host}'}))
self.ws = self.ydl.urlopen(Request(
self.ws.url, headers={'Origin': self.ws.wsw.request.headers['Origin']}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wsw is internal only (part of the websockets library handler) and not part of the websocket response interface, so this will break when we introduce a new library.

(Sorry I've been needing to rename it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to that, if it is needed we can prob add the original Request object to WebSocket responses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @coletdjnz ! Thanks for your comment.

As a high-level class, using internal things is not OK. In the original code, the hostname comes from IE. I can change it to that.

# Info Extractor
def _real_extract(self, url):
  return {
    "__ws": {
      "ws": ws,
      "origin": f'https://{hostname}',
    },
  }

# Downloader
self.ws.url, headers={'Origin': self.ws['origin']}))

Your opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in http_headers infodict property?

Otherwise I'd say that is probably fine too, since it's internal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in http_headers infodict property?

Done. Please see 41c6125 .

@pukkandan pukkandan added enhancement New feature or request site-enhancement Feature request for some website core-triage triage requested from a core dev labels Mar 10, 2024
@pzhlkj6612
Copy link
Contributor Author

It seems that I can't catch a part of TransportError exceptions in the loop of RetryManager ? That error handling code works well most of the times, though. I'm at 7398a7c .

[debug] File locking is not supported. Proceeding without locking
[download]  ...
[download]  ...
[download]  ...
ERROR: Unable to download media m3u8: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) (caused by TransportError("('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))"))
  File "...\.venv\Lib\site-packages\urllib3\connectionpool.py", line 793, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "...\.venv\Lib\site-packages\urllib3\connectionpool.py", line 537, in _make_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "...\.venv\Lib\site-packages\urllib3\connection.py", line 466, in getresponse
    httplib_response = super().getresponse()
                       ^^^^^^^^^^^^^^^^^^^^^
  File "...\Python312\Lib\http\client.py", line 1419, in getresponse
    response.begin()
  File "...\Python312\Lib\http\client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "...\Python312\Lib\http\client.py", line 300, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "...\.venv\Lib\site-packages\requests\adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "...\.venv\Lib\site-packages\urllib3\connectionpool.py", line 847, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "...\.venv\Lib\site-packages\urllib3\util\retry.py", line 445, in increment
    raise reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\.venv\Lib\site-packages\urllib3\util\util.py", line 38, in reraise
    raise value.with_traceback(tb)
  File "...\.venv\Lib\site-packages\urllib3\connectionpool.py", line 793, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "...\.venv\Lib\site-packages\urllib3\connectionpool.py", line 537, in _make_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "...\.venv\Lib\site-packages\urllib3\connection.py", line 466, in getresponse
    httplib_response = super().getresponse()
                       ^^^^^^^^^^^^^^^^^^^^^
  File "...\Python312\Lib\http\client.py", line 1419, in getresponse
    response.begin()
  File "...\Python312\Lib\http\client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "...\Python312\Lib\http\client.py", line 300, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "...\yt_dlp\networking\_requests.py", line 314, in _send
    requests_res = session.request(
                   ^^^^^^^^^^^^^^^^
  File "...\.venv\Lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\.venv\Lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\.venv\Lib\site-packages\requests\adapters.py", line 501, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "...\yt_dlp\extractor\common.py", line 864, in _request_webpage
    return self._downloader.urlopen(self._create_request(url_or_request, data, headers, query))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\yt_dlp\YoutubeDL.py", line 4101, in urlopen
    return self._request_director.send(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\yt_dlp\networking\common.py", line 115, in send
    response = handler.send(request)
               ^^^^^^^^^^^^^^^^^^^^^
  File "...\yt_dlp\networking\_helper.py", line 204, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\yt_dlp\networking\common.py", line 326, in send
    return self._send(request)
           ^^^^^^^^^^^^^^^^^^^
  File "...\yt_dlp\networking\_requests.py", line 338, in _send
    raise TransportError(cause=e) from e
yt_dlp.networking.exceptions.TransportError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Can anyone point me out? Thanks.

Comment on lines 171 to 190
class NiconicoLiveFD(NiconicoLiveBaseFD):
""" Downloads niconico live without being stopped """

def real_download(self, filename, info_dict):
with self._ws_context(info_dict):
new_info_dict = info_dict.copy()
new_info_dict.update({
'protocol': 'm3u8',
})

return FFmpegFD(self.ydl, self.params or {}).download(filename, new_info_dict)


class NiconicoLiveTimeshiftFD(NiconicoLiveBaseFD, HlsFD):
""" Downloads niconico live timeshift VOD """

_PER_FRAGMENT_DOWNLOAD_RATIO = 0.1

def real_download(self, filename, info_dict):
with self._ws_context(info_dict) as ws_context:
Copy link
Member

@pukkandan pukkandan Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to avoid adding more "protocols". Can't we keep the single protocol and do something like:

Suggested change
class NiconicoLiveFD(NiconicoLiveBaseFD):
""" Downloads niconico live without being stopped """
def real_download(self, filename, info_dict):
with self._ws_context(info_dict):
new_info_dict = info_dict.copy()
new_info_dict.update({
'protocol': 'm3u8',
})
return FFmpegFD(self.ydl, self.params or {}).download(filename, new_info_dict)
class NiconicoLiveTimeshiftFD(NiconicoLiveBaseFD, HlsFD):
""" Downloads niconico live timeshift VOD """
_PER_FRAGMENT_DOWNLOAD_RATIO = 0.1
def real_download(self, filename, info_dict):
with self._ws_context(info_dict) as ws_context:
class NiconicoLiveFD(NiconicoLiveBaseFD):
"""Downloads niconico live/timeshift VOD"""
_PER_FRAGMENT_DOWNLOAD_RATIO = 0.1
def real_download(self, filename, info_dict):
with self._ws_context(info_dict) as ws_context:
if info_dict.get('is_live'):
info_dict = info_dict.copy()
info_dict['protocol'] = 'm3u8'
return FFmpegFD(self.ydl, self.params or {}).download(filename, info_dict)

@pukkandan
Copy link
Member

pukkandan commented Mar 11, 2024

  • [!] Apply "FFmpegFixupM3u8PP" for both non-timeshift and timeshift MPEG-TS files by adding "m3u8_*" prefixes and inheriting from "HlsFD".

Since the live videos are being downloaded by ffmpeg, they won't need fixup, no?

For "FFmpegFixupM3u8PP"

This is totally a hack. I think there could be a better way to do so.

This is not a good solution. Just add the new conditions to the fixup. Something like or downloader=='niconicolive' and is_live.

I'm ambivalent about the protocol name change, but definitely don't inherit from HlsFD. Inheriting from concrete classes tends to leave a mess. Inherit from FragmentFD if you need to.

@pukkandan pukkandan removed the core-triage triage requested from a core dev label Mar 11, 2024
Comment on lines +258 to +268
class DurationLimiter():
def __init__(self, target):
self.target = target

def __enter__(self):
self.start = time.time()

def __exit__(self, *exc):
remaining = self.target - (time.time() - self.start)
if remaining > 0:
time.sleep(remaining)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, this is cleaner inline than as a context manager. But just personal preference. I wont force you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I tried to inline the logic but got a bit more complicated code lines with additional comments, so gave up.

yt_dlp/downloader/niconico.py Show resolved Hide resolved
pzhlkj6612 and others added 3 commits March 12, 2024 15:06
- Use "downloader_options" to pass options used by the downloader.
- Combine the two downloaders into one.
- Don't inherit from "HlsFD".

Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
aka "--load-info".

Don't save a Response object to info JSON. Just create a new WebSocket
  connection during the download. Due to Niconico's logic, the manifest
  m3u8 url will be unusable soon if there is no active WebSocket
  connection, so the reconnection will give us a valid manifest m3u8,
  unless the WebSocket url has already expired.
@pzhlkj6612
Copy link
Contributor Author

@pukkandan ,

Since the live videos are being downloaded by ffmpeg, they won't need fixup, no?

Not really.

yt-dlp/yt_dlp/YoutubeDL.py

Lines 3498 to 3501 in 263a4b5

ffmpeg_fixup(downloader == 'hlsnative' and not self.params.get('hls_use_mpegts')
or info_dict.get('is_live') and self.params.get('hls_use_mpegts') is None,
'Possible MPEG-TS in MP4 container or malformed AAC timestamps',
FFmpegFixupM3u8PP)

class FFmpegFixupM3u8PP(FFmpegFixupPostProcessor):
def _needs_fixup(self, info):
yield info['ext'] in ('mp4', 'm4a')
yield info['protocol'].startswith('m3u8')

If we are downloading a normal livestream (e.g., an m3u8) with "FFmpegFD" (redirected from "HlsFD"), the video file will be post-processed by "FFmpegFixupM3u8PP". This is because info_dict.get('is_live') == True and info['protocol'] == 'm3u8_native' . For Niconico live, the protocol name check will not pass.


This is not a good solution. Just add the new conditions to the fixup. Something like or downloader=='niconicolive' and is_live.

I agree, but that's incomplete. We need this:

// YoutubeDL.py

- ffmpeg_fixup(downloader == 'hlsnative' and not self.params.get('hls_use_mpegts')
-              or info_dict.get('is_live') and self.params.get('hls_use_mpegts') is None,
+ ffmpeg_fixup(not self.params.get('hls_use_mpegts')
+              and (downloader in ('hlsnative', 'niconico_live') or info_dict.get('is_live')),
               'Possible MPEG-TS in MP4 container or malformed AAC timestamps',
               FFmpegFixupM3u8PP)

// postprocessor/ffmpeg.py

  class FFmpegFixupM3u8PP(FFmpegFixupPostProcessor): 
      def _needs_fixup(self, info): 
          yield info['ext'] in ('mp4', 'm4a') 
-         yield info['protocol'].startswith('m3u8') 
+         yield info['protocol'].startswith('m3u8') or info['protocol'] == 'niconico_live'

Your opinion?

@pzhlkj6612
Copy link
Contributor Author

pzhlkj6612 commented Mar 13, 2024

Another question:

yt-dlp/yt_dlp/YoutubeDL.py

Lines 3498 to 3501 in 263a4b5

ffmpeg_fixup(downloader == 'hlsnative' and not self.params.get('hls_use_mpegts')
or info_dict.get('is_live') and self.params.get('hls_use_mpegts') is None,
'Possible MPEG-TS in MP4 container or malformed AAC timestamps',
FFmpegFixupM3u8PP)

I don't understand the difference between not self.params.get('hls_use_mpegts') and self.params.get('hls_use_mpegts') is None ? The None and False are both falsy, why they are different here?

yt-dlp/yt_dlp/options.py

Lines 1002 to 1015 in 263a4b5

downloader.add_option(
'--hls-use-mpegts',
dest='hls_use_mpegts', action='store_true', default=None,
help=(
'Use the mpegts container for HLS videos; '
'allowing some players to play the video while downloading, '
'and reducing the chance of file corruption if download is interrupted. '
'This is enabled by default for live streams'))
downloader.add_option(
'--no-hls-use-mpegts',
dest='hls_use_mpegts', action='store_false',
help=(
'Do not use the mpegts container for HLS videos. '
'This is default when not downloading live streams'))

hls_use_mpegts could be True, False and None .

@pzhlkj6612 pzhlkj6612 marked this pull request as draft March 27, 2024 14:16
@pzhlkj6612 pzhlkj6612 marked this pull request as ready for review March 27, 2024 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request site-enhancement Feature request for some website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants