New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[extractor/nebula] Support public videos without an account and channel extractor improvements #6334
Conversation
I ran a quick test (downloading a channel) with a subscription token and everything's still A-OK. ✅ That said, I find this approach slightly problematic. By (potentially repeatedly) exploiting their "first video's on us, then you need a subscription" policy, we're basically circumventing the requirement for a subscription entirely. Or am I missing something, that allows you to do this only once? (But once per... what?) You mention that "some Nebula videos can be watched without an account", but is it really video-specific? Could you link to a video that can not be watched without an account — or even better: create a test case, so we can see what the UI message is and how it will be handled? Admittedly, they didn't put effective measures in place to stop this. So I guess they'll do that once they hear about this, and then this feature will be gone again. |
Once per "browser session" according to their implementation. It is fair game for us to use that imo. As it stands, anyone with basic knowledge of cookies can easily circumvent it. If they wanted better restrictions, they should implement it per-IP etc |
Yep, it's video specific. For example this channel (https://nebula.tv/johnnyharris) publishes his next video (Unemployment, Explained) one week early, exclusively on Nebula. Some other premium-only are:
In general if the video has a blue cross or a flash icon, a subscription is required On a side note, |
I think that's a known issue and pukkandan has already started working on it, see #5979 (comment). You might wanna contribute to that, as opposed to starting over. |
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Log of updated NebulaChannelIE [debug] Command-line config: ['https://nebula.tv/johnnyharris', '--verbose', '-F']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.02.17 [a0a7c0154] (source)
[debug] Lazy loading extractors is disabled
[debug] Git HEAD: 0b85236b4
[debug] Python 3.10.7 (CPython x86_64 64bit) - Linux-5.19.0-31-generic-x86_64-with-glibc2.36 (OpenSSL 3.0.5 5 Jul 2022, glibc 2.36)
[debug] exe versions: ffmpeg N-108931-g4dda3b1653-20221104 (setts), ffprobe N-108931-g4dda3b1653-20221104, phantomjs broken, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.15.0, certifi-2022.09.24, mutagen-1.46.0, sqlite3-2.6.0, websockets-10.4
[debug] Proxy map: {}
[debug] Loaded 1782 extractors
[nebula:channel] Authorizing to Nebula
[nebula:channel] Extracting URL: https://nebula.tv/johnnyharris
[nebula:channel] johnnyharris: Retrieving channel
[download] Downloading playlist: Johnny Harris
[nebula:channel] johnnyharris: Downloading page 1
[nebula:channel] johnnyharris: Downloading page 2
[nebula:channel] johnnyharris: Downloading page 3
[nebula:channel] johnnyharris: Downloading page 4
[nebula:channel] Playlist Johnny Harris: Downloading 90 items of 90
[download] Downloading item 1 of 90
[Nebula] Authorizing to Nebula
[Nebula] Extracting URL: https://nebula.tv/videos/johnnyharris-unemployment-explained/
[Nebula] johnnyharris-unemployment-explained: Fetching video meta data
[Nebula] johnnyharris-unemployment-explained: Fetching video stream info
[Nebula] Reauthenticating to Nebula and retrying, because last bearer call resulted in error 401
ERROR: [Nebula] johnnyharris-unemployment-explained: This video is only available for registered users. Use --username and --password, or --netrc (watchnebula) to provide account credentials
File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/common.py", line 693, in extract
ie_result = self._real_extract(url)
File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 238, in _real_extract
return self._build_video_info(video)
File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 111, in _build_video_info
fmts, subs = self._fetch_video_formats(episode['slug'])
File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 77, in _fetch_video_formats
stream_info = self._call_nebula_api(f'https://content.watchnebula.com/video/{slug}/stream/',
File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 62, in _call_nebula_api
self._perform_login()
File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 120, in _perform_login
self._nebula_api_token = self._perform_nebula_auth(username, password)
File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 27, in _perform_nebula_auth
self.raise_login_required(method='password')
File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/common.py", line 1153, in raise_login_required
raise ExtractorError(msg, expected=True)
[download] Downloading item 2 of 90
[Nebula] Extracting URL: https://nebula.tv/videos/johnnyharris-milk-is-a-lie/
[Nebula] johnnyharris-milk-is-a-lie: Fetching video meta data
[Nebula] johnnyharris-milk-is-a-lie: Fetching video stream info
[Nebula] johnnyharris-milk-is-a-lie: Downloading m3u8 information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for 63d91620a645210001b587e5:
ID EXT RESOLUTION FPS │ FILESIZE TBR PROTO │ VCODEC VBR ACODEC MORE INFO
──────────────────────────────────────────────────────────────────────────────────────────────────────────
audio-English m3u8 audio only │ m3u8 │ audio only unknown [en] English
606 mp4 640x360 24 │ ~124.51MiB 606k m3u8 │ avc1.42C01E 606k video only
1359 mp4 960x540 24 │ ~279.16MiB 1360k m3u8 │ avc1.42C028 1360k video only
3072 mp4 1280x720 24 │ ~630.80MiB 3072k m3u8 │ avc1.640029 3072k video only
3197 mp4 1920x1080 24 │ ~656.44MiB 3197k m3u8 │ hvc1.2.4.L123 3197k video only
5193 mp4 2560x1440 24 │ ~ 1.04GiB 5194k m3u8 │ hvc1.2.4.L150 5194k video only
12033 mp4 3840x2160 24 │ ~ 2.41GiB 12033k m3u8 │ hvc1.2.4.L153 12033k video only
[download] Downloading item 3 of 90
[Nebula] Extracting URL: https://nebula.tv/videos/johnnyharris-how-youtubers-are-escaping-putins-censorship-machine/
[Nebula] johnnyharris-how-youtubers-are-escaping-putins-censorship-machine: Fetching video meta data
[Nebula] johnnyharris-how-youtubers-are-escaping-putins-censorship-machine: Fetching video stream info
[Nebula] johnnyharris-how-youtubers-are-escaping-putins-censorship-machine: Downloading m3u8 information |
yt_dlp/extractor/nebula.py
Outdated
first_api_response = self._call_nebula_api( | ||
f'https://content.watchnebula.com/video/channels/{collection_id}/', collection_id, | ||
auth_type='bearer', note='Retrieving channel') | ||
|
||
page_api_response = first_api_response | ||
|
||
def page_func(page_num): | ||
nonlocal page_api_response | ||
|
||
if page_num > 0: | ||
next_url = page_api_response['episodes'].get('next') | ||
if not next_url: | ||
return [] | ||
|
||
page_api_response = self._call_nebula_api( | ||
next_url, collection_id, auth_type='bearer', note=f'Downloading page {page_num}') | ||
|
||
return [self.url_result(episode['share_url'], ie=NebulaIE, **self._extract_video_metadata(episode)) | ||
for episode in page_api_response['episodes']['results']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is completely wrong. The point of PagedList
is that the pages can be requested out of order (e.g. -I 100
). You cannot assume page_api_response
is the response from previous page. Since the pagination happens using next
key, this seems unsuitable for a PagedList and should just be a generator like original code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of PagedList is that the pages can be requested out of order
It would be great to add it to the class documentation; it 's not obvious, at least for me
https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/utils.py#L2917-L2921
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine InAdvancePagedList is similar, so better in the base class. Something like this:
class PagedList:
"""
Entries generator for Offset-based pagination APIs.
e.g. https://api.example.com/v2/videos?offset=200&limit=50
NOTE: This should not be used with Token-based pagination APIs,
because it's essential being able to retrieve a specific page
without having to request any previous one.
"""
I reverted all changes unrelated to the original PR. I will open a new PR fixing |
ee280c7
to
7aeda6c
Compare
Closes yt-dlp#4300, Closes yt-dlp#5814, Closes yt-dlp#7588, Closes yt-dlp#6334, Closes yt-dlp#6538 Authored by: elyse0, pukkandan, seproDev Co-authored-by: Elyse <26639800+elyse0@users.noreply.github.com> Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information
As the linked issues describes, some Nebula videos can be watched without an account. The extractor has been updated to reflect that the app always requests a guest Bearer token, but to get this token we don't need to be authenticated.
I don't have a premium account, but maybe @hheimbuerger or someone else can test it with pay-walled videos just to make sure it works.
Fixes #4300
Downloading public video
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?