[extractor/nebula] Support public videos without an account and channel extractor improvements #6334

elyse0 · 2023-02-23T23:20:50Z

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

As the linked issues describes, some Nebula videos can be watched without an account. The extractor has been updated to reflect that the app always requests a guest Bearer token, but to get this token we don't need to be authenticated.

I don't have a premium account, but maybe @hheimbuerger or someone else can test it with pay-walled videos just to make sure it works.

Fixes #4300

Downloading public video

[debug] Command-line config: ['https://nebula.tv/videos/tldrnewseu-did-the-us-really-blow-up-the-nordstream-pipelines', '--verbose', '-F']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.02.17 [a0a7c0154] (source)
[debug] Lazy loading extractors is disabled
[debug] Git HEAD: 41432863b
[debug] Python 3.10.7 (CPython x86_64 64bit) - Linux-5.19.0-31-generic-x86_64-with-glibc2.36 (OpenSSL 3.0.5 5 Jul 2022, glibc 2.36)
[debug] exe versions: ffmpeg N-108931-g4dda3b1653-20221104 (setts), ffprobe N-108931-g4dda3b1653-20221104, phantomjs broken, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.15.0, certifi-2022.09.24, mutagen-1.46.0, sqlite3-2.6.0, websockets-10.4
[debug] Proxy map: {}
[debug] Loaded 1782 extractors
[Nebula] Authorizing to Nebula
[Nebula] Extracting URL: https://nebula.tv/videos/tldrnewseu-did-the-us-really-blow-up-the-nordstream-pipelines
[Nebula] tldrnewseu-did-the-us-really-blow-up-the-nordstream-pipelines: Fetching video meta data
[Nebula] tldrnewseu-did-the-us-really-blow-up-the-nordstream-pipelines: Fetching video stream info
[Nebula] tldrnewseu-did-the-us-really-blow-up-the-nordstream-pipelines: Downloading m3u8 information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for 63f64c74366fcd00017c1513:
ID            EXT  RESOLUTION FPS │   FILESIZE   TBR PROTO │ VCODEC          VBR ACODEC     MORE INFO
─────────────────────────────────────────────────────────────────────────────────────────────────────────
audio-Unknown m3u8 audio only     │                  m3u8  │ audio only          unknown    [und] Unknown
456           mp4  640x360     30 │ ~ 29.19MiB  456k m3u8  │ avc1.42C01E    456k video only
815           mp4  960x540     30 │ ~ 52.16MiB  815k m3u8  │ avc1.42C028    815k video only
1361          mp4  1280x720    30 │ ~ 87.11MiB 1362k m3u8  │ avc1.640029   1362k video only
1273          mp4  1920x1080   30 │ ~ 81.45MiB 1273k m3u8  │ hvc1.2.4.L123 1273k video only
2008          mp4  2560x1440   30 │ ~128.48MiB 2009k m3u8  │ hvc1.2.4.L150 2009k video only
4419          mp4  3840x2160   30 │ ~282.66MiB 4419k m3u8  │ hvc1.2.4.L153 4419k video only

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

hheimbuerger · 2023-02-25T12:18:26Z

I ran a quick test (downloading a channel) with a subscription token and everything's still A-OK. ✅

That said, I find this approach slightly problematic. By (potentially repeatedly) exploiting their "first video's on us, then you need a subscription" policy, we're basically circumventing the requirement for a subscription entirely. Or am I missing something, that allows you to do this only once? (But once per... what?)

You mention that "some Nebula videos can be watched without an account", but is it really video-specific? Could you link to a video that can not be watched without an account — or even better: create a test case, so we can see what the UI message is and how it will be handled?

Admittedly, they didn't put effective measures in place to stop this. So I guess they'll do that once they hear about this, and then this feature will be gone again.

pukkandan · 2023-02-25T14:19:04Z

That said, I find this approach slightly problematic. By (potentially repeatedly) exploiting their "first video's on us, then you need a subscription" policy, we're basically circumventing the requirement for a subscription entirely. Or am I missing something, that allows you to do this only once? (But once per... what?)

Once per "browser session" according to their implementation. It is fair game for us to use that imo. As it stands, anyone with basic knowledge of cookies can easily circumvent it. If they wanted better restrictions, they should implement it per-IP etc

elyse0 · 2023-02-25T19:16:00Z

By (potentially repeatedly) exploiting their "first video's on us, then you need a subscription" policy, we're basically circumventing the requirement for a subscription entirely.

Yep, it's video specific. For example this channel (https://nebula.tv/johnnyharris) publishes his next video (Unemployment, Explained) one week early, exclusively on Nebula.

Some other premium-only are:

In general if the video has a blue cross or a flash icon, a subscription is required

On a side note, yt-dlp "https://nebula.tv/johnnyharris" --flat-playlist --skip-download --print "url,title" is not working, probably I'll look into it

hheimbuerger · 2023-02-25T19:58:41Z

On a side note, yt-dlp "https://nebula.tv/johnnyharris" --flat-playlist --skip-download --print "url,title" is not working, probably I'll look into it

I think that's a known issue and pukkandan has already started working on it, see #5979 (comment). You might wanna contribute to that, as opposed to starting over.

Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>

elyse0 · 2023-02-25T20:49:07Z

Log of updated NebulaChannelIE

[debug] Command-line config: ['https://nebula.tv/johnnyharris', '--verbose', '-F']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.02.17 [a0a7c0154] (source)
[debug] Lazy loading extractors is disabled
[debug] Git HEAD: 0b85236b4
[debug] Python 3.10.7 (CPython x86_64 64bit) - Linux-5.19.0-31-generic-x86_64-with-glibc2.36 (OpenSSL 3.0.5 5 Jul 2022, glibc 2.36)
[debug] exe versions: ffmpeg N-108931-g4dda3b1653-20221104 (setts), ffprobe N-108931-g4dda3b1653-20221104, phantomjs broken, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.15.0, certifi-2022.09.24, mutagen-1.46.0, sqlite3-2.6.0, websockets-10.4
[debug] Proxy map: {}
[debug] Loaded 1782 extractors
[nebula:channel] Authorizing to Nebula
[nebula:channel] Extracting URL: https://nebula.tv/johnnyharris
[nebula:channel] johnnyharris: Retrieving channel
[download] Downloading playlist: Johnny Harris
[nebula:channel] johnnyharris: Downloading page 1
[nebula:channel] johnnyharris: Downloading page 2
[nebula:channel] johnnyharris: Downloading page 3
[nebula:channel] johnnyharris: Downloading page 4
[nebula:channel] Playlist Johnny Harris: Downloading 90 items of 90
[download] Downloading item 1 of 90
[Nebula] Authorizing to Nebula
[Nebula] Extracting URL: https://nebula.tv/videos/johnnyharris-unemployment-explained/
[Nebula] johnnyharris-unemployment-explained: Fetching video meta data
[Nebula] johnnyharris-unemployment-explained: Fetching video stream info
[Nebula] Reauthenticating to Nebula and retrying, because last bearer call resulted in error 401
ERROR: [Nebula] johnnyharris-unemployment-explained: This video is only available for registered users. Use --username and --password, or --netrc (watchnebula) to provide account credentials
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/common.py", line 693, in extract
    ie_result = self._real_extract(url)
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 238, in _real_extract
    return self._build_video_info(video)
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 111, in _build_video_info
    fmts, subs = self._fetch_video_formats(episode['slug'])
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 77, in _fetch_video_formats
    stream_info = self._call_nebula_api(f'https://content.watchnebula.com/video/{slug}/stream/',
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 62, in _call_nebula_api
    self._perform_login()
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 120, in _perform_login
    self._nebula_api_token = self._perform_nebula_auth(username, password)
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 27, in _perform_nebula_auth
    self.raise_login_required(method='password')
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/common.py", line 1153, in raise_login_required
    raise ExtractorError(msg, expected=True)

[download] Downloading item 2 of 90
[Nebula] Extracting URL: https://nebula.tv/videos/johnnyharris-milk-is-a-lie/
[Nebula] johnnyharris-milk-is-a-lie: Fetching video meta data
[Nebula] johnnyharris-milk-is-a-lie: Fetching video stream info
[Nebula] johnnyharris-milk-is-a-lie: Downloading m3u8 information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for 63d91620a645210001b587e5:
ID            EXT  RESOLUTION FPS │   FILESIZE    TBR PROTO │ VCODEC           VBR ACODEC     MORE INFO
──────────────────────────────────────────────────────────────────────────────────────────────────────────
audio-English m3u8 audio only     │                   m3u8  │ audio only           unknown    [en] English
606           mp4  640x360     24 │ ~124.51MiB   606k m3u8  │ avc1.42C01E     606k video only
1359          mp4  960x540     24 │ ~279.16MiB  1360k m3u8  │ avc1.42C028    1360k video only
3072          mp4  1280x720    24 │ ~630.80MiB  3072k m3u8  │ avc1.640029    3072k video only
3197          mp4  1920x1080   24 │ ~656.44MiB  3197k m3u8  │ hvc1.2.4.L123  3197k video only
5193          mp4  2560x1440   24 │ ~  1.04GiB  5194k m3u8  │ hvc1.2.4.L150  5194k video only
12033         mp4  3840x2160   24 │ ~  2.41GiB 12033k m3u8  │ hvc1.2.4.L153 12033k video only
[download] Downloading item 3 of 90
[Nebula] Extracting URL: https://nebula.tv/videos/johnnyharris-how-youtubers-are-escaping-putins-censorship-machine/
[Nebula] johnnyharris-how-youtubers-are-escaping-putins-censorship-machine: Fetching video meta data
[Nebula] johnnyharris-how-youtubers-are-escaping-putins-censorship-machine: Fetching video stream info
[Nebula] johnnyharris-how-youtubers-are-escaping-putins-censorship-machine: Downloading m3u8 information

pukkandan · 2023-03-14T13:04:02Z

yt_dlp/extractor/nebula.py

+        first_api_response = self._call_nebula_api(
+            f'https://content.watchnebula.com/video/channels/{collection_id}/', collection_id,
+            auth_type='bearer', note='Retrieving channel')
+
+        page_api_response = first_api_response
+
+        def page_func(page_num):
+            nonlocal page_api_response
+
+            if page_num > 0:
+                next_url = page_api_response['episodes'].get('next')
+                if not next_url:
+                    return []
+
+                page_api_response = self._call_nebula_api(
+                    next_url, collection_id, auth_type='bearer', note=f'Downloading page {page_num}')
+
+            return [self.url_result(episode['share_url'], ie=NebulaIE, **self._extract_video_metadata(episode))
+                    for episode in page_api_response['episodes']['results']]


This is completely wrong. The point of PagedList is that the pages can be requested out of order (e.g. -I 100). You cannot assume page_api_response is the response from previous page. Since the pagination happens using next key, this seems unsuitable for a PagedList and should just be a generator like original code

The point of PagedList is that the pages can be requested out of order

It would be great to add it to the class documentation; it 's not obvious, at least for me

https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/utils.py#L2917-L2921

Any specific suggestion?

I imagine InAdvancePagedList is similar, so better in the base class. Something like this:

class PagedList: """ Entries generator for Offset-based pagination APIs. e.g. https://api.example.com/v2/videos?offset=200&limit=50 NOTE: This should not be used with Token-based pagination APIs, because it's essential being able to retrieve a specific page without having to request any previous one. """

pukkandan · 2023-03-14T14:16:03Z

I reverted all changes unrelated to the original PR. I will open a new PR fixing --flat-playlist. This can be merged once @hheimbuerger confirms login is not broken

Closes yt-dlp#4300, Closes yt-dlp#5814, Closes yt-dlp#7588, Closes yt-dlp#6334, Closes yt-dlp#6538 Authored by: elyse0, pukkandan, seproDev Co-authored-by: Elyse <26639800+elyse0@users.noreply.github.com> Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>

elyse0 added 2 commits February 23, 2023 17:01

Make metadata extraction less fatal

22fc799

Support downloading public videos

4143286

pukkandan added the site-enhancement Feature request for some website label Feb 24, 2023

elyse0 and others added 4 commits February 25, 2023 14:22

Remove duplicate key

8b578b7

Copy _extract_video_metadata from pukkandan/yt-dlp-dev/nebula-flat

b623963

Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>

Improve channel extraction

40d900f

Add new NebulaChannel test

0b85236

elyse0 changed the title ~~[extractor/nebula] Support downloading public videos without an account~~ [extractor/nebula] Support public videos without an account and channel extractor improvements Feb 25, 2023

pukkandan reviewed Mar 14, 2023

View reviewed changes

pukkandan added 2 commits March 14, 2023 19:27

Revert unrelated changes

9e9d22f

cleanup

598490a

pukkandan added the needs-testing Patch needs testing label Mar 14, 2023

pukkandan mentioned this pull request Mar 14, 2023

[extractor/nebula] Refactor #6538

Closed

pukkandan force-pushed the master branch 2 times, most recently from ee280c7 to 7aeda6c Compare May 24, 2023 18:09

seproDev mentioned this pull request Nov 11, 2023

[ie/nebula] Finish Refactor #8566

Merged

9 tasks

bashonly closed this in 45d82be Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[extractor/nebula] Support public videos without an account and channel extractor improvements #6334

[extractor/nebula] Support public videos without an account and channel extractor improvements #6334

elyse0 commented Feb 23, 2023

hheimbuerger commented Feb 25, 2023

pukkandan commented Feb 25, 2023

elyse0 commented Feb 25, 2023

hheimbuerger commented Feb 25, 2023 •

edited

elyse0 commented Feb 25, 2023

pukkandan Mar 14, 2023 •

edited

elyse0 Mar 14, 2023 •

edited

pukkandan Mar 14, 2023

elyse0 Mar 14, 2023

pukkandan commented Mar 14, 2023 •

edited

[extractor/nebula] Support public videos without an account and channel extractor improvements #6334

[extractor/nebula] Support public videos without an account and channel extractor improvements #6334

Conversation

elyse0 commented Feb 23, 2023

Description of your pull request and other information

Downloading public video

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

What is the purpose of your pull request?

hheimbuerger commented Feb 25, 2023

pukkandan commented Feb 25, 2023

elyse0 commented Feb 25, 2023

hheimbuerger commented Feb 25, 2023 • edited

elyse0 commented Feb 25, 2023

pukkandan Mar 14, 2023 • edited

Choose a reason for hiding this comment

elyse0 Mar 14, 2023 • edited

Choose a reason for hiding this comment

pukkandan Mar 14, 2023

Choose a reason for hiding this comment

elyse0 Mar 14, 2023

Choose a reason for hiding this comment

pukkandan commented Mar 14, 2023 • edited

hheimbuerger commented Feb 25, 2023 •

edited

pukkandan Mar 14, 2023 •

edited

elyse0 Mar 14, 2023 •

edited

pukkandan commented Mar 14, 2023 •

edited