Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extractor/nebula] Support public videos without an account and channel extractor improvements #6334

Closed
wants to merge 8 commits into from

Conversation

elyse0
Copy link
Contributor

@elyse0 elyse0 commented Feb 23, 2023

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

As the linked issues describes, some Nebula videos can be watched without an account. The extractor has been updated to reflect that the app always requests a guest Bearer token, but to get this token we don't need to be authenticated.

I don't have a premium account, but maybe @hheimbuerger or someone else can test it with pay-walled videos just to make sure it works.

Fixes #4300

Downloading public video

[debug] Command-line config: ['https://nebula.tv/videos/tldrnewseu-did-the-us-really-blow-up-the-nordstream-pipelines', '--verbose', '-F']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.02.17 [a0a7c0154] (source)
[debug] Lazy loading extractors is disabled
[debug] Git HEAD: 41432863b
[debug] Python 3.10.7 (CPython x86_64 64bit) - Linux-5.19.0-31-generic-x86_64-with-glibc2.36 (OpenSSL 3.0.5 5 Jul 2022, glibc 2.36)
[debug] exe versions: ffmpeg N-108931-g4dda3b1653-20221104 (setts), ffprobe N-108931-g4dda3b1653-20221104, phantomjs broken, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.15.0, certifi-2022.09.24, mutagen-1.46.0, sqlite3-2.6.0, websockets-10.4
[debug] Proxy map: {}
[debug] Loaded 1782 extractors
[Nebula] Authorizing to Nebula
[Nebula] Extracting URL: https://nebula.tv/videos/tldrnewseu-did-the-us-really-blow-up-the-nordstream-pipelines
[Nebula] tldrnewseu-did-the-us-really-blow-up-the-nordstream-pipelines: Fetching video meta data
[Nebula] tldrnewseu-did-the-us-really-blow-up-the-nordstream-pipelines: Fetching video stream info
[Nebula] tldrnewseu-did-the-us-really-blow-up-the-nordstream-pipelines: Downloading m3u8 information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for 63f64c74366fcd00017c1513:
ID            EXT  RESOLUTION FPS │   FILESIZE   TBR PROTO │ VCODEC          VBR ACODEC     MORE INFO
─────────────────────────────────────────────────────────────────────────────────────────────────────────
audio-Unknown m3u8 audio only     │                  m3u8  │ audio only          unknown    [und] Unknown
456           mp4  640x360     30 │ ~ 29.19MiB  456k m3u8  │ avc1.42C01E    456k video only
815           mp4  960x540     30 │ ~ 52.16MiB  815k m3u8  │ avc1.42C028    815k video only
1361          mp4  1280x720    30 │ ~ 87.11MiB 1362k m3u8  │ avc1.640029   1362k video only
1273          mp4  1920x1080   30 │ ~ 81.45MiB 1273k m3u8  │ hvc1.2.4.L123 1273k video only
2008          mp4  2560x1440   30 │ ~128.48MiB 2009k m3u8  │ hvc1.2.4.L150 2009k video only
4419          mp4  3840x2160   30 │ ~282.66MiB 4419k m3u8  │ hvc1.2.4.L153 4419k video only
Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

@pukkandan pukkandan added the site-enhancement Feature request for some website label Feb 24, 2023
@hheimbuerger
Copy link
Contributor

I ran a quick test (downloading a channel) with a subscription token and everything's still A-OK. ✅

That said, I find this approach slightly problematic. By (potentially repeatedly) exploiting their "first video's on us, then you need a subscription" policy, we're basically circumventing the requirement for a subscription entirely. Or am I missing something, that allows you to do this only once? (But once per... what?)

You mention that "some Nebula videos can be watched without an account", but is it really video-specific? Could you link to a video that can not be watched without an account — or even better: create a test case, so we can see what the UI message is and how it will be handled?

Admittedly, they didn't put effective measures in place to stop this. So I guess they'll do that once they hear about this, and then this feature will be gone again.

@pukkandan
Copy link
Member

That said, I find this approach slightly problematic. By (potentially repeatedly) exploiting their "first video's on us, then you need a subscription" policy, we're basically circumventing the requirement for a subscription entirely. Or am I missing something, that allows you to do this only once? (But once per... what?)

Once per "browser session" according to their implementation. It is fair game for us to use that imo. As it stands, anyone with basic knowledge of cookies can easily circumvent it. If they wanted better restrictions, they should implement it per-IP etc

@elyse0
Copy link
Contributor Author

elyse0 commented Feb 25, 2023

By (potentially repeatedly) exploiting their "first video's on us, then you need a subscription" policy, we're basically circumventing the requirement for a subscription entirely.

Yep, it's video specific. For example this channel (https://nebula.tv/johnnyharris) publishes his next video (Unemployment, Explained) one week early, exclusively on Nebula.

Some other premium-only are:

In general if the video has a blue cross or a flash icon, a subscription is required
image
image


On a side note, yt-dlp "https://nebula.tv/johnnyharris" --flat-playlist --skip-download --print "url,title" is not working, probably I'll look into it

@hheimbuerger
Copy link
Contributor

hheimbuerger commented Feb 25, 2023

On a side note, yt-dlp "https://nebula.tv/johnnyharris" --flat-playlist --skip-download --print "url,title" is not working, probably I'll look into it

I think that's a known issue and pukkandan has already started working on it, see #5979 (comment). You might wanna contribute to that, as opposed to starting over.

@elyse0 elyse0 changed the title [extractor/nebula] Support downloading public videos without an account [extractor/nebula] Support public videos without an account and channel extractor improvements Feb 25, 2023
@elyse0
Copy link
Contributor Author

elyse0 commented Feb 25, 2023

Log of updated NebulaChannelIE

[debug] Command-line config: ['https://nebula.tv/johnnyharris', '--verbose', '-F']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.02.17 [a0a7c0154] (source)
[debug] Lazy loading extractors is disabled
[debug] Git HEAD: 0b85236b4
[debug] Python 3.10.7 (CPython x86_64 64bit) - Linux-5.19.0-31-generic-x86_64-with-glibc2.36 (OpenSSL 3.0.5 5 Jul 2022, glibc 2.36)
[debug] exe versions: ffmpeg N-108931-g4dda3b1653-20221104 (setts), ffprobe N-108931-g4dda3b1653-20221104, phantomjs broken, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.15.0, certifi-2022.09.24, mutagen-1.46.0, sqlite3-2.6.0, websockets-10.4
[debug] Proxy map: {}
[debug] Loaded 1782 extractors
[nebula:channel] Authorizing to Nebula
[nebula:channel] Extracting URL: https://nebula.tv/johnnyharris
[nebula:channel] johnnyharris: Retrieving channel
[download] Downloading playlist: Johnny Harris
[nebula:channel] johnnyharris: Downloading page 1
[nebula:channel] johnnyharris: Downloading page 2
[nebula:channel] johnnyharris: Downloading page 3
[nebula:channel] johnnyharris: Downloading page 4
[nebula:channel] Playlist Johnny Harris: Downloading 90 items of 90
[download] Downloading item 1 of 90
[Nebula] Authorizing to Nebula
[Nebula] Extracting URL: https://nebula.tv/videos/johnnyharris-unemployment-explained/
[Nebula] johnnyharris-unemployment-explained: Fetching video meta data
[Nebula] johnnyharris-unemployment-explained: Fetching video stream info
[Nebula] Reauthenticating to Nebula and retrying, because last bearer call resulted in error 401
ERROR: [Nebula] johnnyharris-unemployment-explained: This video is only available for registered users. Use --username and --password, or --netrc (watchnebula) to provide account credentials
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/common.py", line 693, in extract
    ie_result = self._real_extract(url)
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 238, in _real_extract
    return self._build_video_info(video)
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 111, in _build_video_info
    fmts, subs = self._fetch_video_formats(episode['slug'])
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 77, in _fetch_video_formats
    stream_info = self._call_nebula_api(f'https://content.watchnebula.com/video/{slug}/stream/',
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 62, in _call_nebula_api
    self._perform_login()
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 120, in _perform_login
    self._nebula_api_token = self._perform_nebula_auth(username, password)
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/nebula.py", line 27, in _perform_nebula_auth
    self.raise_login_required(method='password')
  File "/home/amish/Documents/Programming/yt-dlp/yt_dlp/extractor/common.py", line 1153, in raise_login_required
    raise ExtractorError(msg, expected=True)

[download] Downloading item 2 of 90
[Nebula] Extracting URL: https://nebula.tv/videos/johnnyharris-milk-is-a-lie/
[Nebula] johnnyharris-milk-is-a-lie: Fetching video meta data
[Nebula] johnnyharris-milk-is-a-lie: Fetching video stream info
[Nebula] johnnyharris-milk-is-a-lie: Downloading m3u8 information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for 63d91620a645210001b587e5:
ID            EXT  RESOLUTION FPS │   FILESIZE    TBR PROTO │ VCODEC           VBR ACODEC     MORE INFO
──────────────────────────────────────────────────────────────────────────────────────────────────────────
audio-English m3u8 audio only     │                   m3u8  │ audio only           unknown    [en] English
606           mp4  640x360     24 │ ~124.51MiB   606k m3u8  │ avc1.42C01E     606k video only
1359          mp4  960x540     24 │ ~279.16MiB  1360k m3u8  │ avc1.42C028    1360k video only
3072          mp4  1280x720    24 │ ~630.80MiB  3072k m3u8  │ avc1.640029    3072k video only
3197          mp4  1920x1080   24 │ ~656.44MiB  3197k m3u8  │ hvc1.2.4.L123  3197k video only
5193          mp4  2560x1440   24 │ ~  1.04GiB  5194k m3u8  │ hvc1.2.4.L150  5194k video only
12033         mp4  3840x2160   24 │ ~  2.41GiB 12033k m3u8  │ hvc1.2.4.L153 12033k video only
[download] Downloading item 3 of 90
[Nebula] Extracting URL: https://nebula.tv/videos/johnnyharris-how-youtubers-are-escaping-putins-censorship-machine/
[Nebula] johnnyharris-how-youtubers-are-escaping-putins-censorship-machine: Fetching video meta data
[Nebula] johnnyharris-how-youtubers-are-escaping-putins-censorship-machine: Fetching video stream info
[Nebula] johnnyharris-how-youtubers-are-escaping-putins-censorship-machine: Downloading m3u8 information

Comment on lines 302 to 320
first_api_response = self._call_nebula_api(
f'https://content.watchnebula.com/video/channels/{collection_id}/', collection_id,
auth_type='bearer', note='Retrieving channel')

page_api_response = first_api_response

def page_func(page_num):
nonlocal page_api_response

if page_num > 0:
next_url = page_api_response['episodes'].get('next')
if not next_url:
return []

page_api_response = self._call_nebula_api(
next_url, collection_id, auth_type='bearer', note=f'Downloading page {page_num}')

return [self.url_result(episode['share_url'], ie=NebulaIE, **self._extract_video_metadata(episode))
for episode in page_api_response['episodes']['results']]
Copy link
Member

@pukkandan pukkandan Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is completely wrong. The point of PagedList is that the pages can be requested out of order (e.g. -I 100). You cannot assume page_api_response is the response from previous page. Since the pagination happens using next key, this seems unsuitable for a PagedList and should just be a generator like original code

Copy link
Contributor Author

@elyse0 elyse0 Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of PagedList is that the pages can be requested out of order

It would be great to add it to the class documentation; it 's not obvious, at least for me

https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/utils.py#L2917-L2921

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific suggestion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine InAdvancePagedList is similar, so better in the base class. Something like this:

class PagedList:
    """
        Entries generator for Offset-based pagination APIs.
        e.g. https://api.example.com/v2/videos?offset=200&limit=50

        NOTE: This should not be used with Token-based pagination APIs,
        because it's essential being able to retrieve a specific page
        without having to request any previous one.
    """

@pukkandan
Copy link
Member

pukkandan commented Mar 14, 2023

I reverted all changes unrelated to the original PR. I will open a new PR fixing --flat-playlist. This can be merged once @hheimbuerger confirms login is not broken

@pukkandan pukkandan added the needs-testing Patch needs testing label Mar 14, 2023
@pukkandan pukkandan force-pushed the master branch 2 times, most recently from ee280c7 to 7aeda6c Compare May 24, 2023 18:09
@seproDev seproDev mentioned this pull request Nov 11, 2023
9 tasks
@bashonly bashonly closed this in 45d82be Nov 20, 2023
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Closes yt-dlp#4300, Closes yt-dlp#5814, Closes yt-dlp#7588, Closes yt-dlp#6334, Closes yt-dlp#6538
Authored by: elyse0, pukkandan, seproDev

Co-authored-by: Elyse <26639800+elyse0@users.noreply.github.com>
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-testing Patch needs testing site-enhancement Feature request for some website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[nebula] support downloaded without account (using bearer token)
3 participants