Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[youtube] Redefine uploader metadata #6384

Merged
merged 19 commits into from Apr 14, 2023

Conversation

coletdjnz
Copy link
Member

@coletdjnz coletdjnz commented Mar 1, 2023

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

uploader metadata for videos, tabs, etc. is/was inconsistent and not well defined. With handles and the channel UCID url now being the two primary urls to a channel, this updates the metadata to reflect that.

New mapping will be strictly:

channel -> channel name
channel_id -> Only UCID
channel_url ->Only  UCID /channel url

uploader -> channel name (same as channel field)
uploader_id -> Only @handle
uploader_url - > Only @handle url 

Note this is slightly different than ytdl-org/youtube-dl@f7ce98a

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

@coletdjnz coletdjnz added the site-enhancement Feature request for some website label Mar 1, 2023
@pukkandan
Copy link
Member

cc @dirkf

yt_dlp/extractor/youtube.py Outdated Show resolved Hide resolved
yt_dlp/extractor/youtube.py Outdated Show resolved Hide resolved
yt_dlp/extractor/youtube.py Outdated Show resolved Hide resolved
yt_dlp/extractor/youtube.py Outdated Show resolved Hide resolved
yt_dlp/extractor/youtube.py Outdated Show resolved Hide resolved
yt_dlp/extractor/youtube.py Outdated Show resolved Hide resolved
@dirkf
Copy link
Contributor

dirkf commented Mar 1, 2023

I compared the extractor test-cases. Where yt-dlp specifies uploader metadata here, the yt-dl test-cases in the latest git master generally match if present. Here are the non-matching ones.

YoutubeIE:

--- yt-dl
+++ yt-dlp
 ...
             'url': 'OtqTfy26tG0',
 ...
-                'uploader': 'The Cinematic Orchestra - Topic',
+                'uploader': 'The Cinematic Orchestra',
 ...

I didn't change this result, which is Sergey's from 0cf09c2, though the current yt-dlp result looks more plausible.

I think these remaining differences are all results that I changed in ytdl-org/youtube-dl@f7ce98a.

YoutubePlaylistIE:

--- yt-dl
+++ yt-dlp
 ...
        'url': 'PLBB231211A4F62143',
 ...
-                'uploader_id': '@WickmanVT',
+                'uploader_id': 'UCKSpbfbl5kRQpTdL7kMc-1Q',
 ...
         'url': 'https://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
 ...
             'id': 'PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
 ...
-                'uploader_id': '@milan5503',
+                'uploader_id': 'UCEI1-PVPcYXjB73Hfelbmaw',
 ...
         'url': 'http://www.youtube.com/embed/_xDOZElKyNU?list=PLsyOSbh5bs16vubvKePAQ1x3PhKavfBIl',
 ...
-                'uploader_id': '@music_king',
+                'uploader_id': 'UC21nz3_MesPLqtDqwdvnoxA',
 ...

YoutubeYtBeIE:

--- yt-dl
+++ yt-dlp
 ...
         'url': 'https://youtu.be/yeWKywCrFtk?list=PL2qgrgXsNUG5ig9cat4ohreBjYLAPC0J5',
 ...
-                'uploader_id': '@backuspagemuseum',
+                'uploader_id': 'backuspagemuseum',
 ...

Also, ytdl-org/youtube-dl#31675 (comment).

coletdjnz and others added 9 commits March 11, 2023 05:24
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
@coletdjnz coletdjnz marked this pull request as ready for review March 11, 2023 07:43
@coletdjnz
Copy link
Member Author

coletdjnz commented Mar 11, 2023

TODO:

  • investigate channel/uploader name extraction (might be getting wrong one for topic channels)
  • handle extraction fallback for channel pages (ownerUrls)
  • notificationsIE (no handle info given)

@coletdjnz coletdjnz merged commit 7666b93 into yt-dlp:master Apr 14, 2023
11 checks passed
@coletdjnz coletdjnz deleted the feat/yt-uploader-channel-map branch April 14, 2023 07:58
@dirkf
Copy link
Contributor

dirkf commented Nov 14, 2023

Reviewing these tests for a new yt-dl extractor version, I find a contradiction between the channel ("Full name of the channel the video is uploaded on.") values extracted for test videos IB3lcPjvWLA and MgNrAu2pzNs (eg).

IB3lcPjvWLA: Afrojack (yt-dlp test value) vs AfrojackVEVO
MgNrAu2pzNs: Stephen (yt-dlp test value) vs Stephen - Topic

Currently the yt-dl extractor doesn't test these values because it just sets channel from uploader, if set.

The second test is commented TODO: should be "Stephen - Topic". If so, I believe that the corresponding value AfrojackVEVO must be correct for IB3lcPjvWLA, which would (FWIW) match yt-dl's current values and the header of the corresponding channel page in Invidious in each case.

[Separately, what are these - Topic suffixes?]

The first value is here (similar to where yt-dlp gets it)

traverse_obj(initial_data, (
    'contents', 'twoColumnWatchNextResults', 'results', 'results',
    'contents', ..., lambda k, _: k.endswith('SecondaryInfoRenderer'),
    ('owner', 'videoOwner'), 'videoOwnerRenderer', 'title'),
    get_all=False)

and here

traverse_obj(initial_data, (
    'engagementPanels', ..., 'engagementPanelSectionListRenderer',
    'content', 'structuredDescriptionContentRenderer', 'items', ...,
    'videoDescriptionHeaderRenderer', 'channel'),
    get_all=False)

Everywhere else, eg traverse_obj(microformats, (..., 'ownerChannelName')) and as in the linked post, matches the second value.

What should be the correct resolution of this TODO?

I expect a similar issue applies to tab extraction of video data for playlists. The channel value extracted there should be consistent.

@coletdjnz
Copy link
Member Author

An issue should be raised for this.

The topic suffix is correct unless YouTube has changed the channel. It's a topic channel which are quite messy to deal with.

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
)

New mapping:
```
channel -> channel name
channel_id -> UCID
channel_url -> UCID channel url

uploader -> channel name (same as channel field)
uploader_id -> @handle
uploader_url -> @handle channel url 
```

Authored by: coletdjnz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-enhancement Feature request for some website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants