Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] comments are not downloading #9358

Open
11 tasks done
alekskor1063 opened this issue Mar 4, 2024 · 12 comments · May be fixed by #9775
Open
11 tasks done

[YouTube] comments are not downloading #9358

alekskor1063 opened this issue Mar 4, 2024 · 12 comments · May be fixed by #9775
Labels
site-bug Issue with a specific website

Comments

@alekskor1063
Copy link

alekskor1063 commented Mar 4, 2024

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Region

RU

Provide a description that is worded well enough to be understood

  • This bug can be fixed by disabling cookies or using cookies from another channel.
  • This bug can be reproduced locally.
  • Comments section in browser is the same as it was.
  • The first occurence of this was between 23 and 27 Feb.

My thoughts:

  • This can be somehow related to the ReVanced shadowban.
  • ...or some kind of A/B testing.

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out UTF-8 (No ANSI), error UTF-8 (No ANSI), screen UTF-8 (No ANSI)
[debug] yt-dlp version stable@2023.12.30 from yt-dlp/yt-dlp [f10589e34] (pip) API
[debug] params: {'getcomments': True, 'cookiefile': './cookies_old.txt', 'windowsfilenames': True, 'concurrent_fragment_downloads': 4, 'extract_flat': 'discard_in_playlist', 'fragment_retries': 10, 'retries': 10, 'verbose': True, 'ignoreerrors': 'only_download', 'postprocessors': [{'key': 'FFmpegConcat', 'only_multi_video': True, 'when': 'playlist'}], 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.41 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.10.12 (CPython x86_64 64bit) - Linux-6.1.58+-x86_64-with-glibc2.35 (OpenSSL 3.0.2 15 Mar 2022, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2
[debug] Optional libraries: Cryptodome-3.20.0, brotli-1.1.0, certifi-2024.02.02, mutagen-1.47.0, requests-2.31.0, secretstorage-3.3.1, sqlite3-3.37.2, urllib3-2.0.7, websockets-12.0
[debug] Proxy map: {'colab_language_server': '/usr/colab/bin/language_service'}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1798 extractors
[youtube] Extracting URL: https://www.youtube.com/watch?v=WO7pMcKvC8A
[youtube] WO7pMcKvC8A: Downloading webpage
[debug] [youtube] Extracted SAPISID cookie
[youtube] WO7pMcKvC8A: Downloading ios player API JSON
[youtube] WO7pMcKvC8A: Downloading android player API JSON
[debug] [youtube] Extracting signature function js_95cde7ed_109
[debug] Loading youtube-sigfuncs.js_95cde7ed_109 from cache
[debug] Loading youtube-nsig.95cde7ed from cache
[debug] [youtube] Decrypted nsig qpzHlXS-9qKluqYa => f4q2n6Tx1UdeCA
[youtube] WO7pMcKvC8A: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
[youtube] Downloading comment section API JSON
[youtube] Downloading ~53 comments
[youtube] Sorting comments by newest first
[youtube] Downloading comment API JSON page 1 (0/~53)
[youtube] Downloading comment API JSON page 2 (0/~53)
[youtube] Downloading comment API JSON page 3 (0/~53)
[youtube] Extracted 0 comments
[debug] Default format spec: bestvideo*+bestaudio/best
[info] WO7pMcKvC8A: Downloading 1 format(s): 303+251
[download] ATTLAS & Mango - Over The Water [Monstercat Release] [WO7pMcKvC8A].webm has already been downloaded
@alekskor1063 alekskor1063 added site-bug Issue with a specific website triage Untriaged issue labels Mar 4, 2024
@bashonly

This comment was marked as outdated.

@bashonly bashonly added incomplete Further information is needed account-needed Account details are needed to test/fix this labels Mar 4, 2024
@absidue
Copy link

absidue commented Mar 4, 2024

This is probably caused by the A/B test that YouTube is doing in the comments, they are switching from commentRenderers to commentViewModels. The commentViewModels barely contain any usable information, instead they have a bunch of keys that reference mutations in the frameworkUpdates.entityBatchUpdate.mutations part of the response.

Also @bashonly if the A/B test is the cause of the error then this can definitely be reproduced without an account, you just need to get visitor data with a visitor ID that has the A/B test.

@bashonly bashonly removed account-needed Account details are needed to test/fix this incomplete Further information is needed labels Mar 4, 2024
@alekskor1063
Copy link
Author

alekskor1063 commented Mar 4, 2024

@bashonly
Entware:

[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2023.12.30 from yt-dlp/yt-dlp [f10589e34] (pip) API
[debug] params: {'getcomments': True, 'writeinfojson': True, 'cookiefile': './cookies_old.txt', 'windowsfilenames': True, 'concurrent_fragment_downloads': 4, 'extract_flat': 'discard_in_playlist', 'fragment_retries': 10, 'retries': 10, 'verbose': True, 'ignoreerrors': 'only_download', 'postprocessors': [{'key': 'FFmpegConcat', 'only_multi_video': True, 'when': 'playlist'}], 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.15 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.11.4 (CPython mips 32bit) - Linux-4.9-ndm-5-mips-with-glibc2.27 (OpenSSL 3.0.10 1 Aug 2023, glibc 2.27)
[debug] exe versions: ffmpeg 5.1.2 (setts), ffprobe 5.1.2
[debug] Optional libraries: Cryptodome-3.19.0, brotli-1.1.0, certifi-2023.07.22, mutagen-1.47.0, requests-2.31.0, sqlite3-3.41.2, urllib3-2.0.7, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1798 extractors
[youtube] Extracting URL: https://www.youtube.com/watch?v=WO7pMcKvC8A
[youtube] WO7pMcKvC8A: Downloading webpage
[debug] [youtube] Extracted SAPISID cookie
[youtube] WO7pMcKvC8A: Downloading ios player API JSON
[youtube] WO7pMcKvC8A: Downloading android player API JSON
[debug] [youtube] Extracting signature function js_31eb286a_101
[debug] Loading youtube-sigfuncs.js_31eb286a_101 from cache
[debug] Loading youtube-nsig.31eb286a from cache
[debug] [youtube] Decrypted nsig NBLEs1t0svsZQy57V => 1tbxFdEOnSzCIw
[debug] [youtube] Extracting signature function js_31eb286a_105
[debug] Loading youtube-sigfuncs.js_31eb286a_105 from cache
[youtube] WO7pMcKvC8A: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
[youtube] Downloading comment section API JSON
[youtube] Downloading ~53 comments
[youtube] Sorting comments by newest first
[youtube] Downloading comment API JSON page 1 (0/~53)
[youtube] Downloading comment API JSON page 2 (0/~53)
[youtube] Downloading comment API JSON page 3 (0/~53)
[youtube] Extracted 0 comments
[debug] Default format spec: bestvideo*+bestaudio/best
[info] WO7pMcKvC8A: Downloading 1 format(s): 303+251
[info] Writing video metadata as JSON to: ATTLAS & Mango - Over The Water [Monstercat Release] [WO7pMcKvC8A].info.json
[download] ATTLAS & Mango - Over The Water [Monstercat Release] [WO7pMcKvC8A].webm has already been downloaded

Windows (PC where comments were correct):

[debug] Encodings: locale cp1251, fs utf-8, pref cp1251, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2023.12.30 from yt-dlp/yt-dlp [f10589e34] (pip) API
[debug] params: {'getcomments': True, 'writeinfojson': True, 'cookiefile': './cookies_old.txt', 'windowsfilenames': True, 'concurrent_fragment_downloads': 4, 'extract_flat': 'discard_in_playlist', 'fragment_retries': 10, 'retries': 10, 'verbose': True, 'ignoreerrors': 'only_download', 'postprocessors': [{'key': 'FFmpegConcat', 'only_multi_video': True, 'when': 'playlist'}], 'compat_opts': set(), 'http_headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.70 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-us,en;q=0.5', 'Sec-Fetch-Mode': 'navigate'}}
[debug] Python 3.11.7 (CPython AMD64 64bit) - Windows-10-10.0.22631-SP0 (OpenSSL 3.0.13 30 Jan 2024)
[debug] exe versions: ffmpeg 4.2.2, ffprobe 4.2.2
[debug] Optional libraries: Cryptodome-3.19.0, brotli-1.1.0, certifi-2024.02.02, mutagen-1.47.0, requests-2.31.0, sqlite3-3.41.2, urllib3-2.0.7, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1798 extractors
[youtube] Extracting URL: https://www.youtube.com/watch?v=WO7pMcKvC8A
[youtube] WO7pMcKvC8A: Downloading webpage
[debug] [youtube] Extracted SAPISID cookie
[youtube] WO7pMcKvC8A: Downloading ios player API JSON
[youtube] WO7pMcKvC8A: Downloading android player API JSON
[debug] [youtube] Extracting signature function js_31eb286a_105
[debug] Loading youtube-sigfuncs.js_31eb286a_105 from cache
[debug] Loading youtube-nsig.31eb286a from cache
[debug] [youtube] Decrypted nsig d6RktdRv3wwXTqEEE => 2PbMJnbQMpluTQ
[youtube] WO7pMcKvC8A: Downloading m3u8 information
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id
[youtube] Downloading comment section API JSON
[youtube] Downloading ~53 comments
[youtube] Sorting comments by newest first
[youtube] Downloading comment API JSON page 1 (0/~53)
[youtube] Downloading comment API JSON page 2 (0/~53)
[youtube] Downloading comment API JSON page 3 (0/~53)
[youtube] Extracted 0 comments
[debug] Default format spec: bestvideo*+bestaudio/best
[info] WO7pMcKvC8A: Downloading 1 format(s): 303+251
[info] Writing video metadata as JSON to: ATTLAS & Mango - Over The Water [Monstercat Release] [WO7pMcKvC8A].info.json
[download] ATTLAS & Mango - Over The Water [Monstercat Release] [WO7pMcKvC8A].webm has already been downloaded

@pukkandan
Copy link
Member

pukkandan commented Mar 4, 2024

@alekskor1063 Can you add --write-pages to the yt-dlp command and post the resulting dump files from a failed run? The files may contain some sensitive info if you are passing cookies, so feel free to send it privately

Contact: discord (pukkandan#4207) / email (pukkandan.ytdlp@gmail.com)

cc @coletdjnz

@pukkandan pukkandan removed the triage Untriaged issue label Mar 4, 2024
@alekskor1063
Copy link
Author

@pukkandan I sent them through Discord.

@bashonly bashonly changed the title YouTube: comments are not downloading if cookies from a certain channel are in use [YouTube] comments are not downloading Mar 30, 2024
@themodfather360
Copy link

themodfather360 commented Apr 8, 2024

This is probably caused by the A/B test that YouTube is doing in the comments, they are switching from commentRenderers to commentViewModels. The commentViewModels barely contain any usable information, instead they have a bunch of keys that reference mutations in the frameworkUpdates.entityBatchUpdate.mutations part of the response.

Also @bashonly if the A/B test is the cause of the error then this can definitely be reproduced without an account, you just need to get visitor data with a visitor ID that has the A/B test.

Looks like I'm getting nothing but the failures now, so maybe the A/B testing is over for me.
Is there any way to proceed with the commentViewModel? How can I use the keys? From frameworkUpdates.entityBatchUpdate.mutations

@shoxie007
Copy link

Looks like I'm getting nothing but the failures now, so maybe the A/B testing is over for me. Is there any way to proceed with the commentViewModel? How can I use the keys? From frameworkUpdates.entityBatchUpdate.mutations

There's nothing you can do when using yt-dlp.

The functions _extract_comment and _comment_entries in youtube.py have to be updated or re-written:
https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/extractor/youtube.py#L3310
https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/extractor/youtube.py#L3359

This is because - as you pointed out - Youtube has moved the comments payload to path frameworkUpdates.entityBatchUpdate.mutations in the JSON response.

I'm working on it. Hopefully, within the next 2-3 days, I'll post a pull request or maybe give someone else the code I write up. This issue is affecting me as well.

@shoxie007
Copy link

I'm working on it. Hopefully, within the next 2-3 days, I'll post a pull request or maybe give someone else the code I write up. This issue is affecting me as well.

Update: Still working on it. Hopefully will be done in another 2-3 days. A complication with Youtube means making temporary accommodations in the code which will be discarded at a later date (after Youtube has finally made up its mind what model to use in its JSON responses.) @themodfather360 had reported above that the commentRenderer model is no longer in use in the JSON responses from Youtube. But I've encountered otherwise. Youtube still uses the commentRenderer model for some videos, sometimes even for the same video, alternating between commentViewModel and commentRenderer from one moment to the next.

@Saiapatsu
Copy link

There's a (so-far) unmerged pull request for Invidious that adds support for the new comment format:
iv-org/invidious#4566
iv-org/invidious#4576
This issue also links a similar patch for youtube-comment-downloader

@minamotorin
Copy link

# This patch is public domain (CC0).
diff --git a/yt_dlp/extractor/youtube.py b/yt_dlp/extractor/youtube.py
--- a/yt_dlp/extractor/youtube.py
+++ b/yt_dlp/extractor/youtube.py
@@ -3307,23 +3307,22 @@ def _extract_heatmap(self, data):
                 'value': ('intensityScoreNormalized', {float_or_none}),
             })) or None
 
-    def _extract_comment(self, comment_renderer, parent=None):
-        comment_id = comment_renderer.get('commentId')
-        if not comment_id:
-            return
+    def _extract_comment(self, view_model, entity, parent=None):
+        entity_payload = entity['payload']['commentEntityPayload']
+        comment_id = entity_payload.get('properties').get('commentId')
 
         info = {
             'id': comment_id,
-            'text': self._get_text(comment_renderer, 'contentText'),
-            'like_count': self._get_count(comment_renderer, 'voteCount'),
-            'author_id': traverse_obj(comment_renderer, ('authorEndpoint', 'browseEndpoint', 'browseId', {self.ucid_or_none})),
-            'author': self._get_text(comment_renderer, 'authorText'),
-            'author_thumbnail': traverse_obj(comment_renderer, ('authorThumbnail', 'thumbnails', -1, 'url', {url_or_none})),
+            'text': self._get_text(entity_payload, ('properties', 'content', 'contetn')),
+            'like_count': self._get_count(entity_payload, ('toolbar', 'likeCountNotliked')),
+            'author_id': traverse_obj(entity_payload, ('author', 'channelId', {self.ucid_or_none})),
+            'author': self._get_text(entity_payload, ('author', 'displayName')),
+            'author_thumbnail': traverse_obj(entity_payload, ('author', 'avatarThumbnailUrl', {url_or_none})),
             'parent': parent or 'root',
         }
 
         # Timestamp is an estimate calculated from the current time and time_text
-        time_text = self._get_text(comment_renderer, 'publishedTimeText') or ''
+        time_text = self._get_text(entity_payload, ('properties', 'publishedTime')) or ''
         timestamp = self._parse_time_text(time_text)
 
         info.update({
@@ -3333,25 +3332,23 @@ def _extract_comment(self, comment_renderer, parent=None):
         })
 
         info['author_url'] = urljoin(
-            'https://www.youtube.com', traverse_obj(comment_renderer, ('authorEndpoint', (
-                ('browseEndpoint', 'canonicalBaseUrl'), ('commandMetadata', 'webCommandMetadata', 'url'))),
+            'https://www.youtube.com', traverse_obj(entity_payload,
+                ('author', 'channelCommand', 'innertubeCommand', 'browseEndpoint', 'canonicalBaseUrl'),
                 expected_type=str, get_all=False))
 
-        author_is_uploader = traverse_obj(comment_renderer, 'authorIsChannelOwner')
+        author_is_uploader = traverse_obj(entity_payload, ('author', 'isCreator'))
         if author_is_uploader is not None:
             info['author_is_uploader'] = author_is_uploader
 
         comment_abr = traverse_obj(
-            comment_renderer, ('actionButtons', 'commentActionButtonsRenderer'), expected_type=dict)
+            entity, ('payload', 'engagementToolbarStateEntityPayload', 'heartState'), expected_type=str)
         if comment_abr is not None:
-            info['is_favorited'] = 'creatorHeart' in comment_abr
+            info['is_favorited'] = comment_abr == 'TOOLBAR_HEART_STATE_HEARTED'
 
-        badges = self._extract_badges([traverse_obj(comment_renderer, 'authorCommentBadge')])
-        if self._has_badge(badges, BadgeType.VERIFIED):
-            info['author_is_verified'] = True
+        info['author_is_verified'] = traverse_obj(entity_payload, ('author', 'isVerified')) == 'true'
 
-        is_pinned = traverse_obj(comment_renderer, 'pinnedCommentBadge')
-        if is_pinned:
+        pinned_text = traverse_obj(view_model, 'pinnedText')
+        if pinned_text:
             info['is_pinned'] = True
 
         return info
@@ -3388,21 +3385,25 @@ def extract_header(contents):
                 break
             return _continuation
 
-        def extract_thread(contents):
+        def extract_thread(contents, entity_payloads):
             if not parent:
                 tracker['current_page_thread'] = 0
             for content in contents:
                 if not parent and tracker['total_parent_comments'] >= max_parents:
                     yield
                 comment_thread_renderer = try_get(content, lambda x: x['commentThreadRenderer'])
-                comment_renderer = get_first(
-                    (comment_thread_renderer, content), [['commentRenderer', ('comment', 'commentRenderer')]],
-                    expected_type=dict, default={})
-
-                comment = self._extract_comment(comment_renderer, parent)
-                if not comment:
+                view_model = traverse_obj(comment_thread_renderer, ('commentViewModel', 'commentViewModel'))
+                if not view_model:
+                  view_model = content.get('commentViewModel')
+                if not view_model:
                     continue
-                comment_id = comment['id']
+                comment_id = view_model['commentId']
+                for entity in entity_payloads:
+                    if traverse_obj(entity, ('payload', 'commentEntityPayload', 'properties', 'commentId')) == comment_id:
+                        entity = entity
+                        break
+
+                comment = self._extract_comment(view_model, entity, parent)
                 if comment.get('is_pinned'):
                     tracker['pinned_comment_ids'].add(comment_id)
                 # Sometimes YouTube may break and give us infinite looping comments.
@@ -3495,7 +3496,7 @@ def extract_thread(contents):
             check_get_keys = None
             if not is_forced_continuation and not (tracker['est_total'] == 0 and tracker['running_total'] == 0):
                 check_get_keys = [[*continuation_items_path, ..., (
-                    'commentsHeaderRenderer' if is_first_continuation else ('commentThreadRenderer', 'commentRenderer'))]]
+                    'commentsHeaderRenderer' if is_first_continuation else ('commentThreadRenderer', 'commentViewModel'))]]
             try:
                 response = self._extract_response(
                     item_id=None, query=continuation,
@@ -3527,7 +3528,7 @@ def extract_thread(contents):
                         break
                     continue
 
-                for entry in extract_thread(continuation_items):
+                for entry in extract_thread(continuation_items, response['frameworkUpdates']['entityBatchUpdate']['mutations']):
                     if not entry:
                         return
                     yield entry

This patch may work, but has not been tested enough.

@jakeogh
Copy link
Contributor

jakeogh commented Apr 23, 2024

Here is a branch with the patch from @minamotorin: https://github.com/jakeogh/yt-dlp/tree/youtube_comments_ab and additional fixes. It works for me, but needs more testing. It attempts to handle both the new and old comment format.

The patch has a typo:
'text': self._get_text(entity_payload, ('properties', 'content', 'contetn')),
should be:
'text': self._get_text(entity_payload, ('properties', 'content', 'content')),

The path is correct, entity_payload['properties']['content']['content'] returns the comment text, but _get_text() still returns nothing because it's passing a gettr to try_get() that looks for the key 'simpleText'. The "author": _get_text() lookup is similarly broken. These issues are fixed in my branch, but there may be reasons to modify _get_text() instead of accessing the dict directly. Other text fields still use the 'simpleText' key, 'title' for example.

@minamotorin
Copy link

minamotorin commented Apr 24, 2024

@jakeogh
Thanks! It seems that my patch contained some bugs.

By the way, 'likeCountNotLiked' is NOT a typo. likeCountLiked is used when the user click “like button”, and otherwise likeCountNotLiked is used. See #9775 (comment)

@jakeogh jakeogh linked a pull request Apr 24, 2024 that will close this issue
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website
Projects
Status: youtube other
Development

Successfully merging a pull request may close this issue.

9 participants