Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weibo broken #8445

Closed
11 tasks done
snwefly opened this issue Oct 27, 2023 · 2 comments · Fixed by #8463
Closed
11 tasks done

weibo broken #8445

snwefly opened this issue Oct 27, 2023 · 2 comments · Fixed by #8463
Labels
patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-bug Issue with a specific website

Comments

@snwefly
Copy link

snwefly commented Oct 27, 2023

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Region

Albania

Provide a description that is worded well enough to be understood

weibo broken

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['https://weibo.com/u/7573838980', '-v']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out cp1252 (No VT), error cp1252 (No VT), screen cp1252 (No VT)
[debug] yt-dlp version stable@2023.10.13 [b634ba742] (pip)
[debug] Python 3.12.0 (CPython AMD64 64bit) - Windows-10-10.0.17763-SP0 (OpenSSL 3.0.11 19 Sep 2023)
[debug] exe versions: ffmpeg 2023-06-21-git-1bcb8a7338-full_build-www.gyan.dev (setts), ffprobe 2023-06-21-git-1bcb8a7338-full_build-www.gyan.dev
[debug] Optional libraries: Cryptodome-3.19.0, brotli-1.1.0, certifi-2023.07.22, mutagen-1.47.0, sqlite3-3.42.0, websockets-12.0
[debug] Proxy map: {}
[debug] Loaded 1890 extractors
[WeiboUser] Extracting URL: https://weibo.com/u/7573838980
[WeiboUser] 7573838980: Downloading videos page 1
[WeiboUser] 7573838980: Generating first-visit guest request
ERROR: 7573838980: An extractor error has occurred. (caused by KeyError('confidence')); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "C:\Python\Lib\site-packages\yt_dlp\extractor\common.py", line 715, in extract
    ie_result = self._real_extract(url)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Lib\site-packages\yt_dlp\extractor\weibo.py", line 233, in _real_extract
    first_page = self._fetch_page(uid)
                 ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Lib\site-packages\yt_dlp\extractor\weibo.py", line 216, in _fetch_page
    return self._weibo_download_json(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\Lib\site-packages\yt_dlp\extractor\weibo.py", line 47, in _weibo_download_json
    self._update_visitor_cookies(video_id)
  File "C:\Python\Lib\site-packages\yt_dlp\extractor\weibo.py", line 38, in _update_visitor_cookies
    'c': '%03d' % visitor_data['data']['confidence'],
                  ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'confidence'
@snwefly snwefly added site-bug Issue with a specific website triage Untriaged issue labels Oct 27, 2023
@bashonly
Copy link
Member

I am not sure of the importance of the confidence param or what its value is supposed to be, but it looks like doing this works:

diff --git a/yt_dlp/extractor/weibo.py b/yt_dlp/extractor/weibo.py
index b0c3052b6..425e679be 100644
--- a/yt_dlp/extractor/weibo.py
+++ b/yt_dlp/extractor/weibo.py
@@ -26,16 +26,16 @@ def _update_visitor_cookies(self, video_id):
             data=urlencode_postdata({
                 'cb': 'gen_callback',
                 'fp': '{"os":"2","browser":"Gecko57,0,0,0","fonts":"undefined","screenInfo":"1440*900*24","plugins":""}',
-            }))
+            }))['data']
 
         self._download_webpage(
             'https://passport.weibo.com/visitor/visitor', video_id,
             note='Running first-visit callback to get guest cookies',
             query={
                 'a': 'incarnate',
-                't': visitor_data['data']['tid'],
+                't': visitor_data['tid'],
                 'w': 2,
-                'c': '%03d' % visitor_data['data']['confidence'],
+                'c': '%03d' % visitor_data.get('confidence', 1),
                 'cb': 'cross_domain',
                 'from': 'weibo',
                 '_rand': random.random(),

cc @c-basalt

@bashonly bashonly added needs-testing Patch needs testing and removed triage Untriaged issue labels Oct 28, 2023
@c-basalt
Copy link
Contributor

@bashonly That also works on my end, and it seems that weibo is relaxed about all other query params except for tid/t.
Though I think it could worth doing a slightly better emulation in case weibo tightens validation in the future.

diff --git a/yt_dlp/extractor/weibo.py b/yt_dlp/extractor/weibo.py
index b0c3052b6..9a7e3b0d9 100644
--- a/yt_dlp/extractor/weibo.py
+++ b/yt_dlp/extractor/weibo.py
@@ -18,24 +18,30 @@


 class WeiboBaseIE(InfoExtractor):
-    def _update_visitor_cookies(self, video_id):
+    def _update_visitor_cookies(self, visitor_url, video_id):
+        chrome_ver = self._search_regex(
+            r'Chrome/(\d+)', traverse_obj(self._downloader.params, ('http_headers', 'User-Agent', {str})),
+            'user agent version', default='90')
         visitor_data = self._download_json(
             'https://passport.weibo.com/visitor/genvisitor', video_id,
             note='Generating first-visit guest request',
+            headers={'Referer': visitor_url},
             transform_source=strip_jsonp,
             data=urlencode_postdata({
                 'cb': 'gen_callback',
-                'fp': '{"os":"2","browser":"Gecko57,0,0,0","fonts":"undefined","screenInfo":"1440*900*24","plugins":""}',
-            }))
+                'fp': f'{{"os":"1","browser":"Chrome{chrome_ver},0,0,0","fonts":"undefined","screenInfo":"1920*1080*24","plugins":""}}',
+            }))['data']

         self._download_webpage(
             'https://passport.weibo.com/visitor/visitor', video_id,
             note='Running first-visit callback to get guest cookies',
+            headers={'Referer': visitor_url},
             query={
                 'a': 'incarnate',
-                't': visitor_data['data']['tid'],
-                'w': 2,
-                'c': '%03d' % visitor_data['data']['confidence'],
+                't': visitor_data['tid'],
+                'w': 3 if visitor_data.get('new_tid') else 2,
+                'c': '%03d' % visitor_data.get('confidence', 100),
+                'gc': '',
                 'cb': 'cross_domain',
                 'from': 'weibo',
                 '_rand': random.random(),
@@ -44,7 +50,7 @@ def _update_visitor_cookies(self, video_id):
     def _weibo_download_json(self, url, video_id, *args, fatal=True, note='Downloading JSON metadata', **kwargs):
         webpage, urlh = self._download_webpage_handle(url, video_id, *args, fatal=fatal, note=note, **kwargs)
         if urllib.parse.urlparse(urlh.url).netloc == 'passport.weibo.com':
-            self._update_visitor_cookies(video_id)
+            self._update_visitor_cookies(urlh.url, video_id)
             webpage = self._download_webpage(url, video_id, *args, fatal=fatal, note=note, **kwargs)
         return self._parse_json(webpage, video_id, fatal=fatal)

@bashonly bashonly added patch-available There is patch available that should fix this issue. Someone needs to make a PR with it and removed needs-testing Patch needs testing labels Oct 28, 2023
bashonly pushed a commit that referenced this issue Nov 11, 2023
Closes #8445
Authored by: c-basalt
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this issue Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants