Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bilibili] Unable to download JSON metadata: HTTP Error 502: Bad Gateway (caused by HTTPError()) #32722

Open
5 tasks done
Mia-Dan opened this issue Feb 12, 2024 · 11 comments
Open
5 tasks done
Labels
broken-IE problem with existing site extraction

Comments

@Mia-Dan
Copy link

Mia-Dan commented Feb 12, 2024

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2021.12.17 2023.08.07, 2024.02.03
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

youtube-dl --verbose https://www.bilibili.com/video/BV1Ac411v7a8
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.bilibili.com/video/BV1Ac411v7a8']
[debug] Encodings: locale cp936, fs mbcs, out cp936, pref cp936
[debug] youtube-dl version 2024.02.03 [4416f82c8] (single file build)
[debug] ** This version was built from the latest master code at https://github.com/ytdl-org/youtube-dl.
[debug] ** For support, visit the main site.
[debug] Python 3.4.4 (CPython AMD64 32bit) - Windows-10-10.0.22000 - OpenSSL 1.0.2d 9 Jul 2015
[debug] exe versions: ffmpeg 2022-12-19-git-48d5aecfc4-full_build-www.gyan.dev, ffprobe 2022-12-19-git-48d5aecfc4-full_build-www.gyan.dev
[debug] Proxy map: {}
[BiliBili] 1Ac411v7a8: Downloading webpage
[BiliBili] 1Ac411v7a8: Downloading video info page
WARNING: Unable to download JSON metadata: HTTP Error 502: Bad Gateway
[BiliBili] 1Ac411v7a8: Downloading video info page
ERROR: Unable to download JSON metadata: HTTP Error 502: Bad Gateway (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "D:\a\ytdl-nightly\ytdl-nightly\youtube_dl\extractor\common.py", line 678, in _request_webpage
  File "D:\a\ytdl-nightly\ytdl-nightly\youtube_dl\YoutubeDL.py", line 2465, in urlopen
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 470, in open
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 580, in http_response
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 502, in error
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 442, in _call_chain
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 685, in http_error_302
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 470, in open
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 580, in http_response
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 508, in error
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 442, in _call_chain
  File "C:\hostedtoolcache\windows\Python\3.4.4\x86\lib\urllib\request.py", line 588, in http_error_default

Description

The error is the same with version 2023.08.07, 2024.02.03, both without and with a proxy. I tried several videos but all with the same error.

@dirkf
Copy link
Contributor

dirkf commented Feb 12, 2024

Working in yt-dlp 2023.06.22 but 502 or 429 for me with yt-dl masater. Back-port needed.

@Mia-Dan
Copy link
Author

Mia-Dan commented Feb 12, 2024

Working in yt-dlp 2023.06.22 but 502 or 429 for me with yt-dl masater. Back-port needed.

Thanks, dirkf. yt-dlp works. :)

@dirkf
Copy link
Contributor

dirkf commented Feb 13, 2024

The existing yt-dl extractor is completely obsolete. After some work:

$ python -m youtube_dl -vF 'https://www.bilibili.com/video/BV1Ac411v7a8'
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-vF', u'https://www.bilibili.com/video/BV1Ac411v7a8']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 4416f82c8
[debug] Python 2.7.15 (CPython i686 32bit) - Linux-6.1.0-17-686-pae-i686-with-debian-12.5 - OpenSSL 1.1.1a  20 Nov 2018 - glibc 2.1.3
[debug] exe versions: ffmpeg 5.1.4-0, ffprobe 5.1.4-0
[debug] Proxy map: {}
[BiliBili] 1Ac411v7a8: Downloading webpage
[BiliBili] BV1Ac411v7a8: Extracting videos in anthology
[BiliBili] 284254516: Extracting chapters
[info] Available formats for BV1Ac411v7a8:
format code  extension  resolution note
30216        m4a        audio only   33k , mp4a.40.5
30232        m4a        audio only   67k , mp4a.40.2
100022       mp4        640x360      15k , av01.0.08m.08.0.110.01.01.01.0, 30.019fps, video only
30011        mp4        640x360      25k , hev1.1.6.l120.90, 30.303fps, video only
30016        mp4        640x360      39k , avc1.64001e, 30.303fps, video only
100023       mp4        854x480      19k , av01.0.08m.08.0.110.01.01.01.0, 30.019fps, video only
30033        mp4        854x480      32k , hev1.1.6.l120.90, 30.303fps, video only
30032        mp4        854x480      50k , avc1.64001f, 30.303fps, video only (best)
$

Is this just some sort of text presentation? I'm just seeing a logo at top right and various text bottom centre (with yt-dlp too).

@aboutqx
Copy link

aboutqx commented Feb 15, 2024

The same.

@dirkf
Copy link
Contributor

dirkf commented Feb 15, 2024

This test URL https://www.bilibili.com/video/BV1jL41167ZG/ from the yt-dlp extractor is described as "supporter-only" but not detected as such, by either the original extractor or my back-port.

Is the short looping video that I see telling me that? I don't see any relevant text in a G-translated page or in the metadata.

@Mia-Dan
Copy link
Author

Mia-Dan commented Feb 16, 2024

Yes, the video in (https://www.bilibili.com/video/BV1jL41167ZG/) is indicating itself as supporter-only by saying "该视频为「高级充电回馈」专属视频 开通「18元档包月充电」即可观看" (¥18 / month)

@dirkf
Copy link
Contributor

dirkf commented Feb 16, 2024

Is this a standard video (that might be identified by its size, say)?

Otherwise, as I commented, it's not obvious how to detect "supporter-only" videos, which would be useful to do.

@Mia-Dan
Copy link
Author

Mia-Dan commented Feb 16, 2024

I'm not sure... Such kind of videos is very rare in bilibili - kinda surprised to see it being picked as a test URL, honestly.

@dirkf
Copy link
Contributor

dirkf commented Feb 16, 2024

This is just one of many test URLs. The point of tests is to use examples that cover all the various cases that users may encounter.

Perhaps this metadata item is_upower_exclusive indicates the status. Unfortunately, there isn't any relevant text in the page, or at least not in the static HTML that yt-dl sees. Another characteristic is that the video duration is ~10s while the full video should be ~695s, but that isn't very specific. More investigation needed.

@Mia-Dan
Copy link
Author

Mia-Dan commented Feb 17, 2024

Okay, thanks for detailed explanation :)

@dirkf
Copy link
Contributor

dirkf commented Feb 17, 2024

Indeed that is the only test video that has is_upower_exclusive_: true, so I'm assuming that it means "supporter-only".

A WIP extractor based on yt-dlp's (with the new PR yt-dlp/yt-dlp#9117) looks good but is prone to hit a captcha page especially with Python 2.

@dirkf dirkf added the broken-IE problem with existing site extraction label Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broken-IE problem with existing site extraction
Projects
None yet
Development

No branches or pull requests

3 participants