Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot correctly resolve bilibili.com video URLs contained in a festival / bilibili.com 的包含在 festival 中的视频链接不能被正确解析 #31661

Open
5 tasks done
szdytom opened this issue Feb 23, 2023 · 6 comments
Labels
broken-IE problem with existing site extraction

Comments

@szdytom
Copy link

szdytom commented Feb 23, 2023

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://www.bilibili.com/video/BV1dZ4y1Y7bt', '-v']
[debug] Encodings: locale cp936, fs mbcs, out cp936, pref cp936
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.19041
[debug] exe versions: none
[debug] Proxy map: {}
[BiliBili] 1dZ4y1Y7bt: Downloading webpage
[BiliBili] 1dZ4y1Y7bt: Downloading video info page
ERROR: Unable to extract title; please report this issue on https://yt-dl.org/bug . Make sure you are using
 the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and
 include its complete output.
Traceback (most recent call last):
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\Youtube
DL.py", line 815, in wrapper
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\Youtube
DL.py", line 836, in __extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extract
or\common.py", line 534, in extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extract
or\bilibili.py", line 213, in _real_extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extract
or\common.py", line 1021, in _html_search_regex
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpupik7c6w\build\youtube_dl\extract
or\common.py", line 1012, in _search_regex
youtube_dl.utils.RegexNotFoundError: Unable to extract title; please report this issue on https://yt-dl.org
/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-
dl with the --verbose flag and include its complete output.

Description

cannot correctly resolve bilibili.com video URLs which is contained in a festival. for example,

https://www.bilibili.com/festival/lty10th?bvid=BV1dZ4y1Y7bt

while a normal video(not contained in a festival) URL should look like

https://www.bilibili.com/video/BVxxxxxxxx

but using https://www.bilibili.com/video/BV1dZ4y1Y7bt still does not work for it auto redirects back to the festival URL.

bilibili.com 的包含在 festival 中的视频链接不能被正确解析。

@dirkf
Copy link
Contributor

dirkf commented Feb 24, 2023

  1. The _VALID_URL can be updated to match URLs like https://www.bilibili.com/festival/lty10th?bvid=BV1dZ4y1Y7bt. Is this the only such format (ie .../festival/slug?bvid=...) or should other top-level path components and/or more path components be matched?

  2. The error occurs because the title extraction fails. In the problem page there is this <title>洛天依十周年官方演唱会</title>. If that should be the fallback title, that's fine, but I'm not familiar with the content. Then

$ python3.9 -m youtube_dl -v -F 'https://www.bilibili.com/festival/lty10th?bvid=BV1dZ4y1Y7bt'
[debug] System config: ['--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '-F', 'https://www.bilibili.com/festival/lty10th?bvid=BV1dZ4y1Y7bt']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: a5464aca1
[debug] Python version 3.9.16 (CPython) - Linux-4.4.0-210-generic-i686-with-glibc2.23
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[BiliBili] 1dZ4y1Y7bt: Downloading webpage
[BiliBili] 1dZ4y1Y7bt: Downloading video info page
WARNING: unable to extract description; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
WARNING: unable to extract og:image; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
[info] Available formats for 1dZ4y1Y7bt:
format code  extension  resolution note
0            flv        unknown    3.53GiB
$

@dirkf dirkf added the broken-IE problem with existing site extraction label Feb 24, 2023
@li6in9muyou
Copy link

  1. URL format like .../festival/<slug>?bvid=<bvid>) is used on rare occasions.
  2. What's in the <title> tag should not be the fallback title, that is the title of the "festival". The requested video is one of many videos published in this "festival"

@dirkf
Copy link
Contributor

dirkf commented Mar 2, 2023

What should be the title of the test video https://www.bilibili.com/festival/lty10th?bvid=BV1dZ4y1Y7bt?

If there isn't an obvious candidate, the title could be f'{festival_title}: {video_id}' or similar.

@li6in9muyou
Copy link

The element can be located with .video-toobar_title whoes innerText is 【洛天依原创曲】光与影的对白【2022官方生贺曲】. This is very different from other video pages.

@dirkf
Copy link
Contributor

dirkf commented Mar 4, 2023

That's fine. There are other fields not being extracted but I don't think they should cause warnings. Obviously, suggestions for alternative sources in the page are welcome.

$ python3.9 -m youtube_dl --get-title 'https://www.bilibili.com/festival/lty10th?bvid=BV1dZ4y1Y7bt'
WARNING: unable to extract description; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
WARNING: unable to extract og:image; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
【洛天依原创曲】光与影的对白【2022官方生贺曲】
$

Are the 【】 part of the title or should they be stripped?

@szdytom
Copy link
Author

szdytom commented Mar 25, 2023

no it shouldn't, the 【】 is a part of the title.

P.S. video description can be read by document.querySelector('.video-desc').innerHTML

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broken-IE problem with existing site extraction
Projects
None yet
Development

No branches or pull requests

3 participants