Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bbc cannot extract playlist #28115

Open
6 tasks
johnnytornado3 opened this issue Feb 8, 2021 · 11 comments
Open
6 tasks

bbc cannot extract playlist #28115

johnnytornado3 opened this issue Feb 8, 2021 · 11 comments

Comments

@johnnytornado3
Copy link

Checklist

  • I'm reporting a broken site support issue
  • I've verified that I'm running youtube-dl version 2021.02.04.1
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar bug reports including closed ones
  • I've read bugs section in FAQ

Verbose log

PASTE VERBOSE LOG HERE

Description

WRITE DESCRIPTION HERE
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

N:\Movies>youtube-dl --version
2021.02.04.1

N:\Movies>youtube-dl --verbose https://www.bbc.com/reel/playlist/mind-matters?vp
id=p0962h5x
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.bbc.com/reel/playlist/mind
-matters?vpid=p0962h5x']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2021.02.04.1
[debug] Python version 3.4.4 (CPython) - Windows-XP-5.1.2600-SP3
[debug] exe versions: ffmpeg N-77883-gd7c75a5, ffprobe N-77883-gd7c75a5, phantom
js 1.9.7
[debug] Proxy map: {}
[bbc] mind-matters: Downloading webpage
ERROR: Unable to extract playlist data; please report this issue on https://yt-d
l.org/bug . Make sure you are using the latest version; type youtube-dl -U to
update. Be sure to call youtube-dl with the --verbose flag and include its compl
ete output.
Traceback (most recent call last):
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpgi7ngq
0n\build\youtube_dl\YoutubeDL.py", line 806, in wrapper
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpgi7ngq
0n\build\youtube_dl\YoutubeDL.py", line 827, in __extract_info
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpgi7ngq
0n\build\youtube_dl\extractor\common.py", line 532, in extract
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpgi7ngq
0n\build\youtube_dl\extractor\bbc.py", line 1176, in _real_extract
File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpgi7ngq
0n\build\youtube_dl\extractor\common.py", line 1010, in _search_regex
youtube_dl.utils.RegexNotFoundError: Unable to extract playlist data; please rep
ort this issue on https://yt-dl.org/bug . Make sure you are using the latest ver
sion; type youtube-dl -U to update. Be sure to call youtube-dl with the --verb
ose flag and include its complete output.

N:\Movies>

@Vangelis66
Copy link

This is yet another duplicate of
#27125
#23660
#21870
#18308
and, possibly, many others...

TL;DR (or DS=didn'r search):

*bbc.com/reel/* URIs are not supported by current bbcIE; or, if support was meant, it's currently broken...

Workarounds:

  1. For single clip, like the one in OP's log:

Reformat
https://www.bbc.com/reel/playlist/mind-matters?vpid=p0962h5x
to
https://www.bbc.co.uk/programmes/p0962h5x
and feed that to yt-dl
(auto-redirection to
https://www.bbc.co.uk/programmes/p095rkvg
will take place)

  1. For the whole 10 clips of the "mind-matters" playlist

Inspect Page Source and search for clipPID":" string; you'll find ten instances, like below:

clipPID":"p095rkvg
clipPID":"p07rr51d
clipPID":"p08d15ny
clipPID":"p07jmww3
clipPID":"p06l4bv9
clipPID":"p05vt4yl
clipPID":"p06qhcmy
clipPID":"p06s9whb
clipPID":"p06rw723
clipPID":"p084qhnf

What's important is the pid values, e.g. p07rr51d for the second clip; create the following list of bbc.co.uk URIs:

https://www.bbc.co.uk/programmes/p095rkvg
https://www.bbc.co.uk/programmes/p07rr51d
https://www.bbc.co.uk/programmes/p08d15ny
https://www.bbc.co.uk/programmes/p07jmww3
https://www.bbc.co.uk/programmes/p06l4bv9
https://www.bbc.co.uk/programmes/p05vt4yl
https://www.bbc.co.uk/programmes/p06qhcmy
https://www.bbc.co.uk/programmes/p06s9whb
https://www.bbc.co.uk/programmes/p06rw723
https://www.bbc.co.uk/programmes/p084qhnf

save it as a text file named mind-matters-pl.txt, put it adjacent to youtube-dl.exe and then issue:
youtube-dl -a "mind-matters-pl.txt"

@dirkf
Copy link
Contributor

dirkf commented Mar 24, 2021

Fixed in a400024.

@remitamine
Copy link
Collaborator

BBC reel playlist URLs are not handled properly(does not handle --no-playlist/--yes-playlist option), so this issue will be kept until this is fixed.

@dirkf
Copy link
Contributor

dirkf commented Mar 27, 2021

It would be easy to fix if there was a way of distinguishing when the noplaylist option is False by default and when it is set by --yes-playlist. Apparently self._downloader.params.get('noplaylist') is False instead of None when neither --xx-playlist option was set. Surely these are boolean options that should set params['noplaylist] only if given:

--- a/youtube_dl/options.py
+++ b/youtube_dl/options.py
@@ -330,11 +330,11 @@
         ))
     selection.add_option(
         '--no-playlist',
-        action='store_true', dest='noplaylist', default=False,
+        action='store_true', dest='noplaylist', default=None,
         help='Download only the video, if the URL refers to a video and a playlist.')
     selection.add_option(
         '--yes-playlist',
-        action='store_false', dest='noplaylist', default=False,
+        action='store_false', dest='noplaylist', default=None,
         help='Download the playlist, if the URL refers to a video and a playlist.')
     selection.add_option(
         '--age-limit',

@remitamine
Copy link
Collaborator

the extractor doesn't need to know whether the parameter has been set explicitly by the user or it's the default value(the default value has to treated in the same way as if the user did pass the option), you can look at other extractors that handle the noplaylist option.

@dirkf
Copy link
Contributor

dirkf commented Mar 27, 2021

The default is wrong: it should be None. A simple boolean value can't represent the three cases. Otherwise there needs to be a boolean params['yesplaylist'] set by --yes-playlist and params['noplaylist'] should only be set by --no-playlist.

Apparently the other extractors can't handle the case where basically the same page is fetched using different URLs that imply distinct default playlist handling. For instance:

The correct logic for a URL that has both a single video and a playlist is:

  • URL implies a playlist and --no-playlist => single video (params['noplaylist'] == False)~
  • URL implies a playlist and --no-playlist not used => playlist (params['noplaylist'] == False)~
  • URL implies a playlist and --yes-playlist => playlist (params['noplaylist'] == True)
  • URL implies single video and --yes-playlist => playlist (params['noplaylist'] == False)*
  • URL implies single video and --yes-playlist not used => single video (params['noplaylist'] == False)*
  • URL implies single video and --no-playlist => single video (params['noplaylist'] == True)

If params['noplaylist'] defaults to False, the cases marked ~ can't be identified, nor can those marked *, because params['noplaylist'] has the same value, but the desired outcome is different.

Even if the different URL formats were handled by separate extractors, it wouldn't help to disambiguate the params['noplaylist'] value.

@remitamine
Copy link
Collaborator

https://www.bbc.com/reel/playlist/mind-matters?vpid=p0962h5x => video

this URL would not be considered that it implies a video, instead, this will be determined after checking the noplaylist value.

@dirkf
Copy link
Contributor

dirkf commented Mar 27, 2021

Actually, like the other URL formats of this type, the vpid= type has a focused video; unlike the others it's not the first in the list under the video. So it does imply a video, especially as its vpid is mentioned.

There is also the third case https://www.bbc.com/reel/video/p099tghy/is-phrenology-the-weirdest-pseudoscience-of-them-all- which is apparently identical to https://www.bbc.com/reel/playlist/mind-matters.

I reviewed the results of find youtube_dl -name '*.py' -exec grep -HE "'noplaylist'" "{}" \; again. Some extractors report that a single video is being processed because of --no-playlist. None mention using --yes-playlist. It's clear that --yes-playlist is being used as just a way to turn off --no-playlist (as if it were --no-no-playlist). If params['noplaylist'] defaults to False there's no other reason for it.

Suppose that a user goes to a page with a video and wants to archive the video. The page (say, https://www.bbc.com/reel/video/p099tghy/is-phrenology-the-weirdest-pseudoscience-of-them-all-) happens to have a playlist that can be extracted, so the user ends up with 57 (12, in this case) other unexpected videos. The unhappy user can make --no-playlist the configuration default to avoid such a surprise. Then the same user goes to a playlist page (say, https://www.bbc.com/reel/playlist/mind-matters) that happens to have an active video and finds that only that video is fetched. The user is unhappy again.

Whereas, if --no-playlist and --yes-playlist operate independently (equivalently, params['noplaylist'] defaults to None), with the first page the user gets the one video expected, and could have used --yes-playlist to get the playlist; there is no need to set any non-default configuration; with the second page, the user gets the playlist expected, and could have used --no-playlist to get just the video. Surely that's what was intended?

@remitamine
Copy link
Collaborator

Suppose that a user goes to a page with a video and wants to archive the video. The page (say, https://www.bbc.com/reel/video/p099tghy/is-phrenology-the-weirdest-pseudoscience-of-them-all-) happens to have a playlist that can be extracted, so the user ends up with 57 (12, in this case) other unexpected videos. The unhappy user can make --no-playlist the configuration default to avoid such a surprise. Then the same user goes to a playlist page (say, https://www.bbc.com/reel/playlist/mind-matters) that happens to have an active video and finds that only that video is fetched. The user is unhappy again.

for https://www.bbc.com/reel/playlist/mind-matters, it's a playlist URL and it will be treated this way regardless of noplaylist value.

@dirkf
Copy link
Contributor

dirkf commented Mar 27, 2021

The same page can have both a video and a playlist and the interpretation of which is to be processed depends only on the URL.

https://www.bbc.com/reel/playlist/mind-matters is a URL "referring to a video and a playlist", to quote the manual, so --no-playlist ought to be respected.

But the other two URL styles that I quoted, which are plainly video and not playlist URLs, refer to an essentially identical page. They should yield the video by default but then it's impossible to override that with --yes-playlist because the option processing doesn't record that --yes-playlist was used.

In summary, the change from False to None in youtube_dl/options.py as suggested

  • would have no effect on existing extractors, except to allow them to respond to --yes-playlist meaningfully,
  • would match the description of the --no/yes-playlist options in the manual, and
  • would allow potentially confusing URLs to be treated more flexibly and as described in the manual.

See https://github.com/dirkf/youtube-dl/tree/df-bbcreel-playlist-patch.

@remitamine
Copy link
Collaborator

would have no effect on existing extractors, except to allow them to respond to --yes-playlist meaningfully

it's a breaking change, it would change the default behaviour.

would match the description of the --no/yes-playlist options in the manual, and
would allow potentially confusing URLs to be treated more flexibly and as described in the manual.

the description of the option states this:

if the URL refers to a video and a playlist

so the option would apply only to https://www.bbc.com/reel/playlist/mind-matters?vpid=p0962h5x, because the URL refers to mind-matters playlist and p0962h5x video id.
the https://www.bbc.com/reel/playlist/mind-matters URL links to a playlist because the URL refers only to mind-matters playlist.
the https://www.bbc.com/reel/video/p099tghy/is-phrenology-the-weirdest-pseudoscience-of-them-all- URL links to a video because the URL refers only to p099tghy video id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants