Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBS changed site #13801

Closed
jimbolaya opened this issue Aug 1, 2017 · 5 comments
Closed

PBS changed site #13801

jimbolaya opened this issue Aug 1, 2017 · 5 comments

Comments

@jimbolaya
Copy link

@jimbolaya jimbolaya commented Aug 1, 2017

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like this: [x])
  • Use the Preview tab to see what your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2017.07.30.1. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2017.07.30.1

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other
    Not sure how to classify this. PBS changed their format and now needs an update.
    Is this a "Site support request" since the site parser needs to be changed? Or is it a "Bug report" since an existing site no longer works?

The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue


If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add the -v flag to your command line you run youtube-dl with (youtube-dl -v <your command line>), copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

[debug] System config: ['--all-subs', '--prefer-ffmpeg', '--ffmpeg-location', '/opt/ffmpeg-git-20170417-64bit-static/ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://www.pbs.org/video/pbs-newshour-full-episode-july-31-2017-1501539057/']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.07.30.1
[debug] Python version 3.4.3 - Linux-4.4.0-72-generic-x86_64-with-LinuxMint-17-qiana
[debug] exe versions: ffmpeg N-85581-ge22d495538-static, ffprobe N-85581-ge22d495538-static, rtmpdump 2.4
[debug] Proxy map: {}
[generic] pbs-newshour-full-episode-july-31-2017-1501539057: Requesting header
[redirect] Following redirect to http://www.pbs.org/video/pbs-newshour-full-episode-july-31-2017-1501539057/
[generic] pbs-newshour-full-episode-july-31-2017-1501539057: Requesting header
WARNING: Falling back on generic information extractor.
[generic] pbs-newshour-full-episode-july-31-2017-1501539057: Downloading webpage
[generic] pbs-newshour-full-episode-july-31-2017-1501539057: Extracting information
ERROR: Unsupported URL: http://www.pbs.org/video/pbs-newshour-full-episode-july-31-2017-1501539057/
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 776, in extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 433, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2944, in _real_extract
    raise UnsupportedError(url)
youtube_dl.utils.UnsupportedError: Unsupported URL: http://www.pbs.org/video/pbs-newshour-full-episode-july-31-2017-1501539057/
...
<end of log>

If the purpose of this issue is a site support request please provide all kinds of example URLs support for which should be included (replace following example URLs by yours):

Note that youtube-dl does not support sites dedicated to copyright infringement. In order for site support request to be accepted all provided example URLs should not violate any copyrights.


Description of your issue, suggested solution and other information

PBS has changed the format of (at least) their newshour videos. Yesterday, I retrieved videos using:
youtube-dl http://www.pbs.org/video/3003276748/

Now, the format of the URL is as follows:
http://www.pbs.org/video/pbs-newshour-full-episode-july-31-2017-1501539057/

Which fails as "ERROR: Unsupported URL:"

The old format URL still works for the older videos, but I am unable to determine if there is an equivalent URL for newer videos.

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Aug 1, 2017

The old format URL still works for the older videos, but I am unable to determine if there is an equivalent URL for newer videos.

http://www.pbs.org/video/pbs-newshour-full-episode-july-31-2017-1501539057/ can be downloaded using http://www.pbs.org/video/3003333873/

@jimbolaya
Copy link
Author

@jimbolaya jimbolaya commented Aug 1, 2017

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Aug 1, 2017

until it's fixed, you can view the source of the webpage and use the url in the og:url meta property.

@jimbolaya
Copy link
Author

@jimbolaya jimbolaya commented Aug 1, 2017

Thank you for the workaround.

@ManuelUrrutia
Copy link

@ManuelUrrutia ManuelUrrutia commented Aug 3, 2017

If the listing provided above is a faithful reproduction of the response at that time (and I have no reason to doubt it), the response has changed. This would not surprise me as their techs are probably tinkering with how their server should respond. Following is the response today (Aug 03) with my older version (2017.05.26):

..:~ youtube-dl -v http://www.pbs.org/video/episode-3-n2qkrh/
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'http://www.pbs.org/video/episode-3-n2qkrh/']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.05.26
[debug] Python version 2.7.10 - Darwin-15.6.0-x86_64-i386-64bit
[debug] exe versions: avconv 12, avprobe 12, ffmpeg 3.3, ffprobe 3.3, rtmpdump 2.4
[debug] Proxy map: {}
[generic] episode-3-n2qkrh: Requesting header
WARNING: Falling back on generic information extractor.
[generic] episode-3-n2qkrh: Downloading webpage
[generic] episode-3-n2qkrh: Extracting information
ERROR: Unsupported URL: http://www.pbs.org/video/episode-3-n2qkrh/
Traceback (most recent call last):
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 1970, in _real_extract
doc = compat_etree_fromstring(webpage.encode('utf-8'))
File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2526, in compat_etree_fromstring
doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory)))
File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2515, in _XML
parser.feed(text)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
ParseError: syntax error: line 1, column 0
Traceback (most recent call last):
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 760, in extract_info
ie_result = ie.extract(url)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 433, in extract
ie_result = self._real_extract(url)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2795, in _real_extract
raise UnsupportedError(url)
UnsupportedError: Unsupported URL: http://www.pbs.org/video/episode-3-n2qkrh/

The response under the current version (2017.07.30.1) is essentially the same but it has an extra warning and it took a while before it could soldier on:

...:~ youtube-dl -v http://www.pbs.org/video/episode-3-n2qkrh/
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'http://www.pbs.org/video/episode-3-n2qkrh/']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.07.30.1
[debug] Python version 2.7.10 - Darwin-15.6.0-x86_64-i386-64bit
[debug] exe versions: avconv 12, avprobe 12, ffmpeg 3.3, ffprobe 3.3, rtmpdump 2.4
[debug] Proxy map: {}
[generic] episode-3-n2qkrh: Requesting header
WARNING: Could not send HEAD request to http://www.pbs.org/video/episode-3-n2qkrh/: <urlopen error [Errno 8] nodename nor servname provided, or not known>
[generic] episode-3-n2qkrh: Downloading webpage
WARNING: Falling back on generic information extractor.
[generic] episode-3-n2qkrh: Extracting information
ERROR: Unsupported URL: http://www.pbs.org/video/episode-3-n2qkrh/
Traceback (most recent call last):
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2077, in _real_extract
doc = compat_etree_fromstring(webpage.encode('utf-8'))
File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2539, in compat_etree_fromstring
doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory)))
File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2528, in _XML
parser.feed(text)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
ParseError: syntax error: line 1, column 0
Traceback (most recent call last):
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 776, in extract_info
ie_result = ie.extract(url)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 433, in extract
ie_result = self._real_extract(url)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2944, in _real_extract
raise UnsupportedError(url)
UnsupportedError: Unsupported URL: http://www.pbs.org/video/episode-3-n2qkrh/

From what I see, the redirect command is no longer taking place and, consequently, no information about the video can be extracted. Hence, I don't see how to deploy the suggested work around. Plus I am uncertain as to what flag to use to extract the meta property of a video for which there is no URL.

One more thing: "Video Downloadhelper" can still get the play list. In the past, such playlist did not include the higher resolutions (IIRC) while youtube-dl did find them and one could simply enter the desired "number" for it. That's no longer possible. The link to the highest resolution found this way for the above video is

https://ga.video.cdn.pbs.org/videos/rare/5ee7ac8e-a916-4123-9498-c383bbf7b1e3/1000021430/hd-16x9-mezzanine-1080p/vthhakfa_racp0103-16x9-720p-720p-3000k.m3u8

Hope this helps and thanks for reading this far.

@dstftw dstftw closed this in 183062a Aug 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.