Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to extract OpenGraph title, certain playlist on webofstories.com #8417

Closed
tehlux opened this issue Feb 4, 2016 · 3 comments
Closed

Unable to extract OpenGraph title, certain playlist on webofstories.com #8417

tehlux opened this issue Feb 4, 2016 · 3 comments

Comments

@tehlux
Copy link

@tehlux tehlux commented Feb 4, 2016

using 2016.02.01

here's the full output and the link it doesn't work on.

$ youtube-dl --verbose http://www.webofstories.com/playAll/oliver.sacks
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--verbose', u'http://www.webofstories.com/playAll/oliver.sacks']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.02.01
[debug] Python version 2.7.6 - Linux-3.19.0-32-generic-x86_64-with-LinuxMint-17.3-rosa
[debug] exe versions: ffmpeg N-77455-g4707497, ffprobe N-77455-g4707497
[debug] Proxy map: {}
[WebOfStoriesPlaylist] oliver.sacks: Downloading webpage
[download] Downloading playlist: Oliver Sacks (Scientist)
[WebOfStoriesPlaylist] playlist Oliver Sacks (Scientist): Collected 360 video ids (downloading 360 of them)
[download] Downloading video 1 of 360
[WebOfStories] 54219: Downloading webpage
[debug] Invoking downloader on u'http://eu-mobile.webofstories.com/lives/50035/157.mp4'
[download] Destination: I thought Alexander Luria had done it all-54219.mp4
[download] 100% of 3.62MiB in 00:12
[download] Downloading video 2 of 360
[WebOfStories] 54218: Downloading webpage
[debug] Invoking downloader on u'http://eu-mobile.webofstories.com/lives/50035/156.mp4'
[download] Destination: The death of medical case histories-54218.mp4
[download] 100% of 3.28MiB in 00:06
[download] Downloading video 3 of 360
[WebOfStories] 54215: Downloading webpage
ERROR: Unable to extract OpenGraph title; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 666, in extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/common.py", line 313, in extract
    return self._real_extract(url)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/webofstories.py", line 46, in _real_extract
    title = self._og_search_title(webpage)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/common.py", line 702, in _og_search_title
    return self._og_search_property('title', html, **kargs)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/common.py", line 690, in _og_search_property
    escaped = self._search_regex(self._og_regexes(prop), html, name, flags=re.DOTALL, **kargs)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/common.py", line 608, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
RegexNotFoundError: Unable to extract OpenGraph title; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
@tehlux
Copy link
Author

@tehlux tehlux commented Feb 27, 2016

i finally had some time to take a look into it myself and there's an issue extracting the title from the meta tags/opengraph in the page.

a page where this fails is this:

http://www.webofstories.com/play/oliver.sacks/152

checking the html i found that it's just invalid (contains quotes).

meta property="og:title" content=""A Leg to Stand On"" /

my conclusion is it's not a bug in youtubedl and can't be fixed. one can't consistently parse some non-html as html. anything to account for that would be a hack. whoever paid the guy for the webofstories website should demand their money back. guy doesn't even know html (there's also other nastiness like class names with spaces, i.e. class="duration text")

the only hack i wouldn't be embarassed to post here is:

    title = "hack" #self._og_search_title(webpage)

in def _real_extract(self, url):
line 46 of youtube_dl/extractor/webofstories.py
that bypasses the search for the title.

i'll try reporting to web of stories admins, so they fix their html. ;) and will do a hack locally just to get all the videos with meta information.

@tehlux
Copy link
Author

@tehlux tehlux commented Feb 27, 2016

i think this should be closed

@tehlux tehlux closed this Feb 27, 2016
@dstftw dstftw reopened this Feb 27, 2016
@dstftw
Copy link
Collaborator

@dstftw dstftw commented Feb 27, 2016

This issue has been fixed and fix will be incorporated in the next version of youtube-dl.

@dstftw dstftw closed this Feb 27, 2016
dstftw added a commit that referenced this issue Feb 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
@dstftw @tehlux and others
You can’t perform that action at this time.