Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic extractor doesn't recognise iframe with src //www.youtube-nocookie.com/embed #3713

Closed
lofidevops opened this issue Sep 10, 2014 · 1 comment

Comments

@lofidevops
Copy link

@lofidevops lofidevops commented Sep 10, 2014

~> youtube-dl --verbose "http://www.thehelloworldprogram.com/python/what-is-python/"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: ['--verbose', 'http://www.thehelloworldprogram.com/python/what-is-python/']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2014.09.06
[debug] Python version 2.7.6 - Linux-3.13.0-35-generic-x86_64-with-Ubuntu-14.04-trusty
[debug] Proxy map: {}
[generic] what-is-python: Requesting header
WARNING: Falling back on generic information extractor.
[generic] what-is-python: Downloading webpage
[generic] what-is-python: Extracting information
ERROR: Unsupported URL: http://www.thehelloworldprogram.com/python/what-is-python/; please report this issue on https://yt-dl.org/bug . Be sure to call youtube-dl with the --verbose flag and include its complete output. Make sure you are using the latest version; type  youtube-dl -U  to update.
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/youtube_dl/extractor/generic.py", line 545, in _real_extract
    doc = parse_xml(webpage)
  File "/usr/local/lib/python2.7/dist-packages/youtube_dl/utils.py", line 1466, in parse_xml
    tree = xml.etree.ElementTree.XML(s.encode('utf-8'), **kwargs)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
ParseError: not well-formed (invalid token): line 1, column 1270
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 523, in extract_info
    ie_result = ie.extract(url)
  File "/usr/local/lib/python2.7/dist-packages/youtube_dl/extractor/common.py", line 178, in extract
    return self._real_extract(url)
  File "/usr/local/lib/python2.7/dist-packages/youtube_dl/extractor/generic.py", line 890, in _real_extract
    raise ExtractorError('Unsupported URL: %s' % url)
ExtractorError: Unsupported URL: http://www.thehelloworldprogram.com/python/what-is-python/; please report this issue on https://yt-dl.org/bug . Be sure to call youtube-dl with the --verbose flag and include its complete output. Make sure you are using the latest version; type  youtube-dl -U  to update.

Hmmm, just seen it's a ParseError... is it possible to tidy bad HTML?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.