Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[appletrailers] xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 39, column 138 #7953

Closed
remitamine opened this issue Dec 23, 2015 · 4 comments

Comments

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Dec 23, 2015

youtube-dl -F http://trailers.apple.com/trailers/independent/thelook/
[appletrailers] thelook: Downloading XML
Traceback (most recent call last):
  File "__main__.py", line 19, in <module>
    youtube_dl.main()
  File "/home/amine/youtube-dl/youtube_dl/__init__.py", line 410, in main
    _real_main(argv)
  File "/home/amine/youtube-dl/youtube_dl/__init__.py", line 400, in _real_main
    retcode = ydl.download(all_urls)
  File "/home/amine/youtube-dl/youtube_dl/YoutubeDL.py", line 1677, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/home/amine/youtube-dl/youtube_dl/YoutubeDL.py", line 665, in extract_info
    ie_result = ie.extract(url)
  File "/home/amine/youtube-dl/youtube_dl/extractor/common.py", line 291, in extract
    return self._real_extract(url)
  File "/home/amine/youtube-dl/youtube_dl/extractor/appletrailers.py", line 92, in _real_extract
    doc = self._download_xml(playlist_url, movie, transform_source=fix_html)
  File "/home/amine/youtube-dl/youtube_dl/extractor/common.py", line 466, in _download_xml
    return compat_etree_fromstring(xml_string.encode('utf-8'))
  File "/home/amine/youtube-dl/youtube_dl/compat.py", line 248, in compat_etree_fromstring
    doc = _XML(text, parser=etree.XMLParser(target=etree.TreeBuilder(element_factory=_element_factory)))
  File "/home/amine/youtube-dl/youtube_dl/compat.py", line 237, in _XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 39, column 138
jaimeMF added a commit that referenced this issue Dec 23, 2015
@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Dec 23, 2015

I've fixed the xml error, but the files use a different structure. Do you have more examples?

I don't have too much time; if you want to fix it yourself, just say it.

@remitamine
Copy link
Collaborator Author

@remitamine remitamine commented Dec 23, 2015

i think these custom pages are not widely used.
i tested the extractor with more than 300 urls and it fails only on 4 of them.

@remitamine remitamine closed this Dec 23, 2015
@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Dec 23, 2015

Note that the three movies are from 2011, so maybe that's the old layout. If someone wants to download them, we can add support for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.