Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TVP extractor fails to parse new URL format for vod.tvp.pl #14368

Closed
rafalborczuch opened this issue Sep 30, 2017 · 0 comments
Closed

TVP extractor fails to parse new URL format for vod.tvp.pl #14368

rafalborczuch opened this issue Sep 30, 2017 · 0 comments

Comments

@rafalborczuch
Copy link
Contributor

@rafalborczuch rafalborczuch commented Sep 30, 2017

  • I've verified and I assure that I'm running youtube-dl 2017.09.24

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://vod.tvp.pl/video/czas-honoru,i-seria-odc-13,194536']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.09.24
[debug] Python version 2.7.10 - Darwin-17.0.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 3.3.2, ffprobe 3.3.2
[debug] Proxy map: {}
[generic] czas-honoru,i-seria-odc-13,194536: Requesting header
WARNING: Falling back on generic information extractor.
[generic] czas-honoru,i-seria-odc-13,194536: Downloading webpage
[generic] czas-honoru,i-seria-odc-13,194536: Extracting information
ERROR: Unsupported URL: https://vod.tvp.pl/video/czas-honoru,i-seria-odc-13,194536
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2125, in _real_extract
    doc = compat_etree_fromstring(webpage.encode('utf-8'))
  File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2539, in compat_etree_fromstring
    doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory)))
  File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2528, in _XML
    parser.feed(text)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
ParseError: not well-formed (invalid token): line 10, column 42
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 777, in extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 434, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2980, in _real_extract
    raise UnsupportedError(url)
UnsupportedError: Unsupported URL: https://vod.tvp.pl/video/czas-honoru,i-seria-odc-13,194536

Extractor for VOD TVP is not working due to video URL format change on the service.

Old URL format: http://vod.tvp.pl/194536/i-seria-odc-13
New URL format: https://vod.tvp.pl/video/czas-honoru,i-seria-odc-13,194536

The fix is a simple URL regex change in extractor/tvp.py.

I have started working on the PR for this issue and will submit it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.