Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[parliamentlive.tv] Not work anymore #9137

Closed
ghost opened this issue Apr 10, 2016 · 5 comments
Closed

[parliamentlive.tv] Not work anymore #9137

ghost opened this issue Apr 10, 2016 · 5 comments

Comments

@ghost
Copy link

@ghost ghost commented Apr 10, 2016

@dstftw dstftw mentioned this issue May 15, 2016
4 of 8 tasks complete
@pljones
Copy link

@pljones pljones commented Sep 8, 2016

Checking I've got parliamentlive.tv included:

$ youtube-dl --list-extractors | grep parliamentlive.tv
parliamentlive.tv

Trying it out:

$ youtube-dl --verbose http://parliamentlive.tv/Event/Index/cb2f33f6-f9fe-463e-a6d5-40eca4b614c0
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--verbose', u'http://parliamentlive.tv/Event/Index/cb2f33f6-f9fe-463e-a6d5-40eca4b614c0']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.02.22
[debug] Python version 2.7.12 - Linux-4.4.0-36-generic-x86_64-with-Ubuntu-16.04-xenial
[debug] exe versions: ffmpeg 2.8.6-1ubuntu2, ffprobe 2.8.6-1ubuntu2, rtmpdump 2.4
[debug] Proxy map: {}
[generic] cb2f33f6-f9fe-463e-a6d5-40eca4b614c0: Requesting header
WARNING: Falling back on generic information extractor.
[generic] cb2f33f6-f9fe-463e-a6d5-40eca4b614c0: Downloading webpage
[generic] cb2f33f6-f9fe-463e-a6d5-40eca4b614c0: Extracting information
ERROR: Unsupported URL: http://parliamentlive.tv/Event/Index/cb2f33f6-f9fe-463e-a6d5-40eca4b614c0
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/generic.py", line 1308, in _real_extract
    doc = compat_etree_fromstring(webpage.encode('utf-8'))
  File "/usr/lib/python2.7/dist-packages/youtube_dl/compat.py", line 248, in compat_etree_fromstring
    doc = _XML(text, parser=etree.XMLParser(target=etree.TreeBuilder(element_factory=_element_factory)))
  File "/usr/lib/python2.7/dist-packages/youtube_dl/compat.py", line 237, in _XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1653, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1517, in _raiseerror
    raise err
ParseError: mismatched tag: line 45, column 2
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/youtube_dl/YoutubeDL.py", line 666, in extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/common.py", line 316, in extract
    return self._real_extract(url)
  File "/usr/lib/python2.7/dist-packages/youtube_dl/extractor/generic.py", line 1950, in _real_extract
    raise UnsupportedError(url)
UnsupportedError: Unsupported URL: http://parliamentlive.tv/Event/Index/cb2f33f6-f9fe-463e-a6d5-40eca4b614c0

Finding the source:

$ grep -r parliamentlive.tv
youtube_dl/extractor/parliamentliveuk.py:    IE_NAME = 'parliamentlive.tv'
youtube_dl/extractor/parliamentliveuk.py:        'url': 'http://www.parliamentlive.tv/Main/Player.aspx?meetingId=15121&player=windowsmedia',
docs/supportedsites.md: - **parliamentlive.tv**: UK parliament videos

Finding the valid url pattern:

  8 class ParliamentLiveUKIE(InfoExtractor):
...
 11     _VALID_URL = r'https?://www\.parliamentlive\.tv/Main/Player\.aspx\?(?:[^&]+&)*?meetingId=(?P<id>[0-9]+)'
...
"youtube_dl/extractor/parliamentliveuk.py"
@pljones
Copy link

@pljones pljones commented Sep 8, 2016

Parsing the URL, at least...

r$ diff -du parliamentliveuk.py /usr/lib/python2.7/dist-packages/youtube_dl/extractor/parliamentliveuk.py
--- parliamentliveuk.py 2016-09-08 18:52:53.369378002 +0100
+++ /usr/lib/python2.7/dist-packages/youtube_dl/extractor/parliamentliveuk.py   2016-09-08 19:48:49.818258986 +0100
@@ -8,12 +8,15 @@
 class ParliamentLiveUKIE(InfoExtractor):
     IE_NAME = 'parliamentlive.tv'
     IE_DESC = 'UK parliament videos'
-    _VALID_URL = r'https?://www\.parliamentlive\.tv/Main/Player\.aspx\?(?:[^&]+&)*?meetingId=(?P<id>[0-9]+)'
+#    _VALID_URL = r'https?://www\.parliamentlive\.tv/Main/Player\.aspx\?(?:[^&]+&)*?meetingId=(?P<id>[0-9]+)'
+    _VALID_URL = r'.*parliamentlive\.tv/(Main/Player\.aspx\?(?:[^&]+&)*?meetingId=(?P<meeting>[0-9]+)|Event/Index/(?P<event>[0-9a-f]{6,6}-*))'

     _TEST = {
-        'url': 'http://www.parliamentlive.tv/Main/Player.aspx?meetingId=15121&player=windowsmedia',
+#        'url': 'http://www.parliamentlive.tv/Main/Player.aspx?meetingId=15121&player=windowsmedia',
+        'url': 'http://parliamentlive.tv/Event/Index/cb2f33f6-f9fe-463e-a6d5-40eca4b614c0',
         'info_dict': {
-            'id': '15121',
+#            'id': '15121',
+            'id': 'cb2f33f6-f9fe-463e-a6d5-40eca4b614c0',
             'ext': 'asf',
             'title': 'hoc home affairs committee, 18 mar 2014.pm',
             'description': 'md5:033b3acdf83304cd43946b2d5e5798d1',
@@ -23,9 +26,7 @@
         }
     }

-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+    def _extract_meeting(self, url, video_id):
         webpage = self._download_webpage(url, video_id)

         asx_url = self._html_search_regex(
@@ -51,3 +52,22 @@
             'title': title,
             'description': description,
         }
+
+    def _extract_event(self, url, video_id):
+        webpage = self._download_webpage(url, video_id)
+        return {
+            'id': '',
+            'ext': '',
+            'url': '',
+            'title': '',
+            'description': '',
+        }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('meeting')
+        if video_id:
+            return self._extract_meeting(url, video_id)
+        else:
+            video_id =  mobj.group('event')
+            return self._extract_event(url, video_id)

From that point on, I'm stuck -- I can trail through various expanded URL but they're all leading me to places I don't want to unearth further right now...
(things like this
http://vodplayer.parliamentlive.tv?mid=CB2F33F6-F9FE-463E-A6D5-40ECA4B614C0&amp;rsp=1473088500&amp;msp=1473089400&amp;audioOnly=False&amp;autoplay=False&amp;dln=20160905_westminster_hall&amp;thumbOverride=False&amp;allowCookies=True
...)

@remitamine remitamine closed this in 4614ad7 Sep 8, 2016
@pljones
Copy link

@pljones pljones commented Sep 8, 2016

Open source at its best - thanks! I'll give it a run tomorrow.

@pljones
Copy link

@pljones pljones commented Sep 9, 2016

Just to confirm everything seems to be fine with this. :)

$ youtube-dl --verbose http://parliamentlive.tv/Event/Index/cb2f33f6-f9fe-463e-a6d5-40eca4b614c0
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--verbose', u'http://parliamentlive.tv/Event/Index/cb2f33f6-f9fe-463e-a6d5-40eca4b614c0']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2016.02.22
[debug] Python version 2.7.12 - Linux-4.4.0-36-generic-x86_64-with-Ubuntu-16.04-xenial
[debug] exe versions: ffmpeg 2.8.6-1ubuntu2, ffprobe 2.8.6-1ubuntu2, rtmpdump 2.4
[debug] Proxy map: {}
[parliamentlive.tv] cb2f33f6-f9fe-463e-a6d5-40eca4b614c0: Downloading webpage
[parliamentlive.tv] cb2f33f6-f9fe-463e-a6d5-40eca4b614c0: Downloading JSON metadata
[Kaltura] 1_m1s2pbew: Downloading Kaltura signature
[Kaltura] 1_m1s2pbew: Downloading video info JSON
[Kaltura] 1_m1s2pbew: Downloading m3u8 information
[Kaltura] 1_m1s2pbew: Checking mp4-800 video format URL
[Kaltura] 1_m1s2pbew: Checking mp4-48 video format URL
[Kaltura] 1_m1s2pbew: mp4-48 video format URL is invalid, skipping
[Kaltura] 1_m1s2pbew: Checking mp4-1200 video format URL
[Kaltura] 1_m1s2pbew: Checking mp4-250 video format URL
[Kaltura] 1_m1s2pbew: Checking hls-meta video format URL
[Kaltura] 1_m1s2pbew: Checking hls-48 video format URL
[Kaltura] 1_m1s2pbew: Checking hls-210 video format URL
[Kaltura] 1_m1s2pbew: Checking hls-680 video format URL
[Kaltura] 1_m1s2pbew: Checking hls-1143 video format URL
[debug] Invoking downloader on u'http://cdnapi.kaltura.com/p/1756741/sp/175674100/playManifest/entryId/1_m1s2pbew/format/url/protocol/http/flavorId/1_otei3lay'
[download] Destination: Westminster Hall-cb2f33f6-f9fe-463e-a6d5-40eca4b614c0.mp4
[download] 100% of 1.57GiB in 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.