Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suffolk.edu/sjc: Massachusetts Supreme Judicial Court livestream jwplayer in Generic #11993

Closed
johnhawkinson opened this issue Feb 6, 2017 · 4 comments

Comments

@johnhawkinson
Copy link
Contributor

@johnhawkinson johnhawkinson commented Feb 6, 2017

This is a bit challenging to test and debug because the website is different during the occasional intermittent livestream, and versus how it later appears when the videos are archived. I could just use some API guidance.

I don't quite understand how the jwplayer checking code in extractor/generic.py is supposed to work. It properly identifies an rtmp:// URL associated with this video, but then throws it away because it does not pass YoutubeIE.suitable(vurl). I'm not sure why the Youtube InfoExtractor is involved with rtmp. And passing the raw rtmp URL to the commandline works just fine.


  • I've verified and I assure that I'm running youtube-dl 2017.02.04.1
  • At least skimmed through README and most notably FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones
  • Site support request (request for adding support for a new site)

After adding a quick patch for some diagnostics in filter_video():

--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -2485,6 +2485,8 @@ class GenericIE(InfoExtractor):
             return self.playlist_result(entries)
 
         def check_video(vurl):
+            # debug
+            self.to_screen('Checking vurl %s' % vurl)
             if YoutubeIE.suitable(vurl):
                 return True
             vpath = compat_urlparse.urlparse(vurl).path

Then I get this this output:

PYTHONPATH=~/src/youtube-dl python -m youtube_dl -v 'http://www.suffolk.edu/sjc/'
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'http://www.suffolk.edu/sjc/']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.02.04.1
[debug] Git HEAD: 2aec725
[debug] Python version 2.7.10 - Darwin-14.5.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 3.1.5, ffprobe 3.1.5, rtmpdump 2.4
[debug] Proxy map: {}
[generic] sjc: Requesting header
WARNING: Falling back on generic information extractor.
[generic] sjc: Downloading webpage
[generic] sjc: Extracting information
[generic] Checking vurl rtmp://192.138.214.154/live/sjclive
ERROR: Unsupported URL: http://www.suffolk.edu/sjc/
Traceback (most recent call last):
  File "/Users/jhawk/src/youtube-dl/youtube_dl/extractor/generic.py", line 1727, in _real_extract
    doc = compat_etree_fromstring(webpage.encode('utf-8'))
  File "/Users/jhawk/src/youtube-dl/youtube_dl/compat.py", line 2526, in compat_etree_fromstring
    doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory)))
  File "/Users/jhawk/src/youtube-dl/youtube_dl/compat.py", line 2515, in _XML
    parser.feed(text)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
ParseError: undefined entity: line 29, column 66
Traceback (most recent call last):
  File "/Users/jhawk/src/youtube-dl/youtube_dl/YoutubeDL.py", line 696, in extract_info
    ie_result = ie.extract(url)
  File "/Users/jhawk/src/youtube-dl/youtube_dl/extractor/common.py", line 369, in extract
    return self._real_extract(url)
  File "/Users/jhawk/src/youtube-dl/youtube_dl/extractor/generic.py", line 2575, in _real_extract
    raise UnsupportedError(url)
UnsupportedError: Unsupported URL: http://www.suffolk.edu/sjc/

Note it properly finds rtmp://192.138.214.154/live/sjclive. And feeding that to youtube-dl a second time works just fine:

pb3:extractor jhawk$ PYTHONPATH=~/src/youtube-dl python -m youtube_dl -v 'rtmp://192.138.214.154/live/sjclive'
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'rtmp://192.138.214.154/live/sjclive']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.02.04.1
[debug] Git HEAD: 2aec725
[debug] Python version 2.7.10 - Darwin-14.5.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 3.1.5, ffprobe 3.1.5, rtmpdump 2.4
[debug] Proxy map: {}
[debug] Invoking downloader on u'rtmp://192.138.214.154/live/sjclive'
[download] Destination: sjclive-sjclive.flv
[debug] rtmpdump command line: rtmpdump --verbose -r rtmp://192.138.214.154/live/sjclive -o sjclive-sjclive.flv.part --resume --skip 1
[rtmpdump] RTMPDump v2.4
[rtmpdump] (c) 2010 Andrej Stepanchuk, Howard Chu, The Flvstreamer Team; license: GPL
[rtmpdump] DEBUG: Parsing...
[rtmpdump] DEBUG: Parsed protocol: 0
...

So, I think there is probably a trivial fix here, but I just don't understand how this is supposed to work, and why YoutubeIE is involved in jwplayer, etc.

Here's the HTML from the site at this time:

<!--COMMENT THIS DIV PLACEHOLDER BELOW AND REMOVE COMMENT FROM EMBEDDED JWPLAYER CODE BEFORE SEPT 
8TH LIVE FEED-->
<!--<div style="width:500px; height:275px; background-color:black; text-align:center; vertical-ali
gn:middle; font-size:26px; font-weight:bold; color:white; font-family:Verdana, Geneva, sans-serif"
><div style="padding-top:90px">Next Live Webcast<br>Nov. 7th at 9:00am</div></div>
-->

<div id='my-video'></div>
<script type='text/javascript'>
    jwplayer('my-video').setup({
        file: 'rtmp://192.138.214.154/live/sjclive',
        fallback: 'true',
        width: '95%',
          aspectratio: '16:9',
          primary: 'flash',
          mediaid:'XEgvuql4'
    });
</script>

Thanks!


p.s.: It would be great to be able to get that kind of diagnostic information I got from my patch by adding a command-line debug or verbose flag, and it wasn't necessary for me use the python debugger or source-code modifications to see what is going on in this kind of case. I'm not sure what kind of diagnostic patches you're willing to accept, though?

@dstftw dstftw closed this in b7a8c1b Feb 6, 2017
@johnhawkinson
Copy link
Contributor Author

@johnhawkinson johnhawkinson commented Feb 6, 2017

Great, that fixes it. What about the postscript (p.s.) regarding diagnostics?

dstftw added a commit that referenced this issue Feb 6, 2017
@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Feb 6, 2017

You can post your (new) ideas to #10894. Actually the logging facility in youtube-dl is not pythonic at all ;-) It's more C-like.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Feb 6, 2017

There are lots of such places. I don't see much point to put debug output everywhere.

@johnhawkinson
Copy link
Contributor Author

@johnhawkinson johnhawkinson commented Feb 6, 2017

Thanks for the superfast response, by the way!

Ha ha, @yan12125, you're funny. But perhaps I should go ahead with that. Although it sounds like there is not a single mind about all this. (Also, it'd be nice if you guys could tell me what to do on IQM2...)

@dstftw: I'm happy to have it anywhere. While doing it comprehensively may be hard, doing it little by little can really help incrementally improve. Every little bit helps.

Also dumb question on your b7a8c1b: is that really the appropriate fix? It doesn't seem right that check_video() should have a list of common format infoextractors and just run them. Surely there are more possible extractors in this case than just YoutubeIE and RtmpIE? It feels like there's some abstraction missing there, but maybe I'm just wrong? Clearly I don't understand it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.