Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C-SPAN audio-only programs fail #14995

Closed
johnhawkinson opened this issue Dec 15, 2017 · 8 comments
Closed

C-SPAN audio-only programs fail #14995

johnhawkinson opened this issue Dec 15, 2017 · 8 comments

Comments

@johnhawkinson
Copy link
Contributor

@johnhawkinson johnhawkinson commented Dec 15, 2017

  • I've verified and I assure that I'm running youtube-dl 2017.12.14

I don't know why there are audio-only hearings recorded on C-SPAN (there seems to be video of this hearing available elsewhere, at least in snippet form...), but youtube-dl doesn't seem to work:

pb3:Downloads jhawk$ youtube-dl -v https://www.c-span.org/video/?438311-1/judiciary
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://www.c-span.org/video/?438311-1/judiciary']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.12.14
[debug] Python version 2.7.10 - Darwin-14.5.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg git-2017-02-28-7f62368, ffprobe git-2017-02-28-7f62368, rtmpdump 2.4
[debug] Proxy map: {}
[CSpan] 438311: Downloading webpage
ERROR: unable to find video id and type; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 784, in extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 437, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/cspan.py", line 115, in _real_extract
    raise ExtractorError('unable to find video id and type')
ExtractorError: unable to find video id and type; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

There is a

                                var jwsetup = {
                                        file: 'https://media.c-spanvideo.org/trimmed/program/049/492999/program.492999.AAC-STD.aac?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9tZWRpYS5jLXNwYW52aWRlby5vcmcvdHJpbW1lZC9wcm9ncmFtLzA0OS80OTI5OTkvcHJvZ3JhbS40OTI5OTkuQUFDLVNURC5hYWMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE1MTMzMjUwNDV9LCJJUEFkZHJlc3MiOnsiQVdTOlNvdXJjZUlwIjoiNzEuMjMyLjE5Ljc0In19fV19&Signature=S1TVHl2jeTxiir05RZRpnGDLaKq9yN6ih-oqr2FbxWzpFop9~oNJvJnhD9SUIZTObjYkXugKAdpr8-kldWIy-vE9kq5FKHAr~2ZcMO-T2~Nn3qN6-mW4PZ5tq0vhrD3LHdUEX4c9qk84mIpwhPaDF4AUgdV1QyULazTRHrDDxNQ_&Key-Pair-Id=APKAIHKVWBEAXX562G7Q',

and that URL downloads just fine.

@johnhawkinson
Copy link
Contributor Author

@johnhawkinson johnhawkinson commented Dec 17, 2017

Bah humbug. By the time I got to looking at this, the web page has changed and now has no audio link either... And the https://media.c-spanvideo.org/trimmed/program/ link fails now.

I think I don't understand how c-span manages their content at a fundamental level...

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Dec 17, 2017

And the https://media.c-spanvideo.org/trimmed/program/ link fails now.

because the url has exprired.
from the Amazon CloudFront Policy in the audio url:

{
    "Statement": [{
        "Resource": "https://media.c-spanvideo.org/trimmed/program/049/492999/program.492999.AAC-STD.aac",
        "Condition": {
            "DateLessThan": {
                "AWS:EpochTime": 1513325045
            },
            "IPAddress": {
                "AWS:SourceIp": "CLIENT_IP"
            }
        }
    }]
}
@johnhawkinson
Copy link
Contributor Author

@johnhawkinson johnhawkinson commented Dec 17, 2017

I need to find another example, but looking at my dump file (from --write-pages) from last week, the C-SPAN web page had:

<script type='text/javascript'>
  var jwsetup = {
  file: 'https://media.c-spanvideo.org/trimmed/program/049/492999/program.492999.AAC-STD.aac?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9tZWRpYS5jLXNwYW52aWRlby5vcmcvdHJpbW1lZC9wcm9ncmFtLzA0OS80OTI5OTkvcHJvZ3JhbS40OTI5OTkuQUFDLVNURC5hYWMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE1MTMzMjUwNDV9LCJJUEFkZHJlc3MiOnsiQVdTOlNvdXJjZUlwIjoiNzEuMjMyLjE5Ljc0In19fV19&Signature=S1TVHl2jeTxiir05RZRpnGDLaKq9yN6ih-oqr2FbxWzpFop9~oNJvJnhD9SUIZTObjYkXugKAdpr8-kldWIy-vE9kq5FKHAr~2ZcMO-T2~Nn3qN6-mW4PZ5tq0vhrD3LHdUEX4c9qk84mIpwhPaDF4AUgdV1QyULazTRHrDDxNQ_&Key-Pair-Id=APKAIHKVWBEAXX562G7Q',
  tracks: [{
    file: '//www.c-span.org/fragments/programTimeline.php?progid=492999',
    kind: 'thumbnails'
  },{
    file: '//www.c-span.org/fragments/convertCap.php?progid=492999',
    kind: 'captions',
    label: 'CC'
  }],
  image: 'https://images.c-span.org/defaults/capitol.jpg/Thumbs/height.576.no_border.width.1024.jpg'
  };
  
  $(document).ready(function() {
  $.cookie('cspanvl_testcookie', 'true', {path: '/'});
  var cookieEnabled = ($.cookie('cspanvl_testcookie') !== undefined) ? true : false;
  if (cookieEnabled == false) {
  var cookieMsg = "<div class='unsupported'>"+
    "<div class='message'>Your browser has cookies disabled. You must have all cookies enabled for video playback to work.</div>";
    $('div#video-embed').html(cookieMsg);
    return;
    }
    });
</script>

And that fails to be matched by _find_jwplayer_data():
https://github.com/rg3/youtube-dl/blob/549bb416f5a9d15c03749a98abd582d0e40418ac/youtube_dl/extractor/common.py#L2299-L2302

because that is expecting to see something like:

<script>
  jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB",
  ...

(I had to chase back quite aways to find that test case added in a4a554a; I wish there was a better way to find out what the regexp was looking for, or at least a practice of providing an example of the regexp target in a comment above.)

In the C-SPAN case, it doesn't directly call jwplayer() from a script tag in the HTML. Instead, it has

<script type='text/javascript' src='//www.c-span.org/assets/javascript/libs/jwplayer/8.0.4/jwplayer.js'></script>
<script type='text/javascript' src='//static.c-span.org/assets/javascript/script-video-ck.1513282520.js'></script>
<script type='text/javascript' src='//imasdk.googleapis.com/js/sdkloader/ima3.js'></script>

and script-video-ck, which is minified, eventually calls:

    var d = jwplayer("video-embed");
    d.setup(jwsetup);

So I imagine this too complicated for the common.py JWPlayer support to handle, so cspan.py needs to search out the var jwsetup and call _find_jwplayer_data() _parse_jwplayer_data() on its own?


@remitamine:

because the url has exprired.
from the Amazon CloudFront Policy in the audio url:

Yeah, I get that. What I'm puzzled by is why the audio link is gone from the c-span page, not why the audio URL itself is invalid.

@johnhawkinson
Copy link
Contributor Author

@johnhawkinson johnhawkinson commented Dec 17, 2017

I need to find another example, but looking at my dump file (from --write-pages) from last week, the C-SPAN web page had:

Here is another example of an audio-only C-SPAN recording which has not disappeared: https://www.c-span.org/video/?438495-1/judiciary

Same jwsetup js pattern and script-video-ck usage.

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Dec 17, 2017

this will work in the url that you have posted, however it would be better to test on more urls.

diff --git a/youtube_dl/extractor/cspan.py b/youtube_dl/extractor/cspan.py
index 171820e27..1a6426c4f 100644
--- a/youtube_dl/extractor/cspan.py
+++ b/youtube_dl/extractor/cspan.py
@@ -111,6 +111,11 @@ class CSpanIE(InfoExtractor):
                     title = self._og_search_title(webpage)
                     surl = smuggle_url(senate_isvp_url, {'force_title': title})
                     return self.url_result(surl, 'SenateISVP', video_id, title)
+                video_id = self._search_regex(
+                    r'jwsetup\.clipprog\s*=\s*(\d+);',
+                    webpage, 'jwsetup program id', default=None)
+                if video_id:
+                    video_type = 'program'
         if video_type is None or video_id is None:
             raise ExtractorError('unable to find video id and type')
 
@@ -138,7 +143,7 @@ class CSpanIE(InfoExtractor):
         entries = []
         for partnum, f in enumerate(files):
             formats = []
-            for quality in f['qualities']:
+            for quality in f.get('qualities', []):
                 formats.append({
                     'format_id': '%s-%sp' % (get_text_attr(quality, 'bitrate'), get_text_attr(quality, 'height')),
                     'url': unescapeHTML(get_text_attr(quality, 'file')),
@johnhawkinson
Copy link
Contributor Author

@johnhawkinson johnhawkinson commented Dec 17, 2017

Thanks. I wouldn't have done it that way, but your way gets the C-SPAN metadata so I can see why you chose it.

It also ought to have worked, based on inspection of my saved data, on the original URL.

however it would be better to test on more urls.

Here's are some more test cases:

https://www.c-span.org/video/?437336-1/judiciary-antitrust-competition-policy-consumer-rights
https://www.c-span.org/video/?438047-1/judiciary

I got them from
https://www.c-span.org/search/?sdate=&edate=&audio=1&searchtype=Videos&sort=Most+Recent+Airing&text=0&sponsorid%5B%5D=61188

and picking out the ones marked "AUDIO STREAM ONLY" (except for 438311, the original, which is still listed there but broken).

So looks good, except for a youtube-dl test?

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Dec 17, 2017

So looks good, except for a youtube-dl test?

added a test with only_matching as most of the pages show this error(most likely that a normal test will expire soon):

Video not available at this time
@johnhawkinson
Copy link
Contributor Author

@johnhawkinson johnhawkinson commented Dec 17, 2017

Huh. The older pages do all seem to show that.

I think I don't understand how c-span manages their content at a fundamental level...

Yeah that.

p.s.: seems like a test with skip would make more sense than only_matching but whatever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.