Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
C-SPAN audio-only programs fail #14995
Comments
|
Bah humbug. By the time I got to looking at this, the web page has changed and now has no audio link either... And the I think I don't understand how c-span manages their content at a fundamental level... |
because the url has exprired.
|
|
I need to find another example, but looking at my dump file (from <script type='text/javascript'>
var jwsetup = {
file: 'https://media.c-spanvideo.org/trimmed/program/049/492999/program.492999.AAC-STD.aac?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9tZWRpYS5jLXNwYW52aWRlby5vcmcvdHJpbW1lZC9wcm9ncmFtLzA0OS80OTI5OTkvcHJvZ3JhbS40OTI5OTkuQUFDLVNURC5hYWMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE1MTMzMjUwNDV9LCJJUEFkZHJlc3MiOnsiQVdTOlNvdXJjZUlwIjoiNzEuMjMyLjE5Ljc0In19fV19&Signature=S1TVHl2jeTxiir05RZRpnGDLaKq9yN6ih-oqr2FbxWzpFop9~oNJvJnhD9SUIZTObjYkXugKAdpr8-kldWIy-vE9kq5FKHAr~2ZcMO-T2~Nn3qN6-mW4PZ5tq0vhrD3LHdUEX4c9qk84mIpwhPaDF4AUgdV1QyULazTRHrDDxNQ_&Key-Pair-Id=APKAIHKVWBEAXX562G7Q',
tracks: [{
file: '//www.c-span.org/fragments/programTimeline.php?progid=492999',
kind: 'thumbnails'
},{
file: '//www.c-span.org/fragments/convertCap.php?progid=492999',
kind: 'captions',
label: 'CC'
}],
image: 'https://images.c-span.org/defaults/capitol.jpg/Thumbs/height.576.no_border.width.1024.jpg'
};
$(document).ready(function() {
$.cookie('cspanvl_testcookie', 'true', {path: '/'});
var cookieEnabled = ($.cookie('cspanvl_testcookie') !== undefined) ? true : false;
if (cookieEnabled == false) {
var cookieMsg = "<div class='unsupported'>"+
"<div class='message'>Your browser has cookies disabled. You must have all cookies enabled for video playback to work.</div>";
$('div#video-embed').html(cookieMsg);
return;
}
});
</script>And that fails to be matched by because that is expecting to see something like: <script>
jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB",
...(I had to chase back quite aways to find that test case added in a4a554a; I wish there was a better way to find out what the regexp was looking for, or at least a practice of providing an example of the regexp target in a comment above.) In the C-SPAN case, it doesn't directly call <script type='text/javascript' src='//www.c-span.org/assets/javascript/libs/jwplayer/8.0.4/jwplayer.js'></script>
<script type='text/javascript' src='//static.c-span.org/assets/javascript/script-video-ck.1513282520.js'></script>
<script type='text/javascript' src='//imasdk.googleapis.com/js/sdkloader/ima3.js'></script>and var d = jwplayer("video-embed");
d.setup(jwsetup);So I imagine this too complicated for the
Yeah, I get that. What I'm puzzled by is why the audio link is gone from the c-span page, not why the audio URL itself is invalid. |
Here is another example of an audio-only C-SPAN recording which has not disappeared: https://www.c-span.org/video/?438495-1/judiciary Same |
|
this will work in the url that you have posted, however it would be better to test on more urls. diff --git a/youtube_dl/extractor/cspan.py b/youtube_dl/extractor/cspan.py
index 171820e27..1a6426c4f 100644
--- a/youtube_dl/extractor/cspan.py
+++ b/youtube_dl/extractor/cspan.py
@@ -111,6 +111,11 @@ class CSpanIE(InfoExtractor):
title = self._og_search_title(webpage)
surl = smuggle_url(senate_isvp_url, {'force_title': title})
return self.url_result(surl, 'SenateISVP', video_id, title)
+ video_id = self._search_regex(
+ r'jwsetup\.clipprog\s*=\s*(\d+);',
+ webpage, 'jwsetup program id', default=None)
+ if video_id:
+ video_type = 'program'
if video_type is None or video_id is None:
raise ExtractorError('unable to find video id and type')
@@ -138,7 +143,7 @@ class CSpanIE(InfoExtractor):
entries = []
for partnum, f in enumerate(files):
formats = []
- for quality in f['qualities']:
+ for quality in f.get('qualities', []):
formats.append({
'format_id': '%s-%sp' % (get_text_attr(quality, 'bitrate'), get_text_attr(quality, 'height')),
'url': unescapeHTML(get_text_attr(quality, 'file')), |
|
Thanks. I wouldn't have done it that way, but your way gets the C-SPAN metadata so I can see why you chose it. It also ought to have worked, based on inspection of my saved data, on the original URL.
Here's are some more test cases: https://www.c-span.org/video/?437336-1/judiciary-antitrust-competition-policy-consumer-rights I got them from and picking out the ones marked "AUDIO STREAM ONLY" (except for 438311, the original, which is still listed there but broken). So looks good, except for a youtube-dl test? |
added a test with
|
|
Huh. The older pages do all seem to show that.
Yeah that. p.s.: seems like a test with |
I don't know why there are audio-only hearings recorded on C-SPAN (there seems to be video of this hearing available elsewhere, at least in snippet form...), but youtube-dl doesn't seem to work:
There is a
and that URL downloads just fine.