Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBS.com Late Show with Steven Colbert won't download anymore #8893

Closed
hstracker90 opened this issue Mar 18, 2016 · 11 comments
Closed

CBS.com Late Show with Steven Colbert won't download anymore #8893

hstracker90 opened this issue Mar 18, 2016 · 11 comments

Comments

@hstracker90
Copy link

@hstracker90 hstracker90 commented Mar 18, 2016

Hello! I couldn't find an open issue for CBS or "Late Show", so I am asking you to look into this. As always I am really grateful this tool does exist and work so well!

Looks like CBS changed something on their website, I can't download the Late Show anymore.

C:\Users\hstracker90> youtube-dl http://www.cbs.com/shows/the-late-show-with-stephen-colbert/video/U6B47sA7bcL1OliOF9n_i6_llXVZLM6r/the-late-show-3-17-2016-william-h-macy-melissa-rauch-isaac-mizrahi-/ --verbose
[debug] System config: []
[debug] User config: [u'-o', u'C:/Users/hstracker90/Downloads/%(title)s.%(ext)s']
[debug] Command-line args: [u'http://www.cbs.com/shows/the-late-show-with-stephen-colbert/video/U6B47sA7bcL1OliOF9n_i6_llXVZLM6r/the-late-show-3-17-2016-william-h-macy-melissa-rauch-isaac-mizrahi-/', u'--verbose']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2016.03.14
[debug] Python version 2.7.10 - Windows-8-6.2.9200
[debug] exe versions: ffmpeg 2.8.git, ffprobe 2.8.git, rtmpdump 2.4
[debug] Proxy map: {}
[CBS] the-late-show-3-17-2016-william-h-macy-melissa-rauch-isaac-mizrahi-: Downloading webpage
[ThePlatform] of9zYH1kUpJS: Downloading SMIL data
Traceback (most recent call last):
File "main.py", line 19, in
File "youtube_dl__init__.pyo", line 412, in main
File "youtube_dl__init__.pyo", line 402, in _real_main
File "youtube_dl\YoutubeDL.pyo", line 1719, in download
File "youtube_dl\YoutubeDL.pyo", line 679, in extract_info
File "youtube_dl\YoutubeDL.pyo", line 736, in process_ie_result
File "youtube_dl\YoutubeDL.pyo", line 668, in extract_info
File "youtube_dl\extractor\common.pyo", line 320, in extract
File "youtube_dl\extractor\theplatform.pyo", line 234, in _real_extract
File "youtube_dl\extractor\theplatform.pyo", line 34, in _extract_theplatform_smil
File "youtube_dl\extractor\common.pyo", line 501, in _download_xml
File "youtube_dl\compat.pyo", line 248, in compat_etree_fromstring
File "youtube_dl\compat.pyo", line 237, in _XML
File "xml\etree\ElementTree.pyo", line 1642, in feed
File "xml\etree\ElementTree.pyo", line 1506, in _raiseerror
xml.etree.ElementTree.ParseError: syntax error: line 1, column 0

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Mar 18, 2016

It's already fixed(#8892) and it will work in the next version.

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Mar 18, 2016

the problem in this issue has been fixed but another error happen(some videos are served using brightcove once but the http formats doesn't work).
i just pushed another change to check brightcove once http formats.

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Mar 18, 2016

these files are served mainly for mobile devices(i see the same ads in the android app).
it's possible to distinguish between ads and content(unencrypted ads with encrypted content).
however it's better to extract the old manifests.
i think it's caused by a change in the site related to how it respond to requests from mobile devices https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/cbs.py#L56.

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Mar 18, 2016

i have a fix for this but it involve making a lot of requests to extract all formats(rtmp, m3u8 and once formats).

@syncretic
Copy link

@syncretic syncretic commented Mar 18, 2016

The problem is CBS only serves the 720p stream to Android devices. I was able to take this m3u8 url:

http://once.unicornmedia.com/now/master/playlist/bb0b18ba-64f5-4b1b-a29f-0ac252f06b68/77a785f3-5188-4806-b788-0893a61634ed/93677179-2d99-4ef4-9e17-fe70d49abfbf/content.m3u8

and I found the 720p stream:

http://api016-phx.unicornmedia.com/now/media/playlist/bb0b18ba-64f5-4b1b-a29f-0ac252f06b68/77a785f3-5188-4806-b788-0893a61634ed/468fb310-a585-11e4-bfdb-005056837bc7/93677179-2d99-4ef4-9e17-fe70d49abfbf/0/0/2542/content.m3u8?visitguid=18c75a73-5452-4995-b5cf-0c0755432915&segmentlength=10&adsegmentlength=10&protocolversion=3

Like you said the commercials are included. However I just ctrl+f #EXT-X-KEY:METHOD=NONE and delete everything in between that and #EXT-X-DISCONTINUITY - then I saved the m3u8 file to my local hdd and ripped it with ffmpeg. Working great.

@syncretic
Copy link

@syncretic syncretic commented Mar 18, 2016

I spoke too soon, was having a bit of problems so I cleaned it up a bit more.

This is the "clean" m3u8 file I ended up with:

http://pastebin.com/y5btexzB

This time it ripped fine with ffmpeg :)

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Mar 18, 2016

i think there won't be a benefit when we try to extract other streams(for some reason the old m3u8 manifest doesn't work now i get HTTP Error 403: Forbidden).
m3u8, rtmp and once formats(works only if i comment _sort_formats in _extract_smil_formats and _extract_theplatform_smil):

python __main__.py -F http://www.cbs.com/shows/the-late-show-with-stephen-colbert/video/U6B47sA7bcL1OliOF9n_i6_llXVZLM6r/the-late-show-3-17-2016-william-h-macy-melissa-rauch-isaac-mizrahi-/
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading XML
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading yWyT1WdMNeVg SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading vvXeBgoToWTS SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading of9zYH1kUpJS SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Checking video URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Downloading m3u8 information
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-60 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-60 video format URL is invalid, skipping
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-264 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-264 video format URL is invalid, skipping
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-512 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-512 video format URL is invalid, skipping
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-764 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-764 video format URL is invalid, skipping
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-1200 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-1200 video format URL is invalid, skipping
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-2000 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-2000 video format URL is invalid, skipping
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-4400 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-4400 video format URL is invalid, skipping
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading mIP7n8e24NtO SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading aqDAQcRkTszq SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading ajOZpuv9Ukas SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading aSLOFXq1adgR SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading ZpBET8wk7n_X SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading ZMtQokCzzxz8 SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading M4y13KHAvRlq SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 403: Forbidden
[info] Available formats for U6B47sA7bcL1OliOF9n_i6_llXVZLM6r:
format code  extension  resolution note
hls-meta     mp4        multiple   Quality selection URL 
rtmp-1-0     flv        384x216    
rtmp-1-1     flv        640x360    
rtmp-1-2     flv        640x360    
rtmp-1-3     flv        848x480    
hls-60       mp4        120x68       60k , mp4a.40.5
hls-264      mp4        256x144     264k , mp4a.40.5, avc1.42001e
hls-512      mp4        384x216     512k , mp4a.40.5, avc1.42001e
hls-764      mp4        480x270     764k , mp4a.40.2, avc1.42001e
hls-1200     mp4        640x360    1200k , mp4a.40.2, avc1.42001f
hls-2000     mp4        960x540    2000k , mp4a.40.2, avc1.4d001f
hls-4400     mp4        1280x720   4400k , mp4a.40.2, avc1.640028 (best)

f4m and once formats:

python __main__.py -F http://www.cbs.com/shows/the-late-show-with-stephen-colbert/video/U6B47sA7bcL1OliOF9n_i6_llXVZLM6r/the-late-show-3-17-2016-william-h-macy-melissa-rauch-isaac-mizrahi-/
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading XML
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading yWyT1WdMNeVg SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Checking video URL
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: video URL is invalid, skipping
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading vvXeBgoToWTS SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading f4m manifest
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading of9zYH1kUpJS SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Checking video URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Downloading m3u8 information
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-60 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-60 video format URL is invalid, skipping
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-264 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-264 video format URL is invalid, skipping
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-512 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-512 video format URL is invalid, skipping
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-764 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-764 video format URL is invalid, skipping
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-1200 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-1200 video format URL is invalid, skipping
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-2000 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-2000 video format URL is invalid, skipping
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: Checking http-4400 video format URL
[CBS] 93677179-2d99-4ef4-9e17-fe70d49abfbf: http-4400 video format URL is invalid, skipping
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading mIP7n8e24NtO SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading f4m manifest
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading aqDAQcRkTszq SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Checking video URL
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: video URL is invalid, skipping
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading ajOZpuv9Ukas SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading f4m manifest
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading aSLOFXq1adgR SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Checking video URL
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: video URL is invalid, skipping
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading ZpBET8wk7n_X SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading f4m manifest
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading ZMtQokCzzxz8 SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Checking video URL
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: video URL is invalid, skipping
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Downloading M4y13KHAvRlq SMIL data
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: Checking video URL
[CBS] U6B47sA7bcL1OliOF9n_i6_llXVZLM6r: video URL is invalid, skipping
[info] Available formats for U6B47sA7bcL1OliOF9n_i6_llXVZLM6r:
format code  extension  resolution note
hls-meta     mp4        multiple   Quality selection URL 
hls-60       mp4        120x68       60k , mp4a.40.5
hls-264      mp4        256x144     264k , mp4a.40.5, avc1.42001e
hds-382      flv        unknown     382k 
hls-512      mp4        384x216     512k , mp4a.40.5, avc1.42001e
hds-518      flv        unknown     518k 
hls-764      mp4        480x270     764k , mp4a.40.2, avc1.42001e
hds-829      flv        unknown     829k 
hls-1200     mp4        640x360    1200k , mp4a.40.2, avc1.42001f
hls-2000     mp4        960x540    2000k , mp4a.40.2, avc1.4d001f
hds-2011     flv        unknown    2011k 
hls-4400     mp4        1280x720   4400k , mp4a.40.2, avc1.640028 (best)
@syncretic
Copy link

@syncretic syncretic commented Mar 18, 2016

Would it be possible to script youtube-dl so it downloads the m3u8 url, but before ripping, clean it up just like I did above? The commercial segments stick out like a sore thumb when you know what to look for in the m3u8.

The commercials all have #EXT-X-KEY:METHOD=NONE immediately above them. So if you start at the line that says #EXT-X-KEY:METHOD=NONE, delete that line and every subsequent line until you reach the next line that says #EXT-X-DISCONTINUITY (and delete that line as well). The next line after that is always the key file:

#EXT-X-KEY:METHOD=AES-128,URI="http://once.unicornmedia.com/key/77a785f3-5188-4806-b788-0893a61634ed/93677179-2d99-4ef4-9e17-fe70d49abfbf/bb0b18ba-64f5-4b1b-a29f-0ac252f06b68/once.key?umx=cAy9b5515NitZ2XV9RnbyA==",IV=0xba180bbbf5641b4ba29f0ac252f06b68

which can be deleted as well since you only need it once at the very beginning. Below that line it's all show segments like this:

#EXTINF:9.009,
http://api016-phx.unicornmedia.com/now/media/segment/bb0b18ba-64f5-4b1b-a29f-0ac252f06b68/77a785f3-5188-4806-b788-0893a61634ed/468fb310-a585-11e4-bfdb-005056837bc7/93677179-2d99-4ef4-9e17-fe70d49abfbf/0/0/270/content.ts?visitguid=18c75a73-5452-4995-b5cf-0c0755432915&baseguid=93677179-2d99-4ef4-9e17-fe70d49abfbf&streamDuration=3330&umx=cAy9b5515NitZ2XV9RnbyA==&startsegmentseconds=0&endsegmentseconds=10

The number of each segment goes up by 10 seconds, so the second segment is "&endsegmentseconds=20" - the third is "&endsegmentseconds=30" etc. It continues like that until you see another #EXT-X-DISCONTINUITY

Delete that and every line until the next key file (that line also gets deleted just like before). Rinse, lather, repeat until the end. Seems like something that could be done by a script in a fraction of a second.

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Mar 18, 2016

The commercials all have #EXT-X-KEY:METHOD=NONE immediately above them.

not always(they can change it see Content Security -> AES-128 in http://docs.brightcove.com/en/once/guides/once-vod-2-0.html)

which can be deleted as well since you only need it once at the very beginning.

it's possible for the key URI or the IV to change.

the way to detect the real content is by checking the mediaItemId of the segement is the same as the mediaItemId of the m3u8 media playlist.
also i don't think that changing the m3u8 manifest is something to be done in youtube-dl.

@syncretic
Copy link

@syncretic syncretic commented Mar 19, 2016

Well you guys are a lot smarter than I am I look forward to see what solution you come up with :)

In the mean time I'll do it the hard way, lol...

@jimbolaya
Copy link

@jimbolaya jimbolaya commented Mar 22, 2016

My workaround is to use --write-pages, then edit the *.m3u8.dump file and change "adsegmentlength=10" to "adsegmentlength=0".
Since youtube-dl wont let me specify a "file:///" URL, I throw it up on a site I have access to and youtube-dl that without any issues.
I really wouldn't mind the ads, except they screw up the audio/video timing of the resulting video. I just skip over them in my viewer anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
@jimbolaya @syncretic @hstracker90 @remitamine and others
You can’t perform that action at this time.