Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cbsnews.com site changed / video extraction no longer working... defaulting to generic extractor instead. #15397

Closed
wolferikg opened this issue Jan 23, 2018 · 10 comments

Comments

@wolferikg
Copy link

@wolferikg wolferikg commented Jan 23, 2018

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like this: [x])
  • Use the Preview tab to see what your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2018.01.21. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2018.01.21

bash-3.2$ youtube-dl --version
2018.01.21

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue


Download of CBS Evening News falls back to generic extractor and ends up grabbing the CBSN live stream instead:

$ youtube-dl https://www.cbsnews.com/video/122-cbs-evening-news/ -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'https://www.cbsnews.com/video/122-cbs-evening-news/', u'-v']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2018.01.21
[debug] Python version 2.7.9 (CPython) - Linux-3.16.0-4-amd64-x86_64-with-debian-8.9
[debug] exe versions: ffmpeg 2.6.9, ffprobe 2.6.9, rtmpdump 2.4
[debug] Proxy map: {}
[generic] 122-cbs-evening-news: Requesting header
WARNING: Falling back on generic information extractor.
[generic] 122-cbs-evening-news: Downloading webpage
[generic] 122-cbs-evening-news: Extracting information
[generic] 122-cbs-evening-news: Downloading m3u8 information
[download] Downloading playlist: 1/22: CBS Evening News
[generic] playlist 1/22: CBS Evening News: Collected 1 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
[debug] Default format spec: bestvideo+bestaudio/best
[debug] Invoking downloader on u'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/master_360.m3u8'
[download] Destination: 1_22 - CBS Evening News-122-cbs-evening-news.mp4
[debug] ffmpeg command line: ffmpeg -y -loglevel verbose -headers 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,;q=0.7
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,
/*;q=0.8
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)
' -i 'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/master_360.m3u8' -c copy -f mp4 '-bsf:a' aac_adtstoasc 'file:1_22 - CBS Evening News-122-cbs-evening-news.mp4.part'
ffmpeg version 2.6.9 Copyright (c) 2000-2016 the FFmpeg developers
built with gcc 4.9.2 (Debian 4.9.2-10)
configuration: --prefix=/usr --extra-cflags='-g -O2 -fstack-protector-strong -Wformat -Werror=format-security ' --extra-ldflags='-Wl,-z,relro' --cc='ccache cc' --enable-shared --enable-libmp3lame --enable-gpl --enable-nonfree --enable-libvorbis --enable-pthreads --enable-libfaac --enable-libxvid --enable-postproc --enable-x11grab --enable-libgsm --enable-libtheora --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libx264 --enable-libspeex --enable-nonfree --disable-stripping --enable-libvpx --enable-libschroedinger --disable-encoder=libschroedinger --enable-version3 --enable-libopenjpeg --enable-librtmp --enable-avfilter --enable-libfreetype --enable-libvo-aacenc --disable-decoder=amrnb --enable-libvo-amrwbenc --enable-libaacplus --libdir=/usr/lib/x86_64-linux-gnu --disable-vda --enable-libbluray --enable-libcdio --enable-gnutls --enable-frei0r --enable-openssl --enable-libass --enable-libopus --enable-fontconfig --enable-libpulse --disable-mips32r2 --disable-mipsdspr1 --disable-mipsdspr2 --enable-libvidstab --enable-libzvbi --enable-avresample --disable-htmlpages --disable-podpages --enable-libutvideo --enable-libfdk-aac --enable-libx265 --enable-libiec61883 --enable-vaapi --enable-libdc1394 --disable-altivec --shlibdir=/usr/lib/x86_64-linux-gnu
libavutil 54. 20.100 / 54. 20.100
libavcodec 56. 26.100 / 56. 26.100
libavformat 56. 25.101 / 56. 25.101
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 11.102 / 5. 11.102
libavresample 2. 1. 0 / 2. 1. 0
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 1.100 / 1. 1.100
libpostproc 53. 3.100 / 53. 3.100
[hls,applehttp @ 0x1f277e0] HLS request for url 'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/20180123T032018/master_360/00004/master_360_01525.ts', offset 0, playlist 0
[mpegts @ 0x1f2e160] parser not found for codec none, packets or times may be invalid.
[mpegts @ 0x1f2e160] parser not found for codec timed_id3, packets or times may be invalid.
[h264 @ 0x23d28c0] Current profile doesn't provide more RBSP data in PPS, skipping
Last message repeated 2 times
[mpegts @ 0x1f2e160] max_analyze_duration 5000000 reached at 5005000 microseconds
[mpegts @ 0x1f2e160] Could not find codec parameters for stream 2 (Unknown: none ([134][0][0][0] / 0x0086)): unknown codec
Consider increasing the value for the 'analyzeduration' and 'probesize' options
[hls,applehttp @ 0x1f277e0] max_analyze_duration 5000000 reached at 5005000 microseconds
[hls,applehttp @ 0x1f277e0] Could not find codec parameters for stream 2 (Unknown: none ([134][0][0][0] / 0x0086)): unknown codec
Consider increasing the value for the 'analyzeduration' and 'probesize' options
Input #0, hls,applehttp, from 'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/master_360.m3u8':
Duration: N/A, start: 56320.312000, bitrate: N/A
Program 0
Metadata:
variant_bitrate : 0
Stream #0:0: Video: h264 (Constrained Baseline) ([27][0][0][0] / 0x001B), yuv420p, 640x360 (640x368) [SAR 1:1 DAR 16:9], 29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc
Stream #0:1: Audio: aac (LC) ([15][0][0][0] / 0x000F), 32000 Hz, stereo, fltp, 57 kb/s
Stream #0:2: Unknown: none ([134][0][0][0] / 0x0086)
Stream #0:3: Data: timed_id3 (ID3 / 0x20334449)
Output #0, mp4, to 'file:1_22 - CBS Evening News-122-cbs-evening-news.mp4.part':
Metadata:
encoder : Lavf56.25.101
Stream #0:0: Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 640x360 (0x0) [SAR 1:1 DAR 16:9], q=2-31, 29.97 fps, 29.97 tbr, 90k tbn, 90k tbc
Stream #0:1: Audio: aac ([64][0][0][0] / 0x0040), 32000 Hz, stereo, 57 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
[hls,applehttp @ 0x1f277e0] HLS request for url 'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/20180123T032018/master_360/00004/master_360_01526.ts', offset 0, playlist 0
[NULL @ 0x23d28c0] Current profile doesn't provide more RBSP data in PPS, skipping
Last message repeated 2 times
[hls,applehttp @ 0x1f277e0] HLS request for url 'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/20180123T032018/master_360/00004/master_360_01527.ts', offset 0, playlist 0
[NULL @ 0x23d28c0] Current profile doesn't provide more RBSP data in PPS, skipping
^C Last message repeated 2 times
[hls,applehttp @ 0x1f277e0] HLS request for url 'https://cbsnhls-i.akamaihd.net/hls/live/264710-b/cbsn_hlsprod_2/20180123T032018/master_360/00004/master_360_01528.ts', offset 0, playlist 0
^C
ERROR: Interrupted by user


Description of your issue, suggested solution and other information

Download of CBS Evening News falls back to generic extractor and ends up grabbing the CBSN live stream instead.
Tested from multiple boxes.

@Rick7C2
Copy link

@Rick7C2 Rick7C2 commented Jan 27, 2018

CBSNEWS changed their videos urls from

https://www.cbsnews.com/videos/126-cbs-evening-news-2/

to

https://www.cbsnews.com/video/126-cbs-evening-news-2/

Line 14 in https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/cbsnews.py

Needs to be changed from...
VALID_URL = r'https?://(?:www.)?cbsnews.com/(?:news|videos)/(?P[\da-z-]+)'

To...
VALID_URL = r'https?://(?:www.)?cbsnews.com/(?:news|video)/(?P[\da-z-]+)'

@slash-proc
Copy link

@slash-proc slash-proc commented Feb 1, 2018

I think the change on their end goes a little deeper than that. I tried exactly what you suggested and unfortunately after I make that change youtube-dl is unable to extract playlist JSON info.

Before change:
youtube-dl $ ./youtube-dl http://www.cbsnews.com/video/131-cbs-evening-news/ -F -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['http://www.cbsnews.com/video/131-cbs-evening-news/', '-F', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2018.01.27
[debug] Python version 3.4.5 (CPython) - Linux-4.9.16-gentoo-x86_64-Intel-R-_Pentium-R-CPU_J2900@_2.41GHz-with-gentoo-2.3
[debug] exe versions: ffmpeg N-86258-g5782e0b, ffprobe N-86258-g5782e0b, rtmpdump 2.4
[debug] Proxy map: {}
[generic] 131-cbs-evening-news: Requesting header
[redirect] Following redirect to https://www.cbsnews.com/video/131-cbs-evening-news/
[generic] 131-cbs-evening-news: Requesting header
WARNING: Falling back on generic information extractor.
[generic] 131-cbs-evening-news: Downloading webpage
[generic] 131-cbs-evening-news: Extracting information
[generic] 131-cbs-evening-news: Downloading m3u8 information
[download] Downloading playlist: 1/31: CBS Evening News
[generic] playlist 1/31: CBS Evening News: Collected 1 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
[info] Available formats for 131-cbs-evening-news:
format code extension resolution note
hls-202-0 mp4 320x180 202k , avc1.4d400d, mp4a.40.2
hls-202-1 mp4 320x180 202k , avc1.4d400d, mp4a.40.2
hls-466-0 mp4 640x360 466k , avc1.66.30, mp4a.40.2
hls-466-1 mp4 640x360 466k , avc1.66.30, mp4a.40.2 (best)
[download] Finished downloading playlist: 1/31: CBS Evening News

After:
youtube-dl $ ./youtube-dl http://www.cbsnews.com/video/131-cbs-evening-news/ -F -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['http://www.cbsnews.com/video/131-cbs-evening-news/', '-F', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2018.01.27
[debug] Python version 3.4.5 (CPython) - Linux-4.9.16-gentoo-x86_64-Intel-R-_Pentium-R-CPU_J2900@_2.41GHz-with-gentoo-2.3
[debug] exe versions: ffmpeg N-86258-g5782e0b, ffprobe N-86258-g5782e0b, rtmpdump 2.4
[debug] Proxy map: {}
[cbsnews] 131-cbs-evening-news: Downloading webpage
ERROR: Unable to extract playlist JSON info; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
File "./youtube-dl/youtube_dl/YoutubeDL.py", line 784, in extract_info
ie_result = ie.extract(url)
File "./youtube-dl/youtube_dl/extractor/common.py", line 438, in extract
ie_result = self._real_extract(url)
File "./youtube-dl/youtube_dl/extractor/cbsnews.py", line 91, in _real_extract
'playlist JSON info', group='json'), video_id)['state']
File "./youtube-dl/youtube_dl/extractor/common.py", line 794, in _search_regex
raise RegexNotFoundError('Unable to extract %s' % _name)
youtube_dl.utils.RegexNotFoundError: Unable to extract playlist JSON info; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

@cfxd
Copy link

@cfxd cfxd commented Mar 11, 2018

I'm running into this too, even with the 2018.03.10 version :-/

I actually found that if you visit the video's page and grab the API URL from CBSNEWS.defaultPayload.items.video and use that URL in your command line then it works and grabs the vid 👍

@wolferikg
Copy link
Author

@wolferikg wolferikg commented Jul 10, 2018

Thanks for the hint @cfxd ... i finally went ahead and whipped up a quick script, that seems to work pretty well.
One needs to install 'jq' for this to work (https://stedolan.github.io/jq/).

wolf$ cat cbsnews.sh
#!/bin/bash

usage () {
echo "$(basename $0) Usage:"
echo "$(basename $0) <URL> [-d]"
echo "    -d // dry run: print video-url and exit."
echo ""
exit 2
}

if [ $# -lt 1 ] ;then usage ;fi

episode=$1
baseurl='https://www.cbsnews.com'
output=$(echo $episode | awk -F/ '{print $5".mp4"}')

json=$(curl -s $episode | grep CBSNEWS.defaultPayload | head -1 | awk -F' = ' '{print $2}')
video=$(echo $json | jq '.items|.[0].video' | sed 's/_phone.m3u8/_tablet.m3u8/g' | sed 's/"//g')

videourl=$baseurl$video

if [ $2 = "-d" ]
  then
    echo "Video-URL: $videourl"
  else
    echo "Attempting to download $videourl to $output ..."
    youtube-dl -o $output $videourl
fi

The substitution of _phone.m3u8 with _tablet.m3u8 was a "wild" guess 😆and will pull the high res version 😉

Maybe someone with more programming skills can use this as a base to submit a patch to fix this issue directly in the yt-dl cbsnews extractor?

Cheers.

@Wowfunhappy
Copy link

@Wowfunhappy Wowfunhappy commented Jul 27, 2018

Six months later, this is still broken as of version 2018.07.21.

@vxbinaca
Copy link
Contributor

@vxbinaca vxbinaca commented Aug 3, 2018

I can confirm @wolferikg clever hack works well. Good job you just helped me with something. Much appreciated.

@ddurdle
Copy link

@ddurdle ddurdle commented Jan 1, 2019

Great workaround. My previous workaround of opening the page source and looking for the 740.mp4 link doesn't work anymore, but this seems to.

@ddurdle
Copy link

@ddurdle ddurdle commented Mar 24, 2019

dead again

@ddurdle
Copy link

@ddurdle ddurdle commented Mar 24, 2019

looks like the change script is appending cbsnews.com to the url when it already exists on the url, so taking the error message and passing it through youtube-dl manually works

@sheerluck
Copy link

@sheerluck sheerluck commented May 22, 2019

Hi all, I tried to download https://www.cbsnews.com/news/how-the-danske-bank-money-laundering-scheme-involving-230-billion-unraveled-60-minutes-2019-05-19 and failed. Opened DevTools and have spotted a sequence of "akamaihd" urls like https://devicecbsnews-a.akamaihd.net/media/mpx/2019/05/19/1524617283782/0519_60Minutes_Segment1_1853572_1200/0519_60Minutes_Segment1_1853572_1200_14.ts
I can see a 7 or 8 extractors already know about "akamaihd" (francetv, lego, brightcove, senateisvp, livestream, nba, nhk, tvnow) so maybe we can fix cbsnews same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.