Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Support CBC Olympics #15535
Support CBC Olympics #15535
Comments
|
I'd like to see this as well. You can do it manually for now:
|
|
Better workaround: Streamlink with an user-agent argument (still needs a Canadian IP). Then sniff the playlist file (
|
|
I've had good luck feeding the Manifest URL directly to youtube-dl. The terminal fills with warning messages about invalid DTS/PTS and invalid timestamps as the stream downloads, but the file you get in the end is perfectly good! For example, this worked for me: $ youtube-dl -o 'CBC - Curling, Feb. 8 - mixed doubles (Draw 4 - CAN vs. FIN).mp4' 'https://dvr-i-cbc.akamaized.net/dvr/701e3ca5-9f3c-4dc4-ac69-f1e2574da38d/701e3ca5-9f3c-4dc4-ac69-f1e2574da38d.ism/QualityLevels(3449984)/Manifest(video,format=m3u8-aapl-v3,audiotrack=english,filter=hls)'You still have to sniff out the Manifest URL yourself, but at least youtube-dl can automatically and correctly assemble the stream using ffmpeg for you. I'm hoping that might be an indicator that this won't be too difficult a site for which to have someone implement an extractor. |
|
Oh, cool, didn't realize it could take the manifest URL like that. Just wrote a little script to process that, download the parts & invoke ffmpeg to stitch 'em together, but guess that wasn't necessary. :D Awesome, thanks. |
|
Note that all of these videos have an ID number, which can very directly lead to an XML file defining the streams in question. In the
You get an XML file containing the metadata and stream URLs of the video. For example: https://olympics.cbc.ca/video/vod/hearts-hugs-and-kisses-for-valentine-day/ has the
which if you then download https://olympics.cbc.ca/videodata/51326.xml gives you <?xml version="1.0" encoding="utf-8"?>
<video>
<videoId>51326</videoId>
<thumbnailUrl><![CDATA[https://olympics.cbc.ca/mm/Photo/Photo/General/05/13/66/51366_DIVAS.jpg]]></thumbnailUrl>
<title><![CDATA[Hearts, hugs and kisses for Valentine's Day]]></title>
<description><![CDATA[The competition is tense, but there's still lots of love at the Winter Olympics. Check out all the hugs, kisses, hearts and emotion on display in Korea.
]]></description>
<SEO>hearts-hugs-and-kisses-for-valentine-day</SEO>
<lang><![CDATA[English]]></lang>
<publicationDate>20180214152200000</publicationDate>
<area><![CDATA[Videos]]></area>
<kind><![CDATA[Video]]></kind>
<section><![CDATA[General]]></section>
<tournament><![CDATA[]]></tournament>
<assetId>a1a16656-9311-e811-80cb-005056990ef8</assetId>
<assetState>3</assetState>
<category1></category1>
<category2></category2>
<category3>hearts-hugs-and-kisses-for-valentine-day</category3>
<category4 />
<category5>very-short</category5>
<category6></category6>
<category7></category7>
<category8></category8>
<category9>Video</category9>
<category10></category10>
<events>
<event id="" start="" end="" />
</events>
<isMultistream>false</isMultistream>
<videoSources>
<videoSource format="IIS" offset="00:00:00">
<uri>https://vod-s-cbc.akamaized.net/vod/6b52bf42-91a6-4a66-bdc8-55bf402d03b5/LOVE_FEB14_EN.ism/manifest(filter=iis)</uri>
<discontinuities />
</videoSource>
<videoSource format="HLS" offset="00:00:00">
<uri>https://vod-i-cbc.akamaized.net/vod/6b52bf42-91a6-4a66-bdc8-55bf402d03b5/LOVE_FEB14_EN.ism/manifest(format=m3u8-aapl-v3,filter=hls)</uri>
<discontinuities />
</videoSource>
<videoSource format="Chromecast" offset="00:00:00">
<uri>https://vod-i-cbc.akamaized.net/vod/6b52bf42-91a6-4a66-bdc8-55bf402d03b5/LOVE_FEB14_EN.ism/manifest(format=m3u8-aapl-v3,filter=chromecast)</uri>
<discontinuities />
</videoSource>
<videoSource format="ConnectTV" offset="00:00:00">
<uri>https://vod-i-cbc.akamaized.net/vod/6b52bf42-91a6-4a66-bdc8-55bf402d03b5/LOVE_FEB14_EN.ism/manifest(format=m3u8-aapl-v3,filter=connecttv)</uri>
<discontinuities />
</videoSource>
<videoSource format="HTML5" offset="00:00:00">
<uri>https://vod-i-cbc.akamaized.net/vod/6b52bf42-91a6-4a66-bdc8-55bf402d03b5/LOVE_FEB14_EN.ism/manifest(format=m3u8-aapl-v3,filter=html5)</uri>
<discontinuities />
</videoSource>
</videoSources>
<timeCodeIn>20000101000000000</timeCodeIn>
<duration>00:01:15.604</duration>
<trimIn></trimIn>
<trimOut></trimOut>
<preroll template="3ac9cc9e-cf36-4a8d-9c43-db3493d4f855" />
<postroll template="" />
<audioTracks default="1">
<audioTrack id="1" lang="en-CA" enabled="true">english</audioTrack>
</audioTracks>
</video> |
|
i no longer have access to a vpn/proxy that can access this videos, so if someone can test if it still working for both replays and live streams(better to include the output of the command). diff --git a/youtube_dl/extractor/cbc.py b/youtube_dl/extractor/cbc.py
index 9faf40227..3be0c646b 100644
--- a/youtube_dl/extractor/cbc.py
+++ b/youtube_dl/extractor/cbc.py
@@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
+import json
import re
from .common import InfoExtractor
@@ -13,6 +14,7 @@ from ..utils import (
xpath_element,
xpath_with_ns,
find_xpath_attr,
+ parse_duration,
parse_iso8601,
parse_age_limit,
int_or_none,
@@ -359,3 +361,63 @@ class CBCWatchIE(CBCWatchBaseIE):
video_id = self._match_id(url)
rss = self._call_api('web/browse/' + video_id, video_id)
return self._parse_rss_feed(rss)
+
+
+class CBCOlympicsIE(InfoExtractor):
+ IE_NAME = 'cbc.ca:olympics'
+ _VALID_URL = r'https?://olympics\.cbc\.ca/video/[^/]+/(?P<id>[^/?#]+)'
+ _TESTS = [{
+ 'url': 'https://olympics.cbc.ca/video/whats-on-tv/olympic-morning-featuring-the-opening-ceremony/',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+ webpage = self._download_webpage(url, display_id)
+ video_id = self._hidden_inputs(webpage)['videoId']
+ video_doc = self._download_xml(
+ 'https://olympics.cbc.ca/videodata/%s.xml' % video_id, video_id)
+ title = xpath_text(video_doc, 'title', fatal=True)
+ is_live = xpath_text(video_doc, 'kind') == 'Live'
+ if is_live:
+ title = self._live_title(title)
+
+ formats = []
+ for video_source in video_doc.findall('videoSources/videoSource'):
+ uri = xpath_text(video_source, 'uri')
+ if not uri:
+ continue
+ tokenize = self._download_json(
+ 'https://olympics.cbc.ca/api/api-akamai/tokenize',
+ video_id, data=json.dumps({
+ 'VideoSource': uri,
+ }).encode(), headers={
+ 'Content-Type': 'application/json',
+ 'Referer': url,
+ # d3.VideoPlayer._init in https://olympics.cbc.ca/components/script/base.js
+ 'Cookie': '_dvp=TK:C0ObxjerU', # AKAMAI CDN cookie
+ }, fatal=False)
+ if not tokenize:
+ continue
+ content_url = tokenize['ContentUrl']
+ video_source_format = video_source.get('format')
+ if video_source_format == 'IIS':
+ formats.extend(self._extract_ism_formats(
+ content_url, video_id, ism_id=video_source_format, fatal=False))
+ else:
+ formats.extend(self._extract_m3u8_formats(
+ content_url, video_id, 'mp4',
+ 'm3u8' if is_live else 'm3u8_native',
+ m3u8_id=video_source_format, fatal=False))
+ self._sort_formats(formats)
+
+ return {
+ 'id': video_id,
+ 'display_id': display_id,
+ 'title': title,
+ 'description': xpath_text(video_doc, 'description'),
+ 'thumbnail': xpath_text(video_doc, 'thumbnailUrl'),
+ 'duration': parse_duration(xpath_text(video_doc, 'duration')),
+ 'formats': formats,
+ 'is_live': is_live,
+ }
diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py
index 666f2ac04..cf66be507 100644
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -162,6 +162,7 @@ from .cbc import (
CBCPlayerIE,
CBCWatchVideoIE,
CBCWatchIE,
+ CBCOlympicsIE,
)
from .cbs import CBSIE
from .cbslocal import CBSLocalIE
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py
index ef44b99a5..7f24cbb04 100644
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -82,7 +82,7 @@ def register_socks_protocols():
compiled_regex_type = type(re.compile(''))
std_headers = {
- 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)',
+ 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0 (Chrome)',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate', |
|
Was trying to write an extractor myself but running into trouble when figuring out what to return. I had all the URLs to download, but how do you get it to actually do that and then stitch them together? Oh well. :D I can try your code in a few hours and let you know! |
|
Partially works, @remitamine! So you're essentially passing in the fragments (given in the m3u8/manifest) into ffmpeg and it stitches them together one after another? Not exactly sure how the code all works – I've honestly struggled to find documentation on how to do proceed, and couldn't find support in the IRC either. Readme pretty much gives you a template then throws you in blind.
(Could we suppress some of this output?) Note that you get a ton of ffmpeg warnings. Almost a hundred for each fragment; totally fills up the console.
Also, if you run the command with a live video – while it's going on – it starts recording at the current point in the stream, and just records until the stream ends. Is this the intended behavior, or should we have it start from the beginning of the stream and then keep recording as the stream continues? |
|
Still some problems with highlights too:
I'm able to manually combine the two into an mp4 though:
|
|
@frozenpandaman most likely that the ffmpeg/avconv version used by youtube-dl is old. |
|
Ah, that makes sense. Thanks so much, @remitamine! |
Please follow the guide below
xinto all the boxes [ ] relevant to your issue (like this:[x])Make sure you are using the latest version: run
youtube-dl --versionand ensure your version is 2018.02.08. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.Before submitting an issue make sure you have:
What is the purpose of your issue?
The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue
If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:
Add the
-vflag to your command line you run youtube-dl with (youtube-dl -v <your command line>), copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):If the purpose of this issue is a site support request please provide all kinds of example URLs support for which should be included (replace following example URLs by yours):
Note that youtube-dl does not support sites dedicated to copyright infringement. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
Description of your issue, suggested solution and other information
Just like NBC Olympics website, it needs to be supported.