Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CBC Olympics #15535

Closed
RyanVoProductions opened this issue Feb 9, 2018 · 11 comments
Closed

Support CBC Olympics #15535

RyanVoProductions opened this issue Feb 9, 2018 · 11 comments

Comments

@RyanVoProductions
Copy link

@RyanVoProductions RyanVoProductions commented Feb 9, 2018

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like this: [x])
  • Use the Preview tab to see what your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2018.02.08. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2018.02.08

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones
  • Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue


If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add the -v flag to your command line you run youtube-dl with (youtube-dl -v <your command line>), copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

youtube-dl https://olympics.cbc.ca/video/whats-on-tv/olympic-morning-featuring-the-opening-ceremony/ -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://olympics.cbc.ca/video/whats-on-tv/olympic-morning-featuring-the-opening-ceremony/', '-v']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2018.02.08
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.16299
[debug] exe versions: ffmpeg 3.3.4, ffprobe 3.3.4, rtmpdump 2.4
[debug] Proxy map: {}
[generic] olympic-morning-featuring-the-opening-ceremony: Requesting header
WARNING: Falling back on generic information extractor.
[generic] olympic-morning-featuring-the-opening-ceremony: Downloading webpage
[generic] olympic-morning-featuring-the-opening-ceremony: Extracting information
ERROR: Unsupported URL: https://olympics.cbc.ca/video/whats-on-tv/olympic-morning-featuring-the-opening-ceremony/
Traceback (most recent call last):
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpn4uo9muj\build\youtube_dl\YoutubeDL.py", line 785, in extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpn4uo9muj\build\youtube_dl\extractor\common.py", line 440, in extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpn4uo9muj\build\youtube_dl\extractor\generic.py", line 3111, in _real_extract
youtube_dl.utils.UnsupportedError: Unsupported URL: https://olympics.cbc.ca/video/whats-on-tv/olympic-morning-featuring-the-opening-ceremony/
...
<end of log>

If the purpose of this issue is a site support request please provide all kinds of example URLs support for which should be included (replace following example URLs by yours):

Note that youtube-dl does not support sites dedicated to copyright infringement. In order for site support request to be accepted all provided example URLs should not violate any copyrights.


Description of your issue, suggested solution and other information

Just like NBC Olympics website, it needs to be supported.

@frozenpandaman
Copy link

@frozenpandaman frozenpandaman commented Feb 11, 2018

I'd like to see this as well.

You can do it manually for now:

  • Find the manifest URL (links to all the fragments) – ends in Manifest(video,format=m3u8-aapl-v3,audiotrack=english,filter=hls). You can find this using chrome://net-internals.
  • Download all parts with wget (requires a browser user-agent as a header)
  • Concat with ffmpeg
@tp0
Copy link

@tp0 tp0 commented Feb 11, 2018

Better workaround: Streamlink with an user-agent argument (still needs a Canadian IP). Then sniff the playlist file (Manifest(video,format=m3u8-aapl-v3,audiotrack=english,filter=hls) and:

streamlink -o output.ts "hls://https://dvr-i-cbc.akamaized.net/dvr/nnnnnnnnnnnn.ism/QualityLevels(3449984)/Manifest(video,format=m3u8-aapl-v3,audiotrack=english,filter=hls)" best --http-header "User-Agent=Mozilla/5.0 (Linux; U; Android 4.0.3; us-en; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30"
[cli][info] Found matching plugin hls for URL hls://https://dvr-i-cbc.akamaized.net/dvr/nnnnnnnnnnnn.ism//QualityLevels(3449984)/Manifest(video,format=m3u8-aapl-v3,audiotrack=english,filter=hls)
[cli][info] Available streams: live (worst, best)
[cli][info] Opening stream: live (hls)
[download][output.ts] Written 18.5 MB (8s @ 2.1 MB/s)
@blkeller
Copy link

@blkeller blkeller commented Feb 14, 2018

I've had good luck feeding the Manifest URL directly to youtube-dl. The terminal fills with warning messages about invalid DTS/PTS and invalid timestamps as the stream downloads, but the file you get in the end is perfectly good! For example, this worked for me:

$ youtube-dl -o 'CBC - Curling, Feb. 8 - mixed doubles (Draw 4 - CAN vs. FIN).mp4' 'https://dvr-i-cbc.akamaized.net/dvr/701e3ca5-9f3c-4dc4-ac69-f1e2574da38d/701e3ca5-9f3c-4dc4-ac69-f1e2574da38d.ism/QualityLevels(3449984)/Manifest(video,format=m3u8-aapl-v3,audiotrack=english,filter=hls)'

You still have to sniff out the Manifest URL yourself, but at least youtube-dl can automatically and correctly assemble the stream using ffmpeg for you. I'm hoping that might be an indicator that this won't be too difficult a site for which to have someone implement an extractor.

@frozenpandaman
Copy link

@frozenpandaman frozenpandaman commented Feb 14, 2018

Oh, cool, didn't realize it could take the manifest URL like that. Just wrote a little script to process that, download the parts & invoke ffmpeg to stitch 'em together, but guess that wasn't necessary. :D Awesome, thanks.

@eternaleye
Copy link

@eternaleye eternaleye commented Feb 15, 2018

Note that all of these videos have an ID number, which can very directly lead to an XML file defining the streams in question.

In the <head> element of the page, there's a <meta name="rc.idMedia" content="..."> element. By feeding the value of the content attribute into this template:

https://olympics.cbc.ca/videodata/${ID}.xml

You get an XML file containing the metadata and stream URLs of the video.

For example:

https://olympics.cbc.ca/video/vod/hearts-hugs-and-kisses-for-valentine-day/

has the <meta> tag

<meta name="rc.idMedia" content="51326" />

which if you then download https://olympics.cbc.ca/videodata/51326.xml gives you

<?xml version="1.0" encoding="utf-8"?>
<video>
  <videoId>51326</videoId>
  <thumbnailUrl><![CDATA[https://olympics.cbc.ca/mm/Photo/Photo/General/05/13/66/51366_DIVAS.jpg]]></thumbnailUrl>
  <title><![CDATA[Hearts, hugs and kisses for Valentine's Day]]></title>
  <description><![CDATA[The competition is tense, but there's still lots of love at the Winter Olympics. Check out all the hugs, kisses, hearts and emotion on display in Korea.
 ]]></description>
  <SEO>hearts-hugs-and-kisses-for-valentine-day</SEO>
  <lang><![CDATA[English]]></lang>
  <publicationDate>20180214152200000</publicationDate>
  <area><![CDATA[Videos]]></area>
  <kind><![CDATA[Video]]></kind>
  <section><![CDATA[General]]></section>
  <tournament><![CDATA[]]></tournament>
  <assetId>a1a16656-9311-e811-80cb-005056990ef8</assetId>
  <assetState>3</assetState>
  <category1></category1>
  <category2></category2>
  <category3>hearts-hugs-and-kisses-for-valentine-day</category3>
  <category4 />
  <category5>very-short</category5>
  <category6></category6>
  <category7></category7>
  <category8></category8>
  <category9>Video</category9>
  <category10></category10>
  <events>
    <event id="" start="" end="" />
  </events>
  <isMultistream>false</isMultistream>
  <videoSources>
    <videoSource format="IIS" offset="00:00:00">
      <uri>https://vod-s-cbc.akamaized.net/vod/6b52bf42-91a6-4a66-bdc8-55bf402d03b5/LOVE_FEB14_EN.ism/manifest(filter=iis)</uri>
      <discontinuities />
    </videoSource>
    <videoSource format="HLS" offset="00:00:00">
      <uri>https://vod-i-cbc.akamaized.net/vod/6b52bf42-91a6-4a66-bdc8-55bf402d03b5/LOVE_FEB14_EN.ism/manifest(format=m3u8-aapl-v3,filter=hls)</uri>
      <discontinuities />
    </videoSource>
    <videoSource format="Chromecast" offset="00:00:00">
      <uri>https://vod-i-cbc.akamaized.net/vod/6b52bf42-91a6-4a66-bdc8-55bf402d03b5/LOVE_FEB14_EN.ism/manifest(format=m3u8-aapl-v3,filter=chromecast)</uri>
      <discontinuities />
    </videoSource>
    <videoSource format="ConnectTV" offset="00:00:00">
      <uri>https://vod-i-cbc.akamaized.net/vod/6b52bf42-91a6-4a66-bdc8-55bf402d03b5/LOVE_FEB14_EN.ism/manifest(format=m3u8-aapl-v3,filter=connecttv)</uri>
      <discontinuities />
    </videoSource>
    <videoSource format="HTML5" offset="00:00:00">
      <uri>https://vod-i-cbc.akamaized.net/vod/6b52bf42-91a6-4a66-bdc8-55bf402d03b5/LOVE_FEB14_EN.ism/manifest(format=m3u8-aapl-v3,filter=html5)</uri>
      <discontinuities />
    </videoSource>
  </videoSources>
  <timeCodeIn>20000101000000000</timeCodeIn>
  <duration>00:01:15.604</duration>
  <trimIn></trimIn>
  <trimOut></trimOut>
  <preroll template="3ac9cc9e-cf36-4a8d-9c43-db3493d4f855" />
  <postroll template="" />
  <audioTracks default="1">
    <audioTrack id="1" lang="en-CA" enabled="true">english</audioTrack>
  </audioTracks>
</video>
@remitamine
Copy link
Collaborator

@remitamine remitamine commented Feb 17, 2018

i no longer have access to a vpn/proxy that can access this videos, so if someone can test if it still working for both replays and live streams(better to include the output of the command).

diff --git a/youtube_dl/extractor/cbc.py b/youtube_dl/extractor/cbc.py
index 9faf40227..3be0c646b 100644
--- a/youtube_dl/extractor/cbc.py
+++ b/youtube_dl/extractor/cbc.py
@@ -1,6 +1,7 @@
 # coding: utf-8
 from __future__ import unicode_literals
 
+import json
 import re
 
 from .common import InfoExtractor
@@ -13,6 +14,7 @@ from ..utils import (
     xpath_element,
     xpath_with_ns,
     find_xpath_attr,
+    parse_duration,
     parse_iso8601,
     parse_age_limit,
     int_or_none,
@@ -359,3 +361,63 @@ class CBCWatchIE(CBCWatchBaseIE):
         video_id = self._match_id(url)
         rss = self._call_api('web/browse/' + video_id, video_id)
         return self._parse_rss_feed(rss)
+
+
+class CBCOlympicsIE(InfoExtractor):
+    IE_NAME = 'cbc.ca:olympics'
+    _VALID_URL = r'https?://olympics\.cbc\.ca/video/[^/]+/(?P<id>[^/?#]+)'
+    _TESTS = [{
+        'url': 'https://olympics.cbc.ca/video/whats-on-tv/olympic-morning-featuring-the-opening-ceremony/',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        video_id = self._hidden_inputs(webpage)['videoId']
+        video_doc = self._download_xml(
+            'https://olympics.cbc.ca/videodata/%s.xml' % video_id, video_id)
+        title = xpath_text(video_doc, 'title', fatal=True)
+        is_live = xpath_text(video_doc, 'kind') == 'Live'
+        if is_live:
+            title = self._live_title(title)
+
+        formats = []
+        for video_source in video_doc.findall('videoSources/videoSource'):
+            uri = xpath_text(video_source, 'uri')
+            if not uri:
+                continue
+            tokenize = self._download_json(
+                'https://olympics.cbc.ca/api/api-akamai/tokenize',
+                video_id, data=json.dumps({
+                    'VideoSource': uri,
+                }).encode(), headers={
+                    'Content-Type': 'application/json',
+                    'Referer': url,
+                    # d3.VideoPlayer._init in https://olympics.cbc.ca/components/script/base.js
+                    'Cookie': '_dvp=TK:C0ObxjerU',  # AKAMAI CDN cookie
+                }, fatal=False)
+            if not tokenize:
+                continue
+            content_url = tokenize['ContentUrl']
+            video_source_format = video_source.get('format')
+            if video_source_format == 'IIS':
+                formats.extend(self._extract_ism_formats(
+                    content_url, video_id, ism_id=video_source_format, fatal=False))
+            else:
+                formats.extend(self._extract_m3u8_formats(
+                    content_url, video_id, 'mp4',
+                    'm3u8' if is_live else 'm3u8_native',
+                    m3u8_id=video_source_format, fatal=False))
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'title': title,
+            'description': xpath_text(video_doc, 'description'),
+            'thumbnail': xpath_text(video_doc, 'thumbnailUrl'),
+            'duration': parse_duration(xpath_text(video_doc, 'duration')),
+            'formats': formats,
+            'is_live': is_live,
+        }
diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py
index 666f2ac04..cf66be507 100644
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -162,6 +162,7 @@ from .cbc import (
     CBCPlayerIE,
     CBCWatchVideoIE,
     CBCWatchIE,
+    CBCOlympicsIE,
 )
 from .cbs import CBSIE
 from .cbslocal import CBSLocalIE
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py
index ef44b99a5..7f24cbb04 100644
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -82,7 +82,7 @@ def register_socks_protocols():
 compiled_regex_type = type(re.compile(''))
 
 std_headers = {
-    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)',
+    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0 (Chrome)',
     'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
     'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
     'Accept-Encoding': 'gzip, deflate',
@frozenpandaman
Copy link

@frozenpandaman frozenpandaman commented Feb 17, 2018

Was trying to write an extractor myself but running into trouble when figuring out what to return. I had all the URLs to download, but how do you get it to actually do that and then stitch them together? Oh well. :D

I can try your code in a few hours and let you know!

@frozenpandaman
Copy link

@frozenpandaman frozenpandaman commented Feb 17, 2018

Partially works, @remitamine! So you're essentially passing in the fragments (given in the m3u8/manifest) into ffmpeg and it stitches them together one after another? Not exactly sure how the code all works – I've honestly struggled to find documentation on how to do proceed, and couldn't find support in the IRC either. Readme pretty much gives you a template then throws you in blind. 🤷

eli:youtube-dl eli$ python3 -m youtube_dl https://olympics.cbc.ca/video/todays-events/cross-country-skiing-feb-women-relay-final/
[cbc.ca:olympics] cross-country-skiing-feb-women-relay-final: Downloading webpage
[cbc.ca:olympics] 45653: Downloading XML
[cbc.ca:olympics] 45653: Downloading JSON metadata
[cbc.ca:olympics] 45653: Downloading ISM manifest
[cbc.ca:olympics] 45653: Downloading JSON metadata
[cbc.ca:olympics] 45653: Downloading m3u8 information
[cbc.ca:olympics] 45653: Downloading JSON metadata
[cbc.ca:olympics] 45653: Downloading m3u8 information
[cbc.ca:olympics] 45653: Downloading JSON metadata
[cbc.ca:olympics] 45653: Downloading m3u8 information
[cbc.ca:olympics] 45653: Downloading JSON metadata
[cbc.ca:olympics] 45653: Downloading m3u8 information
[download] Destination: Cross-country skiing, Feb. 17 - women's 4 x 5 km relay final 2018-02-17 02_41-45653.mp4

(Could we suppress some of this output?)

Note that you get a ton of ffmpeg warnings. Almost a hundred for each fragment; totally fills up the console.

[mpegts @ 0x7fac4b804800] Invalid timestamps stream=0, pts=843973143, dts=843976145, size=10806
[mpegts @ 0x7fac4b804800] Invalid timestamps stream=0, pts=843976146, dts=843979149, size=12629
[hls,applehttp @ 0x7fac4b002800] Invalid timestamps stream=0, pts=843973143, dts=843976145, size=10806
[mpegts @ 0x7fac4b804800] Invalid timestamps stream=0, pts=843979149, dts=843982151, size=10553
[hls,applehttp @ 0x7fac4b002800] Invalid timestamps stream=0, pts=843976146, dts=843979149, size=12629
[hls,applehttp @ 0x7fac4b002800] Invalid timestamps stream=0, pts=843979149, dts=843982151, size=10553
[mpegts @ 0x7fac4b804800] Invalid timestamps stream=0, pts=843985155, dts=843988158, size=10645
...

Also, if you run the command with a live video – while it's going on – it starts recording at the current point in the stream, and just records until the stream ends. Is this the intended behavior, or should we have it start from the beginning of the stream and then keep recording as the stream continues?

@frozenpandaman
Copy link

@frozenpandaman frozenpandaman commented Feb 18, 2018

Still some problems with highlights too:

eli:youtube-dl eli$ python3 -m youtube_dl https://olympics.cbc.ca/video/highlights/snowboarder-ester-ledecka-wins-shocking-gold-medal-skis-super/index.html
[cbc.ca:olympics] snowboarder-ester-ledecka-wins-shocking-gold-medal-skis-super: Downloading webpage
[cbc.ca:olympics] 53944: Downloading XML
[cbc.ca:olympics] 53944: Downloading JSON metadata
[cbc.ca:olympics] 53944: Downloading ISM manifest
[cbc.ca:olympics] 53944: Downloading JSON metadata
[cbc.ca:olympics] 53944: Downloading m3u8 information
[cbc.ca:olympics] 53944: Downloading JSON metadata
[cbc.ca:olympics] 53944: Downloading m3u8 information
[cbc.ca:olympics] 53944: Downloading JSON metadata
[cbc.ca:olympics] 53944: Downloading m3u8 information
[cbc.ca:olympics] 53944: Downloading JSON metadata
[cbc.ca:olympics] 53944: Downloading m3u8 information
[ism] Total fragments: 68
[download] Destination: Snowboarder Ester Ledecka wins shocking gold medal on skis in super-G-53944.fIIS-3497.ismv
[download] 100% of 56.78MiB in 00:42
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 23
[download] Destination: Snowboarder Ester Ledecka wins shocking gold medal on skis in super-G-53944.fHTML5-138.mp4
[download] 100% of 2.24MiB in 00:07
[ffmpeg] Merging formats into "Snowboarder Ester Ledecka wins shocking gold medal on skis in super-G-53944.ismv"
ERROR: Conversion failed!

I'm able to manually combine the two into an mp4 though:

eli:~ eli$ ffmpeg -i filename.fHTML5-138.mp4 -i filename.fIIS-3497.ismv -c:v copy -c:a copy output.mp4
@remitamine
Copy link
Collaborator

@remitamine remitamine commented Feb 19, 2018

@frozenpandaman most likely that the ffmpeg/avconv version used by youtube-dl is old.

@frozenpandaman
Copy link

@frozenpandaman frozenpandaman commented Feb 19, 2018

Ah, that makes sense. Thanks so much, @remitamine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.