Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBC gem unsupported URL error #8382

Open
10 of 11 tasks
h0m3rth0mps0n opened this issue Oct 19, 2023 · 8 comments · May be fixed by #8790
Open
10 of 11 tasks

CBC gem unsupported URL error #8382

h0m3rth0mps0n opened this issue Oct 19, 2023 · 8 comments · May be fixed by #8790
Labels
site-bug Issue with a specific website site-enhancement Feature request for some website

Comments

@h0m3rth0mps0n
Copy link

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Region

Canada

Provide a description that is worded well enough to be understood

I get an "unsupported url" error when trying to download from a supported site, gem.cbc.ca.
Expected outcome: the video should download.

I'll note cbc gem isn't completely broken. I also tried to download https://gem.cbc.ca/the-passionate-eye/s03e06, and that one worked.

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['-Uv', '--username', 'PRIVATE', '--password', 'PRIVATE', '-fbv+ba', 'https://gem.cbc.ca/score-a-hockey-musical']
[debug] Portable config "D:\videodownload\yt-dlp.conf": ['-N', '8']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out cp1252 (No VT), error cp1252 (No VT), screen cp1252 (No VT)
[debug] yt-dlp version stable@2023.10.13 [b634ba742] (win_exe)
[debug] Python 3.8.10 (CPython AMD64 64bit) - Windows-10-10.0.19045-SP0 (OpenSSL 1.1.1k  25 Mar 2021)
[debug] exe versions: ffmpeg N-105441-g61cbfdc0a2-20220201 (setts), ffprobe N-105441-g61cbfdc0a2-20220201
[debug] Optional libraries: Cryptodome-3.19.0, brotli-1.1.0, certifi-2023.07.22, mutagen-1.47.0, sqlite3-3.35.5, websockets-11.0.3
[debug] Proxy map: {}
[debug] Loaded 1890 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Available version: stable@2023.10.13, Current version: stable@2023.10.13
Current Build Hash: 13d6a04ef6678dc61cb6e2d42eb53e69486bd2f52b8e5d778db029ccd4c600b4
yt-dlp is up to date (stable@2023.10.13)
[generic] Extracting URL: https://gem.cbc.ca/score-a-hockey-musical
[generic] score-a-hockey-musical: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] score-a-hockey-musical: Extracting information
[debug] Looking for embeds
ERROR: Unsupported URL: https://gem.cbc.ca/score-a-hockey-musical
Traceback (most recent call last):
  File "yt_dlp\YoutubeDL.py", line 1567, in wrapper
  File "yt_dlp\YoutubeDL.py", line 1702, in __extract_info
  File "yt_dlp\extractor\common.py", line 715, in extract
  File "yt_dlp\extractor\generic.py", line 2575, in _real_extract
yt_dlp.utils.UnsupportedError: Unsupported URL: https://gem.cbc.ca/score-a-hockey-musical
@h0m3rth0mps0n h0m3rth0mps0n added site-bug Issue with a specific website triage Untriaged issue labels Oct 19, 2023
@bashonly
Copy link
Member

bashonly commented Oct 19, 2023

Workaround: pass the URL as https://gem.cbc.ca/score-a-hockey-musical/s01e01

The CBC Gem extractor and the CBC Gem API always expect a /s??e?? in the URL slug, even for one-off specials.

We could do something like this:

diff --git a/yt_dlp/extractor/cbc.py b/yt_dlp/extractor/cbc.py
index be2d13e44..c0fd4d888 100644
--- a/yt_dlp/extractor/cbc.py
+++ b/yt_dlp/extractor/cbc.py
@@ -264,7 +264,7 @@ def entries():
 
 class CBCGemIE(InfoExtractor):
     IE_NAME = 'gem.cbc.ca'
-    _VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>[0-9a-z-]+/s[0-9]+[a-z][0-9]+)'
+    _VALID_URL = r'https?://gem\.cbc\.ca/(?:media/)?(?P<id>[0-9a-z-]+(?:/s[0-9]+[a-z][0-9]+)?)/?(?:$|[#?])'
     _TESTS = [{
         # This is a normal, public, TV show video
         'url': 'https://gem.cbc.ca/media/schitts-creek/s06e01',
@@ -411,6 +411,8 @@ def _find_secret_formats(self, formats, video_id):
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
+        if '/' not in video_id:
+            video_id += '/s01e01'
         video_info = self._download_json(
             f'https://services.radio-canada.ca/ott/cbc-api/v2/assets/{video_id}',
             video_id, expected_status=426)

although it would have the side effect of also matching URLs like https://gem.cbc.ca/browse

also, when I was experimenting with the extractor, I noticed that the timestamp value was being extracted in milliseconds instead of seconds, and would cause an OverflowError if my output template contained something like %(timestamp>%Y%m%d)s. So we could also fix that, e.g.

diff --git a/yt_dlp/extractor/cbc.py b/yt_dlp/extractor/cbc.py
index be2d13e44..d6bc7db02 100644
--- a/yt_dlp/extractor/cbc.py
+++ b/yt_dlp/extractor/cbc.py
@@ -461,8 +461,8 @@ def _real_extract(self, url):
             'duration': video_info.get('duration'),
             'categories': [video_info.get('category')],
             'formats': formats,
-            'release_timestamp': video_info.get('airDate'),
-            'timestamp': video_info.get('availableDate'),
+            'release_timestamp': int_or_none(video_info.get('airDate'), scale=1000),
+            'timestamp': int_or_none(video_info.get('availableDate'), scale=1000),
         }
 
 

@bashonly bashonly added patch-available There is patch available that should fix this issue. Someone needs to make a PR with it and removed triage Untriaged issue labels Oct 19, 2023
@gamer191
Copy link
Collaborator

+        if '/' not in video_id:
+            video_id += '/s01e01'

What does the website append to API calls? Imo we should try to emulate the website when possible

although it would have the side effect of also matching URLs like https://gem.cbc.ca/browse

My guess would be that those would currently fail, and would continue to fail, so it doesn't really matter

@bashonly
Copy link
Member

bashonly commented Oct 19, 2023

What does the website append to API calls? Imo we should try to emulate the website when possible

it appends /s01e01

the CBC Gem API always expect[s] a /s??e?? in the URL slug

@h0m3rth0mps0n
Copy link
Author

Thanks for the workaround. I'm able to download the video this way.

As for the proposed fix, if I may add my two cents.. this would also cause urls like https://gem.cbc.ca/the-passionate-eye/ to implicitly go and download s01e01, would it not? May or may not be desirable 🤷‍♂️

@bashonly
Copy link
Member

bashonly commented Oct 20, 2023

this would also cause urls like https://gem.cbc.ca/the-passionate-eye/ to implicitly go and download s01e01, would it not? May or may not be desirable

Correct. Although full series links are not currently supported, either; only individual season links are supported.

Maybe the best solution would be to implement a full series extractor, which would process these films/specials as 1 episode playlists

@h0m3rth0mps0n
Copy link
Author

Maybe the best solution would be to implement a full series extractor, which would process these films/specials as 1 episode playlists

Sounds good to me :)

@bashonly bashonly added site-enhancement Feature request for some website and removed patch-available There is patch available that should fix this issue. Someone needs to make a PR with it labels Oct 20, 2023
@makew0rld
Copy link
Contributor

Working on this now, looks like I can just use the Next.js JSON in the HTML to figure out whether the URL is for a standalone video (like this issue is about) or a series.

makew0rld added a commit to makew0rld/yt-dlp that referenced this issue Dec 17, 2023
@makew0rld
Copy link
Contributor

Ok I've opened a PR that should address this: #8790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website site-enhancement Feature request for some website
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants