Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added CBCPlayerPlaylistIE #7870

Merged
merged 9 commits into from Aug 20, 2023
Merged

Conversation

trainman261
Copy link
Contributor

@trainman261 trainman261 commented Aug 16, 2023

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

This adds support for CBC Player playlists. There doesn't seem to be any direct connection to ThePlatform here, so we need to extract this information from JSON in the website - with the difficulty being that it's not marked as JSON. Therefore some methods are a bit blunt. Improvements are welcome.
One question is still kind of open for me: what is considered a playlist and what not? The sites I added in the tests are more or less clear playlists, although they could also be seen as a collection of videos by category. However, when on the page for an individual video, it will often list other videos in the same category (see the videos from the playlists in the tests). I haven't treated the pages for individual videos as playlists. However, internally, the webpage treats them the same way, so it would be a minor change. Let me know if I should change this.

Fixes #

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Copilot Summary

🤖 Generated by Copilot at f692108

Summary

✨📝🔍

Add support for CBC player playlists. Create a new extractor class CBCPlayerPlaylistIE in cbc.py and register it in _extractors.py.

CBCPlayerPlaylistIE
Extracts videos from HTML
A new class for fall

Walkthrough

  • Add support for extracting videos from CBC player playlists (link, link, link)
  • Import get_element_by_id function from utils.py to find playlist JSON data in HTML page (link)
  • Define CBCPlayerPlaylistIE class in cbc.py that inherits from InfoExtractor and overrides _real_extract method (link)

yt_dlp/extractor/cbc.py Outdated Show resolved Hide resolved
yt_dlp/extractor/cbc.py Outdated Show resolved Hide resolved
yt_dlp/extractor/cbc.py Outdated Show resolved Hide resolved
yt_dlp/extractor/cbc.py Outdated Show resolved Hide resolved
yt_dlp/extractor/cbc.py Outdated Show resolved Hide resolved
@bashonly bashonly added site-request Request to support a new website pending-fixes PR has had changes requested labels Aug 16, 2023
trainman261 and others added 5 commits August 18, 2023 00:00
Yes, that should match it better.

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
Ah OK, that's it. I couldn't get _search_json working for the life of me...

Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
Co-authored-by: bashonly <88596187+bashonly@users.noreply.github.com>
@trainman261
Copy link
Contributor Author

Alright, fixed up the mentioned points. Anything I missed or still needs to be done?
Another note: since the fixes, I'm having more problems with pypy. It's intermittent, and I have no clue exactly what's going on. The first test will complete until it prints "Downloading item 21 of 25", then I get the Windows message "pypy 3.7 has stopped working" - no warning from pypy itself. I had the issue before, but it was rare, now it's happening more often, it's kind of a 50/50 thing. It's entirely possible that this is an issue with my system. Does anyone else have this issue?

@trainman261
Copy link
Contributor Author

trainman261 commented Aug 19, 2023

Another issue I've noticed: filtering items in the playlist doesn't really work, because the info fields available in the playlist are different than those available from _extract_theplatform_metadata in ThePlatform.py. The result being: it will filter the playlist entries, start extracting the individual filtered video, then cancel downloading the video because the filter match doesn't match anymore, and apparently the info from the playlist is discarded.
My first instinct is to use _extract_theplatform_metadata in the CBCPlayerPlaylistIE but it's a class method, so I would need to create a ThePlatformBaseIE for that to work, which seems to me like overkill. Or am I missing something? (Not super experienced with Python, as mentioned before)
What would be the right concept to fix this issue? I'm thinking it would make the most sense to extract the information from ThePlatform in order to have a single source of truth and potentially offer further information that can be used for filtering.

@bashonly
Copy link
Member

Anything I missed or still needs to be done?

import urllib.parse instead of import urllib

Another note: since the fixes, I'm having more problems with pypy...

I don't use pypy, but it seems like it might be a bug in pypy's JIT

apparently the info from the playlist is discarded.

This is what happens to url result info. url_transparent results will have their info override the final extractor's info, but since this is a playlist extractor, I don't think that is what we want to do here.

the info fields available in the playlist are different than those available from _extract_theplatform_metadata in ThePlatform.py

Then I think it would be better to not extract any of this extra metadata with the playlist url results, and only use what the video extractor returns, like this:

                yield self.url_result(f'https://www.cbc.ca/player/play/{video["id"]}', CBCPlayerIE)

@trainman261
Copy link
Contributor Author

OK, fixed that up. Anything else?

@pukkandan pukkandan removed the pending-fixes PR has had changes requested label Aug 20, 2023
yt_dlp/extractor/cbc.py Outdated Show resolved Hide resolved
@bashonly bashonly merged commit ed71189 into yt-dlp:master Aug 20, 2023
13 checks passed
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-request Request to support a new website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants