[extractor/cbsnews] Overhaul extractors #6681
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Get
cbsnews.com
extraction working again:The
CBSNews
extractor was previously subclassed from theCBSIE
concrete extractor and using an API endpoint that CBS seems to have decommissioned. Furthermore, it seems that almost all cbs.com and paramountplus.com content is now DRM-protected; there are some exceptions but they seem to be rare. Hence the CBSNews overhaulVery old cbsnews.com video embeds (such as the videos in the first
CBSNewsIE
test case) will no longer play on the website, they will instead show a redirect link to watch the video on ParamountPlus, but in my testing the links have always resulted in a 404.Sometime last year, all of the CBS-owned news station sites were moved from cbslocal.com to cbsnews.com. Because of this, and because the CBSNews and CBSLocal extractors now share a lot of common methods, I have removed the
cbslocal
module and moved its extractors intocbsnews.py
. The old URLs now redirect to the new URL format, e.g.losangeles.cbslocal.com
now redirects towww.cbsnews.com/losangeles/
, and so do all of its article and video links, unless the article/video has been removed. The old URL formats are no longer supported since any old-style links that are still alive will redirect to a supported URL.Did some cleanup in the Anvato extractor; the CBSLocal extractors were the only extractors to use the
_extract_anvato_videos
method, and it no longer worksAlso added support for the national CBS News livestream and the CBS-owned local stations' livestreams
Closes #6565
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?
🤖 Generated by Copilot at 48cce62
Summary
🔀🧹🛠️
This pull request improves and fixes some CBS-related extractors, and removes an unused and redundant extractor. It simplifies the Anvato extractor by using existing helper functions and arguments. It also reorganizes the imports of the CBS extractors to avoid circular dependencies.
Walkthrough
yt_dlp/extractor/_extractors.py
and removing unused CBSInteractiveIE (link)fatal
instead oferrnote
for indicating download failure in_get_anvato_videos
method ofAnvatoIE
(link)_extract_anvato_videos
method fromAnvatoIE
(link)yt_dlp/extractor/cbsnews.py
(link)skip
key for unavailable video, correctingext
value for subtitles, adding missingid
key, and specifyingskip_download
parameter for m3u8 formats (link, link, link, link)_TEST
to_TESTS
and adding closing bracket (link, link)_real_extract
method of CBSNewsLiveVideoIE by usingtraverse_obj
utility function (link)yt_dlp/extractor/cbslocal.py
file as it is no longer needed (link)