Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extractor/cbsnews] Overhaul extractors #6681

Merged
merged 3 commits into from May 29, 2023
Merged

Conversation

bashonly
Copy link
Member

@bashonly bashonly commented Mar 30, 2023

Get cbsnews.com extraction working again:

  • The CBSNews extractor was previously subclassed from the CBSIE concrete extractor and using an API endpoint that CBS seems to have decommissioned. Furthermore, it seems that almost all cbs.com and paramountplus.com content is now DRM-protected; there are some exceptions but they seem to be rare. Hence the CBSNews overhaul

  • Very old cbsnews.com video embeds (such as the videos in the first CBSNewsIE test case) will no longer play on the website, they will instead show a redirect link to watch the video on ParamountPlus, but in my testing the links have always resulted in a 404.

  • Sometime last year, all of the CBS-owned news station sites were moved from cbslocal.com to cbsnews.com. Because of this, and because the CBSNews and CBSLocal extractors now share a lot of common methods, I have removed the cbslocal module and moved its extractors into cbsnews.py. The old URLs now redirect to the new URL format, e.g. losangeles.cbslocal.com now redirects to www.cbsnews.com/losangeles/, and so do all of its article and video links, unless the article/video has been removed. The old URL formats are no longer supported since any old-style links that are still alive will redirect to a supported URL.

  • Did some cleanup in the Anvato extractor; the CBSLocal extractors were the only extractors to use the _extract_anvato_videos method, and it no longer works

  • Also added support for the national CBS News livestream and the CBS-owned local stations' livestreams

Closes #6565

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

🤖 Generated by Copilot at 48cce62

Summary

🔀🧹🛠️

This pull request improves and fixes some CBS-related extractors, and removes an unused and redundant extractor. It simplifies the Anvato extractor by using existing helper functions and arguments. It also reorganizes the imports of the CBS extractors to avoid circular dependencies.

We're heaving on the ropes, me hearties, on the count of three
We're moving all the CBS extractors to where they ought to be
We're simplifying Anvato and fixing up the tests
We're making yt_dlp better than the rest

Walkthrough

  • Consolidate CBS-related extractors into one file and avoid circular imports by moving CBSLocalIE and CBSLocalArticleIE to yt_dlp/extractor/_extractors.py and removing unused CBSInteractiveIE (link)
  • Use standard argument name fatal instead of errnote for indicating download failure in _get_anvato_videos method of AnvatoIE (link)
  • Remove duplicate _extract_anvato_videos method from AnvatoIE (link)
  • Enable the use of AnvatoIE and ParamountPlusIE in CBS-related classes and follow relative import convention by updating imports in yt_dlp/extractor/cbsnews.py (link)
  • Fix and improve test cases for CBSNewsIE by adding skip key for unavailable video, correcting ext value for subtitles, adding missing id key, and specifying skip_download parameter for m3u8 formats (link, link, link, link)
  • Follow convention and fix syntax error for test cases of CBSNewsLiveVideoIE by renaming _TEST to _TESTS and adding closing bracket (link, link)
  • Simplify and improve readability of return value of _real_extract method of CBSNewsLiveVideoIE by using traverse_obj utility function (link)
  • Delete yt_dlp/extractor/cbslocal.py file as it is no longer needed (link)

@bashonly bashonly added the site-bug Issue with a specific website label Mar 30, 2023
yt_dlp/extractor/cbsnews.py Outdated Show resolved Hide resolved
yt_dlp/extractor/cbsnews.py Outdated Show resolved Hide resolved
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
@bashonly bashonly merged commit f6e43d6 into yt-dlp:master May 29, 2023
11 checks passed
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cbsnews.com Most canged something in the last 7 days.
2 participants