New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[extractor/rheinmaintv] Add extractor #7311
Conversation
Extractor for rheinmaintv.de.
Add new extractor to _extractors.py.
Fix a potential problem in _html_search_regex: _search_regex may return a tuple when several subpatterns are selected. (Moreover, the result of clean_html is stripped already.)
Fixed the test cases.
Cosmetic chenges.
Fixed indendation. Oops!
Cosmetic changes.
Test cases completed.
Changed comments to make linter (flake8) happy. (Although commented-out code should in fact start with ##.)
Cosmetic changes
Linter (flake8) did not approve the line splits...
Improved video_id and JSON-LD extraction.
Improved/fixed fallbacks.
Use library function instead of quick&dirty solution. (Plus, yet another small change of layout.)
Oops, still overlooked one issue... Use library function instead of new Python method. (Although this part of the code should, hopefully, never be executed.)
Further improved _html_search_regex.
Leave the formats alone. The final extension should be established elsewhere.
If the info_dict contains an extension, save it as the preferred format for merge (if not specified otherwise). In case of a merge, the extension will be overridden by the chosen (best) format without notice (for backwards compatibility, as the comment says). This looks very ugly but still safer than overriding extensions in all available formats. It would probably be much better to leave the info_dict alone.
Revert the change.
Reverted the change and sorted the imports.
Fixed a glitch during the merge.
Consider any format (file extension) in the info_dict when merging video/audio formats.
Added another test and improved a comment.
Undo the latest changes.
old `_VALID_URL` was not matching URLs w/ alphanumeric `display_id`s like https://www.rheinmaintv.de/sendungen/beitrag-video/bricks4kids/vom-22.06.2022/
yt_dlp/extractor/rheinmaintv.py
Outdated
**traverse_obj(json_ld, { | ||
'timestamp': 'timestamp', | ||
'duration': 'duration', | ||
'view_count': 'view_count', | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not merge_dict with the whole thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original PR author wanted to mix and match, they wanted to give priority to JSON LD for certain fields but also wanted non-JSON LD fallbacks for those (description
, title
)
But yeah ig merging the whole thing with JSON LD as secondary dict couldn't hurt
yt_dlp/utils/_utils.py
Outdated
sanitize_codec = functools.partial( | ||
try_get, getter=lambda x: x[0].split('.')[0].replace('0', '').lower()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge this separately
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
Authored by: barthelmannk Co-authored-by: barthelmannk <81305638+barthelmannk@users.noreply.github.com>
Supersedes #5840
PR commit 08d65a6 reapplies a change made in yt-dlp:master commit a538772 that was somehow reverted in 69bec67 - without this fix ismv+isma formats will not merge into mp4 container
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?
Copilot Summary
🤖 Generated by Copilot at 97e4572
Summary
📺🛠️🐛
This pull request adds a new extractor for Rhein-Main TV videos, by creating the
RheinMainTVIE
class in therheinmaintv.py
module and importing it in the_extractors.py
module. It also fixes a bug in theget_compatible_ext
function in theutils/_utils.py
module, by sanitizing the codec names before checking their compatibility with MP4 containers.Walkthrough
rheinmaintv.py
module that definesRheinMainTVIE
class (link)_real_extract
method to parse video information and formats from webpage (link)RheinMainTVIE
class in_extractors.py
and add valid URLs (link)RheinMainTVIE
inrheinmaintv.py
(link)sanitize_codec
function inutils/_utils.py
to convert codecs to lowercase (link)