Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extractor/rheinmaintv] Add extractor #7311

Merged
merged 39 commits into from Jun 22, 2023
Merged

Conversation

bashonly
Copy link
Member

@bashonly bashonly commented Jun 14, 2023

Supersedes #5840

PR commit 08d65a6 reapplies a change made in yt-dlp:master commit a538772 that was somehow reverted in 69bec67 - without this fix ismv+isma formats will not merge into mp4 container

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence): almost all code was authored by barthelmannk in RheinMainTV #5840

What is the purpose of your pull request?

Copilot Summary

🤖 Generated by Copilot at 97e4572

Summary

📺🛠️🐛

This pull request adds a new extractor for Rhein-Main TV videos, by creating the RheinMainTVIE class in the rheinmaintv.py module and importing it in the _extractors.py module. It also fixes a bug in the get_compatible_ext function in the utils/_utils.py module, by sanitizing the codec names before checking their compatibility with MP4 containers.

RheinMainTVIE added
Extract videos from German channel
Winter nights are bright

Walkthrough

  • Implement video extraction for Rhein-Main TV (link,link)
    • Add rheinmaintv.py module that defines RheinMainTVIE class (link)
    • Override _real_extract method to parse video information and formats from webpage (link)
    • Import RheinMainTVIE class in _extractors.py and add valid URLs (link)
    • Add tests for RheinMainTVIE in rheinmaintv.py (link)
  • Fix bug with uppercase codecs in MP4 containers (link)
    • Modify sanitize_codec function in utils/_utils.py to convert codecs to lowercase (link)
    • Avoid incorrect detection of incompatible formats such as ISMV (link)

barthelmannk and others added 30 commits December 19, 2022 17:06
Extractor for rheinmaintv.de.
Add new extractor to _extractors.py.
Fix a potential problem in _html_search_regex: _search_regex may return a tuple when several subpatterns are selected. (Moreover, the result of clean_html is stripped already.)
Fixed the test cases.
Cosmetic chenges.
Fixed indendation. Oops!
Cosmetic changes.
Test cases completed.
Changed comments to make linter (flake8) happy. (Although commented-out code should in fact start with ##.)
Cosmetic changes
Linter (flake8) did not approve the line splits...
Improved video_id and JSON-LD extraction.
Improved/fixed fallbacks.
Use library function instead of quick&dirty solution.
(Plus, yet another small change of layout.)
Oops, still overlooked one issue... Use library function instead of new Python method. (Although this part of the code should, hopefully, never be executed.)
Further improved _html_search_regex.
Leave the formats alone. The final extension should be established elsewhere.
If the info_dict contains an extension, save it as the preferred format for merge (if not specified otherwise). In case of a merge, the extension will be overridden by the chosen (best) format without notice (for backwards compatibility, as the comment says).

This looks very ugly but still safer than overriding extensions in all available formats. It would probably be much better to leave the info_dict alone.
Revert the change.
Reverted the change and sorted the imports.
Fixed a glitch during the merge.
Consider any format (file extension) in the info_dict when merging video/audio formats.
Added another test and improved a comment.
Undo the latest changes.
@bashonly bashonly added the site-request Request to support a new website label Jun 14, 2023
Comment on lines 95 to 99
**traverse_obj(json_ld, {
'timestamp': 'timestamp',
'duration': 'duration',
'view_count': 'view_count',
}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not merge_dict with the whole thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original PR author wanted to mix and match, they wanted to give priority to JSON LD for certain fields but also wanted non-JSON LD fallbacks for those (description, title)

But yeah ig merging the whole thing with JSON LD as secondary dict couldn't hurt

yt_dlp/extractor/rheinmaintv.py Outdated Show resolved Hide resolved
Comment on lines 3505 to 3506
sanitize_codec = functools.partial(
try_get, getter=lambda x: x[0].split('.')[0].replace('0', '').lower())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this separately

bashonly and others added 3 commits June 22, 2023 04:54
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
@bashonly bashonly merged commit 98cb1ed into yt-dlp:master Jun 22, 2023
11 checks passed
@bashonly bashonly deleted the pr/rhein branch July 2, 2023 16:38
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Authored by: barthelmannk

Co-authored-by: barthelmannk <81305638+barthelmannk@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-request Request to support a new website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants