[extractor/rheinmaintv] Add extractor #7311

bashonly · 2023-06-14T18:06:24Z

Supersedes #5840

PR commit 08d65a6 reapplies a change made in yt-dlp:master commit a538772 that was somehow reverted in 69bec67 - without this fix ismv+isma formats will not merge into mp4 container

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence): almost all code was authored by barthelmannk in RheinMainTV #5840

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

Copilot Summary

`🤖 Generated by Copilot at 97e4572`

Summary

📺🛠️🐛

This pull request adds a new extractor for Rhein-Main TV videos, by creating the RheinMainTVIE class in the rheinmaintv.py module and importing it in the _extractors.py module. It also fixes a bug in the get_compatible_ext function in the utils/_utils.py module, by sanitizing the codec names before checking their compatibility with MP4 containers.

RheinMainTVIE added
Extract videos from German channel
Winter nights are bright

Walkthrough

Implement video extraction for Rhein-Main TV (link,link)
- Add rheinmaintv.py module that defines RheinMainTVIE class (link)
- Override _real_extract method to parse video information and formats from webpage (link)
- Import RheinMainTVIE class in _extractors.py and add valid URLs (link)
- Add tests for RheinMainTVIE in rheinmaintv.py (link)
Fix bug with uppercase codecs in MP4 containers (link)
- Modify sanitize_codec function in utils/_utils.py to convert codecs to lowercase (link)
- Avoid incorrect detection of incompatible formats such as ISMV (link)

Extractor for rheinmaintv.de.

Add new extractor to _extractors.py.

Fix a potential problem in _html_search_regex: _search_regex may return a tuple when several subpatterns are selected. (Moreover, the result of clean_html is stripped already.)

Fixed the test cases.

Cosmetic chenges.

Fixed indendation. Oops!

Cosmetic changes.

Test cases completed.

Changed comments to make linter (flake8) happy. (Although commented-out code should in fact start with ##.)

Cosmetic changes

Linter (flake8) did not approve the line splits...

Improved video_id and JSON-LD extraction.

Improved/fixed fallbacks.

Use library function instead of quick&dirty solution. (Plus, yet another small change of layout.)

Oops, still overlooked one issue... Use library function instead of new Python method. (Although this part of the code should, hopefully, never be executed.)

Further improved _html_search_regex.

Leave the formats alone. The final extension should be established elsewhere.

If the info_dict contains an extension, save it as the preferred format for merge (if not specified otherwise). In case of a merge, the extension will be overridden by the chosen (best) format without notice (for backwards compatibility, as the comment says). This looks very ugly but still safer than overriding extensions in all available formats. It would probably be much better to leave the info_dict alone.

Revert the change.

Reverted the change and sorted the imports.

Fixed a glitch during the merge.

Consider any format (file extension) in the info_dict when merging video/audio formats.

Added another test and improved a comment.

Undo the latest changes.

old `_VALID_URL` was not matching URLs w/ alphanumeric `display_id`s like https://www.rheinmaintv.de/sendungen/beitrag-video/bricks4kids/vom-22.06.2022/

pukkandan · 2023-06-22T04:46:22Z

yt_dlp/extractor/rheinmaintv.py

+            **traverse_obj(json_ld, {
+                'timestamp': 'timestamp',
+                'duration': 'duration',
+                'view_count': 'view_count',
+            }),


Why not merge_dict with the whole thing?

I think the original PR author wanted to mix and match, they wanted to give priority to JSON LD for certain fields but also wanted non-JSON LD fallbacks for those (description, title)

But yeah ig merging the whole thing with JSON LD as secondary dict couldn't hurt

yt_dlp/extractor/rheinmaintv.py

pukkandan · 2023-06-22T04:47:42Z

yt_dlp/utils/_utils.py

+    sanitize_codec = functools.partial(
+        try_get, getter=lambda x: x[0].split('.')[0].replace('0', '').lower())


I'll merge this separately

Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>

Authored by: barthelmannk Co-authored-by: barthelmannk <81305638+barthelmannk@users.noreply.github.com>

barthelmannk and others added 30 commits December 19, 2022 17:06

Add files via upload

3242430

Extractor for rheinmaintv.de.

Update _extractors.py

fe5a5b5

Add new extractor to _extractors.py.

Update common.py

11a29f8

Fix a potential problem in _html_search_regex: _search_regex may return a tuple when several subpatterns are selected. (Moreover, the result of clean_html is stripped already.)

Update rheinmaintv.py

9b43787

Fixed the test cases.

Update rheinmaintv.py

6508252

Cosmetic chenges.

Update common.py

d4eb78a

Fixed indendation. Oops!

Update rheinmaintv.py

b06bf38

Cosmetic changes.

Update rheinmaintv.py

68a9421

Test cases completed.

Update rheinmaintv.py

e5aaa5f

Changed comments to make linter (flake8) happy. (Although commented-out code should in fact start with ##.)

Update rheinmaintv.py

93f9432

Cosmetic changes

Update rheinmaintv.py

a500e20

Linter (flake8) did not approve the line splits...

Update rheinmaintv.py

164827c

Improved video_id and JSON-LD extraction.

Update rheinmaintv.py

7224c30

Improved/fixed fallbacks.

Update rheinmaintv.py

6a04563

Use library function instead of quick&dirty solution. (Plus, yet another small change of layout.)

Update rheinmaintv.py

c91da52

Oops, still overlooked one issue... Use library function instead of new Python method. (Although this part of the code should, hopefully, never be executed.)

Update common.py

542cfa5

Further improved _html_search_regex.

Update rheinmaintv.py

bc9503d

Leave the formats alone. The final extension should be established elsewhere.

Merge branch 'master' into rheinmaintv

b4fe90b

Update YoutubeDL.py

636e869

Revert the change.

Update rheinmaintv.py

599a9b7

Reverted the change and sorted the imports.

Update common.py

bf4fdb7

Fixed a glitch during the merge.

Update YoutubeDL.py

fa85dcf

Consider any format (file extension) in the info_dict when merging video/audio formats.

Update rheinmaintv.py

95a84e0

Added another test and improved a comment.

Apply suggestions from code review

84a959e

Update rheinmaintv.py

64c71f1

Undo the latest changes.

Revert core changes

cf015ad

Revert ext hack

2e72388

Test cleanup part 1

b3f3e40

Merge branch 'yt-dlp:master' into pr/rhein

01a49eb

bashonly added 5 commits June 14, 2023 11:07

Cleanup tests part 2

af3f553

[utils] Fix case bug in sanitize_codec

08d65a6

Add test comment

60d531e

Cleanup

937b761

Relax/cleanup _VALID_URL regex

97e4572

old `_VALID_URL` was not matching URLs w/ alphanumeric `display_id`s like https://www.rheinmaintv.de/sendungen/beitrag-video/bricks4kids/vom-22.06.2022/

bashonly added the site-request Request to support a new website label Jun 14, 2023

Add test for updated _VALID_URL

eb98f41

pukkandan approved these changes Jun 22, 2023

View reviewed changes

pukkandan assigned bashonly Jun 22, 2023

bashonly and others added 3 commits June 22, 2023 04:54

remove superfluous comment

07cb271

Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>

revert change to _utils.py

9b23e79

merge dicts

011d56b

bashonly merged commit 98cb1ed into yt-dlp:master Jun 22, 2023
11 checks passed

bashonly deleted the pr/rhein branch July 2, 2023 16:38

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024

[extractor/rheinmaintv] Add extractor (yt-dlp#7311)

cf84f94

Authored by: barthelmannk Co-authored-by: barthelmannk <81305638+barthelmannk@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[extractor/rheinmaintv] Add extractor #7311

[extractor/rheinmaintv] Add extractor #7311

bashonly commented Jun 14, 2023 •

edited by ghost

pukkandan Jun 22, 2023

bashonly Jun 22, 2023

pukkandan Jun 22, 2023

		sanitize_codec = functools.partial(
		try_get, getter=lambda x: x[0].split('.')[0].replace('0', '').lower())

[extractor/rheinmaintv] Add extractor #7311

[extractor/rheinmaintv] Add extractor #7311

Conversation

bashonly commented Jun 14, 2023 • edited by ghost

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

What is the purpose of your pull request?

🤖 Generated by Copilot at 97e4572

Summary

Walkthrough

pukkandan Jun 22, 2023

Choose a reason for hiding this comment

bashonly Jun 22, 2023

Choose a reason for hiding this comment

pukkandan Jun 22, 2023

Choose a reason for hiding this comment

bashonly commented Jun 14, 2023 •

edited by ghost

`🤖 Generated by Copilot at 97e4572`