Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ie/bpb] Support more urls and extract more metadata #8119

Merged
merged 2 commits into from Sep 16, 2023

Conversation

Grub4K
Copy link
Member

@Grub4K Grub4K commented Sep 15, 2023

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

This PR allows all possible url formats and extracts more metadata.

Fixes #8093

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Copilot Summary

🤖 Generated by Copilot at 73f6743

Summary

🎥🌐✨

This pull request improves the BpbIE extractor to support more media types and metadata from the Bundeszentrale für politische Bildung website. It also fixes a bug in the determine_ext function by adding a missing entry for video/ogg in the mimetype2ext dictionary.

Oh we are the coders of the high seas
We scrape and we parse and we download with ease
We map the MIME types to the right ext
And we update the BpbIE on the count of three

Walkthrough

  • Rewrite the _real_extract method of the BpbIE extractor to extract more metadata and formats from the webpage and JSON-LD data (link)
  • Update the imports, the _VALID_URL regex, the _TESTS list, and add some helper functions and constants to the BpbIE extractor (link)
  • Add a new entry to the mimetype2ext dictionary in the utils module to map video/ogg to ogv extension (link)

@Grub4K Grub4K added the site-bug Issue with a specific website label Sep 15, 2023
yt_dlp/extractor/bpb.py Outdated Show resolved Hide resolved
yt_dlp/extractor/bpb.py Outdated Show resolved Hide resolved
Comment on lines +118 to +125
def _parse_vue_attributes(self, name, string, video_id):
attributes = extract_attributes(self._search_regex(rf'(<{name}(?:"[^"]*?"|[^>])*>)', string, name))

for key, value in attributes.items():
if key.startswith(':'):
attributes[key] = self._parse_json(value, video_id, transform_source=js_to_json, fatal=False)

return attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that it might be useful to eventually generalize this and add it as an InfoExtractor helper method, since it is very similar to ZaikoBaseIE._parse_vue_element_attrs (but not similar enough as-is)

yt_dlp/extractor/bpb.py Outdated Show resolved Hide resolved
yt_dlp/extractor/bpb.py Outdated Show resolved Hide resolved
@bashonly bashonly added the pending-fixes PR has had changes requested label Sep 15, 2023
@bashonly bashonly added pending-review PR needs a review and removed pending-fixes PR has had changes requested labels Sep 15, 2023
@coletdjnz coletdjnz removed the pending-review PR needs a review label Sep 16, 2023
Copy link
Member

@coletdjnz coletdjnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Grub4K Grub4K merged commit f659e64 into yt-dlp:master Sep 16, 2023
13 checks passed
@Grub4K Grub4K deleted the ie/bpb branch September 17, 2023 11:02
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[site-request] bpb.de (Bundeszentrale für politische Bildung; Germany)
3 participants