Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[massengeschmacktv] Update extractor #7813

Merged
merged 4 commits into from
Sep 17, 2023
Merged

Conversation

sb0stn
Copy link
Contributor

@sb0stn sb0stn commented Aug 11, 2023

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

The current implementation of the Massengeschmack.tv extractor fails to get the title of the video.

From the regex, it seems previously the title was contained within an h3 element, while now the following HTML is used on the site:

<div class="heading-wrapper">
  <h2 class="heading-black small">
    <span id="clip-title">Fernsehkritik-TV #202</span>
  </h2>
</div>

This pull requests updates the regex to work with the current HTML. Additionally the test case was updated, to reflect changes to the expected values.

Logs

Before [debug] Command-line config: ['--cookies', 'cookies.txt', '-vU', 'https://massengeschmack.tv/play/fktvplus41'] [debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8 [debug] yt-dlp version stable@2023.07.06 [b532a34] (pip) [debug] Python 3.11.4 (CPython arm64 64bit) - macOS-14.0-arm64-arm-64bit (OpenSSL 3.1.2 1 Aug 2023) [debug] exe versions: ffmpeg 6.0 (setts), ffprobe 6.0, phantomjs 2.1.1, rtmpdump 2.4 [debug] Optional libraries: Cryptodome-3.18.0, brotli-1.0.9, certifi-2023.07.22, mutagen-1.46.0, sqlite3-2.6.0, websockets-11.0.3 [debug] Proxy map: {} [debug] Loaded 1855 extractors [debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest Available version: stable@2023.07.06, Current version: stable@2023.07.06 yt-dlp is up to date (stable@2023.07.06) [massengeschmack.tv] Extracting URL: https://massengeschmack.tv/play/fktvplus41 [massengeschmack.tv] fktvplus41: Downloading webpage ERROR: [massengeschmack.tv] fktvplus41: Unable to extract title; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U File "/opt/homebrew/Cellar/yt-dlp/2023.7.6_1/libexec/lib/python3.11/site-packages/yt_dlp/extractor/common.py", line 710, in extract ie_result = self._real_extract(url) ^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/yt-dlp/2023.7.6_1/libexec/lib/python3.11/site-packages/yt_dlp/extractor/massengeschmacktv.py", line 32, in _real_extract title = clean_html(self._html_search_regex( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/yt-dlp/2023.7.6_1/libexec/lib/python3.11/site-packages/yt_dlp/extractor/common.py", line 1294, in _html_search_regex res = self._search_regex(pattern, string, name, default, fatal, flags, group) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Cellar/yt-dlp/2023.7.6_1/libexec/lib/python3.11/site-packages/yt_dlp/extractor/common.py", line 1258, in _search_regex raise RegexNotFoundError('Unable to extract %s' % _name) ```
After [debug] Command-line config: ['--cookies', 'cookies.txt', '-vU', 'https://massengeschmack.tv/play/fktvplus41'] [debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8 [debug] yt-dlp version stable@2023.07.06 [b532a34] (pip) [debug] Python 3.11.4 (CPython arm64 64bit) - macOS-14.0-arm64-arm-64bit (OpenSSL 3.1.2 1 Aug 2023) [debug] exe versions: ffmpeg 6.0 (setts), ffprobe 6.0, phantomjs 2.1.1, rtmpdump 2.4 [debug] Optional libraries: Cryptodome-3.18.0, brotli-1.0.9, certifi-2023.07.22, mutagen-1.46.0, sqlite3-2.6.0, websockets-11.0.3 [debug] Proxy map: {} [debug] Loaded 1855 extractors [debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest Available version: stable@2023.07.06, Current version: stable@2023.07.06 yt-dlp is up to date (stable@2023.07.06) [massengeschmack.tv] Extracting URL: https://massengeschmack.tv/play/fktvplus41 [massengeschmack.tv] fktvplus41: Downloading webpage [massengeschmack.tv] fktvplus41: Downloading m3u8 information [debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id [debug] Default format spec: bestvideo*+bestaudio/best [info] fktvplus41: Downloading 1 format(s): hls-2465 [debug] Invoking hlsnative downloader on "https://dla3.massengeschmack.tv/stream/95876ce2afc8f7145a53fd765cdb2bcb/64d6b477/fktvplus41/5750k/fktvplus41_.m3u8" [hlsnative] Downloading m3u8 manifest [hlsnative] Total fragments: 67 [download] Destination: FKTV PLUS #41 [fktvplus41].mp4 [download] 100% of 195.62MiB in 00:00:20 at 9.57MiB/s [debug] ffprobe command line: ffprobe -hide_banner -show_format -show_streams -print_format json 'file:FKTV PLUS #41 [fktvplus41].mp4' [debug] ffmpeg command line: ffprobe -show_streams 'file:FKTV PLUS #41 [fktvplus41].mp4' [FixupM3u8] Fixing MPEG-TS in MP4 container of "FKTV PLUS #41 [fktvplus41].mp4" [debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:FKTV PLUS #41 [fktvplus41].mp4' -map 0 -dn -ignore_unknown -c copy -f mp4 -bsf:a aac_adtstoasc -movflags +faststart 'file:FKTV PLUS #41 [fktvplus41].temp.mp4'

Fixes [issue not reported yet]

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Copilot Summary

🤖 Generated by Copilot at fc04e06

Summary

🛠️🖼️✅

Improved an extractor for a German media website and updated its test case.

Oh, we're the crew of the Massengeschmack
And we scrape the web for videos to watch
We've improved our extractor to handle the changes
And we've added a thumbnail field to our test cases

Walkthrough

  • Update title extraction regex to handle different HTML layouts and capitalizations (link)
  • Update test case for MassengeschmackTVIE extractor to match the new title format, video hash, and thumbnail URL (link)

@bashonly bashonly added the site-bug Issue with a specific website label Aug 21, 2023
@bashonly bashonly added the pending-fixes PR has had changes requested label Sep 16, 2023
@bashonly bashonly removed the pending-fixes PR has had changes requested label Sep 17, 2023
@bashonly bashonly merged commit 81f46ac into yt-dlp:master Sep 17, 2023
16 checks passed
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants