Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DW] [Deutsche Welle] An extractor error has occurred. (caused by KeyError('media_title')) #4944

Open
9 of 10 tasks
kiufta opened this issue Sep 16, 2022 · 13 comments
Open
9 of 10 tasks
Labels
patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-bug Issue with a specific website

Comments

@kiufta
Copy link

kiufta commented Sep 16, 2022

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I remove or skip any mandatory* field

Checklist

Region

Germany

Provide a description that is worded well enough to be understood

I'm trying to download this Deutsche Welle podcast (only audio!) https://www.dw.com/de/wenn-der-strom-aus-der-luft-kommt/av-63153677
with ArchLinux youtube-dlp (up-to-date): It fails with the KeyError('media_title') error.

This URL doesn't contain characters which need escaping.

The error seems to be the same like in the otherwise unrelated issue #2606

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[papa@main podcast]$ youtube-dl -vU https://www.dw.com/de/wenn-der-strom-aus-der-luft-kommt/av-63153677
[debug] Command-line config: ['-vU', 'https://www.dw.com/de/wenn-der-strom-aus-der-luft-kommt/av-63153677']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.09.01 [5d7c7d6]
[debug] Python 3.10.7 (CPython 64bit) - Linux-5.19.8-arch1-1-x86_64-with-glibc2.36 (glibc 2.36)
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] exe versions: ffmpeg N-107098-g4d45f5acbd (setts), ffprobe N-107098-g4d45f5acbd, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.12.0, brotlicffi-1.0.9.2, certifi-2022.09.14, mutagen-1.45.1, secretstorage-3.3.3, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Loaded 1670 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2022.09.01, Current version: 2022.09.01
yt-dlp is up to date (2022.09.01)
[debug] [dw] Extracting URL: https://www.dw.com/de/wenn-der-strom-aus-der-luft-kommt/av-63153677
[dw] 63153677: Downloading webpage
ERROR: 63153677: An extractor error has occurred. (caused by KeyError('media_title')); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "/usr/lib/python3.10/site-packages/yt_dlp/extractor/common.py", line 670, in extract
    ie_result = self._real_extract(url)
  File "/usr/lib/python3.10/site-packages/yt_dlp/extractor/dw.py", line 53, in _real_extract
    title = hidden_inputs['media_title']
KeyError: 'media_title'
@kiufta kiufta added site-bug Issue with a specific website triage Untriaged issue labels Sep 16, 2022
@gamer191

This comment was marked as resolved.

@pukkandan
Copy link
Member

❯ yt-dlp -v https://www.dw.com/en/under-construction-indonesias-new-capital-nusantara/av-63121733#spark_wn=1
[debug] Command-line config: ['--ignore-config', '-v', 'https://www.dw.com/en/under-construction-indonesias-new-capital-nusantara/av-63121733#spark_wn=1']
[debug] Encodings: locale cp65001, fs utf-8, pref cp65001, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.09.01 [5d7c7d656] (source)
[debug] Lazy loading extractors is disabled
[debug] Plugins: ['SamplePluginIE', 'SamplePluginPP']
[debug] Git HEAD: dab284f80
[debug] Python 3.10.7 (CPython 64bit) - Windows-10-10.0.22000-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] exe versions: ffmpeg N-107787-gc469c3c3b1-20220814 (fdk,setts), ffprobe N-107787-gc469c3c3b1-20220814, phantomjs 2.1.1
[debug] Optional libraries: Cryptodome-3.14.1, brotli-1.0.9, certifi-2021.10.08, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.1
[debug] Proxy map: {}
[debug] Loaded 1673 extractors
[debug] [dw] Extracting URL: https://www.dw.com/en/under-construction-indonesias-new-capital-nusantara/av-63121733#spark_wn=1
[dw] 63121733: Downloading webpage
ERROR: 63121733: An extractor error has occurred. (caused by KeyError('media_title')); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "D:\Programs\Source\yt-dlp\yt-dlp\yt_dlp\extractor\common.py", line 672, in extract
    ie_result = self._real_extract(url)
  File "D:\Programs\Source\yt-dlp\yt-dlp\yt_dlp\extractor\dw.py", line 53, in _real_extract
    title = hidden_inputs['media_title']
KeyError: 'media_title'

@pukkandan pukkandan removed the triage Untriaged issue label Sep 17, 2022
@dirkf
Copy link
Contributor

dirkf commented Sep 18, 2022

In yt-dlp and yt-dl, --force-generic captures this HTML5 source.

@Vangelis66

This comment was marked as resolved.

@pukkandan

This comment was marked as resolved.

@dirkf

This comment was marked as resolved.

@Vangelis66

This comment was marked as resolved.

@dirkf

This comment was marked as resolved.

@Vangelis66

This comment was marked as resolved.

@Pablohn26
Copy link

I am hitting the same problem:

yt-dlp_linux -v -F  "https://www.dw.com/es/noticias-%C3%BAltima-hora/av-18529525"
[debug] Command-line config: ['-v', '-F', 'https://www.dw.com/es/noticias-%C3%BAltima-hora/av-18529525']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.01.06 [6becd25] (linux_exe)
[debug] Python 3.10.8 (CPython x86_64 64bit) - Linux-6.0.0-6-amd64-x86_64-with-glibc2.36 (OpenSSL 3.0.7 1 Nov 2022, glibc 2.36)
[debug] exe versions: ffmpeg 5.1.2-1 (setts), ffprobe 5.1.2-1
[debug] Optional libraries: Cryptodome-3.16.0, brotli-1.0.9, certifi-2022.12.07, mutagen-1.46.0, sqlite3-2.6.0, websockets-10.4
[debug] Proxy map: {}
[debug] Loaded 1760 extractors
[dw] Extracting URL: https://www.dw.com/es/noticias-%C3%BAltima-hora/av-18529525
[dw] 18529525: Downloading webpage
ERROR: 18529525: An extractor error has occurred. (caused by KeyError('media_title')); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "yt_dlp/extractor/common.py", line 680, in extract
  File "yt_dlp/extractor/dw.py", line 53, in _real_extract
KeyError: 'media_title'

@dirkf
Copy link
Contributor

dirkf commented Jan 11, 2023

This page has what would be HTML5 video except that the tag is <video-js> instead of <video>. This is used for DW's player which appears to be based on video.js.

This structure hasn't been targeted in any existing yt-dl extractor AFAIK. In the BandaiChannel extractor the same tag introduces a Brightcove player that doesn't use <source> elements. Of course video.js is a BC project. The documentation of its getPlayer() method says:

  • @param {string|Element} id
  •      An HTML element - `<video>`, `<audio>`, or `<video-js>` -
    
  •      or a string matching the `id` of such an element.
    

So this patch to the HTML5 parser, which extracts DW's Spanish news, as above, with --force-generic in yt-dl, and I expect in yt-dlp, doesn't seem unreasonable:

-        _MEDIA_TAG_NAME_RE = r'(?:(?:amp|dl8(?:-live)?)-)?(video|audio)'
+        _MEDIA_TAG_NAME_RE = r'(?:(?:amp|dl8(?:-live)?)-)?(video(?:-js)?|audio)'

@Vangelis66
Copy link

So this patch to the HTML5 parser,

... FTR, the code to be patched lives inside ./yt-dlp/yt_dlp/extractor/common.py (and in a similar location in youtube-dl 😉 ) ...

which extracts DW's Spanish news, as above, with --force-generic in yt-dl,

... Using dw.py (with the associated fixes inside common.py) from this df-dw-extractor-ovrhaul youtube-dl branch, it is not necessary to issue --force-generic:

youtube-dl -vF "https://www.dw.com/es/noticias-%C3%BAltima-hora/av-18529525" => 

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-vF', 'https://www.dw.com/es/noticias-%C3%BAltima-hora/av-18529525']
[debug] Encodings: locale cp1253, fs mbcs, out cp737, pref cp1253
[debug] youtube-dl version 2023.01.11.114514
[debug] Python version 3.4.4 (CPython) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg 5.0, ffprobe 5.0, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[dw] 18529525: Downloading webpage
[dw] 18529525: Downloading m3u8 information
[info] Available formats for 18529525:
format code  extension  resolution note
305          mp4        480x270     305k , avc1.42c015, 25.0fps, mp4a.40.2
572          mp4        512x288     572k , avc1.4d4015, 25.0fps, mp4a.40.2
1108         mp4        640x360    1108k , avc1.4d401e, 25.0fps, mp4a.40.2
2665         mp4        960x540    2665k , avc1.4d401f, 25.0fps, mp4a.40.2
5193         mp4        1280x720   5193k , avc1.64001f, 25.0fps, mp4a.40.2
6676         mp4        1920x1080  6676k , avc1.640028, 25.0fps, mp4a.40.2 (best)

and I expect in yt-dlp,

Indeed:

yt-dlp -vF --ies generic,html5 "https://www.dw.com/es/noticias-%C3%BAltima-hora/av-18529525" => 

[debug] Command-line config: ['-vF', '--ies', 'generic,html5', 'https://www.dw.com/es/noticias-%C3%BAltima-hora/av-18529525']
[debug] Portable config "<redacted>\yt-dlp.conf": ['--ffmpeg-location', '../../..', '--downloader-args', 'ffmpeg:-v 8 -stats']
[debug] Encodings: locale cp1253, fs utf-8, pref cp1253, out utf-8 (No VT), error utf-8 (No VT), screen utf-8 (No VT)
[debug] yt-dlp version 2023.01.07 [87ebab0] (source*)
[debug] Python 3.7.16 (CPython x86 32bit) - Windows-Vista-6.0.6003-SP2 (OpenSSL 1.1.1s  1 Nov 2022)
[debug] exe versions: ffmpeg 5.0 (fdk,setts), ffprobe 5.0
[debug] Optional libraries: sqlite3-2.6.0
[debug] Proxy map: {}
[debug] Extractor Plugins: AGB+NSIG (YoutubeIE)
[debug] Plugin directories: {'<redacted>\\yt-dlp-plugins\\YTNSigProxy.zip\\yt_dlp_plugins', '<redacted>\\yt-dlp-plugins\\YTAgeGateBypass.zip\\yt_dlp_plugins'}
[debug] Loaded 2 extractors
[generic] Extracting URL: https://www.dw.com/es/noticias-%C3%BAltima-hora/av-18529525
[generic] av-18529525: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] av-18529525: Extracting information
[debug] Looking for embeds
[html5] av-18529525: Downloading m3u8 information
[debug] Identified a html5 embed
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for av-18529525-1:
ID       EXT RESOLUTION FPS |   TBR PROTO | VCODEC        VBR ACODEC    ABR
---------------------------------------------------------------------------
hls-305  mp4 480x270     25 |  305k m3u8  | avc1.42c015  305k mp4a.40.2  0k
hls-572  mp4 512x288     25 |  572k m3u8  | avc1.4d4015  572k mp4a.40.2  0k
hls-1108 mp4 640x360     25 | 1108k m3u8  | avc1.4d401e 1108k mp4a.40.2  0k
hls-2665 mp4 960x540     25 | 2665k m3u8  | avc1.4d401f 2665k mp4a.40.2  0k
hls-5193 mp4 1280x720    25 | 5193k m3u8  | avc1.64001f 5193k mp4a.40.2  0k
hls-6676 mp4 1920x1080   25 | 6676k m3u8  | avc1.640028 6676k mp4a.40.2  0k

dirkf added a commit to dirkf/youtube-dl that referenced this issue Jan 11, 2023
@pukkandan pukkandan added the patch-available There is patch available that should fix this issue. Someone needs to make a PR with it label Jan 11, 2023
@bashonly
Copy link
Member

bashonly commented May 1, 2023

see also the uncannily similar suggested solution here: #6764 (comment)

@bashonly bashonly changed the title [Deutsche Welle] An extractor error has occurred. (caused by KeyError('media_title')) [DW] [Deutsche Welle] An extractor error has occurred. (caused by KeyError('media_title')) May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch-available There is patch available that should fix this issue. Someone needs to make a PR with it site-bug Issue with a specific website
Projects
None yet
Development

No branches or pull requests

7 participants