Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ArteTV] Wrong subtitles language #9242

Open
11 tasks done
ethicnology opened this issue Feb 18, 2024 · 6 comments
Open
11 tasks done

[ArteTV] Wrong subtitles language #9242

ethicnology opened this issue Feb 18, 2024 · 6 comments
Labels
bug Bug that is not site-specific site-bug Issue with a specific website

Comments

@ethicnology
Copy link

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

Region

France, Germany

Provide a description that is worded well enough to be understood

I'm testing the nightly version to try the new feature that collect closed captions (aka language-acc.vtt) subtitles from arte.tv.

I discovered that some subtitles tracks that does not exist on the website are downloaded.

yt-dlp --list-subs https://www.arte.tv/fr/videos/110970-002-A

Language Formats
fr       vtt, vtt
de       vtt, vtt
en       vtt

The English subtitle is actually in French and the browser does not display any english track

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['--list-subs', 'https://www.arte.tv/fr/videos/110970-002-A', '-vU']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version nightly@2024.02.14.232704 from yt-dlp/yt-dlp-nightly-builds [fb44020fa] (pip)
[debug] Python 3.11.7 (CPython arm64 64bit) - macOS-14.1.1-arm64-arm-64bit (OpenSSL 3.2.1 30 Jan 2024)
[debug] exe versions: ffmpeg 6.1.1 (setts), ffprobe 6.1.1
[debug] Optional libraries: Cryptodome-3.19.0, brotli-1.1.0, certifi-2024.02.02, mutagen-1.47.0, requests-2.31.0, sqlite3-3.45.1, urllib3-1.26.18, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1833 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp-nightly-builds/releases/latest
[debug] Downloading _update_spec from https://github.com/yt-dlp/yt-dlp-nightly-builds/releases/latest/download/_update_spec
Current version: nightly@2024.02.14.232704 from yt-dlp/yt-dlp-nightly-builds
Latest version: nightly@2024.02.17.232706 from yt-dlp/yt-dlp-nightly-builds
ERROR: You installed yt-dlp with pip or using the wheel from PyPi; Use that to update
[ArteTV] Extracting URL: https://www.arte.tv/fr/videos/110970-002-A
[ArteTV] 110970-002-A: Downloading JSON metadata
WARNING: [ArteTV] Video is geo restricted. Retrying extraction with fake IP 70.36.6.219 (PM) as X-Forwarded-For.
[ArteTV] Extracting URL: https://www.arte.tv/fr/videos/110970-002-A
[ArteTV] 110970-002-A: Downloading JSON metadata
[ArteTV] 110970-002-A: Downloading m3u8 information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
[info] Available subtitles for 110970-002-A:
Language Formats
fr       vtt, vtt
de       vtt, vtt
en       vtt
@ethicnology ethicnology added site-bug Issue with a specific website triage Untriaged issue labels Feb 18, 2024
@bashonly
Copy link
Member

seems to be a long-standing issue that also occurs in stable and is unrelated to the closed captions patch

@bashonly bashonly changed the title [ArteTV] Wrong subtitles language on nightly [ArteTV] Wrong subtitles language Feb 18, 2024
@ethicnology
Copy link
Author

ethicnology commented Feb 18, 2024

You are right, this happens also on master

[debug] Command-line config: ['--list-subs', 'https://www.arte.tv/fr/videos/110970-002-A', '-vU']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2023.12.30 from yt-dlp/yt-dlp [f10589e34] (zip)
[debug] Python 3.11.7 (CPython arm64 64bit) - macOS-14.1.1-arm64-arm-64bit (OpenSSL 3.2.1 30 Jan 2024)
[debug] exe versions: ffmpeg 6.1.1 (setts), ffprobe 6.1.1
[debug] Optional libraries: Cryptodome-3.19.0, brotli-1.1.0, certifi-2024.02.02, mutagen-1.47.0, requests-2.31.0, sqlite3-3.45.1, urllib3-1.26.18, websockets-12.0
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests, websockets
[debug] Loaded 1798 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: stable@2023.12.30 from yt-dlp/yt-dlp
yt-dlp is up to date (stable@2023.12.30 from yt-dlp/yt-dlp)
[ArteTV] Extracting URL: https://www.arte.tv/fr/videos/110970-002-A
[ArteTV] 110970-002-A: Downloading JSON metadata
WARNING: [ArteTV] Video is geo restricted. Retrying extraction with fake IP 82.117.29.126 (LI) as X-Forwarded-For.
[ArteTV] Extracting URL: https://www.arte.tv/fr/videos/110970-002-A
[ArteTV] 110970-002-A: Downloading JSON metadata
[ArteTV] 110970-002-A: Downloading m3u8 information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
[info] Available subtitles for 110970-002-A:
Language Formats
fr       vtt, vtt
de       vtt, vtt
en       vtt

@seproDev
Copy link
Member

The m3u8 served by Arte tags those tracks as English:

#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subtitle_0",LANGUAGE="en",NAME="VO (Forced)",AUTOSELECT=YES,FORCED=YES,DEFAULT=NO,URI="https://arte-cmafhlssub.akamaized.net/am/cmaf/110000/110900/110970-002-A/230510004704/medias/110970-002-A_st_VO-FRA.m3u8"

To solve this temporarily we could add more language overwrites based on the file URL, but to solve this properly, yt-dlp needs to support for subtitle track names and additional flags such as "forced" and "hearing impaired".

@seproDev seproDev removed the triage Untriaged issue label Feb 19, 2024
@bashonly bashonly added the bug Bug that is not site-specific label Feb 19, 2024
@WassimAttar
Copy link
Contributor

I stopped using https://www.arte.tv/
Instead i use https://www.youtube.com/@arte/videos
youtube is more reliable and audiotracks is better managed
On arte's youtube channel you can find 99% of the videos

@ethicnology
Copy link
Author

Unfortunately, most of the georestricted content is not available on YouTube.
Usually they publish things they produced or took part in production.

This includes a large part of the Cinema catalogue, some documentaries and series.

@Fritz66
Copy link

Fritz66 commented Mar 30, 2024

Is it possible to preset a language with yt-dlp?
Here is my issue: When I want to download this movie


https://www.arte.tv/de/videos/047384-000-A/die-kleinen-pariserinnen/
ID                         EXT RESOLUTION FPS |   FILESIZE   TBR PROTO | VCODEC        VBR ACODEC     MORE INFO
-----------------------------------------------------------------------------------------------------------------------
VA-STA-audio_0-Deutsch     mp4 audio only     |                  m3u8  | audio only        unknown    [de] Deutsch [DE]
VA-STA-audio_0-Französisch mp4 audio only     |                  m3u8  | audio only        unknown    [fr] Deutsch [DE]
VA-STA-428                 mp4 384x216     25 | ~301.07MiB  428k m3u8  | avc1.42e00d  428k video only Deutsch [DE]
VA-STA-727                 mp4 640x360     25 | ~511.42MiB  728k m3u8  | avc1.4d401e  728k video only Deutsch [DE]
VA-STA-1126                mp4 768x432     25 | ~791.71MiB 1127k m3u8  | avc1.4d401e 1127k video only Deutsch [DE]
VA-STA-1924                mp4 1280x720    25 | ~  1.32GiB 1925k m3u8  | avc1.4d401f 1925k video only Deutsch [DE]
VA-STA-2167                mp4 1920x1080   25 | ~  1.49GiB 2168k m3u8  | avc1.4d0028 2168k video only Deutsch [DE]

I have to download the audio and video tracks separately and then merge them with ffmpeg. Well, it works, but it's annoying.
In this example it's
yt-dlp https://www.arte.tv/de/videos/047384-000-A -f VA-STA-audio_0-Deutsch -o audio ; yt-dlp https://www.arte.tv/de/videos/047384-000-A -f VA-STA-2167 -o video
followed by
ffmpeg -i audio -i video -c:a copy -c:v copy movie.mp4

It would be nice if it would always use "VA-STA-audio_0-Deutsch" and if this language is not available, print a message like "language not available" and stop.

Edit: I'm no heavy Githuber and I don't how know to open a new issue.

vtexier added a commit to vtexier/yt-dlp that referenced this issue May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug that is not site-specific site-bug Issue with a specific website
Projects
None yet
Development

No branches or pull requests

5 participants