Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[arte.tv extractor] Subtitles never found #30816

Open
5 tasks done
raphaelmerx opened this issue Apr 4, 2022 · 3 comments
Open
5 tasks done

[arte.tv extractor] Subtitles never found #30816

raphaelmerx opened this issue Apr 4, 2022 · 3 comments

Comments

@raphaelmerx
Copy link

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

youtube-dl --list-subs https://www.arte.tv/fr/videos/102958-001-A/en-therapie-saison-2-1-35/  -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--list-subs', 'https://www.arte.tv/fr/videos/102958-001-A/en-therapie-saison-2-1-35/', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 33709297c
[debug] Python version 3.10.2 (CPython) - macOS-12.2.1-arm64-arm-64bit
[debug] exe versions: ffmpeg 4.4.1, ffprobe 4.4.1, phantomjs 83104, rtmpdump 2.4
[debug] Proxy map: {}
[ArteTV] 102958-001-A: Downloading JSON metadata
[ArteTV] 102958-001-A: Downloading m3u8 information
[ArteTV] 102958-001-A: Downloading m3u8 information
[ArteTV] 102958-001-A: Downloading m3u8 information
102958-001-A has no subtitles

Description

The arte.tv extractor never finds subtitles, even though reading through its code, subtitles should be handled. For example for the video link above (https://www.arte.tv/fr/videos/102958-001-A/en-therapie-saison-2-1-35/), reading through the browser logs I can find subtitles for the subtitle file at https://arte-cmafhls.akamaized.net/am/cmaf/102000/102900/102958-001-B/220330204635/medias/102958-001-B_st_VF-MAL.vtt, yet running youtube-dl never finds subtitles.

@dirkf
Copy link
Contributor

dirkf commented Apr 4, 2022

The extractor doesn't support separate subtitles. Instead, you have to select the formats labelled VOx-STMx, so for FR these ones:

HTTPS_MQ_2     mp4        384x216    VOF-STMF, Français (sourds et malentendants)  300k 
HTTPS_HQ_2     mp4        640x360    VOF-STMF, Français (sourds et malentendants)  800k 
HTTPS_EQ_2     mp4        720x406    VOF-STMF, Français (sourds et malentendants) 1500k 
HTTPS_SQ_2     mp4        1280x720   VOF-STMF, Français (sourds et malentendants) 2200k 

The subtitles are built into the video stream, I guess. They were certainly visible in HTTPS_MQ_2 when I viewed.

There doesn't seem to be any reference to subtitle URLs in the site JSON used by the extractor, nor in the plain webpage, including its hydration JSON. Possibly they're hidden inside a m3u8 manifest. The routine that extracts formats from m3u8 has been enhanced in the yt-dlp fork to extract subtitles as well, but the yt-dlp extractor doesn't yet do that.

@fstirlitz
Copy link
Contributor

There is a newer API endpoint for Arte, which neither youtube-dl nor yt-dlp has been updated to use. This newer endpoint contains links to m3u8 manifests that do contain links to subtitle streams. The newer endpoint also enforces geoblocking, but XFF spoofing bypasses that easily.

Just change /v1 to /v2 in _API_BASE and see. (The data structure is different, obviously.)

@dirkf
Copy link
Contributor

dirkf commented Apr 10, 2022

Above PR maybe back-ported when practical to resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants