Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArteTV : Subtitles can't be embed / separated #3086

Closed
5 of 6 tasks
ValSlender opened this issue Mar 16, 2022 · 8 comments · Fixed by #3302
Closed
5 of 6 tasks

ArteTV : Subtitles can't be embed / separated #3086

ValSlender opened this issue Mar 16, 2022 · 8 comments · Fixed by #3302
Labels
site-enhancement Feature request for some website

Comments

@ValSlender
Copy link

Checklist

Region

France

Example URLs

https://www.arte.tv/fr/videos/104351-002-A/serviteur-du-peuple-1-23/

Description

When I download a video from arte.tv, yt-dlp respond "[EmbedSubtitle] There aren't any subtitles to embed". Some of the video available on the website don't have separate subtitles but certain have. That lead to subtitles direct write on the video.

Verbose log

D:\Users\valslender\Videos\yt-dlp>yt-dlp -vU https://www.arte.tv/fr/videos/104351-002-A/serviteur-du-peuple-1-23/
[debug] Command-line config: ['-vU', 'https://www.arte.tv/fr/videos/104351-002-A/serviteur-du-peuple-1-23/']
[debug] User config "C:\Users\valslender\AppData\Roaming\yt-dlp\config": ['-f', 'mp4', '-o', 'D:/Users/valslender/Downloads/%(title)s.%(ext)s', '--no-mtime', '--no-part', '--embed-subs', '--sub-lang', 'fr,en']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, err utf-8, pref cp1252
[debug] yt-dlp version 2022.03.08.1 [c0c2c57] (win_exe)
[debug] Python version 3.8.10 (CPython 64bit) - Windows-10-10.0.19044-SP0
[debug] exe versions: ffmpeg n5.0-4-g911d7f167c-20220224 (setts), ffprobe n5.0-4-g911d7f167c-20220224
[debug] Optional libraries: brotli, Cryptodome, mutagen, sqlite, websockets
[debug] Proxy map: {}
Latest version: 2022.03.08.1, Current version: 2022.03.08.1
yt-dlp is up to date (2022.03.08.1)
[debug] [ArteTV] Extracting URL: https://www.arte.tv/fr/videos/104351-002-A/serviteur-du-peuple-1-23/
[ArteTV] 104351-002-A: Downloading JSON metadata
[ArteTV] 104351-002-A: Downloading m3u8 information
[ArteTV] 104351-002-A: Downloading m3u8 information
[debug] Sort order given by extractor: res, quality
[debug] Formats sorted by: hasvid, ie_pref, res, quality, lang, fps, hdr:12(7), vcodec:vp9.2(10), acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[info] 104351-002-A: Downloading 1 format(s): HTTPS_SQ_1
[debug] Invoking downloader on "https://arteptweb-a.akamaihd.net/am/ptweb/104000/104300/104351-002-A_SQ_0_VO-STF_06263720_MP4-2200_AMM-PTWEB_1fDtn1DapHH.mp4"
[download] Destination: D:\Users\valslender\Downloads\Serviteur du peuple (1_23).mp4
[download] 100% of 720.35MiB in 08:11
[EmbedSubtitle] There aren't any subtitles to embed
@ValSlender ValSlender added site-enhancement Feature request for some website triage Untriaged issue labels Mar 16, 2022
@ValSlender ValSlender changed the title ART ArteTV : Subtitles can't be embed / separated Mar 16, 2022
@fstirlitz
Copy link
Contributor

fstirlitz commented Mar 16, 2022

Those videos are served with hardcoded/burned-in subtitles; there’s a separate video stream for each subtitle language.

$ yt-dlp -F --list-sub https://www.arte.tv/fr/videos/104351-002-A/serviteur-du-peuple-1-23/
104351-002-A has no subtitles
[info] Available formats for 104351-002-A:
ID            EXT RESOLUTION │   TBR PROTO  │ VCODEC       VBR ACODEC    ABR MORE INFO
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
HLS_XQ_2-61   mp4 audio only │   61k m3u8_n │ audio only       mp4a.40.2 61k
HLS_XQ_1-61   mp4 audio only │   61k m3u8_n │ audio only       mp4a.40.2 61k
HTTPS_MQ_2    mp4 384x216    │  300k https  │ unknown     300k unknown    0k VO-STA, Version originale - ST allemand
HTTPS_MQ_1    mp4 384x216    │  300k https  │ unknown     300k unknown    0k VO-STF, Version originale - ST français
HTTPS_HQ_2    mp4 640x360    │  800k https  │ unknown     800k unknown    0k VO-STA, Version originale - ST allemand
HTTPS_HQ_1    mp4 640x360    │  800k https  │ unknown     800k unknown    0k VO-STF, Version originale - ST français
HTTPS_EQ_2    mp4 720x406    │ 1500k https  │ unknown    1500k unknown    0k VO-STA, Version originale - ST allemand
HTTPS_EQ_1    mp4 720x406    │ 1500k https  │ unknown    1500k unknown    0k VO-STF, Version originale - ST français
HTTPS_SQ_2    mp4 1280x720   │ 2200k https  │ unknown    2200k unknown    0k VO-STA, Version originale - ST allemand
HTTPS_SQ_1    mp4 1280x720   │ 2200k https  │ unknown    2200k unknown    0k VO-STF, Version originale - ST français
HLS_XQ_2-358  mp4 384x216    │  358k m3u8_n │ avc1.66.30  358k mp4a.40.2  0k
HLS_XQ_1-358  mp4 384x216    │  358k m3u8_n │ avc1.66.30  358k mp4a.40.2  0k
HLS_XQ_2-919  mp4 640x360    │  919k m3u8_n │ avc1.77.30  919k mp4a.40.2  0k
HLS_XQ_1-919  mp4 640x360    │  919k m3u8_n │ avc1.77.30  919k mp4a.40.2  0k
HLS_XQ_2-1616 mp4 720x406    │ 1616k m3u8_n │ avc1.77.30 1616k mp4a.40.2  0k
HLS_XQ_1-1616 mp4 720x406    │ 1616k m3u8_n │ avc1.77.30 1616k mp4a.40.2  0k
HLS_XQ_2-2310 mp4 1280x720   │ 2310k m3u8_n │ avc1.77.30 2310k mp4a.40.2  0k
HLS_XQ_1-2310 mp4 1280x720   │ 2310k m3u8_n │ avc1.77.30 2310k mp4a.40.2  0k

Streams labelled VO-STA contain German subtitles, those with VO-STF contain French subtitles. (HLS_XQ_1 and HLS_XQ_2 and their derivatives should also have been labelled as VO-STF and VO-STA respectively, but apparently aren’t.) You can choose the format you prefer using the -f option. There is little we can do otherwise.

@fstirlitz
Copy link
Contributor

If you really insist on having subtitles in a proper format, you may attempt the following:

  1. Download the same episode from YouTube, where it has no burned-in subtitles
  2. Mask out the logos, then compute the difference between the video streams, blacking out every pixel which has sufficiently similar colour in both
  3. Run OCR on the resulting video stream

I am pretty sure someone sufficiently determined can write an FFmpeg command to do (1) at least, and there are tools available to do (2), since it used to be somewhat commonplace for ripping DVD subtitles. But I doubt it’s something yt-dlp should do.

@pukkandan pukkandan added external issue Issue with an external tool and removed triage Untriaged issue labels Mar 16, 2022
@fstirlitz
Copy link
Contributor

@pukkandan The fact that the format labels/descriptions are not propagated to the m3u8 sub-streams might be worth fixing, though.

pukkandan added a commit that referenced this issue Mar 16, 2022
@pukkandan
Copy link
Member

done

@fstirlitz
Copy link
Contributor

I was wrong. There is a newer API endpoint, and it does allow downloading non-burned-in subtitles.

See my comment: ytdl-org/youtube-dl#30816 (comment).

@pukkandan pukkandan removed the external issue Issue with an external tool label Apr 4, 2022
@pukkandan pukkandan reopened this Apr 4, 2022
@fstirlitz fstirlitz mentioned this issue Apr 4, 2022
12 tasks
@pukkandan pukkandan linked a pull request Apr 14, 2022 that will close this issue
12 tasks
pukkandan pushed a commit that referenced this issue Jul 27, 2022
Closes #3622, #3502, #3086

Authored by: fstirlitz, pukkandan
@Totorrr
Copy link

Totorrr commented Sep 1, 2022

EDIT: my bad. Subtitles ARE downloadable with yt-dlp (with options --sub-langs all --write-subs of course).
So please IGNORE the following message.

@pukkandan @fstirlitz Maybe this one is worth re-opening?

For this link:

https://www.arte.tv/fr/videos/106220-000-A/la-saison-des-femmes/

I can confirm that the subtitles are not burnt into the video, and that they are downloadable (webvtt format with .vtt extension), but yt-dlp doesn't see them (yet?) when I --list-formats.

I do use yt-dlp version 2022.08.19 and not last (today!) release yet, so, that may be fixed in 2022-09-01, but I wouldn't be so sure, as the fix above was made before the version I use.

@ValSlender
Copy link
Author

I have issues with subtitles too. For example on: https://www.arte.tv/fr/videos/098427-001-A/black-panthers-1-2/ with french default version, at 50 secs there is subtitles when the language spoken is not French (Forced sub ?). I don't know how to download this with yt-dlp. With subs for deaf/hard, it's in .vtt, I can read it from a notepad when it's separated but I can't play it on VLC when it's embed (I don't know if VLC can read it).

@Totorrr
Copy link

Totorrr commented Sep 4, 2022

@ValSlender I see the same behaviour: when I download the subtitles like this,

yt-dlp --sub-langs all --write-subs --skip-download 'https://www.arte.tv/fr/videos/098427-001-A/black-panthers-1-2/

I get 2 files which are the full subtitle files (French & German). But the video on the website also has forced subtitles that yt-dlp doesn't get. I could download those subtitles from this URL:

https://arte-cmafhls.akamaized.net/am/cmaf/098000/098400/098427-001-A/220828065735/medias/098427-001-A_st_VF-FRA.vtt

So @pukkandan and @fstirlitz , maybe this is worth re-opening afterall.

A trick for @ValSlender: when you get your .vtt file, VLC may be able to play it (mine does). Otherwise, I suggest a conversion to another .srtsubtitle format:

  • Edit the file with a text editor and remove all the STYLE blocks at the beginning of the file,
  • Use ffmpeg -i input.vtt output.srt to convert the file, then use it with any video player.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-enhancement Feature request for some website
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants