Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support converting multilingual TTML to srt #12303

Open
yan12125 opened this issue Feb 28, 2017 · 1 comment
Open

Support converting multilingual TTML to srt #12303

yan12125 opened this issue Feb 28, 2017 · 1 comment

Comments

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Feb 28, 2017

  • I've verified and I assure that I'm running youtube-dl 2017.02.27

  • At least skimmed through README and most notably FAQ and BUGS sections

  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

$ youtube-dl -v --write-sub --convert-subs srt test:daisuki
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '--write-sub', '--convert-subs', 'srt', 'test:daisuki']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.02.27
[debug] Git HEAD: 7c4aa6fd6
[debug] Python version 3.6.0 - Linux-4.10.1-1-ARCH-x86_64-with-arch
[debug] exe versions: ffmpeg 3.2.4, ffprobe 3.2.4, rtmpdump 2.4
[debug] Proxy map: {}
[TestURL] Test URL: http://www.daisuki.net/tw/en/anime/watch.TheIdolMasterCG.11213.html
[Daisuki] 11213: Downloading webpage
[Daisuki] 11213: Downloading JSON metadata
[Daisuki] 11213: Downloading m3u8 information
[info] Writing video subtitles to: #01 Who is in the pumpkin carriage - THE IDOLM@STER CINDERELLA GIRLS-11213.mul.ttml
[debug] Invoking downloader on 'https://bngn-vh.akamaihd.net/i/43383936/35470338/smil/TW/00005/454886408824423.smil/index_6000000_av.m3u8?null=0&id=AgCMcBxnoxzgBe+JtVig6tjALsUYU9c4vLlbWNR%2fIjKLjO3tedogpOqsv80VcutRxOme6T2ME6x0%2fQ%3d%3d'
[hlsnative] Downloading m3u8 manifest
WARNING: hlsnative has detected features it does not support, extraction will be delegated to ffmpeg
[download] Destination: #01 Who is in the pumpkin carriage - THE IDOLM@STER CINDERELLA GIRLS-11213.mp4
[debug] ffmpeg command line: ffmpeg -y -headers 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-us,en;q=0.5
Cookie: _alid_=dmV/+lBznv+Bca+is2H0ew==; hdntl=exp=1488378735~acl=%2f*~data=hdntl~hmac=e9d75e91b0278ee8489e785fb97e5f8d2a203dc603fbeae88c22eccbc6be5e63
' -i 'https://bngn-vh.akamaihd.net/i/43383936/35470338/smil/TW/00005/454886408824423.smil/index_6000000_av.m3u8?null=0&id=AgCMcBxnoxzgBe+JtVig6tjALsUYU9c4vLlbWNR%2fIjKLjO3tedogpOqsv80VcutRxOme6T2ME6x0%2fQ%3d%3d' -c copy -f mp4 'file:#01 Who is in the pumpkin carriage - THE IDOLM@STER CINDERELLA GIRLS-11213.mp4.part'
ffmpeg version 3.2.4 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 6.3.1 (GCC) 20170109
  configuration: --prefix=/usr --disable-debug --disable-static --disable-stripping --enable-avisynth --enable-avresample --enable-fontconfig --enable-gmp --enable-gnutls --enable-gpl --enable-ladspa --enable-libass --enable-libbluray --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libiec61883 --enable-libmodplug --enable-libmp3lame --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-netcdf --enable-shared --enable-version3 --enable-x11grab
  libavutil      55. 34.101 / 55. 34.101
  libavcodec     57. 64.101 / 57. 64.101
  libavformat    57. 56.101 / 57. 56.101
  libavdevice    57.  1.100 / 57.  1.100
  libavfilter     6. 65.100 /  6. 65.100
  libavresample   3.  1.  0 /  3.  1.  0
  libswscale      4.  2.100 /  4.  2.100
  libswresample   2.  3.100 /  2.  3.100
  libpostproc    54.  1.100 / 54.  1.100
[NULL @ 0x562846175960] non-existing SPS 0 referenced in buffering period
[NULL @ 0x562846175960] SPS unavailable in decode_picture_timing                                                                        
[h264 @ 0x56284624b520] non-existing SPS 0 referenced in buffering period                                                               
[h264 @ 0x56284624b520] SPS unavailable in decode_picture_timing                                                                        
Input #0, hls,applehttp, from 'https://bngn-vh.akamaihd.net/i/43383936/35470338/smil/TW/00005/454886408824423.smil/index_6000000_av.m3u8?null=0&id=AgCMcBxnoxzgBe+JtVig6tjALsUYU9c4vLlbWNR%2fIjKLjO3tedogpOqsv80VcutRxOme6T2ME6x0%2fQ%3d%3d':
  Duration: 00:24:00.00, start: 0.100667, bitrate: 0 kb/s
  Program 0 
    Metadata:
      variant_bitrate : 0
    Stream #0:0: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(tv, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 90k tbn, 47.95 tbc
    Metadata:
      variant_bitrate : 0
    Stream #0:1: Audio: aac (LC) ([15][0][0][0] / 0x000F), 48000 Hz, stereo, fltp
    Metadata:
      variant_bitrate : 0
Output #0, mp4, to 'file:#01 Who is in the pumpkin carriage - THE IDOLM@STER CINDERELLA GIRLS-11213.mp4.part':
  Metadata:
    encoder         : Lavf57.56.101
    Stream #0:0: Video: h264 (High) ([33][0][0][0] / 0x0021), yuv420p(tv, bt709), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 23.98 fps, 23.98 tbr, 90k tbn, 90k tbc
    Metadata:
      variant_bitrate : 0
    Stream #0:1: Audio: aac (LC) ([64][0][0][0] / 0x0040), 48000 Hz, stereo
    Metadata:
      variant_bitrate : 0
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
frame=34525 fps=347 q=-1.0 Lsize=  845745kB time=00:23:59.97 bitrate=4811.4kbits/s speed=14.5x    
video:811849kB audio:33986kB subtitle:0kB other streams:0kB global headers:1kB muxing overhead: unknown
Exception ignored in: <_io.FileIO name=6 mode='wb' closefd=True>
ResourceWarning: unclosed file <_io.BufferedWriter name=6>
[ffmpeg] Downloaded 866043111 bytes
[download] 100% of 825.92MiB
[download] 100% of 825.92MiB
[debug] ffmpeg command line: ffprobe -show_streams 'file:#01 Who is in the pumpkin carriage - THE IDOLM@STER CINDERELLA GIRLS-11213.mp4'
[ffmpeg] Fixing malformated aac bitstream in "#01 Who is in the pumpkin carriage - THE IDOLM@STER CINDERELLA GIRLS-11213.mp4"
[debug] ffmpeg command line: ffmpeg -y -i 'file:#01 Who is in the pumpkin carriage - THE IDOLM@STER CINDERELLA GIRLS-11213.mp4' -c copy -f mp4 -bsf:a aac_adtstoasc 'file:#01 Who is in the pumpkin carriage - THE IDOLM@STER CINDERELLA GIRLS-11213.temp.mp4'
[ffmpeg] Converting subtitles
WARNING: You have requested to convert dfxp (TTML) subtitles into another format, which results in style information loss
Deleting original file #01 Who is in the pumpkin carriage - THE IDOLM@STER CINDERELLA GIRLS-11213.mul.ttml (pass -k to keep)

  • Single video: test:daisuki

Note that youtube-dl does not support sites dedicated to copyright infringement. In order for site support request to be accepted all provided example URLs should not violate any copyrights.


Description of your issue, suggested solution and other information

test:daisuki has a TTML subtitle http://bngnwww.b-ch.com/caption/35470338/1206/275503087581916/0817102633.xml. It contains multiple languages:

    <div xml:lang="English">
	<p begin="00:00:08.690" end="00:00:12.150" style="1">
	It was just a little while ago...
	</p>
    ...
    </div>
    <div xml:lang="Korean">
	<p begin="00:00:08.519" end="00:00:12.078" style="1">
	얼마 전까지 우리는
	</p>
    ...
    </div>

Seems SRT does not support multiple languages in the same file? If so dfxp2srt should return a lang => subtitle dictionary and FFmpegSubtitlesConvertorPP need to handle multiple files.

Ref: #4738

@federicorosso1993
Copy link

@federicorosso1993 federicorosso1993 commented Jul 15, 2017

I was able to use xmlstarlet and ttml2srt.py (by nomoketo) to only extract my own language on multi-language ttml subtitles file:

xmlstarlet ed -N ns=http://www.w3.org/2006/04/ttaf1 -d "//ns:div[not(contains(@xml:lang,'Italian'))]" "/path/of/the/original/subtitle.mul.ttml" > "/path/to/save/subtitle.ttml" && python3 ttml2srt.py "/path/to/just/converted/subtitle.ttml" > "/path/to/save/subtitle.srt"

since ttml is an xml file by using the correct namespace you can use xmlstarlet to only extract one language not contains xml:lang 'Language' from daisuki ttml multilanguage files...
and ttml2srt.py is only a basic converter (maybe you use a better one) to convert ttml to srt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.