Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Styled subtitles are converted incompletely #8055

Closed
forthrin opened this issue Dec 30, 2015 · 3 comments
Closed

Styled subtitles are converted incompletely #8055

forthrin opened this issue Dec 30, 2015 · 3 comments

Comments

@forthrin
Copy link

@forthrin forthrin commented Dec 30, 2015

When converting subtitles from .ttml to .srt, inline spans causes subtitles to be cut off prematurely.

Original (.ttml):

<p begin="00:08:38.920" dur="00:00:07.280" style="left">Ein lever berre éin gong. Ingen gjer<br />noko<span style="italic"> for </span>meg. Eg må gjere det sjølv.</p>

Conversion (.srt):

Ein lever berre éin gong. Ingen gjer
noko for

The following part is missing:

meg. Eg må gjere det sjølv.

Verbose output:

~$ youtube-dl -f mp4 --all-subs --convert-subtitles srt --embed-subs 'https://tv.nrk.no/serie/skeive-jenter/KOID37003614/sesong-1/episode-10'
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-c', u'-i', u'-f', u'mp4', u'--all-subs', u'--convert-subtitles', u'srt', u'--hls-prefer-native', u'--audio-format', u'm4a', u'-o', u'%(title)s.%(ext)s', u'--verbose', u'https://tv.nrk.no/serie/skeive-jenter/KOID37003614/sesong-1/episode-10']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.12.23
[debug] Python version 2.7.11 - Darwin-15.2.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 2.8.3, ffprobe 2.8.3
[debug] Proxy map: {}
[NRKTV] KOID37003614: Downloading webpage
[NRKTV] KOID37003614: Downloading f4m manifest
[NRKTV] KOID37003614: Downloading m3u8 information
[info] Writing video subtitles to: Skeive jenter.no.ttml
[debug] Invoking downloader on u'http://nordond16c-f.akamaihd.net/i/no/open/fb/fbaf6ba0eecbb3d23c7546e9e97371ef2fe4b798/9e7d670e-a5c0-4f61-a87d-1145f7f05315_,141,316,563,1266,2250,.mp4.csmil/index_4_av.m3u8?null='
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 73
[download] Destination: Skeive jenter.mp4
[download]   0.0% of ~216.35MiB at  2.42MiB/s ETA 01:50
frame= 1250 fps= 11 q=-1.0 Lsize=   14201kB time=00:00:50.00 bitrate=2326.4kbits/s    
video:13187kB audio:977kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.258942%
Exiting normally, received signal 2.
[download] 100% of 214.49MiB in 04:11
[ffmpeg] Converting subtitles
WARNING: You have requested to convert dfxp (TTML) subtitles into another format, which results in style information loss
~$ 
@dstftw
Copy link
Collaborator

@dstftw dstftw commented Dec 30, 2015

Post the full output of youtube-dl when run with --verbose option.

@forthrin
Copy link
Author

@forthrin forthrin commented Dec 30, 2015

Done

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Jan 27, 2016

i didn't find a good solution yet.
a simular error happen to me.
in the original:

      <p style="s2" begin="00:08:01.24" id="p85" end="00:08:05.00"><span tts:color="yellow">Security guard, sir. </span>Mr Barlow, I<br />don't know what you've heard,</p>

in the srt file:

Security guard, sir.
don't know what you've heard,

in the python documentation about text and tail attributes of xml.etree.ElementTree.Element https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.text :

the text attribute holds either the text between the element’s start tag and its first child or end tag, or None, and the tail attribute holds either the text between the element’s end tag and the next tag, or None. For the XML data

so this code can't extract the text between tags.

        out = str_or_empty(node.text)

        for child in node:
            if child.tag in (_x('ttml:br'), _x('ttaf1:br'), 'br'):
                out += '\n' + str_or_empty(child.tail)
@remitamine remitamine closed this in 2b14cb5 Feb 2, 2016
remitamine added a commit that referenced this issue Feb 2, 2016
[utils] fix dfxp2srt text extraction(fixes #8055)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.