Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ITV Hub .ttml subs fail to convert to .srt #14191

Closed
Vangelis66 opened this issue Sep 13, 2017 · 8 comments
Closed

ITV Hub .ttml subs fail to convert to .srt #14191

Vangelis66 opened this issue Sep 13, 2017 · 8 comments

Comments

@Vangelis66
Copy link

@Vangelis66 Vangelis66 commented Sep 13, 2017

  • I've verified and I assure that I'm running youtube-dl 2017.09.11
  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)

Standalone youtube-dl.exe v2017.09.11 on Windows Vista SP2 32bit, fully patched.
Command issued:

youtube-dl -f worst "https://www.itv.com/hub/royal-stories/2a4353a0004" -o "The People's Princess[2a4353a0004].mp4" --write-sub --convert-subs=srt --add-metadata -v > ITVHub_VLog.txt 2>&1

Verbose log is quite lengthy, instead of being inserted in the body of this issue, it's been attached:
ITVHub_VLog.txt
FWIW, the failure part reads:

[ffmpeg] Converting subtitles
WARNING: You have requested to convert dfxp (TTML) subtitles into another format, which results in style information loss
Traceback (most recent call last):
  File "__main__.py", line 19, in <module>
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\__init__.py", line 465, in main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\__init__.py", line 455, in _real_main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\YoutubeDL.py", line 1966, in download
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\YoutubeDL.py", line 787, in extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\YoutubeDL.py", line 841, in process_ie_result
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\YoutubeDL.py", line 1601, in process_video_result
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\YoutubeDL.py", line 1947, in process_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\YoutubeDL.py", line 2012, in post_process
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\postprocessor\ffmpeg.py", line 589, in run
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\utils.py", line 2677, in dfxp2srt
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\compat.py", line 2510, in compat_etree_fromstring
  File "C:\Python\Python34\lib\xml\etree\ElementTree.py", line 1335, in XML
xml.etree.ElementTree.ParseError: encoding specified in XML declaration is incorrect: line 1, column 30

Hopefully this is something that can be fixed...
Many thanks for your ongoing efforts 👍

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Sep 13, 2017

#12909 should fix this problem.

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Sep 13, 2017

Thanks to @remitamine, this will be fixed in the next version.

@yan12125 yan12125 closed this Sep 13, 2017
@Vangelis66
Copy link
Author

@Vangelis66 Vangelis66 commented Sep 16, 2017

@yan12125 & @remitamine

Unfortunately, latest release 2017.09.15 has not fixed this original issue 😢 , though now a different error is printed; youtube-dl command used:

youtube-dl -f worst "https://www.itv.com/hub/royal-stories/2a4353a0004" -o "The People's Princess[2a4353a0004].mp4" --write-sub --convert-subs=srt --add-metadata

Command prompt window excerpt:

[ffmpeg] Converting subtitles
WARNING: You have requested to convert dfxp (TTML) subtitles into another format
, which results in style information loss
Traceback (most recent call last):
  File "__main__.py", line 19, in <module>
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpl1klaw4y\bu
ild\youtube_dl\__init__.py", line 465, in main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpl1klaw4y\bu
ild\youtube_dl\__init__.py", line 455, in _real_main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpl1klaw4y\bu
ild\youtube_dl\YoutubeDL.py", line 1967, in download
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpl1klaw4y\bu
ild\youtube_dl\YoutubeDL.py", line 787, in extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpl1klaw4y\bu
ild\youtube_dl\YoutubeDL.py", line 841, in process_ie_result
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpl1klaw4y\bu
ild\youtube_dl\YoutubeDL.py", line 1601, in process_video_result
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpl1klaw4y\bu
ild\youtube_dl\YoutubeDL.py", line 1948, in process_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpl1klaw4y\bu
ild\youtube_dl\YoutubeDL.py", line 2013, in post_process
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmpl1klaw4y\bu
ild\youtube_dl\postprocessor\ffmpeg.py", line 589, in run
  File "C:\Python\Python34\lib\codecs.py", line 319, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid
start byte

Please re-open this
Many thanks in anticipation...

@yan12125 yan12125 reopened this Sep 16, 2017
@yan12125 yan12125 closed this in 3869028 Sep 16, 2017
@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Sep 16, 2017

Sorry I didn't check carefully. It should really be fixed in the next version: (3869028)

$ youtube-dl -f worst "https://www.itv.com/hub/royal-stories/2a4353a0004" -o "The People's Princess[2a4353a0004].mp4" --write-sub --convert-subs=srt
[ITV] 2a4353a0004: Downloading webpage
[ITV] 2a4353a0004: Downloading XML
[info] Writing video subtitles to: The People's Princess[2a4353a0004].en.ttml
[download] Destination: The People's Princess[2a4353a0004].mp4
[download]  99.9% of ~84.89MiB at  1.17MiB/s ETA 00:00
[rtmpdump] 88923610 bytes
[download] 100% of 84.80MiB
[ffmpeg] Converting subtitles
WARNING: You have requested to convert dfxp (TTML) subtitles into another format, which results in style information loss
Deleting original file The People's Princess[2a4353a0004].en.ttml (pass -k to keep)
@Vangelis66
Copy link
Author

@Vangelis66 Vangelis66 commented Sep 24, 2017

@yan12125 wrote:

Sorry I didn't check carefully. It should really be fixed in the next version

Hi 😄 I hope you're well !
No need to apologise; as ever, your constant efforts and user support is greatly appreciated (this goes to the whole team BTW 👍 )

So, after a wait of several days, next (latest) version 2017.09.24 was released and I was eager to try it out:

youtube-dl -f worst "https://www.itv.com/hub/the-jonathan-ross-show/2a1166a0135" --write-sub --convert-subs=srt -k

As expected, now the subs conversion from .ttml to .srt completes successfully; for the record, out of the major software players on the Windows platform, only VLC 3.0.0-git appears to have native support for the .ttml (dfxp) subs format; but I usually use MPC-BE / MPC-HC / PotPlayer, which all require the conversion to .srt, hence the original issue that was filed...

Trying the generated .srt file with MPC-BE, I almost immediately saw there must be a flaw somewhere in the conversion from .ttml -> .srt, which is demonstrated as the following issue:

  1. When a sub contains two (or more) lines and the font colour is white, everything looks OK:

white_2lines

  1. However, in every two-line subtitle where font colour !=white (even with different colours between lines), the font size in the second line is noticeably bigger that the first one:

green_2lines
yellow_2lines
blue_2lines
2colours_2lines

Unfortunately, I know nothing about Python, this is not something I can fix myself and submit a PR, so again it should be investigated and hopefully remedied by a dev...

A personal comment/suggestion:
I see that the .ttml subs is the first item to be downloaded by youtube-dl, folowed by the video itself,
the subs conversion is the last step; maybe consider putting the subs conversion right after fetching the .ttml file (this would avoid waiting for the video download to complete before grabbing the .srt file)?
In a related matter, ITV Hub usually first publish a fresh video without the accompanying subs, these may (or in rare cases may not) be added at a later stage; AFAICS, there's not a --subs-only switch in youtube-dl so as to fetch only the subs once they become available; reading the docs it would appear
--skip-download --write-sub --convert-subs=srt -k
should be able to achieve the same thing, but, sadly, I am left with only the .ttml subs (because subs conversion follows after video fetch... 😞 ).
For your convenience and further testing, I am attaching here both .ttml+.srt subs produced by latest youtube-dl: ITV_Subs_bug.zip
And I'm not sure if it's at all helpful, but here's a link how get-flash-videos handles the subs conversion in Perl...
A million thanks for this excellent tool, along with your consideration for disabled people (my sister is hard of hearing, so relies on .srt subs...).

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Sep 24, 2017

I am left with only the .ttml subs (because subs conversion follows after video fetch... 😞 ).

Yep that's another bug (caused by me!). Please follow #9073 for updates.

For the issue of larger texts, could you try the SRT on more players? At the first glance, the transformed srt for this video seems correct.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Sep 24, 2017

Neither VLC nor mpv have such problem. Most likely MPC does not support nested tags properly.
image
image

@Vangelis66
Copy link
Author

@Vangelis66 Vangelis66 commented Sep 24, 2017

... Thank you both for your input/suggestions.

yan12125 wrote:

For the issue of larger texts, could you try the SRT on more players?

Well, being on a Sunday I had some more free time to check, and:
MPC-HC (stable release and latest beta) exhibits the same behaviour as described in #14191 (comment); in default video dimensions/maximised window/full screen, second line of coloured subtitle has larger fonts.
MPC-BE (my preferred video player), being a fork of MPC-HC, as expected has the same "issue"; let me just say for the record that I've been using MPC-HC for almost a decade (WinXP/Vista/7), for the last year or so switched to MPC-BE (on Vista/7), and it's the first time a flavour of .srt subtitles isn't rendered properly... 😕
PotPlayer is a mixed bag case; the .srt file produced by yt-dl has the "issue" I reported when viewing the video in default dimensions, but the "issue" goes away in maximised window/full screen modes; even more bizarrely, this "issue" goes completely away, in all possible window modes, if I make the subtitle fonts bold (???). So, using PotPlayer with bold fonts is one solution I could change to...

dstftw wrote:

Neither VLC nor mpv have such problem.

I do have a portable installation of VLC 2.2.6 on this Vista laptop and can confirm; however, I mostly use VLC as an audio stream recorder rather than a video viewer; be that as it may, when I full-screen VLC with this certain type of .srt file produced by youtube-dl, the subtitles remain extremely small:

fs

Again, to overcome this flaw in VLC and read subs in full-screen I have to go into its advanced options and unselect subtitle formatting (resulting in monochrome subs 👎) ...
To conclude, I do insist the ITV .srt subs produced by yt-dl have a (small) flaw; you, OTOH, state:

At first glance, the transformed srt for this video seems correct.

and

Most likely MPC does not support nested tags properly.

(blaming in essence the player...).
I'm not a coder so can't challenge your claims, in fact I'm overly grateful for this app; your coding time is probably better spent fixing other crucial yt-dl issues than this small subtitles handicap... If others are also OK with current status, I personally will further process those .srt subs with SubtitleEdit (or similar) to strip formatting and create a look of them I'm more comfortable with...

Again, many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.