Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some SRT files written with fractional times in 10ms units, rather than ms. #16967

Open
Ben-Mann opened this issue Jul 14, 2018 · 0 comments
Open

Comments

@Ben-Mann
Copy link

@Ben-Mann Ben-Mann commented Jul 14, 2018

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like this: [x])
  • Use the Preview tab to see what your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2018.07.10. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2018.07.10

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones
  • Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add the -v flag to your command line you run youtube-dl with (youtube-dl -v <your command line>), copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

$ python -m youtube_dl -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2018.07.10
[debug] Git HEAD: 40a051fa9
[debug] Python version 2.7.13 (CPython) - Windows-10-10.0.16299
[debug] exe versions: ffmpeg N-91282-gc5e6c0b5f6, ffprobe N-91282-gc5e6c0b5f6
[debug] Proxy map: {}

Description of your issue, suggested solution and other information

Some SRT files downloaded by youtube_dl are being written with two digit millisecond timestamps. For example, when executing the following (the url is publicly visible):

python -m youtube_dl http://www.crunchyroll.com/time-of-eve/episode-1-akiko-452708 -o akiko --sub-lang enUS --write-s
ub --sub-format srt --skip-download

The SRT starts out with

1
0:00:03,34 --> 0:00:06,26
STUDIO RIKKA

2
0:00:10,44 --> 0:00:12,44
In the future, probably Japan.

The 34, 26, and 44 should be 340, 260 and 440ms respectively. Some applications may instead interpret this as 34ms, 26ms, and 44ms, which can result in subtitles appearing for only a few milliseconds randomly during videos (at least on some encoders/players, depending on how they interpret the malformed srt?).

I hacked a quick and dirty fix to add an extra 0 if this type of timestamp is found

diff --git a/youtube_dl/extractor/crunchyroll.py b/youtube_dl/extractor/crunchyroll.py
index 311da515d..a4859a9e0 100644
--- a/youtube_dl/extractor/crunchyroll.py
+++ b/youtube_dl/extractor/crunchyroll.py
@@ -299,13 +299,24 @@ class CrunchyrollIE(CrunchyrollBaseIE):

         decrypted_data = intlist_to_bytes(aes_cbc_decrypt(data, key, iv))
         return zlib.decompress(decrypted_data)
-
+
+    def _fix_srt_time(self, str):
+        mms = re.search(r'\d*:\d*:\d*,(\d+)', str)
+        if mms:
+            ms = mms.group(1)
+            l = len(ms)
+            if l == 2:
+                return str + "0"
+            if l == 1:
+                return str + "00"
+        return str
+
     def _convert_subtitles_to_srt(self, sub_root):
         output = ''

         for i, event in enumerate(sub_root.findall('./events/event'), 1):
-            start = event.attrib['start'].replace('.', ',')
-            end = event.attrib['end'].replace('.', ',')
+            start = self._fix_srt_time(event.attrib['start'].replace('.', ','))
+            end = self._fix_srt_time(event.attrib['end'].replace('.', ','))
             text = event.attrib['text'].replace('\\N', '\n')
             output += '%d\n%s --> %s\n%s\n\n' % (i, start, end, text)
         return output

Following this, the first few lines are encoded correctly:

1
0:00:03,340 --> 0:00:06,260
STUDIO RIKKA

2
0:00:10,440 --> 0:00:12,440
In the future, probably Japan.

But I'm not yet familiar enough with the codebase or situation to know if this is needed in all other extractors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.