Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Formatting of auto generated subtitles from YouTube #17960

Closed
dainiusb opened this issue Oct 24, 2018 · 3 comments
Closed

[Question] Formatting of auto generated subtitles from YouTube #17960

dainiusb opened this issue Oct 24, 2018 · 3 comments
Labels

Comments

@dainiusb
Copy link

@dainiusb dainiusb commented Oct 24, 2018

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly

  • Put an x into all the boxes [ ] relevant to your issue (like this: [x])

  • Use the Preview tab to see what your issue will actually look like


Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2018.10.05. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2018.10.05

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones
  • Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue


Description of your issue, suggested solution and other information

The formatting of subtitles from youtube looks kinda strange to me. Why are there separate words at first and then parts of sentences with those words? Subtitle editing programs that I've tried could not parse the file properly. Is it supposed to look like that?

I like that there are separate words with timestamps. I want to parse them later for my project and import into Audacity as labels.

Here's what the beginning of the file looks like:

WEBVTT
Kind: captions
Language: en
Style:
::cue(c.colorCCCCCC) { color: rgb(204,204,204);
 }
::cue(c.colorE5E5E5) { color: rgb(229,229,229);
 }
##

00:00:00.000 --> 00:00:02.149 align:start position:0%
 
hi<c.colorE5E5E5><00:00:00.359><c> I</c><00:00:00.630><c> have</c></c><c.colorCCCCCC><00:00:00.690><c> a</c><00:00:00.930><c> rather</c><00:00:01.170><c> simple</c><00:00:01.740><c> and</c><00:00:01.829><c> stupid</c></c>

00:00:02.149 --> 00:00:02.159 align:start position:0%
hi<c.colorE5E5E5> I have</c><c.colorCCCCCC> a rather simple and stupid
 </c>

00:00:02.159 --> 00:00:04.340 align:start position:0%
hi<c.colorE5E5E5> I have</c><c.colorCCCCCC> a rather simple and stupid</c>
<c.colorE5E5E5>idea</c><c.colorCCCCCC><00:00:02.639><c> it</c><00:00:02.879><c> can</c><00:00:03.120><c> help</c><00:00:03.330><c> increase</c><00:00:03.689><c> the</c><00:00:03.929><c> travel</c></c>

00:00:04.340 --> 00:00:04.350 align:start position:0%
<c.colorE5E5E5>idea</c><c.colorCCCCCC> it can help increase the travel
 </c>

00:00:04.350 --> 00:00:06.680 align:start position:0%
<c.colorE5E5E5>idea</c><c.colorCCCCCC> it can help increase the travel
range<00:00:04.589><c> of</c><00:00:04.740><c> electric</c><00:00:05.160><c> cars</c><00:00:05.460><c> to</c><00:00:05.609><c> infinity</c><00:00:06.359><c> which</c></c>

00:00:06.680 --> 00:00:06.690 align:start position:0%
range of electric cars to infinity which
@dainiusb dainiusb changed the title [Question] Auto generated subtitles from YouTube formatting [Question] Formatting of auto generated subtitles from YouTube Oct 24, 2018
@VietTPham
Copy link
Contributor

@VietTPham VietTPham commented Oct 24, 2018

it is using .vtt format, use --convert-subs srt to have it convert to srt format

@dainiusb
Copy link
Author

@dainiusb dainiusb commented Oct 24, 2018

Yeah, that works, but it got rid of the separate words with timestamps and I need that data :(

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Oct 24, 2018

This is what youtube serves for vtt subtitles. You may also try downloading ttml.

@dstftw dstftw closed this Oct 24, 2018
@dstftw dstftw added the question label Oct 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.