Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
[Question] Formatting of auto generated subtitles from YouTube #17960
Comments
|
it is using .vtt format, use |
|
Yeah, that works, but it got rid of the separate words with timestamps and I need that data :( |
|
This is what youtube serves for vtt subtitles. You may also try downloading ttml. |
Please follow the guide below
You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
Put an
xinto all the boxes [ ] relevant to your issue (like this:[x])Use the Preview tab to see what your issue will actually look like
Make sure you are using the latest version: run
youtube-dl --versionand ensure your version is 2018.10.05. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.Before submitting an issue make sure you have:
What is the purpose of your issue?
The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue
Description of your issue, suggested solution and other information
The formatting of subtitles from youtube looks kinda strange to me. Why are there separate words at first and then parts of sentences with those words? Subtitle editing programs that I've tried could not parse the file properly. Is it supposed to look like that?
I like that there are separate words with timestamps. I want to parse them later for my project and import into Audacity as labels.
Here's what the beginning of the file looks like: