Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
How do I extract the subtitles in plain text? #17178
Comments
|
There is no feature to "just strip tags". Subtitles are provided as is. You can convert to other formats with |
|
I don't fully understand this... from the YouTube dashboard I can download a srt. With this tool I can't. Why's that? So instead you're saying I need to download the vtt and then use the --convert-subs flag? When I try that Also what's confusing is when I --list-subs I get
So captions but no subtitles? |
|
Having the same issue. Last night I was able to download srt files with --convert-subs srt but for whatever reason today this same command on the same video will not work |
|
Hi @jakecan13 I've been researching this a bunch, and I finally figured it out using another module. The working code sample is as follows...
|
|
Thanks but now how can i download a video with subtitles with this module??? |
|
@dacorsa You want to download both the video as well as the subtitles in a .txt file for a video? |
|
Yes, but i'd like the video+subs embeded...as unique file |
|
@dacorsa Sorry, In that case, I'm not sure how to achieve this. |
|
ok and for separate files? can you help me? |
|
Thanks i solved with your link: |
I can see how to extract the automatically generated subtitles for a video..
e.g.
youtube-dl --write-auto-sub --skip-download https://youtu.be/bQLkDomt59AThis creates a file in this instance called
React Router v4-bQLkDomt59A.en.vttThe first part of this WEBVTT file looks like this..
What is the best way to simply extract the plain text from these subtitles? Notice how the text repeats, so I can't just strip out the tags.
From the YouTube dashboard, I can download the srt and sbv formats. These look far easier to post-process.
However, when I try to grab the srt format using this tool
youtube-dl --write-auto-sub --sub-format=srt --skip-download https://youtu.be/bQLkDomt59AI get
What am I missing here?
Otherwise for the vtt file that does get downloaded, any suggestions for a library to post-process this file?
Thanks.