Youtube-Playlist-OCR

Script that extracts the text from all Videos in a Youtube Playlist. (Used to extract the Questions of the 50-Question-Videos from PietSmiet's Youtube Playlist)

Technical:

The IDs of all videos that have not yet been processed and their length are queried via the YouTube data API. With puppeteer-cluster screenshots are taken every X seconds of the video using multiple chromium instances. These are processed with jimp to finally extract the text from the screenshots with tesseract.

Use

Install dependencies with npm install
YouTube-API Key needs to be stored in an enviromnent variable called YT_API_KEY
Adjust filepath, filename, sreenshot-interval, language, text frame position/size etc. in constants at the top of parse.js
Run with npm run start

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
questions		questions
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md
config.json		config.json
package-lock.json		package-lock.json
package.json		package.json
parse.js		parse.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions

questions

screenshots

screenshots

.gitignore

.gitignore

README.md

README.md

config.json

config.json

package-lock.json

package-lock.json

package.json

package.json

parse.js

parse.js

Repository files navigation

Youtube-Playlist-OCR

Technical:

Use

About

Releases

Packages

Languages

verbindolai/YouTube-Playlist-OCR

Folders and files

Latest commit

History

Repository files navigation

Youtube-Playlist-OCR

Technical:

Use

About

Topics

Resources

Stars

Watchers

Forks

Languages