Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Improve Youtube scrapping speed #21146
Comments
If the video is in a download archive, that's not the case. As for scraping every video to check --dateafter, that's necessary to get the published date. |
|
Playlist is not necessarily sorted by date. |
Checklist
There is too many Youtube related issues to check (even if I use "request label" there is still 22 pages)...it would take me hours or days.
Description
WRITE DESCRIPTION HERE
I observed youtube-dl's behavior from terminal output, it looks like it scraps every video title/url of a Youtube channel before it starts to check other conditions (for example if a video is in --download-archive file or if video meets --dateafter).
(Sorry if that's not how it really works.)
My propose: Since youtube-dl get up to 100 videos' information in every request (from what I saw in terminal output), why not check these conditions everytime you get video informations from Youtube, then say if the earliest video in this list if already archived or doesn't meet dateafter condition, just don't keep scrapping youtube channel/playlist.
Benefit: Let's say A Youtube channel with 2000+ videos uploads a new video everyday, and I run youtube-dl script to archive the channel every week, it would only request 100 videos' information instead of 2000+ videos' information.