Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Youtube scrapping speed #21146

Closed
axzxc1236 opened this issue May 19, 2019 · 2 comments
Closed

Improve Youtube scrapping speed #21146

axzxc1236 opened this issue May 19, 2019 · 2 comments
Labels

Comments

@axzxc1236
Copy link

@axzxc1236 axzxc1236 commented May 19, 2019

Checklist

  • I'm reporting a feature request
  • I've verified that I'm running youtube-dl version 2019.05.11
  • I've searched the bugtracker for similar feature requests including closed ones

There is too many Youtube related issues to check (even if I use "request label" there is still 22 pages)...it would take me hours or days.

Description

WRITE DESCRIPTION HERE

I observed youtube-dl's behavior from terminal output, it looks like it scraps every video title/url of a Youtube channel before it starts to check other conditions (for example if a video is in --download-archive file or if video meets --dateafter).

(Sorry if that's not how it really works.)

My propose: Since youtube-dl get up to 100 videos' information in every request (from what I saw in terminal output), why not check these conditions everytime you get video informations from Youtube, then say if the earliest video in this list if already archived or doesn't meet dateafter condition, just don't keep scrapping youtube channel/playlist.

Benefit: Let's say A Youtube channel with 2000+ videos uploads a new video everyday, and I run youtube-dl script to archive the channel every week, it would only request 100 videos' information instead of 2000+ videos' information.

@axzxc1236 axzxc1236 added the request label May 19, 2019
@ealgase
Copy link
Contributor

@ealgase ealgase commented May 19, 2019

I observed youtube-dl's behavior from terminal output, it looks like it scraps every video title/url of a Youtube channel before it starts to check other conditions (for example if a video is in --download-archive file or if video meets --dateafter).

If the video is in a download archive, that's not the case. As for scraping every video to check --dateafter, that's necessary to get the published date.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented May 19, 2019

Playlist is not necessarily sorted by date.

@dstftw dstftw closed this May 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.