Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How doest it work? I need minimum network and cpu activity for every day incremental scraping #6227

Closed
lukasmrtvy opened this issue Jul 14, 2015 · 1 comment

Comments

@lukasmrtvy
Copy link

@lukasmrtvy lukasmrtvy commented Jul 14, 2015

Hi, i have a question.
For example, how does ytdl work with these arguments?

./youtube-dl https://www.youtube.com/playlist?list=UUXIyz409s7bNWVcM-vjfdVA"-e --get-id --get-       duration --playlist-reverse --no-warnings  --download-archive ./blacklist 

I want to scrap every day only newest( actualy i dont download videos, i just need three atributes: name, lenght and id of youtube videos) videos from specific youtube playlist.
For first time it would take a while, i now, but other day, at the incremental scrap if i have a blacklist, (ids of already downloaded/saved videos) would it scrap it faster i guess, or may i accelerate this task?
I want to do it with ~50 youtube channels/playlists every day. So it would be great if it would take minimum network and cpu activity.

Thanks

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Jul 14, 2015

The info for videos in the archive won't be extracted but currently it gets all the video ids from the playlist, which may take some time depending on its size (it use to need to download 1 webpage every 50 videos or so, I don't know if it has changed too much).

Note, I'd recommend you to use --dump-json for getting the info. For YouTube videos it may not be a problem, but if the duration is unknown --get-duration won't print any line (potentially breaking your script).

@jaimeMF jaimeMF closed this Jul 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.