Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Optimize --download-archive #1745
Comments
|
That would easily work for Youtube videos and other site where the url contains the id, but this is not true in all sites (for example TED talks). So it will require some additional work. |
|
Seems to me this could be implemented as InfoExtractor.should_fetch_pages() (default implementation returns true) to be called near here: https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L340 . Another option is to create a 'skip information fetching' exception and use it (for now) only on the YouTube extractor. In both cases you maintain current behavior (always fetch pages) for other websites while YouTube can easily skip playlist items already present in --archive-download. |
|
Fixed - at least for YouTube playlists/user profiles/channels/... - as of youtube-dl v2013.11.22.2 . Type |
|
I'm noting that in 2014.01.23.4, this issue is present. (That is, youtube-dl will go out and fetch info for ids in download archive.) I can replicate this by using --download-archive, and running youtube-dl ytuser:therealgiantbomb more than once in a row. |
|
@Hajitorus Sorry, we introduced a bug there. Fixed in youtube-dl 2014.01.29. |
When a video is already in the archive created by --download-archive youtube-dl will still fetch the video and info pages (for YouTube, at least). Since I am downloading from a playlist, the code already has the IDs beforehand and could safely skip those steps.
I'm not sure if there is any reason it would want to check the page again but initially it seems unnecessary to download two pages and then decide to skip the download if it already has the information needed to skip all these steps beforehand (the video ID).
Thanks again!