Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize --download-archive #10733

Closed
alecmev opened this issue Sep 23, 2016 · 3 comments
Closed

Optimize --download-archive #10733

alecmev opened this issue Sep 23, 2016 · 3 comments

Comments

@alecmev
Copy link

@alecmev alecmev commented Sep 23, 2016

I'm not sure how well --download-archive plays with other sources, but with Soundcloud (my use case) there's a lot of unnecessary overhead. Instead of just taking track ID's found in a playlist and checking those against the archive list, youtube-dl also fetches track meta from Soundcloud API, hence wasting time on something redundant. In addition to that, it seems that youtube-dl doesn't cache (not to mention index) the archive file, making this process even slower.

Why does this matter to me? I have a growing list of liked tracks (265 as of now), which I like to backup, in case an artist decides to monetize a track / deletes it / Soundcloud goes down. I add maybe 1 track a week, but a simple sync with youtube-dl takes almost as much time as re-downloading everything (file download speed is negligible, compared to API querying latency).

I know nothing about the architecture of this project, but I'd suggest reading, caching and indexing the archive file on launch, and letting extractors do lookups, so that they can skip files at the earliest opportunity possible (or leave it up to the generic checker, if such opportunity doesn't exist).

Something similar has been suggested in #8757, as a part of a larger refactor.

@dstftw dstftw closed this in 24628cf Sep 23, 2016
@alecmev
Copy link
Author

@alecmev alecmev commented Sep 24, 2016

@dstftw Thanks. However, this doesn't fix the issue, just addresses it. Also, I've updated to 2016.09.24, and youtube-dl still does Downloading info JSON for every item:

[soundcloud:user] jeremejevs: Downloading user info
[soundcloud:user] jeremejevs: Downloading track page 1
[soundcloud:user] jeremejevs: Downloading track page 2
[soundcloud:user] jeremejevs: Downloading track page 3
[soundcloud:user] jeremejevs: Downloading track page 4
[soundcloud:user] jeremejevs: Downloading track page 5
[soundcloud:user] jeremejevs: Downloading track page 6
[download] Downloading playlist: jeremejevs (Likes)
[soundcloud:user] playlist jeremejevs (Likes): Collected 265 video ids (downloading 265 of them)
[download] Downloading video 1 of 265
[soundcloud] *snip*: Resolving id
[soundcloud] *snip*: Downloading info JSON
[soundcloud] *snip*: Downloading track url
[soundcloud] *snip*: Checking fallback video format URL
[soundcloud] *snip*: Checking http_mp3_128_url video format URL
[download] *snip* has already been recorded in archive
[download] Downloading video 2 of 265
[soundcloud] *snip*: Resolving id
[soundcloud] *snip*: Downloading info JSON
[soundcloud] *snip*: Downloading track url
[soundcloud] *snip*: Checking fallback video format URL
[soundcloud] *snip*: Checking http_mp3_128_url video format URL
[download] *snip* has already been recorded in archive
...
dstftw added a commit that referenced this issue Sep 24, 2016
@leonklingele
Copy link

@leonklingele leonklingele commented Oct 9, 2018

@dstftw can this please be reopened? --download-archive is still quite inefficient in combination with [soundcloud:user]:

[soundcloud] ARTIST/TRACK: Resolving id
[soundcloud] ARTIST/TRACK: Downloading info JSON
[soundcloud] ID: Downloading track url
[soundcloud] ID: Downloading m3u8 information
[soundcloud] ID: Checking hls_mp3_128_url video format URL
[soundcloud] ID: Checking http_mp3_128_url video format URL
[download] TRACK_TITLE has already been recorded in archive
@leonklingele
Copy link

@leonklingele leonklingele commented Nov 18, 2018

Same issue with Vimeo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.