Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--download-archive not efficient (at least with SoundCloud, Vimeo) #19022

Closed
5 of 9 tasks
leonklingele opened this issue Jan 27, 2019 · 6 comments
Closed
5 of 9 tasks

Comments

@leonklingele
Copy link

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2019.01.27. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2019.01.27

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones
  • Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-i', u'--download-archive', u'ytdl-archive.txt', u'-v', u'https://soundcloud.com/soft-cell-official']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.01.27
[debug] Python version 2.7.15 (CPython) - Darwin-15.6.0-x86_64-i386-64bit
[debug] exe versions: avconv 12.3, avprobe 12.3, ffmpeg 4.1, ffprobe 4.1, rtmpdump 2.4
[debug] Proxy map: {}

Description of your issue, suggested solution and other information

The --download-archive option can be used to record the IDs of downloaded files in an archive so youtube-dl does not attempt to download them again. This option works fine with YouTube itself, however doesn't with SoundCloud and Vimeo.
When using the option with [soundcloud:user], youtube-dl still fetches the following information from the Internets:

  • info JSON
  • track url
  • m3u8 information

Steps to reproduce

$ cd $(mktemp -d) # clean environment
$ youtube-dl -i --download-archive ytdl-archive.txt -v "https://soundcloud.com/soft-cell-official"
# Abort command after a song was downloaded, then run it again

Expected result

On the second execution, youtube-dl should skip the first song immediately without downloading any information about it.

Actual result

# Second execution of youtube-dl ("soundcloud 502908177" is already in the archive)
...
[download] Downloading video 1 of 252
[soundcloud] soft-cell-official/northern-lights-dub-mix: Resolving id
[soundcloud] soft-cell-official/northern-lights-dub-mix: Downloading info JSON # <--
[soundcloud] 502908177: Downloading track url                                  # <--
[soundcloud] 502908177: Downloading m3u8 information                           # <--
[soundcloud] 502908177: Checking hls_mp3_128_url video format URL              # <--
[soundcloud] 502908177: Checking http_mp3_128_url video format URL             # <--
[debug] Default format spec: bestvideo+bestaudio/best                          # <--
[download] Northern Lights (Dub Mix) has already been recorded in archive
...

The lines above marked with # <-- should not happen.

Use https://vimeo.com/stargate to reproduce with Vimeo.

Issue related to #10733.

@ealgase
Copy link
Contributor

ealgase commented Jan 28, 2019

Unfortunately, that can't be changed, as the ID is returned at the same time as the other extracted data. This would require a large rewrite and wouldn't be much more efficient.

@CoryDHall
Copy link
Contributor

This is not only inefficient, with SoundCloud it leads to IP blocking.

Is there some way to have a flag to archive SoundCloud downloads with this part: soft-cell-official/northern-lights-dub-mix

@leonklingele
Copy link
Author

as the ID is returned at the same time as the other extracted data

Debugging the requests with --print-traffic shows that the track-collections API does return track IDs:

https://api-v2.soundcloud.com/profile/soundcloud:users:207965082?linked_partitioning=1&limit=50&client_id=LvWovRaJZlWCHql0bISuum8Bd2KX79mb&offset=0

@ealgase
Copy link
Contributor

ealgase commented Jan 28, 2019

Debugging the requests with --print-traffic shows that the track-collections API does return track IDs:

I was referring to the way youtube-dl implements extractions.

@CoryDHall
Copy link
Contributor

CoryDHall commented Feb 1, 2019

@ealgase @leonklingele I have a PR that fixes this specifically (just for Soundcloud), but while I was piecing together how this works, I found the extractor can be overhauled to reduce redundant API calls, but I felt it was beyond the scope of this issue (and I am admittedly not super-familiar with the codebase).

dstftw added a commit that referenced this issue Feb 1, 2019
…nload archive id when no explicit ie_key is provided (#19022)
@dstftw dstftw closed this as completed Feb 3, 2019
@leonklingele
Copy link
Author

Thanks for merging the fix, @dstftw. This issue also covers archive-inefficiency of Vimeo which I think has not been fixed, so can you please reopen? :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants