Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] parallel downloads feature using a batch file #3746

Open
Rush2088 opened this issue Sep 14, 2014 · 22 comments
Open

[Feature Request] parallel downloads feature using a batch file #3746

Rush2088 opened this issue Sep 14, 2014 · 22 comments
Labels

Comments

@Rush2088
Copy link

@Rush2088 Rush2088 commented Sep 14, 2014

Is there a way to enable parallel download session fed in by a --batch-file.
Some sites have very slow band width per video, therefore instead of waiting to finish previous file, it is much efficient to run parallel downloads.

@Rush2088 Rush2088 changed the title parallel downloads feature using a batch file [Feature Request] parallel downloads feature using a batch file Sep 14, 2014
@optikfluffel
Copy link

@optikfluffel optikfluffel commented Dec 19, 2014

This would be nice for playlists for example at Youtube, too :)

@megapctr
Copy link

@megapctr megapctr commented Feb 21, 2016

👍 It's doable with external downloaders, but that's an annoying hurdle

@yan12125 yan12125 added the request label Feb 11, 2017
@schmod
Copy link

@schmod schmod commented Apr 21, 2017

Is there any reason why xargs is not be sufficient for this?

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Apr 22, 2017

Is there any reason why xargs is not be sufficient for this?

Some cases: custom output templates, postprocessing...

@Roman2K
Copy link
Contributor

@Roman2K Roman2K commented Nov 8, 2017

I would rather have the complexity of parallel downloads kept out of youtube-dl so that its code base remains more maintainable.

Parallelism can be already be delegated to xargs -P x -n y where x is the number of parallel downloads and y is the number of URLs/IDs to pass to youtube-dl, most likely 1. I find this a very elegant solution, composing various tools, each one doing a single job well. Used it successfully many times, saving me hours or days.

@yan12125

Some cases: custom output templates, postprocessing...

Maybe I'm missing something but why can't this be done with xargs youtube-dl passing the same arguments (for templates, postprocessing, ...) as you would youtube-dl?

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Nov 8, 2017

From the viewpoint of youtube-dl internals, a series of urls is not different than a playlist with several videos. I was considering the latter case when I wrote the previous comment. If you want to achieve downloading videos of a playlist in parallel along with postprocessing them, you'll need to run the parallel download step - either via internal codes or external tools - from youtube-dl, instead of run youtube-dl in parallel, and then the postprocessing step can continue inside youtube-dl.

Or moving postprocessing codes to standalone commands. That would be interesting.

@siddht4
Copy link

@siddht4 siddht4 commented Nov 28, 2017

After bit of research and all,I found out that https://github.com/MrS0m30n3/youtube-dl-gui already does supports it in series form,tinkering of it can support the parallel form

https://pypi.python.org/pypi/twodict may help.

https://github.com/MrS0m30n3/youtube-dl-gui/blob/master/youtube_dl_gui/downloadmanager.py#L106-L122 manages the current state.

https://github.com/MrS0m30n3/youtube-dl-gui/blob/master/youtube_dl_gui/downloadmanager.py#L239-L385 manages the queue which can be modified to work in parallel.

So it seems the process already exist so a little tinkering will add to the main project.

@ytdl-org ytdl-org deleted a comment from eladkarako Feb 10, 2018
@ytdl-org ytdl-org locked and limited conversation to collaborators Feb 10, 2018
@ytdl-org ytdl-org unlocked this conversation Apr 5, 2018
@epitron
Copy link
Contributor

@epitron epitron commented Dec 12, 2018

Is there any reason why xargs is not be sufficient for this?

If you use xargs, it's nontrivial to include the numerical playlist index in the filename (for playlists where order matters).

@eladkarako
Copy link

@eladkarako eladkarako commented Dec 12, 2018

@siddht4 get the download data using youtube-dl (--skip-download --dump-json), write #1 an aria2 compatible-list, and rely on aria2 for both the queue-management and download #2, to get both multipart-download and multi-file download.

For example, aria2 with --split=3 --min-split-size=1M --max-concurrent-downloads=5 --max-connection-per-server=16 will give you an active download-queue of five files, each has three parallel segment-download at a time. works well on youtube. With --continue=true --allow-overwrite=false --auto-file-renaming=false you can re-run the aria2 command in-case of an error, skipping completed files.

@alex-hofmann
Copy link

@alex-hofmann alex-hofmann commented Feb 20, 2019

Just wanted to thank @Roman2K and provide the full one-liner, since many youtube-dl users may not be familiar with xargs or the shell.

Parallelism can be already be delegated to xargs -P x -n y where x is the number of parallel downloads and y is the number of URLs/IDs to pass to youtube-dl, most likely 1. I find this a very elegant solution, composing various tools, each one doing a single job well. Used it successfully many times, saving me hours or days.

cat files.txt | xargs -n 1 -P 4 youtube-dl
Here, files.txt is my batch file; -n 1 indicates one url for each youtube-dl call; -P 4 indicates 4 parallel youtube-dl calls

@aryehbeitz
Copy link

@aryehbeitz aryehbeitz commented Feb 22, 2019

@alex-hofmann can this be used to download in parallel when downloading a channel or playlist on youtube?

@PiotrDabrowskey
Copy link

@PiotrDabrowskey PiotrDabrowskey commented Feb 22, 2019

@aryehbeitz No, I guess. It only allows you to download multiple links in parallel, but you need to specify those links in the file. Playlists and channels are just single link which are passed to youtube-dl as you normally do, so it does not solve the problem of this issue.

If there's a way to gather all the links from one playlist and place them into the file, this would do the trick.

@alex-hofmann
Copy link

@alex-hofmann alex-hofmann commented Feb 22, 2019

@aryehbeitz @PiotrDabrowskey Yeah I think you're right, this is only useful when you can feed the known links. For example, if someone wanted to download the Jeopardy! episodes that a reddit user graciously uploads daily they could use something like

curl -A 'random' https://www.reddit.com/user/jibjabjrjr/.rss 2>/dev/null | grep -Po 'https:\/\/drive.*?(?=\;|\&)' | xargs -n 1 -P 4 youtube-dl

My guess is that something similar is available for pulling the urls from a youtube playlist with a little curl+grep

@Roman2K
Copy link
Contributor

@Roman2K Roman2K commented Feb 22, 2019

@aryehbeitz You can get the individual URLs from a playlist with:

$ youtube-dl -j --flat-playlist 'https://www.youtube.com/watch?v=CcNo07Xp8aQ&list=PLA671B7E7BFB780B7' | jq -r '"https://youtu.be/"+ .url'

That can be fed directly to xargs for parallel downloads as in @alex-hofmann's example.

@Roman2K
Copy link
Contributor

@Roman2K Roman2K commented Mar 7, 2019

If anyone's interested, I made a cronjob for backing up my YouTube playlists to Google Drive (using rclone). Idempotent/incremental. Kind of rough scripting but I've been running it with several 500+ long playlists over the last few days and it works well.

https://github.com/Roman2K/ytdump

  • Basic usage: ruby dl.rb PLAYLIST_URL
    • downloads to ./out
    • keeps metadata/cache in ./meta
  • Export to Drive/other: see dl_to_rclone
    • usage: see favorites/dl.sh
  • Cronjob: see crontab and cronjob
  • Concurrency: see Downloader::NTHREADS in dl.rb

Clone and customize to your needs.

@zer0def
Copy link

@zer0def zer0def commented Apr 3, 2019

Based on @alex-hofmann's and @Roman2K's hints, here's a trashman's oneliner that should work with any extractor (requires jq alongside typical coreutils binaries):

TMP=`mktemp`; TMPI=0; while ((TMPI++)); read -r; do TMPF="${TMP}.`printf "%05d" "${TMPI}"`"; cat <<<${REPLY} >${TMPF}; echo ${TMPF}; done <<<`youtube-dl -J -u <username> -p <password> <playlist_url> | jq -Mc 'if ._type=="playlist" then .entries[] else . end'` | xargs -I'{}' -n1 -P`nproc` -- /bin/sh -c 'youtube-dl -f best -o "`echo {} | awk -F. '\''{print $NF}'\''`-%(title)s-%(ext)s" --load-info {} &>/dev/null; rm {}'; unset TMP TMPI TMPF

Adjust to your heart's content.

@epitron
Copy link
Contributor

@epitron epitron commented Apr 3, 2019

@zer0def Nice!

Small issue: the number of parallel downloaders is set to the number of CPU cores (-P`nproc`). That parameter should be tuned to match the network capacity, not the CPU capacity.

Anyone who wants to try that should start it around -P5, and increase it until it saturates their connection.

@zer0def
Copy link

@zer0def zer0def commented Apr 4, 2019

@epitron As long as your extractions don't fall back to ffmpeg, it's safe to ramp up the process number, but when they do, you might end up unnecessarily extending your downloads due to threads fighting for CPU time, so I've picked nproc simply because it's fairly idiot-proof, while remaining faster than a sequential run on a large playlist.

As per my previous comment: adjust to your heart's content, since everyone's case might be different.

@epitron
Copy link
Contributor

@epitron epitron commented Apr 4, 2019

@zer0def Hyperthreading can also artificially increase your CPU count. What if you have 32 cores, and a regular network connection?

I thought YTDL only used ffmpeg to copy streams into different containers (unless you pass --recode-video or a specific format that needs to be transcoded). Are there any services that will by-default transcode video, and which support playlists?

@zer0def
Copy link

@zer0def zer0def commented Apr 5, 2019

Then you have threads stalling for data, which is hardly a concern compared to saturating out low-end and/or low-core count CPUs with a pipe huge enough to overwhelm it.

@epitron
Copy link
Contributor

@epitron epitron commented Apr 8, 2019

@zer0def What if YouTube bans or throttles you for spamming their servers with connections?

@zer0def
Copy link

@zer0def zer0def commented Apr 9, 2019

Do you have an actual case to provide as basis or are we just talking hypotheses? Because if you're looking for a golden mean, then yes, you could just cobble up a proper solution, write a wrapper script or use parallel --limit. All of which go beyond a minimal dependencies "trashman's" case (ideally, I would've liked to omit jq, but haven't yet figured out a way to output necessary information in one line without running into issues with whitespace delimited paths).

@ytdl-org ytdl-org deleted a comment from siddht4 Sep 11, 2019
@ytdl-org ytdl-org deleted a comment from siddht4 Sep 11, 2019
@ytdl-org ytdl-org locked and limited conversation to collaborators Sep 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.