New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-Threaded downloading and processing #1918
Comments
This comment has been minimized.
This comment has been minimized.
To get around this issue I've been using this command: conf: Basically tricking yt-dlp into skipping PP on the first pass and calling another yt-dlp instance to do the PP of the previous file while the 1st instance continues on to the next. If PP is done externally with ffmpeg I don't see why multithreading is even needed outside of getting the console output. I'd be fine with an official solution that did just that even if it meant we couldn't see the final console output of the files. Having PP done async is a major speed up for me (around 30 - 40%) since I'm downloading to a HDD. There's more discussion about it here: https://discord.com/channels/807245652072857610/892817964716392500 |
youtube-dl was not written with any kind of multithreading in mind. Implementing this would need a rewrite of the overall architecture. Since it would be difficult to keep track of multiple issues dealing with multithreading, we can use this issue to track all these: (Adapted from: ytdl-org/youtube-dl#350 (comment))
PS: if someone is trying to implement threading, imo they should put their efforts into (5) |
Except (5) is not what I want, I only ever want to download 1 video at a time. I think (5) already exists in another fork afaik. |
That is not a fork, but a wrapper @Jules-A As I mentioned in above comment, if (5) is implemented natively, (4) can be added on trivially. Say the implementation of (5) is like this: There is a Then, we can add another CLI arg This is why I personally think it is better to spend effort on implementing (5) rather than trying to do (4) directly PS: With this, you also get (6) for free by modifying |
First off, multithreading is not synonymous with pipelining, and I’d rather you reverted the renaming. Second, I am not proposing that postprocessing start before downloading is finished. The goal here is to treat each step as a blackbox, thereby avoiding any breaking changes to existing functionality. If this can’t be done without rewriting the entire workflow, so be it. But I think it can, and I’d personally like to try. |
You need multithreading to be able to run a PP and a download at the same time (which is what u are requesting, right?)
Yes, that was clear. As I understand, you want the download of the second video to start while PP of first is ongoing
Sure, you can try. I will help wherever I can, and PRs are ofc welcome. In the OP, you mention that the UI is where you see difficulty with this. For that, you can simply have the PPs write to the screen normally (overwriting any progress bar above). This effectively makes it so all messages gets to the console, with the progress bar always staying at the bottom. This will be a bit problematic if/when #1720 is implemented, but we can cross that bridge when we get to it PS: After thinking a bit more, this is not an ideal solution. It won't be clear from the logs what video is being downloaded and what is being Postprocessed. Perhaps it can be improved by prefixing all PP messages with the video id? Anyway, this should not be too big of a concern |
That's not exactly true since I'm already doing it with my current commands/config.
Sure but rewriting everything seems like a large undertaking and might not ever get done whereas it probably wouldn't take anywhere near as long to just call ffmpeg similar to how it's done with exec for those who want faster downloads now and don't care about the jumbled console output. |
OK, this is the point where I’d like it to start downloading the next item. Line 3004 in ddd24c9
So, right now a list of URLs are just iterated over with a for-loop. Lines 3042 to 3054 in ddd24c9
What if that was instead treated as a stack? Each URL could be fully resolved before downloading commences (already possible using the right parameters), then the resulting ie_result stack could be passed through to the downloader, which would pop off the top item, download it, then recursively call itself with the remainder of the stack. That would have to be spun off into a child process or something similar, of course.
After that point, the function would need to block until nothing else is doing post-processing. That could be accomplished with a mutex. After everything is finished, you simply wait for the child process to complete, and then you’re done. |
Anything obviously wrong with that plan? There’s a bit of a race condition where a child process could lock the mutex before the parent, admittedly. |
If you extract links long before you download them, they may expire
Why? This makes no sense to me and has many issues - one being that you'll reach recursion limit quickly. It makes more sense to instead keep all downloads in the main thread and do the postprocessing in a separate thread. First see how playlists are handled in the code. Your plan doesn't seem to account for them at all. Also, keep in mind that |
Good point. In that case, extraction needs to be lumped together with downloading.
You’re right, I’m clearly too used to Swift’s reentrant concurrency. Python’s processes correspond to actual threads, so that’d result in a thread explosion. I think it’d be better to keep the main thread clear if we aren’t bothering with recursion, though, if only to make potential UI improvements easier in the future. Python’s multiprocessing queues are thread-safe and process-safe, right? What if there was an extracting/downloading queue and a postprocessing queue? The main process could populate the downloading queue with URLs, and the downloading process could add to the postprocessing queue after each download. Sentinel values could be used to mark the end of a queue, upon which the relevant process would close. When every process is closed, the main process can follow suit.
It’s a bit hard for me to follow, honestly. I know |
|
If i can chime in to the discussion, i propose the following:-
|
@mhogomchungu the "yt-dlp process manager" sounds like it is better as a separate program. And in fact, programs like this already exist. See the above-mentioned https://github.com/tuxlovesyou/squid-dl. If we are doing this in yt-dlp itself, it must be done properly with support in the core program, not as some wrapper spawning processes |
My proposal implements concurrency through multiple processes and it needs a process manager. Discussed proposals implements concurrency through multiple threads and it need a thread manager and this seems to be harder to do since "yt-dlp was not build with multi threading in mind". One way or the other, a concurrency manager of some sort can not be avoided here and both approaches can be implemented in the core program. The mentioned tool is doing exactly what i do in Media Downloader[1] when downloading playlists and it will have competition soon because i play to offer the functionality from CLI only. |
First off, this isn’t a competition. Second, yt-dlp’s heavy use of dependency injection means it doesn’t actually share state that much, which should make it easier to retrofit for this form of concurrency. Furthermore, Python (like many modern languages) has built-in concurrency systems that can do a lot of the heavy lifting. Third, I don’t think anyone has argued that this should be default behavior. Many existing options are likely incompatible with this form of processing, and having that many intermediate files at once may require more storage than users expect. Besides, default postprocessing is unlikely to take very long. Finally, for the sake of avoiding complications with bandwidth throttling and CPU thrashing, I think it is best to assume that downloading saturates network bandwidth and postprocessing uses all available CPU resources. That is, neither concurrent downloading nor concurrent postprocessing are beneficial. |
Python’s multiprocessing library doesn’t seem to provide a deque, so playlists pose a bit of an issue. Until they are extracted/downloaded, they can’t be identified as playlists. Once they are, they’ll have to add each entry to the end of the download queue, unfortunately meaning non-playlists get priority over playlists. For the sake of not complicating things considerably, it would probably be best to make the options to write playlist data to a file incompatible with pipelining. That could be revisited in the future. @pukkandan Is it safe to assume that playlist entries don’t become stale? |
I did say it was going to be difficult to implement 🤷 If you don't want to write full integration with all features of yt-dlp, you are better off writing a wrapper. See https://github.com/yt-dlp/yt-dlp#embedding-yt-dlp. That way, you can neatly separate the download and PPs without worrying about the features that you never use. For just a single video, the code could be as simple as this: import yt_dlp
def download(url):
with yt_dlp.YoutubeDL({
'format': 'bv,ba',
'outtmpl': '%(title)s [%(id)s].%(format_id)s.%(ext)s',
'fixup': 'never',
}) as ydl:
return ydl.sanitize_info(ydl.extract_info(url))
def postprocess(info):
with yt_dlp.YoutubeDL({
'format': 'bv+ba',
'outtmpl': '%(title)s [%(id)s].%(ext)s',
'fixup': 'force',
'postprocessors': [{'key': 'FFmpegMetadata'}],
}) as ydl:
return ydl.process_ie_result(info)
info = download(URL)
postprocess(info) # Run this in a separate thread/process For batch files/playlists, you can enumerate through the data similar to squid-dl
I don't understand what you mean |
It was meant to show experience with doing process based concurrent downloading with yt-dlp. When using my app, i typically run multiple instances of yt-dlp and each instance is configured to use aria2c to get multiple connections per instance for faster downloads. Again, i am mentioning this to show experience with concurrent downloading with yt-dlp and not to brag or competing with anybody.
It should not be the default and when/if it is implemented then as a GUI frontend developer to yt-dlp, my biggest interest is in how the output will be produced and how i can tell how many instances are running and what output belong to what instance. |
While I'd love having sequential downloads and async postprocessing (happening off the main downloader thread/process) I see the following difficulties:
IMO the way to have this today is to parallelize / embed I pondered scripting this in bash / zsh by having a global download limit value in an environment variable and then divide that by the CPU cores and then spawn N independent So while I would love this I can see how many complexities it can unleash and how the maintainers wouldn't think it's worth it. |
I’m inclined to agree, particularly if it is unacceptable to lack compatibility with particularly thorny configurations. If it were possible to cleanly separate post-processing from downloading (such that the Python interface could call the latter using the output of the former without bespoke configuration tweaking), this would be quite doable. It’d still screw up the UI, though. Since that doesn’t seem to be the case, I think this issue should be closed at the maintainer’s discretion. |
As I said above, I intend to leave this issue open to hold all future discussions on multi-threading |
One of the reasons I originally filed #267 was that I hoped that moving to Python 3 might allow us to take advantage of the language’s native asynchronous programming support to implement simultaneous downloading of multiple streams (maybe even livestreams) in a performant manner. But there are some obstacles to going down that route:
|
In my experience using ytdl with fully async programs is that the actually downloading is what blocks the whole thread because it depends on the remote server upload speed (as in g00g13 for example). So if aiohttp can be optionally used instead of urlib3/requests, it already solves the major bottleneck for batch processing. |
I don’t think it’s even that hard to do readable progress reporting of concurrent tasks… assuming you manage the concurrency part well enough in the first place. Code sample#!/usr/bin/env python3
import sys
class TTYProgressMsg:
def __init__(self, printer, items):
self.__data = items
self.__i = None
self.__printer = printer
def _print(self):
self.__printer._print(self.__data)
def __enter__(self):
self.__printer._append(self)
return self
def __exit__(self, *exc):
if exc != (None, None, None):
return
self.__printer._remove(self)
def update(self, *items):
self.__data = items
self.__printer._refresh()
class TTYPrinter:
def __init__(self):
self.__progress_msgs = []
def __kill_line(self):
sys.stdout.write('\r\x1B[2K')
def _print(self, items):
self.__kill_line()
for item in items:
sys.stdout.write(item)
sys.stdout.write('\n')
def __back_out(self):
l = len(self.__progress_msgs)
if l == 0:
return
sys.stdout.write(f'\x1B[{l}F')
def _append(self, progress):
self.__progress_msgs.append(progress)
progress._print()
return progress
def _remove(self, progress):
self.__back_out()
self.__progress_msgs.remove(progress)
progress._print()
for prog in self.__progress_msgs:
prog._print()
sys.stdout.flush()
def _refresh(self):
self.__back_out()
for progress in self.__progress_msgs:
progress._print()
sys.stdout.flush()
def event_msg(self, *items, interrupt=True):
if not interrupt:
self.__back_out()
self._print(items)
for prog in self.__progress_msgs:
prog._print()
sys.stdout.flush()
def progress_msg(self, *items):
return TTYProgressMsg(self, items)
if __name__ == '__main__':
import random
import asyncio
prt = TTYPrinter()
running = 0
async def task(id, d):
global running
perc = 0
running += 1
with prt.progress_msg(f'[{id}] ?') as progress:
while perc < 100:
progress.update(f'[{id}] {perc}%')
await asyncio.sleep(d)
perc += random.randint(0, min(10, 100 - perc))
progress.update(f'done {id}')
running -= 1
async def random_events():
evt = 0
await asyncio.sleep(0)
while running:
await asyncio.sleep(random.uniform(0.2, 5))
prt.event_msg(f'random event {evt}', interrupt=True)
evt += 1
async def main():
await asyncio.gather(
random_events(),
task('task1', 0.2),
task('task2', 0.1),
task('task3', 0.3),
task('task4', 0.15),
)
asyncio.run(main())
prt.event_msg('all done') |
#5680 now supports parallel progress output automatically. Implementation idea for parallel postprocessing
ImplementationIf we were to create a That means most of the public wrappers would need either a flag This might need some refactoring to pass down which sort of processing needs to be applied (could be calculated from the info dict again). It would then be handled inside Downsides
|
Hey do you still have the command working? Just wanting to get it working on my end currently |
Yeah but I think I had to change a few things around --parse-metadata but that has nothing to do with parallel muxing/downloading and rather just a personal change for title. I also moved to using aliases so it looks a bit different now and changed my batch files to fix archive breaking when closing the download window when a video was muxing. So my full config looks like:
generic.conf
and called with Generic Download.bat
Don't have CR sub atm so didn't post my CR configs as I think there's some ongoing issues with the CR extractor right now anyway that might need changing it. EDIT: Might as well post it but here's my crunchyroll.conf
|
hmm, i'm trying to modify my command to download music in parallel processing but I'm kinda stuck
|
Converting audio, embedding thumbnail, converting thumbnail, cropping and I think adding metadata all requires ffmpeg (maybe embedding thumbnails uses another library, not sure. So if you were to add it based on my config, that would be in the common-code alias, in the parameters of the 2nd process call. |
Checklist
Description
Problem
When instructed to download multiple URLs, yt-dlp works completely serially: it retrieves, extracts, and downloads the desired content (presumably saturating the network connection), then performs postprocessing (potentially saturating the CPU, among other things).
These tasks have broadly independent constraints: there is minimal CPU demand for downloading a video, and no network traffic while transcoding it. This leaves a lot of room for improvement, which is particularly noticeable with common tasks like downloading playlists.
Fix
I propose a form of pipelining: when network-constrained tasks finish (that is, when downloading is complete), yt-dlp should move onto the next item’s network-constrained tasks. This means that the first item can be postprocessed while the second item is downloading.
If the second item finishes downloading before the first item is done with postprocessing, yt-dlp can add it to the postprocessing queue and move onto downloading the third item.
The result is that a downloading operation and a postprocessing operation can occur at the same time, substantially improving speed without incurring thrashing.
Implementation
yt-dlp already draws a concrete distinction between downloading and postprocessing, which is used for some of the options. That should make it relatively seamless to decouple them for this purpose without breaking any custom plugins.
Moreover, this scheme would never run more than one instance of a task at the same time (no task runs both before and after downloading is finished), so shared state is unlikely to be a concern.
The only thing that may be complicated by this proposal, then, is the UI. At worst, this could be dealt with by simply not displaying progress when run in this manner.
The text was updated successfully, but these errors were encountered: