Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
processes bidding for the same partially downloaded file ... #485
Comments
|
A five-second delay would be wrong, and we can't really check whether another process is writing to the file. What we can do is use cooperative locking on the |
|
Yeah, already hating! (% Possibly, there is a simple obvious way to feed a single youtube-dl with +to «you may hate me, too». ;D |
|
Yes, five seconds delay is quite specific to not everyone's bandwidth, Won't it be possible to, say, support double-threading, for instance, |
|
I think it could be a good idea to have a parameter that would tell how many threads you want to launch. If it's only to download one file, it would be split evenly, and if it's to download something like a playlist, it could be made to download multiple videos at once. Of course that would imply checking if it's possible given the video's host, how to report the progress with multiple videos at once, seeing if it would be wise to put a hard limit (so someone doesn't end up opening 100 connections to YouTube), that sort of thing, but that could be interesting. |
|
I say, I am ready for testing, thinking and proposal on this if there |
|
|
Let's restrict this discussion to running multiple youtube-dl instances in parallel, ok? In-youtube-dl parallelism is a whole different idea. And I don't see any significant downside to simply locking the .part files. |
|
Looking around, it seems there's no easy and portable way to lock a file out of the box. I've seen multiple solutions but they all use OS-specific calls or need additional packages. Would that be okay to use or would it bloat the code too much? I've also seen people using the approach of creating another file next to the .part one, something like (filename).lock. In that case the file wouldn't be truly locked, but youtube-dl could be made to check if such a file exists or not and if it does, it will assume it's already used and skip it. It's the kind of system used by Firefox for example, it creates a .parentlock file when a profile is currently used so it can't be used by another instance. Of course there's the issue of if it crashes, as it risks leaving a .lock file orphan, and there's the risk of bloating the filesystem with useless temporary files. If anyone has another technique that would be both portable and comes as a standard with Python, that would be perfect. If not, either making OS-specific calls or using lockfile would be the best bet I think, but I'd like to know if that would be acceptable before trying to implement it. |
|
|
I'd like to hijack this thread. Since #1562, there is a class |
~
at the risk of giving you enough reasons to start hating me ;-) ...
~
[use case]
~
the only way I have seen you can speed up downloads for many media files is by running a number of processes at the same time. So I have a script to get a lot of youtube URIs, which I then sort, split in smaller files and reorder (one process processes the list from a-z the other from z-a ...) in order to make sure that if a feed is partially downloaded, options are the other process will pick it up and finish the download
~
[/use case]
~
Now, if a process is downloading a file (so it temporarily suffixes it .part) and another process gets a chance to download the same file, it will 'think' (sorry for the anthropomorphizing) that it was left partially downloaded by some previous process (and not that is being downloaded by a current one)
~
Short of checking with the OS to see if any process is currently writing to that file, I think, a delay of, say, 5 seconds without any increase of data would be a good bet that file is not being currently worked on, otherwise yo "say something" and keep going
~
thanks
lbrtchx