Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve argument handling for aria2c #8332

Merged
merged 1 commit into from Nov 7, 2023
Merged

Conversation

CrendKing
Copy link
Contributor

@CrendKing CrendKing commented Oct 12, 2023

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

Currently the argument handling of aria2c has several issues:

  1. It uses --no-conf, thus ignoring user's aria2.conf config file. There is no inherit reason yt-dlp requires this to work, because if an option is specified both in config file and in argument, the argument (which yt-dlp uses) takes priority.

  2. It forces --file-allocation=none, twice, while the aria2c offers subjectively better choices. I suggest we don't force if it's not necessary.

  3. It hard codes concurrent download segments/splits to 16, instead of respecting yt-dlp's own --concurrent-fragments. In the commit introducing these hardcodes, the author didn't mention any reason why he put in such numbers.

Note that the current --concurrent-fragments default value is 1. Therefore, after this change, user might observe huge slowdown for site supporting multiple concurrent connections from single IP. It could also give user tool to fix cases when a site would ban IP that makes multi-connections.

  1. The existing respected non-boolean options are missing the '=' separator, making them being ignored by aria2c altogether. For example, cmd += self._option('--max-overall-download-limit', 'ratelimit') generates something like --max-overall-download-limit 1M, while aria2c requires --max-overall-download-limit=1M.
Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Copilot Summary

🤖 Generated by Copilot at 94480ae

Summary

🛠️🚀🆕

Improved aria2c integration and option handling in yt_dlp/downloader/external.py. Added concurrency and segmentation options for aria2c and a separator argument for external downloaders.

Sing, O Muse, of the skillful coder who refined
The Aria2cFD class, a swift and powerful tool
To download from the web with options well-defined
And avoid the errors of the ignorant and the fool.

Walkthrough

  • Modify _option method of ExternalFD class to accept separator argument for different command option formats (link)
  • Simplify and improve _make_cmd method of Aria2cFD class by removing redundant, unnecessary, or conflicting options and adding options for concurrent downloads, connections, and segments (link, link, link)

@Grub4K
Copy link
Member

Grub4K commented Oct 12, 2023

  1. It uses --no-conf, thus ignoring user's aria2.conf config file

This is by design. Downloaders are suppose to be neutral, if you want to use your predefined config you should be able to use --no-conf=false through --downloader-args.

  1. It forces --file-allocation=none, twice, while the aria2c offers subjectively better choices. I suggest we don't force if it's not necessary.

This is also by design. This keeps it similar to other downloaders behavior, and pre-allocation options are better explored on a case by case basis. Removing the duplication would make sense though.

  1. It hard codes concurrent download segments/splits to 16, instead of respecting yt-dlp's own --concurrent-fragments. In the commit introducing these hardcodes, the author didn't mention any reason why he put in such numbers.

Agreed, but tying it to --concurrent-fragments is a bit problematic imo, for the same reasons you point out. Currently you have to use --downloader-args which feels clumsy.

Maybe we can think of something better.

  1. The existing respected non-boolean options are missing the '=' separator, making them being ignored by aria2c altogether. For example, cmd += self._option('--max-overall-download-limit', 'ratelimit') generates something like --max-overall-download-limit 1M, while aria2c requires --max-overall-download-limit=1M.

Contrary to what you are saying aria2c --max-overall-download-limit 1M https://speed.hetzner.de/100MB.bin works. I think the docs are showing the options that way as a stylistic choice only.

@CrendKing
Copy link
Contributor Author

This is by design. Downloaders are suppose to be neutral, if you want to use your predefined config you should be able to use --no-conf=false through --downloader-args.

You are correct we can. It just I feel it's counterintuitive, because who could figure out to do that without reading yt-dlp source code and finding your comment? I guess you could argue that whoever uses a aria2.conf must be geeky enough to do that.

This is also by design. This keeps it similar to other downloaders behavior, and pre-allocation options are better explored on a case by case basis. Removing the duplication would make sense though.

Fair enough. Consistency is important.

Maybe we can think of something better.

Please help. I know we can, again, ask users to pass those 3 arguments through --downloader-args, but the same issue (need to read source code) exists. Maybe at least reduce the default from 16 to 4 or 5, since aria2c's default value is 5? 16 puts too much pressure on both the server and HDD.

@bashonly
Copy link
Member

bashonly commented Oct 12, 2023

It's not necessary to read the source code. --downloader-args is documented in the README. aria2c's options are documented in its manual. (And -N is documented as working with native dash and hls fragment downloads)

@CrendKing
Copy link
Contributor Author

CrendKing commented Oct 12, 2023

What I meant is that without reading the code, a naive user would not easily notice that their aria2 config file is not applied, or how many concurrent downloads are used, all of which are very specific to this aria2c external downloader. The reasoning "Downloaders are suppose to be neutral, thus should not use user config" is fine, but it's no where to be found in yt-dlp's README. Many yt-dlp users are not developers. It's no easy task to deduce the downloader's "misbehavior".

Of course, if somehow there is a section in the README that lists such design philosophies of the downloaders, and provide workarounds with detailed explanations, all would be good. Since there is not yet, I think maybe it's better to both utilizing existing yt-dlp options, and make better unconfigurable defaults.

@Grub4K
Copy link
Member

Grub4K commented Oct 12, 2023

Of course, if somehow there is a section in the README that lists such design philosophies of the downloaders, and provide workarounds with detailed explanations, all would be good. Since there is not yet...

It would be very welcome if it could be created by you then

@CrendKing
Copy link
Contributor Author

Oh, I know I do not have the expertise on yt-dlp downloaders to do that. I don't even know how half the downloaders work (such as axel, httpie). And this is the first time I look at yt-dlp's source. I don't know why you guys design like this.

@Grub4K
Copy link
Member

Grub4K commented Oct 13, 2023

I don't know or use them either, but it would be good to have you work on the aria2c part of the downloader info section.

@CrendKing
Copy link
Contributor Author

CrendKing commented Oct 20, 2023

I reverted most of the changes and only kept the remove-duplicate-argument part. It is bad because it's added AFTER self._configuration_args(), so even user wants to override it with --downloader-args they can't.

Hope the PR is more agreeable now.

@bashonly bashonly self-requested a review October 20, 2023 12:18
Copy link
Member

@Grub4K Grub4K left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only want to deal with the duplicate argument here this seems fine by me. We can track the README improvements in an issue if you feel like it should be addressed in the future

@CrendKing
Copy link
Contributor Author

Sure. Let's do that in a separate issue and I'll try to improve it.

@Grub4K Grub4K self-assigned this Oct 27, 2023
@Grub4K Grub4K merged commit 21b2528 into yt-dlp:master Nov 7, 2023
16 checks passed
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants