-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fetchJobs option to parallelize submodule updates #323
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can confirm. In a project with 65 submodules, the checkout project execution time decreases from 1m30s to 48s with 5 concurrent jobs. (at first try at least)
@@ -56,6 +56,9 @@ inputs: | |||
fetch-depth: | |||
description: 'Number of commits to fetch. 0 indicates all history for all branches and tags.' | |||
default: 1 | |||
fetch-jobs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to keep the default to 0
?
Any downsides of running this im parallel per default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I'm aware. I just wanted to have the option, and thought it was prudent not to create new defaults overriding normal git behavior. However, considering that the default fetch-depth
is already non-standard, there's probably no such expectation anyway.
If parallel jobs were to be the default, then that raises the question as to what this default should be. Having checked the git source code, I can see that there is actually a 0 jobs option which implies that it will run as many jobs as there are processors (referred to as some reasonable default
in the documentation). To my eye, this seems a bit excessive. At any rate, the existence of that slightly hidden option means that the fetch-jobs
parameter should probably use -1 instead of 0 to omit the jobs argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested a bit with --jobs=0, and this probably isn't a good option for GitHub (but for the opposite reason I was worried about), since the runners are specced with only 2 cores. Since git is bottlenecked waiting for IO, you would want more jobs than that.
I think the ideal number is going to depend on a lot of factors, so it's probably better if this remains a manually enabled optimization option.
The default is now fetch-jobs: -1, which is the new value for reverting to normal git behavior (provide no --jobs argument).
Is there any reason preventing it from being merged? We have a big number of submodules in our project, and having parallel fetching would improve the time consuming signiificantly |
Would it help if I'd redo the PR from scratch? |
Hi, I'm still around. Not sure why there's been no movement on this PR. If there are any concerns, I'm willing to try to address them. |
@Felixoid I'm not using github actions anymore, sorry! |
I am not someone who can push this PR either. |
Dear @ericsciple and @thboop, I see you've merged PRs recently. Can you review it, please? |
Dear @TingluoHuang and @ethomson, can you, maybe, review this extremely useful PR? |
@johnsudol Would you mind taking a look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @trylle, thanks for the PR! I have one small comment
Based on the docs you linked, the fetch-jobs
option will only work as an optimization when the checkout
action is used on a repo that includes submodules, right?
If that is the case, we only need to get the input for fetch-jobs
if result.submodules
is enabled.
Aside from the above, Looks good to me!
src/input-helper.ts
Outdated
@@ -89,6 +89,13 @@ export async function getInputs(): Promise<IGitSourceSettings> { | |||
} | |||
core.debug(`fetch depth = ${result.fetchDepth}`) | |||
|
|||
// Fetch jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic should be moved below line 114
then we can do something like:
if(result.submodules){
result.fetchJobs = Math.floor(Number(core.getInput('fetch-jobs') || '-1'))
if (isNaN(result.fetchJobs) || result.fetchJobs < -1) {
etc...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied in d46e61f
As implemented, this is correct - it has no effect otherwise. It is possible to specify the |
It seems everyone agrees on merging this PR, could we just have someone with write access take a look into approving / merging this? Ping @johnsudol . @cory-miller I also see you have merged PRs into this repo recently, would you mind taking a look? |
This implements an option to configure the git submodule update --jobs parameter.
From my testing, on a checkout with many submodules (e.g. boost) that normally takes 4.5 minutes, using this PR and declaring
fetch-jobs: 4
in the checkout step shaves off at least a minute on the checkout.