Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fetchJobs option to parallelize submodule updates #323

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

trylle
Copy link

@trylle trylle commented Aug 9, 2020

This implements an option to configure the git submodule update --jobs parameter.

From my testing, on a checkout with many submodules (e.g. boost) that normally takes 4.5 minutes, using this PR and declaring fetch-jobs: 4 in the checkout step shaves off at least a minute on the checkout.

Copy link

@quisse quisse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm. In a project with 65 submodules, the checkout project execution time decreases from 1m30s to 48s with 5 concurrent jobs. (at first try at least)

@@ -56,6 +56,9 @@ inputs:
fetch-depth:
description: 'Number of commits to fetch. 0 indicates all history for all branches and tags.'
default: 1
fetch-jobs:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to keep the default to 0?

Any downsides of running this im parallel per default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm aware. I just wanted to have the option, and thought it was prudent not to create new defaults overriding normal git behavior. However, considering that the default fetch-depth is already non-standard, there's probably no such expectation anyway.

If parallel jobs were to be the default, then that raises the question as to what this default should be. Having checked the git source code, I can see that there is actually a 0 jobs option which implies that it will run as many jobs as there are processors (referred to as some reasonable default in the documentation). To my eye, this seems a bit excessive. At any rate, the existence of that slightly hidden option means that the fetch-jobs parameter should probably use -1 instead of 0 to omit the jobs argument.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested a bit with --jobs=0, and this probably isn't a good option for GitHub (but for the opposite reason I was worried about), since the runners are specced with only 2 cores. Since git is bottlenecked waiting for IO, you would want more jobs than that.

I think the ideal number is going to depend on a lot of factors, so it's probably better if this remains a manually enabled optimization option.

The default is now fetch-jobs: -1, which is the new value for
reverting to normal git behavior (provide no --jobs argument).
@Felixoid
Copy link

Is there any reason preventing it from being merged? We have a big number of submodules in our project, and having parallel fetching would improve the time consuming signiificantly

@Felixoid
Copy link

Would it help if I'd redo the PR from scratch?

@trylle
Copy link
Author

trylle commented Mar 22, 2022

Hi, I'm still around. Not sure why there's been no movement on this PR. If there are any concerns, I'm willing to try to address them.

@Felixoid
Copy link

Dear @quisse and @staabm, can you kindly take a look?

@quisse
Copy link

quisse commented Mar 23, 2022

@Felixoid I'm not using github actions anymore, sorry!

@staabm
Copy link

staabm commented Mar 23, 2022

I am not someone who can push this PR either.

@Felixoid
Copy link

Dear @ericsciple and @thboop, I see you've merged PRs recently. Can you review it, please?

@Felixoid
Copy link

Felixoid commented Apr 8, 2022

Dear @TingluoHuang and @ethomson, can you, maybe, review this extremely useful PR?

@Skylion007
Copy link

@johnsudol Would you mind taking a look at this PR?

Copy link

@johnsudol johnsudol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @trylle, thanks for the PR! I have one small comment

Based on the docs you linked, the fetch-jobs option will only work as an optimization when the checkout action is used on a repo that includes submodules, right?

If that is the case, we only need to get the input for fetch-jobs if result.submodules is enabled.

Aside from the above, Looks good to me!

@@ -89,6 +89,13 @@ export async function getInputs(): Promise<IGitSourceSettings> {
}
core.debug(`fetch depth = ${result.fetchDepth}`)

// Fetch jobs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic should be moved below line 114

then we can do something like:

if(result.submodules){
    result.fetchJobs = Math.floor(Number(core.getInput('fetch-jobs') || '-1'))
    if (isNaN(result.fetchJobs) || result.fetchJobs < -1) {
        etc...
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in d46e61f

@trylle trylle requested a review from a team as a code owner December 31, 2022 10:41
@trylle
Copy link
Author

trylle commented Dec 31, 2022

Based on the docs you linked, the fetch-jobs option will only work as an optimization when the checkout action is used on a repo that includes submodules, right?

As implemented, this is correct - it has no effect otherwise. It is possible to specify the --jobs argument to git-fetch, as well, but, as I understand it, its usefulness there is limited unless fetching from (potentially) multiple remotes with submodules in one go (as opposed to the fetch-then-submodule-update behavior of the checkout action), and would require a larger change.

@Skylion007
Copy link

It seems everyone agrees on merging this PR, could we just have someone with write access take a look into approving / merging this? Ping @johnsudol . @cory-miller I also see you have merged PRs into this repo recently, would you mind taking a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants