Skip to content

Github and Gitlab Authentication via http.extraHeader for cloning Repositories #4139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
May 27, 2025

Conversation

kashifkhan0771
Copy link
Contributor

@kashifkhan0771 kashifkhan0771 commented May 14, 2025

Description:

We previously included Git credentials directly in the repository URL during cloning. However, for one user, these credentials were being logged in Splunk, which posed a security concern.
To resolve this, I’ve updated the implementation to remove user credentials from the URL and instead pass them via the Authorization header using Basic Auth. This new approach is now the default behavior. For backward compatibility and in case any issues arise I’ve added a --auth-in-url flag. When this flag is used, credentials will be included in the URL as they were before.

I’ve tested this change with both GitHub and GitLab sources, with and without the --auth-in-url flag, and everything is working as expected on my end.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@kashifkhan0771 kashifkhan0771 requested review from a team as code owners May 14, 2025 08:52
@kashifkhan0771
Copy link
Contributor Author

One thing that's confusing me is this test case in the Git source. It uses a username and password for authentication when cloning repository. The test case still works with the old approach, even though GitHub removed password-based authentication back in 2021. When I updated the authentication logic to use the Authorization header, the test case no longer works and reports an error about password-based authentication being removed which is the correct behavior for the current code as well.

@kashifkhan0771
Copy link
Contributor Author

This is my first PR for the Git source, and I’m not fully familiar with its complete implementation. If I missed anything or there’s room for improvement, please feel free to point it out. Thank you!

@kashifkhan0771 kashifkhan0771 self-assigned this May 14, 2025
@rosecodym
Copy link
Collaborator

Did you confirm that this works for non-GitHub git sources? (E.g. GitLab?)

@kashifkhan0771
Copy link
Contributor Author

Did you confirm that this works for non-GitHub git sources? (E.g. GitLab?)

Initially, I tested with a private GitHub repository, but as you suggested, I also tried using a GitLab repository with an access token, and it works with both. For GitLab, it's important to ensure that the access token has the appropriate scope to read repositories. I added the testing screenshots in the OSS Ticket on JIRA.

@kashifkhan0771 kashifkhan0771 marked this pull request as draft May 16, 2025 08:01
@kashifkhan0771
Copy link
Contributor Author

I'm currently discussing this change with Cody. The changes looks good but we're planning to make it backward compatible to ensure it doesn't break any existing setups.

@kashifkhan0771
Copy link
Contributor Author

I'm currently discussing this change with Cody. The changes looks good but we're planning to make it backward compatible to ensure it doesn't break any existing setups.

Added flag --auth-in-url for both Github and Gitlab source.

@kashifkhan0771 kashifkhan0771 marked this pull request as ready for review May 19, 2025 08:26
@kashifkhan0771 kashifkhan0771 requested a review from a team as a code owner May 19, 2025 08:26
@kashifkhan0771 kashifkhan0771 changed the title Git Authentication via http.extraHeader Github and Gitlab Authentication via http.extraHeader for cloning Repositories May 19, 2025
Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool! Unfortunately, I do have a blocking concern :(

main.go Outdated
@@ -119,6 +119,7 @@ var (
githubScanPRComments = githubScan.Flag("pr-comments", "Include pull request descriptions and comments in scan.").Bool()
githubScanGistComments = githubScan.Flag("gist-comments", "Include gist comments in scan.").Bool()
githubCommentsTimeframeDays = githubScan.Flag("comments-timeframe", "Number of days in the past to review when scanning issue, PR, and gist comments.").Uint32()
githubAuthInUrl = githubScan.Flag("auth-in-url", "Embed authentication credentials in repository URLs instead of using secure HTTP headers").Default("false").Bool()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that Default("false") is the default and doesn't need to be explicitly specified.

return CloneRepo(ctx, userInfo, gitUrl, true, args...)
}

// CloneRepoUsingTokenInHeader clones a repo using a provided token as basic auth in the header.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you create a separate function for this instead adding a new bool parameter to CloneRepoUsingToken?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CloneRepoUsingToken function is used in multiple places, including outside the GitHub and GitLab sources. To avoid making widespread changes especially where we would need to hardcode the authInUrl parameter as false. I created a separate function that internally calls the same CloneRepo function with authInUrl as false.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take your point, but I want to make the case that expanding the git module's surface area by adding another function like this has its own maintainability costs, which I think are greater in the long term than adding another parameter to CloneRepoUsingToken. Adding the parameter is a one-time cost of modifying ten call sites (three of which you're already touching); adding a new function is an ongoing cost to all maintainers because now we have to keep track of the differences between the two, and it is near-inevitable that some future maintainer will miss the existence of one of them and consequentially make a mistake.

I don't think hard-coding authInUrl to be false is a bad thing; you're effectively doing that when you use two separate functions, except it's not obvious at each call site that it's what you're doing. (The hardcoding is still present - it's just further down the call stack.) And if you're worried about enterprise (which I appreciate!) don't be - it only calls CloneRepoUsingToken in one spot :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the CloneRepoUsingTokenInHeader function and now we are using CloneRepoUsingToken with authInUrl parameter.

@@ -226,6 +228,9 @@ func (s *Source) Init(aCtx context.Context, name string, jobID sources.JobID, so
}
s.conn = &conn

// set useAuthInUrl before new connector
s.useAuthInUrl = s.conn.AuthInUrl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed what the default OSS CLI behavior should be, but we haven't yet discussed what the enterprise behavior should be - which is, unfortunately, the opposite. Enterprise, until we change it, won't be writing s.conn.AuthInUrl, so it will retain its default value of false, meaning it will use your new functionality. Unfortunately, in enterprise, we typically want to always maintain the status quo until we explicitly change things, so the default value should be the current behavior, which is to keep the auth information in the URL.

You can resolve this by changing the protobuf field to something like remove_auth_from_url, so that the default value of false maintains the existing behavior. The OSS CLI would then set this field to be the negation of the CLI flag. (This will also apply to GitLab.)

I'm happy to discuss this synchronously on Slack if that's easier!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅

@kashifkhan0771 kashifkhan0771 requested a review from rosecodym May 20, 2025 15:02
Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've hardcoded authInUrl: false for a few sources that you didn't make configurable on the command line. That's fine, but we should retain the legacy behavior of authInUrl: true in those cases. I've marked them inline.

@@ -28,7 +28,7 @@ func (e *Engine) ScanGitHub(ctx context.Context, c sources.GithubConfig) (source
IncludeWikis: c.IncludeWikis,
SkipBinaries: c.SkipBinaries,
CommentsTimeframeDays: c.CommentsTimeframeDays,
AuthInUrl: c.AuthInUrl,
RemoveAuthInUrl: !c.AuthInUrl, // Enterprise uses the opposite field in proto to keep credentials in the URL by default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be AuthInHeader - some people prefer "positive" boolean flags. This isn't a blocking comment, though!

@@ -276,7 +279,7 @@ func (s *Source) scanRepo(ctx context.Context, repoURI string, reporter sources.
cloneFunc = func() (string, *git.Repository, error) {
user := cred.BasicAuth.Username
token := cred.BasicAuth.Password
return CloneRepoUsingToken(ctx, token, repoURI, user)
return CloneRepoUsingToken(ctx, token, repoURI, user, false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't add any new flag configuration for the git source, so we should retain the legacy behavior, which is to put the auth in the URL. (Right?)

@@ -1180,7 +1204,7 @@ func PrepareRepo(ctx context.Context, uriString string) (string, bool, error) {
if !ok {
return "", remote, fmt.Errorf("password must be included in Git repo URL when username is provided")
}
path, _, err = CloneRepoUsingToken(ctx, password, remotePath, uri.User.Username())
path, _, err = CloneRepoUsingToken(ctx, password, remotePath, uri.User.Username(), false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PrepareRepo is used only by the git source, so should also retain the legacy behavior of putting the auth in the URL.

@@ -595,7 +595,7 @@ func (s *Source) EnumerateAndScanAllObjects(ctx context.Context, chunksChan chan
}

// download the repo
path, repo, err := git.CloneRepoUsingToken(ctx, ghToken, repoURL, ghUser)
path, repo, err := git.CloneRepoUsingToken(ctx, ghToken, repoURL, ghUser, false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This source also doesn't have the setting flag-controlled, so should retain the legacy behavior.

@@ -61,7 +61,7 @@ func (s *Source) cloneRepo(
return "", nil, err
}
case *sourcespb.Huggingface_Token:
path, repo, err = git.CloneRepoUsingToken(ctx, s.huggingfaceToken, repoURL, "")
path, repo, err = git.CloneRepoUsingToken(ctx, s.huggingfaceToken, repoURL, "", false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unconfigurable source should retain the legacy behavior.

@kashifkhan0771
Copy link
Contributor Author

You've hardcoded authInUrl: false for a few sources that you didn't make configurable on the command line. That's fine, but we should retain the legacy behavior of authInUrl: true in those cases. I've marked them inline.

Sorry, That was a mistake :) - I'll change those back to true to retain legacy behavior

Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for diving into this!

@kashifkhan0771 kashifkhan0771 merged commit 92e9157 into trufflesecurity:main May 27, 2025
13 checks passed
@kashifkhan0771 kashifkhan0771 deleted the fix/oss-188 branch May 27, 2025 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants