-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Github and Gitlab Authentication via http.extraHeader
for cloning Repositories
#4139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Github and Gitlab Authentication via http.extraHeader
for cloning Repositories
#4139
Conversation
One thing that's confusing me is this test case in the Git source. It uses a username and password for authentication when cloning repository. The test case still works with the old approach, even though GitHub removed password-based authentication back in 2021. When I updated the authentication logic to use the Authorization header, the test case no longer works and reports an error about password-based authentication being removed which is the correct behavior for the current code as well. |
This is my first PR for the Git source, and I’m not fully familiar with its complete implementation. If I missed anything or there’s room for improvement, please feel free to point it out. Thank you! |
Did you confirm that this works for non-GitHub |
Initially, I tested with a private GitHub repository, but as you suggested, I also tried using a GitLab repository with an access token, and it works with both. For GitLab, it's important to ensure that the access token has the appropriate scope to read repositories. I added the testing screenshots in the OSS Ticket on JIRA. |
82612f9
to
28a32f2
Compare
I'm currently discussing this change with Cody. The changes looks good but we're planning to make it backward compatible to ensure it doesn't break any existing setups. |
Added flag |
http.extraHeader
http.extraHeader
for cloning Repositories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty cool! Unfortunately, I do have a blocking concern :(
main.go
Outdated
@@ -119,6 +119,7 @@ var ( | |||
githubScanPRComments = githubScan.Flag("pr-comments", "Include pull request descriptions and comments in scan.").Bool() | |||
githubScanGistComments = githubScan.Flag("gist-comments", "Include gist comments in scan.").Bool() | |||
githubCommentsTimeframeDays = githubScan.Flag("comments-timeframe", "Number of days in the past to review when scanning issue, PR, and gist comments.").Uint32() | |||
githubAuthInUrl = githubScan.Flag("auth-in-url", "Embed authentication credentials in repository URLs instead of using secure HTTP headers").Default("false").Bool() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that Default("false")
is the default and doesn't need to be explicitly specified.
pkg/sources/git/git.go
Outdated
return CloneRepo(ctx, userInfo, gitUrl, true, args...) | ||
} | ||
|
||
// CloneRepoUsingTokenInHeader clones a repo using a provided token as basic auth in the header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you create a separate function for this instead adding a new bool
parameter to CloneRepoUsingToken
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CloneRepoUsingToken
function is used in multiple places, including outside the GitHub and GitLab sources. To avoid making widespread changes especially where we would need to hardcode the authInUrl
parameter as false. I created a separate function that internally calls the same CloneRepo
function with authInUrl
as false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take your point, but I want to make the case that expanding the git
module's surface area by adding another function like this has its own maintainability costs, which I think are greater in the long term than adding another parameter to CloneRepoUsingToken
. Adding the parameter is a one-time cost of modifying ten call sites (three of which you're already touching); adding a new function is an ongoing cost to all maintainers because now we have to keep track of the differences between the two, and it is near-inevitable that some future maintainer will miss the existence of one of them and consequentially make a mistake.
I don't think hard-coding authInUrl
to be false
is a bad thing; you're effectively doing that when you use two separate functions, except it's not obvious at each call site that it's what you're doing. (The hardcoding is still present - it's just further down the call stack.) And if you're worried about enterprise (which I appreciate!) don't be - it only calls CloneRepoUsingToken
in one spot :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the CloneRepoUsingTokenInHeader
function and now we are using CloneRepoUsingToken
with authInUrl
parameter.
pkg/sources/github/github.go
Outdated
@@ -226,6 +228,9 @@ func (s *Source) Init(aCtx context.Context, name string, jobID sources.JobID, so | |||
} | |||
s.conn = &conn | |||
|
|||
// set useAuthInUrl before new connector | |||
s.useAuthInUrl = s.conn.AuthInUrl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed what the default OSS CLI behavior should be, but we haven't yet discussed what the enterprise behavior should be - which is, unfortunately, the opposite. Enterprise, until we change it, won't be writing s.conn.AuthInUrl
, so it will retain its default value of false
, meaning it will use your new functionality. Unfortunately, in enterprise, we typically want to always maintain the status quo until we explicitly change things, so the default value should be the current behavior, which is to keep the auth information in the URL.
You can resolve this by changing the protobuf field to something like remove_auth_from_url
, so that the default value of false
maintains the existing behavior. The OSS CLI would then set this field to be the negation of the CLI flag. (This will also apply to GitLab.)
I'm happy to discuss this synchronously on Slack if that's easier!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ✅
c472925
to
a001c15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've hardcoded authInUrl: false
for a few sources that you didn't make configurable on the command line. That's fine, but we should retain the legacy behavior of authInUrl: true
in those cases. I've marked them inline.
pkg/engine/github.go
Outdated
@@ -28,7 +28,7 @@ func (e *Engine) ScanGitHub(ctx context.Context, c sources.GithubConfig) (source | |||
IncludeWikis: c.IncludeWikis, | |||
SkipBinaries: c.SkipBinaries, | |||
CommentsTimeframeDays: c.CommentsTimeframeDays, | |||
AuthInUrl: c.AuthInUrl, | |||
RemoveAuthInUrl: !c.AuthInUrl, // Enterprise uses the opposite field in proto to keep credentials in the URL by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be AuthInHeader
- some people prefer "positive" boolean flags. This isn't a blocking comment, though!
pkg/sources/git/git.go
Outdated
@@ -276,7 +279,7 @@ func (s *Source) scanRepo(ctx context.Context, repoURI string, reporter sources. | |||
cloneFunc = func() (string, *git.Repository, error) { | |||
user := cred.BasicAuth.Username | |||
token := cred.BasicAuth.Password | |||
return CloneRepoUsingToken(ctx, token, repoURI, user) | |||
return CloneRepoUsingToken(ctx, token, repoURI, user, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't add any new flag configuration for the git
source, so we should retain the legacy behavior, which is to put the auth in the URL. (Right?)
pkg/sources/git/git.go
Outdated
@@ -1180,7 +1204,7 @@ func PrepareRepo(ctx context.Context, uriString string) (string, bool, error) { | |||
if !ok { | |||
return "", remote, fmt.Errorf("password must be included in Git repo URL when username is provided") | |||
} | |||
path, _, err = CloneRepoUsingToken(ctx, password, remotePath, uri.User.Username()) | |||
path, _, err = CloneRepoUsingToken(ctx, password, remotePath, uri.User.Username(), false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PrepareRepo
is used only by the git
source, so should also retain the legacy behavior of putting the auth in the URL.
@@ -595,7 +595,7 @@ func (s *Source) EnumerateAndScanAllObjects(ctx context.Context, chunksChan chan | |||
} | |||
|
|||
// download the repo | |||
path, repo, err := git.CloneRepoUsingToken(ctx, ghToken, repoURL, ghUser) | |||
path, repo, err := git.CloneRepoUsingToken(ctx, ghToken, repoURL, ghUser, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This source also doesn't have the setting flag-controlled, so should retain the legacy behavior.
pkg/sources/huggingface/repo.go
Outdated
@@ -61,7 +61,7 @@ func (s *Source) cloneRepo( | |||
return "", nil, err | |||
} | |||
case *sourcespb.Huggingface_Token: | |||
path, repo, err = git.CloneRepoUsingToken(ctx, s.huggingfaceToken, repoURL, "") | |||
path, repo, err = git.CloneRepoUsingToken(ctx, s.huggingfaceToken, repoURL, "", false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unconfigurable source should retain the legacy behavior.
Sorry, That was a mistake :) - I'll change those back to true to retain legacy behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for diving into this!
933ad17
to
afa543a
Compare
Description:
We previously included Git credentials directly in the repository URL during cloning. However, for one user, these credentials were being logged in Splunk, which posed a security concern.
To resolve this, I’ve updated the implementation to remove user credentials from the URL and instead pass them via the Authorization header using Basic Auth. This new approach is now the default behavior. For backward compatibility and in case any issues arise I’ve added a
--auth-in-url
flag. When this flag is used, credentials will be included in the URL as they were before.I’ve tested this change with both GitHub and GitLab sources, with and without the
--auth-in-url
flag, and everything is working as expected on my end.Checklist:
make test-community
)?make lint
this requires golangci-lint)?