Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter scans by GitHub repository visibility #1777

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

0x736E
Copy link

@0x736E 0x736E commented Sep 14, 2023

Description:

Implementation of Filter scans to only public repositories #1483.

Adds --visibility commandline argument which enables filtering scanning based on GitHub repository visibility status by providing a comma separated list. Acceptable visibility configuration includes "public", "private" and "shared" . For example, to only scan by repositories which are either private or shared: --visibility="private,shared"

Default behaviour is not changed as no filtering is applied in the absense of user defined visibility filter configuration.

Internally, a new ScanOptions object is created to represent GitHub specific scan options.

No new tests were created because there appears to be an absence of unit tests for credential verification, and the manner in which this feature works requires checking the user's credentials against a repository.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@0x736E 0x736E requested review from a team as code owners September 14, 2023 22:36
@zricethezav zricethezav added pkg/engine PRs and Issues related to the `engine` package pkg/sources PRs and Issues related to the `sources` package labels Sep 20, 2023
@zricethezav
Copy link
Collaborator

Thanks for opening this PR @0x736E, would you mind fixing the merge conflict? We'll get a review on this soon.

@zricethezav
Copy link
Collaborator

Overall, this PR LGTM and I think this is a useful feature.

@ahrav @0x1 @mcastorina would any of you care to give this a review too?

@0x736E
Copy link
Author

0x736E commented Sep 21, 2023

Merged with main and removed whitespace

@@ -87,6 +87,7 @@ var (
githubScanIssueComments = githubScan.Flag("issue-comments", "Include issue comments in scan.").Bool()
githubScanPRComments = githubScan.Flag("pr-comments", "Include pull request comments in scan.").Bool()
githubScanGistComments = githubScan.Flag("gist-comments", "Include gist comments in scan.").Bool()
githubVisibility = githubScan.Flag("visibility", `Repositories to include based on their visibility state. Example: "public", "private", "shared", "unknown"`).String()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: If the only valid values are the ones listed in the help text, we could use Enum() instead of String().

Suggested change
githubVisibility = githubScan.Flag("visibility", `Repositories to include based on their visibility state. Example: "public", "private", "shared", "unknown"`).String()
githubVisibility = githubScan.Flag("visibility", `Repositories to include based on their visibility state. Example: "public", "private", "shared", "unknown"`).Enum("public", "private", "shared", "unknown")

// AND the visibility of this repo is not in the allowed list
if len(s.scanOptions.Visibility) > 0 {
vis := s.visibilityOf(ctx, repoURL)
if !slices.Contains(s.scanOptions.Visibility, vis) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Converting the slice to a set will have better performance as the number of repos scanned increases.

visibilitySet := make(map[source_metadatapb.Visibility]struct{}, len(s.scanOptions.Visibility))
for _, vis := range s.scanOptions.Visibility {
    visibilitySet[vis] = struct{}{}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where using a PR as an excuse to learn a new language meets with reality. Thanks for the suggestion, I'll look into it and make the changes.

if len(s.scanOptions.Visibility) > 0 {
vis := s.visibilityOf(ctx, repoURL)
if !slices.Contains(s.scanOptions.Visibility, vis) {
s.log.V(2).Info("Skipping repository due to visibility requirements " + repoURL + " (" + vis.String() + ")")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (optional): Logs are more useful when the message is static

Suggested change
s.log.V(2).Info("Skipping repository due to visibility requirements " + repoURL + " (" + vis.String() + ")")
s.log.V(2).Info("Skipping repository due to visibility requirements", "repo", repoURL, "visibility", vis.String())

func NewScanOptions(options ...ScanOption) *ScanOptions {
scanOptions := &ScanOptions{
GitScanOptions: nil,
Visibility: []source_metadatapb.Visibility{},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (optional): nil is a valid and safe default slice value.

Comment on lines +23 to +32
func getValidVisibilityScanOption(optVis string) source_metadatapb.Visibility {
optVis = strings.TrimSpace(optVis)
switch optVis {
case "public", "private", "shared":
return source_metadatapb.Visibility(source_metadatapb.Visibility_value[optVis])

default:
return source_metadatapb.Visibility_invalid
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why allow unknown in the flag if it's going to result in an invalid visibility?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point, it shouldn't.

Unknown is more or less the result of an error in determining the visibility of a repo, for this use case it seems unwise to allow users to target only repos which cause errors.

@@ -159,6 +159,8 @@ enum Visibility {
private = 1;
shared = 2;
unknown = 3;

_invalid = -1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is something we should add to the proto. Instead, the validation should be performed by the CLI parsing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't particularly happy with this in the first instance, I will review it.

@mcastorina
Copy link
Collaborator

Overall this looks like a great addition! I'd prefer to leave the proto untouched though, which is why I requested changes.

@0x736E
Copy link
Author

0x736E commented Sep 21, 2023

I've reviewed your suggestions, I agree with them, so I'll make the appropriate changes and update this PR when ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg/engine PRs and Issues related to the `engine` package pkg/sources PRs and Issues related to the `sources` package
Development

Successfully merging this pull request may close these issues.

3 participants