Skip to content

[GitHub] Add a GraphQL client to the connector #3837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rgmz
Copy link
Contributor

@rgmz rgmz commented Jan 11, 2025

Description:

This introduces a latent GraphQL client without making any functional changes. The GraphQL API is required for certain features (e.g., #1906), and it is already being manually called in at least once place:

req, err := http.NewRequest("POST", "https://api.github.com/graphql", bytes.NewBuffer(requestBody))
if err != nil {
return nil, fmt.Errorf("failed to create request: %w", err)
}

shurcooL/githubv4 is the library recommended by the authors of google/go-github.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@rgmz rgmz requested review from a team as code owners January 11, 2025 15:10
@rgmz rgmz marked this pull request as draft January 11, 2025 15:11
@rgmz rgmz force-pushed the feat/github-graphql branch from b3fcb92 to c2bc286 Compare January 11, 2025 15:34
@rgmz rgmz marked this pull request as ready for review January 11, 2025 15:36
@rgmz rgmz force-pushed the feat/github-graphql branch 3 times, most recently from 37ec6bc to c067504 Compare January 11, 2025 18:38
Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty cool! It looks like these clients are all unauthenticated - out of curiosity, do you foresee them ever being authenticated?

default:
return nil, fmt.Errorf("unknown connection type")
}
}

func createAPIClient(ctx context.Context, httpClient *http.Client, apiEndpoint string) (*github.Client, error) {
ctx.Logger().WithName("github").V(2).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this logger name be added way higher than the call stack, at entry to the overall github code? I presume you didn't do that because github.com/trufflesecurity/trufflehog/v3/pkg/context doesn't support WithName, but I don't think kludging around that by adding contextual information in the "wrong" spot like this is a good solution overall because we've seen it contribute to bifurcation of our loggers (where a chunk of code is using two loggers at once, each with different contextual information), which in turn has caused concrete debugging problems.

To be clear, I'm not worried that this specific code change will cause an issue - I'm worried that in nine months somebody else is going to copy the pattern somewhere else without thinking too hard about it and create a maintainability problem. If you want to add this piece of context, I think it's safer in the long term to add it at the entry to the github code either using WithValue or with a new WithName that you add to our logging package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the current implementation doesn't make it easy to have a named logger for a package. You either have to call .WithName() repeatedly or pass a logger separately -- neither are ideal.

I think it's safer in the long term to add ... a new WithName that you add to our logging package.

I can try to tackle that in a different PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithName functionality can't be added to the logger Context. Calls to WithName are appended to the logger, meaning that a context passed between packages could become trufflehog.github.git.detector.etc.

The best approach is probably a simple helper function in the package.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is what you're going for, but a logger in a context can be modified via:

ctx = context.WithLogger(ctx, ctx.Logger().WithName("github"))

Not the most ergonomic, but imo not completely foreign (might be just me since I wrote trufflehog/pkg/context and trufflehog/pkg/log though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is what you're going for, but a logger in a context can be modified via:

ctx = context.WithLogger(ctx, ctx.Logger().WithName("github"))

Not the most ergonomic, but imo not completely foreign (might be just me since I wrote trufflehog/pkg/context and trufflehog/pkg/log though).

I think this would still have the issue of 'unbounded' logger names:

Calls to WithName are appended to the logger, meaning that a context passed between packages could become trufflehog.github.git.detector.etc.

There's no way to unset or overwrite the logger name that I'm aware of.

@rgmz
Copy link
Contributor Author

rgmz commented Jan 14, 2025

Pretty cool! It looks like these clients are all unauthenticated - out of curiosity, do you foresee them ever being authenticated?

It should inherit authentication from the HTTP client's transport:

httpClient := common.RetryableHTTPClientTimeout(int64(httpTimeoutSeconds))
tokenSource := oauth2.StaticTokenSource(&oauth2.Token{AccessToken: token})
httpClient.Transport = &oauth2.Transport{
Base: httpClient.Transport,
Source: tokenSource,
}

@rosecodym
Copy link
Collaborator

🤦 no idea how i missed that

@rgmz rgmz force-pushed the feat/github-graphql branch 3 times, most recently from d0bd942 to 6c7f34e Compare January 20, 2025 14:55
@rgmz rgmz force-pushed the feat/github-graphql branch 4 times, most recently from b3ede0c to bec5710 Compare February 2, 2025 23:02
@rgmz rgmz force-pushed the feat/github-graphql branch 4 times, most recently from 4fd5aa6 to 9c1c695 Compare February 17, 2025 16:24
@rgmz rgmz force-pushed the feat/github-graphql branch from 9c1c695 to fa55481 Compare March 8, 2025 18:22
@rgmz rgmz force-pushed the feat/github-graphql branch 2 times, most recently from 64467bb to 0d733d9 Compare March 23, 2025 22:08
@rgmz rgmz force-pushed the feat/github-graphql branch 2 times, most recently from 6627b2c to 7e7fcdb Compare April 6, 2025 16:37
@rgmz rgmz force-pushed the feat/github-graphql branch from 7e7fcdb to 7208c39 Compare April 12, 2025 18:04
@rgmz rgmz requested a review from rosecodym April 12, 2025 18:12
@rgmz rgmz force-pushed the feat/github-graphql branch from 7208c39 to 97aa98c Compare April 18, 2025 02:24
@rgmz
Copy link
Contributor Author

rgmz commented Apr 18, 2025

Hey @rosecodym, do you have any other concerns with this PR? If not, can it be merged?

@rgmz rgmz force-pushed the feat/github-graphql branch 2 times, most recently from b6d92ed to 21fbed3 Compare May 1, 2025 14:10
@rgmz rgmz force-pushed the feat/github-graphql branch 2 times, most recently from b8ececf to 72a02fb Compare May 20, 2025 23:08
@rgmz rgmz force-pushed the feat/github-graphql branch from 72a02fb to 21d7c41 Compare May 24, 2025 14:23
@kashifkhan0771
Copy link
Contributor

Hi @rgmz, could you please resolve the conflicts when you have a moment? We're planning an enhancement to the GitHub source, and this looks like a great starting point for that work.

@rosecodym
Copy link
Collaborator

@rgmz Sorry, I totally missed your re-review request! My sincere apologies. I'll do it later today.

if err != nil {
return nil, fmt.Errorf("could not create API client: %w", err)
}

graphqlClient, err := createGraphqlClient(ctx, httpClient, apiEndpoint)
if err != nil {
return nil, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wrap the API client creation error; we should either wrap this one as well, or wrap neither.

if err != nil {
return nil, fmt.Errorf("could not create API client: %w", err)
}

graphqlClient, err := createGraphqlClient(ctx, httpClient, apiEndpoint)
if err != nil {
return nil, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same error-wrapping comment as above


graphqlClient, err := createGraphqlClient(ctx, httpClient, apiEndpoint)
if err != nil {
return nil, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same error-wrapping comment as above

parsedURL.Path = before + "/api/graphql"
graphqlEndpoint = parsedURL.String()
}
getLogger(ctx).V(2).Info("Creating GraphQL client", "url", graphqlEndpoint)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the original reason you used a named logger, but I don't see anything in the current iteration of the PR that requires it. Am I missing something or can you replace this with normal ctx.Logger() stuff?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the original reason you used a named logger, but I don't see anything in the current iteration of the PR that requires it. Am I missing something or can you replace this with normal ctx.Logger() stuff?

Seems like good practice to use a named logger, no? Especially with #4172 coming soon, otherwise logs won't have source attribution (unless I'm missing something obvious here)

Copy link
Collaborator

@rosecodym rosecodym May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other places we use logger names in the codebase is in a handful of detectors and at the very top of the program where we set the logger name to trufflehog. I don't think it's necessarily a bad pattern, but if we're going to introduce it I'd like to do so with a plan - having our sources log things using an inconsistent set of techniques has historically caused problems when people expect things to work one way but they work a different way. (The new usage here also suggests a caller should persist the generated logger - even though this particular PR immediately throws it away - which has also historically caused logging information to be dropped.)

The fact that this source's logging is coming from a GitHub source should already be indicated by keys that are attached to the injected context higher up the call stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants