Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Githubreceiver fails with ERROR 404 trying to login api.github.com #38393

Closed
bayer-alvaro opened this issue Mar 5, 2025 · 9 comments
Closed

Comments

@bayer-alvaro
Copy link

bayer-alvaro commented Mar 5, 2025

Component(s)

receiver/github

Describe the issue you're reporting

I've tried several different configurations, switching endpoint between "https://api.github.com" and " "https://api.github.com/graphql". All fail with

error scraperhelper@v0.120.1-0.20250303102058-a9bca17f1a4c/obs_metrics.go:61 Error scraping metrics {"otelcol.component.id": "github", "otelcol.component.kind": "Receiver", "otelcol.signal": "metrics", "error": "returned error 404: {\"data\":null}"}

My config file

extensions:
    bearertokenauth/github:
        token: ${env:GH_TOKEN}

receivers:
    github:
        initial_delay: 1s
        collection_interval: 120s
        scrapers:
            scraper:
#                metrics:
#                    vcs.repository.contributor.count:
#                        enabled: true
                github_org: "myorg"
                search_query: "org:myorg topic:beat12345678"
                endpoint: "https://api.github.com"
                auth:
                    authenticator: bearertokenauth/github

exporters:
  awsemf/application:
    namespace: 'BAY_CDP_SERVICES/Otel/Application'
    region: 'eu-central-1'
    log_group_name: '/bay-cdp-custom/github'
    dimension_rollup_option: NoDimensionRollup
    resource_to_telemetry_conversion:
      enabled: true

service:
    extensions: [bearertokenauth/github]
    pipelines:
        metrics:
            receivers: [github]
            processors: []
            exporters: [awsemf/application]

@bayer-alvaro bayer-alvaro added the needs triage New item requiring triage label Mar 5, 2025
Copy link
Contributor

github-actions bot commented Mar 8, 2025

Pinging code owners for receiver/github: @adrielp @andrzej-stencel @crobert-1 @TylerHelmuth. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

@adrielp
Copy link
Contributor

adrielp commented Mar 8, 2025

@bayer-alvaro - what are the permissions assigned on your authentication token? That specific message is a returned call from GitHub. Usually the 404 data null simply means permissions don't allow you to make the query search. Security through obscurity, but the practice is don't let anyone know a repo (or rather resource) exists if you don't have the required permissions to view it. This is how the endpoint value is used https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/githubreceiver/internal/scraper/githubscraper/helpers.go#L126-L150. You don't need to set api or anything.

@adrielp
Copy link
Contributor

adrielp commented Mar 8, 2025

/label -needs-triage waiting-for-author

@github-actions github-actions bot added waiting for author and removed needs triage New item requiring triage labels Mar 8, 2025
@bayer-alvaro
Copy link
Author

@bayer-alvaro - what are the permissions assigned on your authentication token? That specific message is a returned call from GitHub. Usually the 404 data null simply means permissions don't allow you to make the query search. Security through obscurity, but the practice is don't let anyone know a repo (or rather resource) exists if you don't have the required permissions to view it. This is how the endpoint value is used https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/githubreceiver/internal/scraper/githubscraper/helpers.go#L126-L150. You don't need to set api or anything.

Hi @adrielp , thanks for your reply but the token has ORG level Read permissions. Indeed, I ensured it does by directly requesting, using same token

curl -H "Authorization: bearer $TOKEN" https://api.github.com/search/repositories?q=org:myorg+topic:beat12345678&per_page=3

And getting expected json response with no issues at all

@adrielp
Copy link
Contributor

adrielp commented Mar 10, 2025

@bayer-alvaro - In the original message, I'm not seeing any mention of login. That message is a GraphQL response saying that the query made didn't return any data on a scrape of metrics.

Login errors would show up as failed to login through GraphQL... and I don't see that in the snippet of the log presented.

Also, GraphQL and REST have different auth flows behind the scenes (on GitHubs side). Login is only through GraphQL

The Read-Only token Repository permissions needed are:

  • administration
  • commit statuses
  • contents
  • custom properties
  • issues
  • metadata
  • pull requests

From there, ensure your token is given access to the repositories within your org (GitHub used to require a request be approved to do that for orgs)

Please remove the endpoint change prior to retesting with updated auth token. And I'd encourage you to use the search query feature to search for a smaller subset of repos based on rate limiting with a high collection interval.

I'm going to leave this issue open for a little bit. If you'd like to get a quick response and have a small working session I'm on the CNCF slack so feel free to reach out. This appears to be a config issue, not an issue with the receiver itself.

@bayer-alvaro
Copy link
Author

bayer-alvaro commented Mar 11, 2025

Hi back @adrielp ,

Following your advices, I am getting some different results. Seems like now scraping and facing some other issues I'll have to figure out apart from this one.

Something I do not understand: as suggested by you, I removed endpoint.

❯ cat /etc/otelcol/config.yaml
extensions:
    bearertokenauth/github:
        token: ${env:GH_TOKEN}

receivers:
    github:
        initial_delay: 1s
        collection_interval: 120s
        scrapers:
            scraper:
              #                metrics:
              #      vcs.repository.contributor.count:
              #          enabled: true
                github_org: "myorg"
                search_query: "org:myorg topic:beat12345678"
                  #endpoint: "https://api.github.com"
                auth:
                    authenticator: bearertokenauth/github

Is it implicit ? Or that config fields is meant only for self managed github?

@bayer-alvaro
Copy link
Author

bayer-alvaro commented Mar 11, 2025

Scraper throws back only this same error all time

error getting branch count: {error 26 0 returned error 502: {"data":null,"errors":[{"message":"Something went wrong while executing your query. This may be the result of a timeout, or it could be a GitHub bug. Please include FA61:12B6:19700D:1A3797:67D0661E when reporting this issue."}]}} {"otelcol.component.id": "github", "otelcol.component.kind": "Receiver", "otelcol.signal": "metrics"}

Highlighting the fact to report code FA61:12B6:19700D:1A3797:67D0661E not sure now if this is from Github or Otelcontrib Githubreceiver

I guess this may be related to limit query, but beyond org and topic, not sure what else ... How many repositories may be considered like "too much" ?

...
                github_org: "myorg"
                search_query: "org:myorg topic:beat12345678"
                  #endpoint: "https://api.github.com"
                auth:
...

@bayer-alvaro
Copy link
Author

Closing as login issue is overcome. Thanks @adrielp

@adrielp
Copy link
Contributor

adrielp commented Mar 12, 2025

Something I do not understand: as suggested by you, I removed endpoint.
.....
Is it implicit ? Or that config fields is meant only for self managed github?

Yes, it's implicit. Here's the code that builds the clients. If endpoint is set for self-managed GitHub instances then it'll rebuild the proper REST and GraphQL API endpoints in the code. This receiver, for scraping, uses both REST and GraphQL where it makes sense.

So, endpoint is only needed if it's a self-managed server and should only be the regular URL like https://mygithubserver.com.

For the 502 error, I haven't seen this error in a long time. It's an internal server error on the GitHub side. Here's an issue (and another) from other communities who have similarly experienced this.

I'm working on adding exponential backoff retries for rate limiting. I might be able to figure something out here, but headers don't actually come back through the library for graphql we're using. I'm really surprised to see a 502 though. this code was specifically set to 50 to address this on branches and haven't seen a 502 pop up sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants