Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overcome the 5000 requests/hr limit #118

Closed
judell opened this issue Dec 19, 2021 · 17 comments
Closed

overcome the 5000 requests/hr limit #118

judell opened this issue Dec 19, 2021 · 17 comments
Assignees
Labels
enhancement New feature or request stale No recent activity has been detected on this issue/PR and it will be closed

Comments

@judell
Copy link
Contributor

judell commented Dec 19, 2021

If you hit that limit you're stuck for about 1/2 hour, and the built-in retry logic won't let you get past that.

Here's what we have now.

func shouldRetryError(err error) bool {
	if _, ok := err.(*github.RateLimitError); ok {
		log.Printf("[WARN] Received Rate Limit Error")
		return true
	}
	return false
}

Since you're going to have to wait a long while before you can continue, I've tried this with success.

func shouldRetryError(err error) bool {
	if _, ok := err.(*github.RateLimitError); ok {
		log.Printf("[WARN] Received Rate Limit Error")
                time.Sleep(1 * time.Minute)
		return true
	}
	return false
}
@judell judell added the enhancement New feature or request label Dec 19, 2021
@judell
Copy link
Contributor Author

judell commented Dec 20, 2021

Actually @bigdatasourav I've wound up here:

func shouldRetryError(err error) bool {
	if _, ok := err.(*github.RateLimitError); ok {
		log.Printf(`rate limit %+v`, err)
		time.Sleep(1 * time.Minute)
		return true
	} 
	log.Printf(`error %+v`, err)
	return false
}

I'm not sure there are any others that should be retryable. So far, w/respect to the github_user table, the false returns are 404s which I thought would be excluded by ShouldIgnoreError: isNotFoundError(...)?

@judell
Copy link
Contributor Author

judell commented Dec 20, 2021

@bigdatasourav:

The motivation for this, btw, was the tensorflow script in https://github.com/turbot/steampipe-samples/tree/main/github-external-contributor-analysis. Previous iterations were able to complete within the limit, but that one needed an escape hatch. https://github.com/turbot/steampipe-plugin-github/tree/overcome-5k-calls-per-hr has a solution that worked. It also has extra logging for the github_user table which is how I saw the 404s happening.

This is the second time I've used this strategy, the first being in a branch of the Slack plugin, https://github.com/turbot/steampipe-plugin-slack/tree/use-retry-hydrate. I'm getting the impression that for any API that imposes fixed-length waits, an approach like this may be warranted.

It has occurred to me that the plugin sdk could expose all backoff the strategies available in https://github.com/sethvargo/go-retry, but I'm not sure if that would be helpful or advisable.

@judell
Copy link
Contributor Author

judell commented Dec 21, 2021

Thanks for checking in about this, @bigdatasourav.

I'm not able to reproduce what I was seeing w/respect to 404s, sorry to have muddied the waters there.

I see what you mean about the max 10 retries. Since I've seen the rate reset window be as much as 40 minutes, a 1-minute backoff would still fail in such a case. We could back off in 5-minute increments, but ultimately we don't know whether GitHub will impose an even longer wait.

Since the error tells us exactly when the reset will occur, and I've observed that GitHub always honors that, might it make sense to parse out the reset time from the error and wait until then?

@judell
Copy link
Contributor Author

judell commented Jan 15, 2022

Update: A solution here should also account for the secondary rate limit, see: https://steampipe.slack.com/archives/C01UECB59A7/p1642267519007100?thread_ts=1639685923.062100&cid=C01UECB59A7

@judell
Copy link
Contributor Author

judell commented Jan 15, 2022

Thinking some more about this, for the core_limit and search_limit, if the signature for an ErrorPredicate included a context, then a shouldRetryError function could use it to find when the reset will occur, and wait accordingly. That would be better than parsing the error string, which also contains that info.

// sketch of idea
func shouldRetryError(ctx context.Context, err error) bool {
	if _, ok := err.(*github.RateLimitError); ok {
        log.Printf(`rate limit %+v`, err)		
        client := connect(ctx, d)
        detail, resp, err := client.RateLimits(ctx)
        // use this info to find when the reset will occur
        // then sleep until the limit
		return true
	}
	return false
}

For the secondary rate limit, since it's not reflected in that API response, it might still be necessary to look for "exceeded a secondary rate limit" in the error message, and wait a minute if that's the message.

Details of the GitHub plugin notwithstanding, the general question here for @kaidaguerre : How best could the plugin SDK make it possible for all plugins to interact with the retry logic in this way? And, should this issue be raised in steampipe-plugin-sdk?

@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Apr 21, 2022
@github-actions
Copy link

This issue was closed because it has been stalled for 90 days with no activity.

@judell judell removed the stale No recent activity has been detected on this issue/PR and it will be closed label May 23, 2022
@judell judell reopened this Aug 24, 2022
@github-actions
Copy link

'This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 30 days.'

@github-actions github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Sep 23, 2022
@cbruno10 cbruno10 removed the stale No recent activity has been detected on this issue/PR and it will be closed label Sep 23, 2022
@cbruno10
Copy link
Contributor

We're still looking at approaches to solve this issue, removing stale label.

@gs-rickchristy
Copy link

gs-rickchristy commented Sep 29, 2022

Thanks, it's a bummer that it's happening. The link above was a 404 - but I think I found the correct one, or something related. Is the stratgy to use a materialized view, as a cache, then query the view? That's not a horrible strategy and would maybe work! Here is the link to the sample I read about this idea from:
https://github.com/turbot/steampipe-samples/tree/main/all/github-activity

@judell
Copy link
Contributor Author

judell commented Oct 3, 2022

@gs-rickchristy it's certainly possible to capture historical data in matviews (or just plain tables actually), and to do so in batches small enough to fly under the rate-limit radar. But we would still like to obviate the need for that strategy and are looking ways to be more resilient to those limits.

Funnily enough, the above link is 404 because we're on the free plan for our community Slack and it now only keeps 90 days of history. So I'm using the Slack plugin to capture daily snapshots and I wish I had started doing so sooner!

If you do save stuff this way, in addition to (or instead of) saving to Steampipe tables (or matviews) directly, you might want to save CSV files, and a great place to save them is GitHub. I wrote about that method here: https://www.infoworld.com/article/3668032/visualizing-the-hacker-news-api-with-hcl-and-sql.html.

@github-actions
Copy link

github-actions bot commented Dec 2, 2022

'This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.'

@github-actions github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Dec 2, 2022
@github-actions
Copy link

github-actions bot commented Jan 2, 2023

'This issue was closed because it has been stalled for 90 days with no activity.'

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 2, 2023
@judell judell reopened this Jan 4, 2023
@cbruno10
Copy link
Contributor

cbruno10 commented Jan 4, 2023

Hey @judell , for this issue, do you have any thoughts on where we should go next? Since the reset time for the 5000 requests is often long (30/40+ minutes), I don't think it's feasible for us to add wait times in the GitHub plugin, like we do for the abuse limit (which often has a reset time of a few minutes).

@github-actions github-actions bot removed the stale No recent activity has been detected on this issue/PR and it will be closed label Jan 4, 2023
@github-actions
Copy link

github-actions bot commented Mar 6, 2023

'This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.'

@github-actions github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Mar 6, 2023
@github-actions
Copy link

github-actions bot commented Apr 6, 2023

'This issue was closed because it has been stalled for 90 days with no activity.'

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 6, 2023
@cbruno10
Copy link
Contributor

cbruno10 commented Apr 7, 2023

@judell We have some planned upcoming work around using GitHub's GraphQL API where possible. Our initial tests show that we can be more efficient with API calls for the tables we looked at, so hoping that will help with the hourly rate limit.

I'll keep this issue closed for now, as we'll create another issue to track the GraphQL updates, but if you have any other comments, please add them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale No recent activity has been detected on this issue/PR and it will be closed
Projects
None yet
Development

No branches or pull requests

4 participants