overcome the 5000 requests/hr limit #118

judell · 2021-12-19T02:35:27Z

If you hit that limit you're stuck for about 1/2 hour, and the built-in retry logic won't let you get past that.

Here's what we have now.

func shouldRetryError(err error) bool {
	if _, ok := err.(*github.RateLimitError); ok {
		log.Printf("[WARN] Received Rate Limit Error")
		return true
	}
	return false
}

Since you're going to have to wait a long while before you can continue, I've tried this with success.

func shouldRetryError(err error) bool {
	if _, ok := err.(*github.RateLimitError); ok {
		log.Printf("[WARN] Received Rate Limit Error")
                time.Sleep(1 * time.Minute)
		return true
	}
	return false
}

The text was updated successfully, but these errors were encountered:

judell · 2021-12-20T01:52:47Z

Actually @bigdatasourav I've wound up here:

func shouldRetryError(err error) bool {
	if _, ok := err.(*github.RateLimitError); ok {
		log.Printf(`rate limit %+v`, err)
		time.Sleep(1 * time.Minute)
		return true
	} 
	log.Printf(`error %+v`, err)
	return false
}

I'm not sure there are any others that should be retryable. So far, w/respect to the github_user table, the false returns are 404s which I thought would be excluded by ShouldIgnoreError: isNotFoundError(...)?

judell · 2021-12-20T17:17:55Z

@bigdatasourav:

The motivation for this, btw, was the tensorflow script in https://github.com/turbot/steampipe-samples/tree/main/github-external-contributor-analysis. Previous iterations were able to complete within the limit, but that one needed an escape hatch. https://github.com/turbot/steampipe-plugin-github/tree/overcome-5k-calls-per-hr has a solution that worked. It also has extra logging for the github_user table which is how I saw the 404s happening.

This is the second time I've used this strategy, the first being in a branch of the Slack plugin, https://github.com/turbot/steampipe-plugin-slack/tree/use-retry-hydrate. I'm getting the impression that for any API that imposes fixed-length waits, an approach like this may be warranted.

It has occurred to me that the plugin sdk could expose all backoff the strategies available in https://github.com/sethvargo/go-retry, but I'm not sure if that would be helpful or advisable.

judell · 2021-12-21T20:27:54Z

Thanks for checking in about this, @bigdatasourav.

I'm not able to reproduce what I was seeing w/respect to 404s, sorry to have muddied the waters there.

I see what you mean about the max 10 retries. Since I've seen the rate reset window be as much as 40 minutes, a 1-minute backoff would still fail in such a case. We could back off in 5-minute increments, but ultimately we don't know whether GitHub will impose an even longer wait.

Since the error tells us exactly when the reset will occur, and I've observed that GitHub always honors that, might it make sense to parse out the reset time from the error and wait until then?

judell · 2022-01-15T17:34:26Z

Update: A solution here should also account for the secondary rate limit, see: https://steampipe.slack.com/archives/C01UECB59A7/p1642267519007100?thread_ts=1639685923.062100&cid=C01UECB59A7

judell · 2022-01-15T18:59:31Z

Thinking some more about this, for the core_limit and search_limit, if the signature for an ErrorPredicate included a context, then a shouldRetryError function could use it to find when the reset will occur, and wait accordingly. That would be better than parsing the error string, which also contains that info.

// sketch of idea
func shouldRetryError(ctx context.Context, err error) bool {
	if _, ok := err.(*github.RateLimitError); ok {
        log.Printf(`rate limit %+v`, err)		
        client := connect(ctx, d)
        detail, resp, err := client.RateLimits(ctx)
        // use this info to find when the reset will occur
        // then sleep until the limit
		return true
	}
	return false
}

For the secondary rate limit, since it's not reflected in that API response, it might still be necessary to look for "exceeded a secondary rate limit" in the error message, and wait a minute if that's the message.

Details of the GitHub plugin notwithstanding, the general question here for @kaidaguerre : How best could the plugin SDK make it possible for all plugins to interact with the retry logic in this way? And, should this issue be raised in steampipe-plugin-sdk?

github-actions · 2022-04-21T23:32:10Z

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions · 2022-05-21T23:34:04Z

This issue was closed because it has been stalled for 90 days with no activity.

github-actions · 2022-09-23T23:37:14Z

'This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 30 days.'

cbruno10 · 2022-09-23T23:43:25Z

We're still looking at approaches to solve this issue, removing stale label.

gs-rickchristy · 2022-09-29T18:02:34Z

Thanks, it's a bummer that it's happening. The link above was a 404 - but I think I found the correct one, or something related. Is the stratgy to use a materialized view, as a cache, then query the view? That's not a horrible strategy and would maybe work! Here is the link to the sample I read about this idea from:
https://github.com/turbot/steampipe-samples/tree/main/all/github-activity

judell · 2022-10-03T03:27:58Z

@gs-rickchristy it's certainly possible to capture historical data in matviews (or just plain tables actually), and to do so in batches small enough to fly under the rate-limit radar. But we would still like to obviate the need for that strategy and are looking ways to be more resilient to those limits.

Funnily enough, the above link is 404 because we're on the free plan for our community Slack and it now only keeps 90 days of history. So I'm using the Slack plugin to capture daily snapshots and I wish I had started doing so sooner!

If you do save stuff this way, in addition to (or instead of) saving to Steampipe tables (or matviews) directly, you might want to save CSV files, and a great place to save them is GitHub. I wrote about that method here: https://www.infoworld.com/article/3668032/visualizing-the-hacker-news-api-with-hcl-and-sql.html.

github-actions · 2022-12-02T23:31:53Z

'This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.'

github-actions · 2023-01-02T23:31:43Z

'This issue was closed because it has been stalled for 90 days with no activity.'

cbruno10 · 2023-01-04T19:18:29Z

Hey @judell , for this issue, do you have any thoughts on where we should go next? Since the reset time for the 5000 requests is often long (30/40+ minutes), I don't think it's feasible for us to add wait times in the GitHub plugin, like we do for the abuse limit (which often has a reset time of a few minutes).

github-actions · 2023-03-06T23:31:52Z

'This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.'

github-actions · 2023-04-06T23:31:43Z

'This issue was closed because it has been stalled for 90 days with no activity.'

cbruno10 · 2023-04-07T13:10:08Z

@judell We have some planned upcoming work around using GitHub's GraphQL API where possible. Our initial tests show that we can be more efficient with API calls for the tables we looked at, so hoping that will help with the hourly rate limit.

I'll keep this issue closed for now, as we'll create another issue to track the GraphQL updates, but if you have any other comments, please add them.

judell added the enhancement New feature or request label Dec 19, 2021

rajlearner17 assigned bigdatasourav Dec 19, 2021

judell mentioned this issue Dec 20, 2021

use plugin.retryHydrate turbot/steampipe-plugin-slack#23

Closed

judell mentioned this issue Feb 1, 2022

could plugins tell the engine about rate limits? turbot/steampipe-plugin-sdk#194

Open

github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Apr 21, 2022

github-actions bot closed this as completed May 21, 2022

judell removed the stale No recent activity has been detected on this issue/PR and it will be closed label May 23, 2022

judell reopened this Aug 24, 2022

github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Sep 23, 2022

cbruno10 removed the stale No recent activity has been detected on this issue/PR and it will be closed label Sep 23, 2022

github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Dec 2, 2022

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 2, 2023

judell reopened this Jan 4, 2023

github-actions bot removed the stale No recent activity has been detected on this issue/PR and it will be closed label Jan 4, 2023

github-actions bot added the stale No recent activity has been detected on this issue/PR and it will be closed label Mar 6, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 6, 2023

electriquo mentioned this issue Jan 4, 2024

rate-limit makes the plugin unusable #398

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overcome the 5000 requests/hr limit #118

overcome the 5000 requests/hr limit #118

judell commented Dec 19, 2021 •

edited

Loading

judell commented Dec 20, 2021

judell commented Dec 20, 2021

judell commented Dec 21, 2021

judell commented Jan 15, 2022

judell commented Jan 15, 2022 •

edited

Loading

github-actions bot commented Apr 21, 2022

github-actions bot commented May 21, 2022

github-actions bot commented Sep 23, 2022

cbruno10 commented Sep 23, 2022

gs-rickchristy commented Sep 29, 2022 •

edited

Loading

judell commented Oct 3, 2022 •

edited

Loading

github-actions bot commented Dec 2, 2022

github-actions bot commented Jan 2, 2023

cbruno10 commented Jan 4, 2023

github-actions bot commented Mar 6, 2023

github-actions bot commented Apr 6, 2023

cbruno10 commented Apr 7, 2023

overcome the 5000 requests/hr limit #118

overcome the 5000 requests/hr limit #118

Comments

judell commented Dec 19, 2021 • edited Loading

judell commented Dec 20, 2021

judell commented Dec 20, 2021

judell commented Dec 21, 2021

judell commented Jan 15, 2022

judell commented Jan 15, 2022 • edited Loading

github-actions bot commented Apr 21, 2022

github-actions bot commented May 21, 2022

github-actions bot commented Sep 23, 2022

cbruno10 commented Sep 23, 2022

gs-rickchristy commented Sep 29, 2022 • edited Loading

judell commented Oct 3, 2022 • edited Loading

github-actions bot commented Dec 2, 2022

github-actions bot commented Jan 2, 2023

cbruno10 commented Jan 4, 2023

github-actions bot commented Mar 6, 2023

github-actions bot commented Apr 6, 2023

cbruno10 commented Apr 7, 2023

judell commented Dec 19, 2021 •

edited

Loading

judell commented Jan 15, 2022 •

edited

Loading

gs-rickchristy commented Sep 29, 2022 •

edited

Loading

judell commented Oct 3, 2022 •

edited

Loading