-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overcome the 5000 requests/hr limit #118
Comments
Actually @bigdatasourav I've wound up here:
I'm not sure there are any others that should be retryable. So far, w/respect to the |
The motivation for this, btw, was the tensorflow script in https://github.com/turbot/steampipe-samples/tree/main/github-external-contributor-analysis. Previous iterations were able to complete within the limit, but that one needed an escape hatch. https://github.com/turbot/steampipe-plugin-github/tree/overcome-5k-calls-per-hr has a solution that worked. It also has extra logging for the github_user table which is how I saw the 404s happening. This is the second time I've used this strategy, the first being in a branch of the Slack plugin, https://github.com/turbot/steampipe-plugin-slack/tree/use-retry-hydrate. I'm getting the impression that for any API that imposes fixed-length waits, an approach like this may be warranted. It has occurred to me that the plugin sdk could expose all backoff the strategies available in https://github.com/sethvargo/go-retry, but I'm not sure if that would be helpful or advisable. |
Thanks for checking in about this, @bigdatasourav. I'm not able to reproduce what I was seeing w/respect to 404s, sorry to have muddied the waters there. I see what you mean about the max 10 retries. Since I've seen the rate reset window be as much as 40 minutes, a 1-minute backoff would still fail in such a case. We could back off in 5-minute increments, but ultimately we don't know whether GitHub will impose an even longer wait. Since the error tells us exactly when the reset will occur, and I've observed that GitHub always honors that, might it make sense to parse out the reset time from the error and wait until then? |
Update: A solution here should also account for the secondary rate limit, see: https://steampipe.slack.com/archives/C01UECB59A7/p1642267519007100?thread_ts=1639685923.062100&cid=C01UECB59A7 |
Thinking some more about this, for the
For the secondary rate limit, since it's not reflected in that API response, it might still be necessary to look for "exceeded a secondary rate limit" in the error message, and wait a minute if that's the message. Details of the GitHub plugin notwithstanding, the general question here for @kaidaguerre : How best could the plugin SDK make it possible for all plugins to interact with the retry logic in this way? And, should this issue be raised in |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
This issue was closed because it has been stalled for 90 days with no activity. |
'This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 30 days.' |
We're still looking at approaches to solve this issue, removing |
Thanks, it's a bummer that it's happening. The link above was a 404 - but I think I found the correct one, or something related. Is the stratgy to use a materialized view, as a cache, then query the view? That's not a horrible strategy and would maybe work! Here is the link to the sample I read about this idea from: |
@gs-rickchristy it's certainly possible to capture historical data in matviews (or just plain tables actually), and to do so in batches small enough to fly under the rate-limit radar. But we would still like to obviate the need for that strategy and are looking ways to be more resilient to those limits. Funnily enough, the above link is 404 because we're on the free plan for our community Slack and it now only keeps 90 days of history. So I'm using the Slack plugin to capture daily snapshots and I wish I had started doing so sooner! If you do save stuff this way, in addition to (or instead of) saving to Steampipe tables (or matviews) directly, you might want to save CSV files, and a great place to save them is GitHub. I wrote about that method here: https://www.infoworld.com/article/3668032/visualizing-the-hacker-news-api-with-hcl-and-sql.html. |
'This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.' |
'This issue was closed because it has been stalled for 90 days with no activity.' |
Hey @judell , for this issue, do you have any thoughts on where we should go next? Since the reset time for the 5000 requests is often long (30/40+ minutes), I don't think it's feasible for us to add wait times in the GitHub plugin, like we do for the abuse limit (which often has a reset time of a few minutes). |
'This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.' |
'This issue was closed because it has been stalled for 90 days with no activity.' |
@judell We have some planned upcoming work around using GitHub's GraphQL API where possible. Our initial tests show that we can be more efficient with API calls for the tables we looked at, so hoping that will help with the hourly rate limit. I'll keep this issue closed for now, as we'll create another issue to track the GraphQL updates, but if you have any other comments, please add them. |
If you hit that limit you're stuck for about 1/2 hour, and the built-in retry logic won't let you get past that.
Here's what we have now.
Since you're going to have to wait a long while before you can continue, I've tried this with success.
The text was updated successfully, but these errors were encountered: