[Feature Request]: Rate Limiting on LLM API calls #913
Replies: 2 comments
-
|
@roablep We use LiteLLM under the hood. They provide a proxy implementation in which, you can do all rate limiting, budget planning etc. Then you pass this proxy url as the Then LiteLLM (under Crawl4AI) will route your queries through the proxy which will throttle/manage your queries and put it back together. This way Crawl4AI is agnostic of your LLM provider choice and setup. |
Beta Was this translation helpful? Give feedback.
-
|
Great! Perhaps we add this to the documentation at https://docs.crawl4ai.com/extraction/llm-strategies/ Also - typo in the docs: s/LightLLM/LiteLLMs/g Here's a proposed addition to the documentation - Advanced: Rate Limiting & Budgeting with LiteLLM ProxyIf you're working with LLM APIs that have rate limits, quota caps, or cost controls, you can enable full control over throttling, retries, and budgets by using LiteLLM’s proxy. Crawl4AI uses LiteLLM under the hood to make its LLM calls. By default, we talk directly to the model provider (e.g., OpenAI, Ollama, etc.)—but you can optionally route all LLM requests through your own LiteLLM proxy. How to Use It in Crawl4AI
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What needs to be done?
Right now I think the Crawl4AI rate limiting is for the web crawl and not the LLM call, right?
I'd like to implement rate limits per LLM vendor and LLM model so that I can avoid hitting LLM API rate caps.
What problem does this solve?
LLM errors
Target users/beneficiaries
No response
Current alternatives/workarounds
No response
Proposed approach
No response
Beta Was this translation helpful? Give feedback.
All reactions