[Feature Request]: Improve LLMExtractionStrategy support for custom OpenAI-compatible endpoints #2002
samueltian666
started this conversation in
Feature requests
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What needs to be done?
improve LLMExtractionStrategy compatibility with custom OpenAI-compatible /v1/chat/completions endpoints, not just the official OpenAI API.
In my testing, the built-in Crawl4AI LLM extraction flow works correctly with the official OpenAI endpoint, but fails with some custom OpenAI-compatible providers even when those providers accept valid raw /chat/completions requests.
It would be useful if Crawl4AI could either:
fully support more OpenAI-compatible custom endpoints through the built-in LLMExtractionStrategy, or
document / expose a clearer fallback mode for custom providers that are not fully compatible with the current litellm request path.
What problem does this solve?
At the moment, users may assume that any service exposing an OpenAI-compatible /v1/chat/completions API will work with LLMExtractionStrategy.
However, in practice:
direct manual HTTP requests to the provider succeed
the same provider fails when used through crawl4ai -> litellm -> LLMExtractionStrategy
This makes it difficult to use Crawl4AI with self-hosted or third-party OpenAI-compatible inference services.
Target users/beneficiaries
Developers using self-hosted OpenAI-compatible inference services
Teams using private gateways or custom LLM proxy endpoints
Users who want to combine Crawl4AI extraction with non-OpenAI official backends
Current alternatives/workarounds
Current workaround is:
use Crawl4AI only for crawling and content extraction
manually send the crawled content to the custom /chat/completions endpoint outside LLMExtractionStrategy
This works, but it bypasses Crawl4AI’s built-in LLM extraction pipeline and makes integration less clean.
Proposed approach
verify whether LLMExtractionStrategy should route some custom endpoints through a more flexible OpenAI-compatible mode instead of the default provider path
improve compatibility with custom OpenAI-compatible providers when base_url is provided
document the exact compatibility expectations for litellm-based LLM extraction versus raw OpenAI-compatible HTTP APIs
In my testing:
official OpenAI works correctly through LLMExtractionStrategy
some custom OpenAI-compatible providers accept raw HTTP requests but fail through the litellm path
So this seems less like Crawl4AI ignoring base_url, and more like a compatibility gap between certain OpenAI-compatible providers and the current litellm request flow.
Beta Was this translation helpful? Give feedback.
All reactions