-
Notifications
You must be signed in to change notification settings - Fork 655
Integration: Bytez Chat Model Provider #1175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…Params in the context's requestOptions.
Description🔄 What ChangedThis Pull Request introduces significant enhancements across the Portkey AI Gateway, focusing on new model provider integrations, improved robustness, and expanded feature support. Key changes include:
🔍 Impact of the ChangeThese changes significantly expand the gateway's capabilities by integrating new AI models and enhancing existing provider interactions. The improvements in error handling, circuit breaking, and logging contribute to a more reliable, observable, and maintainable system. The refined plugin architecture offers greater flexibility and control over request/response flows, while multi-modal embedding support broadens the range of applications. 📁 Total Files Changed
🧪 Test AddedGiven the nature and scope of changes, the following tests are likely to have been added or updated:
🔒Security VulnerabilitiesNo direct security vulnerabilities were detected in the provided patch. The changes, particularly the improved error logging ( MotivationThis PR aims to expand the Portkey AI Gateway's compatibility with a wider range of LLM providers, enhance its core routing and plugin functionalities for improved reliability and observability, and provide more flexible and standardized interactions with various AI models. Type of Change
How Has This Been Tested?
Screenshots (if applicable)N/A Checklist
Related IssuesN/A Tip Quality Recommendations
Sequence DiagramsequenceDiagram
participant U as User
participant PG as Portkey Gateway
participant HM as Hooks Manager
participant PP as Portkey Plugins
participant PIA as Portkey Internal API
participant RM as Router Module
participant CB as Circuit Breaker Module
participant SH as Stream Handler
participant LLM as LLM Provider
participant FAI as Featherless AI
participant K as Krutrim
participant AAI as Azure AI Inference
participant B as Bedrock
participant SA as Sutra API
Note over U,PG: User initiates API request
U->>+PG: API Request (e.g., POST /v1/chat/completions)
PG->>+HM: executeHooks(eventType='beforeRequestHook', context, ...)
Note over HM: New: `fail_on_error` logic for plugin verdicts
HM->>+PP: handler(context, parameters, eventType)
Note over PP: New plugin: `requiredMetadataKeys` added
Note over PP: `webhook` plugin: improved error data (response body, timeout error)
Note over PP: `gibberish`, `language`, `moderateContent`, `pii` plugins:
PP->>PP: getCurrentContentPart(context, eventType) for robust content extraction
PP->>+PIA: fetchPortkey(endpoint, credentials, data, timeout)
Note over PIA: New: Returns `LogObject` on success/failure for observability
PIA-->>-PP: Response & LogObject
PP-->>-HM: PluginHandlerResponse(verdict, transformedData, log, fail_on_error)
HM-->>-PG: Hook Results
PG->>+RM: tryTargetsRecursively(c, target, request, ...)
Note over RM: New: `id` for inherited config, `cb_config` handling for circuit breaker
RM->>RM: Filter healthy targets for circuit breaker
RM->>+LLM: API Call (transformed request)
Note over LLM: New Providers: Featherless AI, Krutrim added
Note right of LLM: Anthropic: Enhanced tool content, `transformFinishReason`
Note right of LLM: Azure AI Inference: Batch output, file uploads, new endpoints
Note right of LLM: Bedrock: Multi-modal embeddings, improved tool handling, cache usage in response, `transformFinishReason`
Note right of LLM: Cohere: Multi-modal embeddings
LLM-->>-RM: LLM Response (raw)
alt LLM Response is streaming
RM->>+SH: handleStreamingMode(reader, transformer, ...)
Note over SH: New: `try...catch...finally` for stream processing to ensure resource cleanup
SH->>U: Stream Chunks (transformed)
SH-->>-U: Stream End (writer.close() on error/completion)
else LLM Response is complete
RM->>RM: Transform Response (e.g., `transformFinishReason` for Anthropic/Bedrock)
RM-->>-PG: Transformed Response
end
alt Circuit Breaker active
RM->>CB: recordCircuitBreakerFailure(env, id, cbConfig, jsonPath, status)
RM->>CB: handleCircuitBreakerResponse(response, id, cbConfig, jsonPath, c)
end
PG->>+HM: executeHooks(eventType='afterResponseHook', context, ...)
HM-->>-PG: Hook Results
PG-->>-U: Final API Response
Note over U,PG: Example: Sutra Integration (from notebook)
U->>PG: POST /v1/chat/completions (model="sutra-v2")
PG->>SA: POST https://api.two.ai/v2/chat/completions (Authorization: Sutra API Key)
SA-->>PG: Sutra Response
PG-->>U: Transformed Response
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds Bytez as a new chat model provider with a well-structured implementation. I've identified a few improvements that could enhance error handling and performance.
Important PR Review SkippedPR review skipped as per the configuration setting. Run a manually review by commenting /matter review 💡Tips to use Matter AICommand List
|
Important PR Review SkippedPR review skipped as per the configuration setting. Run a manually review by commenting /matter review 💡Tips to use Matter AICommand List
|
Important PR Review SkippedPR review skipped as per the configuration setting. Run a manually review by commenting /matter review 💡Tips to use Matter AICommand List
|
We do not want to bloat the code base with packages that aren't needed. It should be sufficient.
It already does this where necessary, otherwise whatever is passed from the server is passed to the client
Overkill, either fetch is going to fail, or there is an upstream error, either way it will get reported to the client.
Perhaps in the future...
We currently stream character by character, we do not return JSON chunks. In the future we may update this.
YAGNI and overkill, bytez manages its own integration code. |
Important PR Review SkippedPR review skipped as per the configuration setting. Run a manually review by commenting /matter review 💡Tips to use Matter AICommand List
|
Important PR Review SkippedPR review skipped as per the configuration setting. Run a manually review by commenting /matter review 💡Tips to use Matter AICommand List
|
Important PR Review SkippedPR review skipped as per the configuration setting. Run a manually review by commenting /matter review 💡Tips to use Matter AICommand List
|
Important PR Review SkippedPR review skipped as per the configuration setting. Run a manually review by commenting /matter review 💡Tips to use Matter AICommand List
|
@narengogi @VisargD Requested changes have been made, and I dropped the LRU. Pls lmk if you need any further changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to go! adding two minor comments
src/providers/bytez/index.ts
Outdated
finish_reason: 'stop', | ||
}, | ||
], | ||
usage: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make the usage object openai compliant
https://portkey.ai/docs/api-reference/inference-api/chat#response-usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also please move this function into chatComplete.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@narengogi Alright, changes have been made!
We currently do not provide token usage metrics as part of our API (support for it will be added soon, although our billing model is a bit unique in that it's based around request concurrency).
Following other examples where the counts are not returned, I did this, this work for you guys?
If it's a hard requirement lmk, and I'll update our backend ASAP.
usage: {
completion_tokens: -1,
prompt_tokens: -1,
total_tokens: -1,
},
…rm to openai compliant spec.
Important PR Review SkippedPR review skipped as per the configuration setting. Run a manually review by commenting /matter review 💡Tips to use Matter AICommand List
|
Important PR Review SkippedPR review skipped as per the configuration setting. Run a manually review by commenting /matter review 💡Tips to use Matter AICommand List
|
Description
Hi all! 👋
We're Bytez, the largest model provider on the internet! We may also be one of the cheapest if not the cheapest.
We'd love to integrate with PortKey. Please see the changed files and let me know if anything needs to change.
I'd like to point out that we do a check against our api to see if a model is a "chat" model. This is stored in a simple cache that is just an object. If that's going to be a problem due to having an unbounded ceiling in terms of memory utilization pls lmk, and I will convert it to a LRU with 100 entries.
Our API's input signature is a bit more bespoke than other model providers, please lmk if the custom requestHandler I have is sufficient, or if there's an easier way to do what I've done.
Bonus feedback: Ya'll need an integration guide! Would be immensely useful 😄
Motivation
We'd love to be integrated in PortKey!
Type of Change
How Has This Been Tested?
Screenshots (if applicable)
Checklist
Related Issues