Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
WalkthroughThis pull request adds comprehensive support for LMDeploy as a new LLM backend integration to Olla. The change includes a configuration profile defining API endpoints and metrics extraction, converter and parser implementations for model response handling, factory and route registration, provider constants, and extensive documentation covering integration, API reference, and usage examples. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (5)
docs/content/api-reference/lmdeploy.md (1)
257-257: Sample response is not valid JSON.
[0.0123, -0.0456, 0.0789, ...]will fail any JSON validator readers paste it into. Either drop the trailing...or annotate it as a non-parseable excerpt (e.g.,// truncated). Same pattern is used elsewhere in the docs (LM Studio embeddings) so it's not a hard requirement, but worth tightening for new docs.📝 Suggested edit
- "embedding": [0.0123, -0.0456, 0.0789, ...] + "embedding": [0.0123, -0.0456, 0.0789]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/content/api-reference/lmdeploy.md` at line 257, The sample JSON under the "embedding" field is invalid because it includes a trailing ellipsis; update the example so it's valid JSON by either removing the "..." (e.g., "embedding": [0.0123, -0.0456, 0.0789]) or marking it as a truncated excerpt using a non-JSON comment/annotation (e.g., add a note like "// truncated" outside the JSON block or replace array with a clearly labeled excerpt), and apply the same fix to other LM Studio embedding examples to ensure all samples validate.internal/adapter/registry/profile/lmdeploy_parser.go (1)
32-46: Optional: replace hard-coded"lmdeploy"literals withconstants.ProviderTypeLMDeploy.The provider name appears twice as a string literal (Lines 32 and 46). Using the existing
constants.ProviderTypeLMDeploywould prevent silent drift if the constant is ever renamed. Note that the siblingvllm_parser.gouses literals as well, so this is a consistency-vs-DRY trade-off — feel free to defer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/adapter/registry/profile/lmdeploy_parser.go` around lines 32 - 46, Replace the hard-coded provider string "lmdeploy" with the shared constant constants.ProviderTypeLMDeploy in the lmdeploy parser: set Type to constants.ProviderTypeLMDeploy where currently Type: "lmdeploy" (in the created profile object) and change the ownership check to compare model.OwnedBy != constants.ProviderTypeLMDeploy instead of the literal; this keeps the symbol names centralized and prevents drift (mirror this change only in lmdeploy_parser.go; vllm_parser.go can remain as-is if you prefer consistency choice).internal/adapter/registry/profile/lmdeploy_parser_test.go (1)
168-194: Optional: this test does not actually exercise the "if it somehow appears" case.The comment on Line 172 promises tolerance "if it somehow appears", but the JSON payload omits
max_model_lenentirely. To actually validate the claim, include the field in the payload (e.g."max_model_len": 32768) and assert that parsing still succeeds andMaxContextLengthremains nil/zero. As-is, this case overlaps with the earlier "details nil when no metadata beyond default owned_by" test.🧪 Suggested payload tweak
response := `{ "object": "list", "data": [ { "id": "qwen/Qwen2-7B-Instruct", "object": "model", "created": 1754535990, - "owned_by": "lmdeploy" + "owned_by": "lmdeploy", + "max_model_len": 32768 } ] }`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/adapter/registry/profile/lmdeploy_parser_test.go` around lines 168 - 194, Update the test "no max_model_len field — LMDeploy differs from vLLM here" to actually exercise the "if it somehow appears" case by adding a "max_model_len" field (e.g. "max_model_len": 32768) to the JSON response string, then call parser.Parse as before and assert parsing still succeeds (require.NoError) and that the parsed model's Details.MaxContextLength remains nil (or zero/unpopulated) — reference the existing parser.Parse call and model.Details.MaxContextLength assertion to locate where to add the new payload and additional assertion.internal/adapter/converter/lmdeploy_converter.go (2)
52-91:OwnedByderivation uses the resolvedmodelID, which can be an upstream alias.When
findLMDeployNativeNamereturns "" andmodel.Aliases[0].Nameis e.g."alias-from-ollama"(no slash),OwnedBybecomes"lmdeploy"— which then gets suppressed by the parser on the way back in (Line 46 oflmdeploy_parser.go). That's coherent. However, if the unifiedmodel.IDcarries a more accurateorg/...prefix than the alias, that information is lost. Consider derivingOwnedByfrommodel.ID(or a publisher field on the unified model) rather than from the chosen wireID.♻️ Possible refactor
- m := &profile.LMDeployModel{ - ID: modelID, - Object: "model", - Created: now, - OwnedBy: c.determineOwner(modelID), - } + owner := c.determineOwner(modelID) + if owner == constants.ProviderTypeLMDeploy { + // Fall back to the unified ID's org segment when the wire ID lacks one. + owner = c.determineOwner(model.ID) + } + m := &profile.LMDeployModel{ + ID: modelID, + Object: "model", + Created: now, + OwnedBy: owner, + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/adapter/converter/lmdeploy_converter.go` around lines 52 - 91, In convertModel (LMDeployConverter.convertModel) the OwnedBy is computed from the resolved wire modelID (after findLMDeployNativeName / alias fallback) which can strip org prefix; instead derive OwnedBy from the unified model's original identifier or publisher field (e.g., use model.ID or model.Publisher if available) and pass that into determineOwner while still using the resolved modelID for the LMDeploy wire ID; update the code so you capture originalID := model.ID (or model.Publisher) before you replace modelID and call c.determineOwner(originalID) for the OwnedBy field, leaving modelID assignment and Permission generation unchanged.
13-15: Remove unused type aliases.The type aliases
LMDeployModelResponseandLMDeployModelDataare not referenced anywhere in the codebase outside their definitions. The comment stating they exist "for backward compatibility with tests" is inaccurate—the test file directly usesprofile.LMDeployResponseandprofile.LMDeployModel. These aliases unnecessarily broaden the public surface and can be safely removed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/adapter/converter/lmdeploy_converter.go` around lines 13 - 15, Remove the two unused backward-compatibility type aliases LMDeployModelResponse and LMDeployModelData from the converter file: delete the alias declarations "type LMDeployModelResponse = profile.LMDeployResponse" and "type LMDeployModelData = profile.LMDeployModel" and any accompanying misleading comment, ensuring no other code references these symbols (tests already use profile.LMDeployResponse/profile.LMDeployModel); run tests and remove any now-unused imports if the file becomes free of profile usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/profiles/lmdeploy.yaml`:
- Around line 167-174: The calculations block uses generation_time_ms in
tokens_per_second but lmdeploy.yaml's metrics.extraction.paths lacks extraction
for generation_time_ms; update the metrics.extraction.paths section to include
generation_time_ms with the same extraction key used in other profiles (e.g.,
generation_time_ms: "$.metrics.generation_time_ms") so the tokens_per_second
calculation in calculations.tokens_per_second can compute correctly.
- Around line 27-35: The config lists an LMDeploy-only probe endpoint
`/is_sleeping` which upstream does not expose; remove `/is_sleeping` from
api.paths and any detection.path_indicators, or verify and replace it with a
real LMDeploy probe if you confirm one exists. Specifically, update the entries
that mention `/is_sleeping` in the profile (look for api.paths and
detection.path_indicators) so detection relies only on verified endpoints like
`/generate` and `/pooling`, avoiding a probe that will 404.
In `@docs/content/api-reference/lmdeploy.md`:
- Around line 123-131: The streaming SSE fenced code block showing the "data:
{...}" chunks should include a language identifier to satisfy markdownlint
MD040; update the triple-backtick fence around the SSE example in lmdeploy.md so
the opening fence is ```text (rather than just ```), keeping the block contents
unchanged; locate the fenced block that contains the sample lines starting with
"data: {\"id\":\"chatcmpl-lmdeploy-abc123\"" and add the language tag to the
opening fence.
In `@readme.md`:
- Line 13: Update the companion badge in docs/content/index.md by replacing the
old image/alt text "LM Deploy-openai-lightblue" and alt="LM Deploy: OpenAI
Compatible" with the new badge used in the README: image name "LM
Deploy-openai-lightgreen" and alt="LM Deploy: Native Support", and ensure the
anchor/img markup and href match the README's badge markup so the docs and
README display the same badge.
- Line 13: Update the shields.io image URL so the badge message shows "native"
instead of "openai": in the README anchor/img element where the src is
"https://img.shields.io/badge/LM Deploy-openai-lightgreen.svg" (and the
corresponding alt currently "LM Deploy: Native Support"), change the "openai"
segment to "native" (i.e., use "LM Deploy-native-lightgreen.svg") so the
rendered badge text matches the alt text and color convention used by other
native badges.
---
Nitpick comments:
In `@docs/content/api-reference/lmdeploy.md`:
- Line 257: The sample JSON under the "embedding" field is invalid because it
includes a trailing ellipsis; update the example so it's valid JSON by either
removing the "..." (e.g., "embedding": [0.0123, -0.0456, 0.0789]) or marking it
as a truncated excerpt using a non-JSON comment/annotation (e.g., add a note
like "// truncated" outside the JSON block or replace array with a clearly
labeled excerpt), and apply the same fix to other LM Studio embedding examples
to ensure all samples validate.
In `@internal/adapter/converter/lmdeploy_converter.go`:
- Around line 52-91: In convertModel (LMDeployConverter.convertModel) the
OwnedBy is computed from the resolved wire modelID (after findLMDeployNativeName
/ alias fallback) which can strip org prefix; instead derive OwnedBy from the
unified model's original identifier or publisher field (e.g., use model.ID or
model.Publisher if available) and pass that into determineOwner while still
using the resolved modelID for the LMDeploy wire ID; update the code so you
capture originalID := model.ID (or model.Publisher) before you replace modelID
and call c.determineOwner(originalID) for the OwnedBy field, leaving modelID
assignment and Permission generation unchanged.
- Around line 13-15: Remove the two unused backward-compatibility type aliases
LMDeployModelResponse and LMDeployModelData from the converter file: delete the
alias declarations "type LMDeployModelResponse = profile.LMDeployResponse" and
"type LMDeployModelData = profile.LMDeployModel" and any accompanying misleading
comment, ensuring no other code references these symbols (tests already use
profile.LMDeployResponse/profile.LMDeployModel); run tests and remove any
now-unused imports if the file becomes free of profile usage.
In `@internal/adapter/registry/profile/lmdeploy_parser_test.go`:
- Around line 168-194: Update the test "no max_model_len field — LMDeploy
differs from vLLM here" to actually exercise the "if it somehow appears" case by
adding a "max_model_len" field (e.g. "max_model_len": 32768) to the JSON
response string, then call parser.Parse as before and assert parsing still
succeeds (require.NoError) and that the parsed model's Details.MaxContextLength
remains nil (or zero/unpopulated) — reference the existing parser.Parse call and
model.Details.MaxContextLength assertion to locate where to add the new payload
and additional assertion.
In `@internal/adapter/registry/profile/lmdeploy_parser.go`:
- Around line 32-46: Replace the hard-coded provider string "lmdeploy" with the
shared constant constants.ProviderTypeLMDeploy in the lmdeploy parser: set Type
to constants.ProviderTypeLMDeploy where currently Type: "lmdeploy" (in the
created profile object) and change the ownership check to compare model.OwnedBy
!= constants.ProviderTypeLMDeploy instead of the literal; this keeps the symbol
names centralized and prevents drift (mirror this change only in
lmdeploy_parser.go; vllm_parser.go can remain as-is if you prefer consistency
choice).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 28bdd97d-e339-47cd-b972-1c1ba3e1a41e
📒 Files selected for processing (23)
config/profiles/lmdeploy.yamldocs/content/api-reference/lmdeploy.mddocs/content/api-reference/overview.mddocs/content/concepts/overview.mddocs/content/faq.mddocs/content/index.mddocs/content/integrations/backend/lmdeploy.mddocs/content/integrations/overview.mddocs/mkdocs.ymlinternal/adapter/converter/factory.gointernal/adapter/converter/factory_test.gointernal/adapter/converter/lmdeploy_converter.gointernal/adapter/converter/lmdeploy_converter_test.gointernal/adapter/registry/profile/lmdeploy.gointernal/adapter/registry/profile/lmdeploy_parser.gointernal/adapter/registry/profile/lmdeploy_parser_test.gointernal/adapter/registry/profile/parsers.gointernal/app/handlers/handler_common.gointernal/app/handlers/server_routes.gointernal/core/constants/providers.gointernal/core/domain/profile.gointernal/version/version.goreadme.md
| # LMDeploy-specific endpoints | ||
| - /v1/encode # 4: token encoding (LMDeploy-specific) | ||
| - /generate # 5: native generation endpoint | ||
| - /pooling # 6: reward/score pooling (the real embeddings path) | ||
| # NOTE: /v1/embeddings is intentionally excluded — LMDeploy returns HTTP 400 | ||
| # unconditionally for all backends on that path. | ||
|
|
||
| # Sleep state probe (LMDeploy-specific, used for auto-detection) | ||
| - /is_sleeping # 7: sleep state probe |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What are the exact URL paths exposed by lmdeploy serve api_server for /generate, /pooling, and /is_sleeping endpoints? Are they prefixed with /v1?
💡 Result:
The lmdeploy serve api_server exposes the following endpoints for the specified paths, based on source code and documentation: - /generate: This is an LMDeploy-specific endpoint for raw generation requests (e.g., POST /generate with prompt, input_ids, etc.). It is NOT prefixed with /v1. - /pooling: This endpoint is for reward model pooling (e.g., POST /pooling with model and input). It is NOT prefixed with /v1. Available when using PyTorch backend with reward models like internlm2-1_8b-reward. - /is_sleeping: No evidence of this exact endpoint in api_server. There is a "sleep engine" feature in PyTorch backend for resource management (e.g., PR fixing sleep for dp=1), but no exposed HTTP endpoint named /is_sleeping. Likely internal or not publicly exposed. Not prefixed with /v1 if it exists. Primary endpoints are OpenAI-compatible under /v1 (e.g., /v1/chat/completions, /v1/models, /v1/completions), plus custom ones like /generate and /pooling at root level. Full list available via /openapi.json after starting the server.
Citations:
- 1: https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/serve/openai/api_server.py
- 2: https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/serve/openai/api_client.py
- 3: https://lmdeploy.readthedocs.io/en/latest/supported_models/reward_models.html
- 4: add reward model api InternLM/lmdeploy#3665
Remove or verify the /is_sleeping endpoint with LMDeploy upstream.
The web search against LMDeploy's source code confirms that /generate and /pooling are correctly listed without the /v1 prefix—they are custom root-level endpoints, not OpenAI-compatible ones. However, there is no evidence that /is_sleeping is an actual exposed HTTP endpoint in lmdeploy serve api_server. LMDeploy does have an internal sleep engine feature, but it does not appear to expose an HTTP endpoint for it. Including /is_sleeping in the config's api.paths and detection.path_indicators may cause detection to fail or requests to 404.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/profiles/lmdeploy.yaml` around lines 27 - 35, The config lists an
LMDeploy-only probe endpoint `/is_sleeping` which upstream does not expose;
remove `/is_sleeping` from api.paths and any detection.path_indicators, or
verify and replace it with a real LMDeploy probe if you confirm one exists.
Specifically, update the entries that mention `/is_sleeping` in the profile
(look for api.paths and detection.path_indicators) so detection relies only on
verified endpoints like `/generate` and `/pooling`, avoiding a probe that will
404.
| model: "$.model" | ||
| finish_reason: "$.choices[0].finish_reason" | ||
| input_tokens: "$.usage.prompt_tokens" | ||
| output_tokens: "$.usage.completion_tokens" | ||
| total_tokens: "$.usage.total_tokens" | ||
| calculations: | ||
| is_complete: 'len(finish_reason) > 0' | ||
| tokens_per_second: "generation_time_ms > 0 ? (output_tokens * 1000.0) / generation_time_ms : 0" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether generation_time_ms is supplied by Olla's metrics layer
# (rather than being parsed from the response body).
rg -nP --type=go -C3 '\bgeneration_time_ms\b'
rg -nP --type=yaml -C2 '\bgeneration_time_ms\b' config/profilesRepository: thushan/olla
Length of output: 2560
🏁 Script executed:
cat -n config/profiles/lmdeploy.yaml | sed -n '160,180p'Repository: thushan/olla
Length of output: 673
generation_time_ms is missing from paths but used in calculations.
The tokens_per_second calculation references generation_time_ms, which is not extracted in the metrics.extraction.paths section of this profile. Other profiles (vllm, vllm-mlx, sglang) define generation_time_ms: "$.metrics.generation_time_ms" in their paths sections. Add the missing extraction to lmdeploy.yaml, otherwise the calculation will always evaluate to 0.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/profiles/lmdeploy.yaml` around lines 167 - 174, The calculations block
uses generation_time_ms in tokens_per_second but lmdeploy.yaml's
metrics.extraction.paths lacks extraction for generation_time_ms; update the
metrics.extraction.paths section to include generation_time_ms with the same
extraction key used in other profiles (e.g., generation_time_ms:
"$.metrics.generation_time_ms") so the tokens_per_second calculation in
calculations.tokens_per_second can compute correctly.
Adds native support for LMDeploy finally.
Summary by CodeRabbit
Release Notes
New Features
Documentation