Conversation
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
WalkthroughThis pull request introduces comprehensive support for vLLM-MLX, an Apple Silicon-optimized OpenAI-compatible inference server using the MLX framework. It adds a new backend profile configuration, converters, parsers, response models, and constants across the system. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
internal/adapter/converter/factory_test.go (1)
59-60:assert.Contains(t, qpErr.Reason, "vllm")doesn't exclusively verify the standalone"vllm"format.
"vllm"is a substring of"vllm-mlx", so this assertion passes even if the"vllm"converter were removed from the factory (as long as"vllm-mlx"remains in the error string). Consider using a word-boundary or format-separator pattern, or checkingformatMapsimilarly to how line 80–81 does it for the supported-formats test.♻️ Suggested approach
- assert.Contains(t, qpErr.Reason, "vllm") + assert.Contains(t, qpErr.Reason, "vllm,") // comma-separated; or + // assert.True(t, strings.Contains(qpErr.Reason, "vllm") && !strings.HasPrefix(qpErr.Reason[strings.Index(qpErr.Reason, "vllm"):], "vllm-mlx"))Alternatively, update
getSupportedFormatsto use a stable delimiter and assert on the delimited token.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/adapter/converter/factory_test.go` around lines 59 - 60, The assertion on qpErr.Reason is too loose because "vllm" is a substring of "vllm-mlx"; update the test in factory_test.go to assert the standalone presence of the "vllm" converter by either (A) parsing qpErr.Reason into tokens using the same delimiter used by getSupportedFormats and asserting one token equals "vllm", or (B) checking the factory's formatMap (or equivalent map used in the test) directly for the "vllm" key (similar to how the supported-formats assertions on lines ~80–81 check specific entries); reference qpErr.Reason, getSupportedFormats, and formatMap to locate and change the assertion.internal/adapter/converter/vllm_mlx_converter.go (1)
13-15: Misleading comment — these are convenience aliases, not backward-compatibility shims.Nothing pre-existing is being maintained here. Consider rewording to clarify intent, e.g. "Package-level aliases so converter tests and logic can reference the profile types without a qualifier."
✏️ Suggested wording
-// Type aliases for backward compatibility with tests +// Type aliases so converter code can reference profile types without a package qualifier. type VLLMMLXResponse = profile.VLLMMLXResponse type VLLMMLXModel = profile.VLLMMLXModel🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/adapter/converter/vllm_mlx_converter.go` around lines 13 - 15, Update the misleading comment above the type aliases VLLMMLXResponse and VLLMMLXModel: replace the "Type aliases for backward compatibility with tests" wording with a clearer description that these are package-level convenience aliases allowing converter code and tests to reference profile.VLLMMLXResponse and profile.VLLMMLXModel without a qualifier (e.g., "Package-level convenience aliases so converter logic and tests can reference profile types without a qualifier"). Ensure the new comment sits immediately above the two alias declarations VLLMMLXResponse and VLLMMLXModel.internal/adapter/registry/profile/vllm_mlx_parser.go (1)
46-49: Hardcoded"vllm-mlx"string — use the constant for consistency.Line 47 compares against the string literal
"vllm-mlx"while also referencingconstants.ProviderTypeVLLMin the same condition. Sinceconstants.ProviderTypeVLLMMLXexists (value"vllm-mlx"), use it here to avoid silent drift.♻️ Proposed fix
- if model.OwnedBy != "" && model.OwnedBy != "vllm-mlx" && model.OwnedBy != constants.ProviderTypeVLLM { + if model.OwnedBy != "" && model.OwnedBy != constants.ProviderTypeVLLMMLX && model.OwnedBy != constants.ProviderTypeVLLM {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/adapter/registry/profile/vllm_mlx_parser.go` around lines 46 - 49, The condition in the parser is comparing model.OwnedBy against a hardcoded "vllm-mlx" string; replace that literal with the existing constant constants.ProviderTypeVLLMMLX so the check is consistent with the rest of the code; update the if condition that references model.OwnedBy (and constants.ProviderTypeVLLM) to use constants.ProviderTypeVLLMMLX instead of the literal, keeping the rest of the logic intact.internal/adapter/registry/profile/vllm_mlx_parser_test.go (1)
128-158: Useconstants.ProviderTypeVLLMin the test'sdefaultOwnersslice for consistency.The parser (line 47 of
vllm_mlx_parser.go) already usesconstants.ProviderTypeVLLMwhen checking the default owned-by value, so the test should reference the same constant rather than hardcoding"vllm". This ensures the test and parser stay in lockstep if the constant's value ever changes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/adapter/registry/profile/vllm_mlx_parser_test.go` around lines 128 - 158, Update the test's defaultOwners slice to use the shared constant instead of the hardcoded string: replace the literal "vllm" with constants.ProviderTypeVLLM in the t.Run block (the test function "does not set publisher when owned_by is default") so it matches the parser's check in vllm_mlx_parser.go; ensure you import the package that defines constants.ProviderTypeVLLM if not already imported and run the test to verify no regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/profiles/vllm-mlx.yaml`:
- Around line 20-26: Update the misleading comment above the anthropic_support
block to say that vLLM-MLX implements the Anthropic Messages API natively
(provides its own /v1/messages endpoint and token counting via
token_count/messages_path) rather than "inheriting the vLLM API surface"; leave
the configuration keys (anthropic_support, enabled, messages_path, token_count)
unchanged and only revise the explanatory comment to reflect native
implementation in vLLM-MLX.
- Around line 165-174: The quantization.multipliers map is missing an entry for
the "bf16" quantization level listed in
features.model_format.quantization_levels, causing lookups to return
missing/zero values; add a "bf16" key to quantization.multipliers (e.g., bf16:
0.5) so memory/multiplier calculations in any code paths using that map (refer
to quantization.multipliers and features.model_format.quantization_levels)
produce correct estimates.
In `@internal/adapter/registry/profile/vllm_mlx_parser.go`:
- Around line 3-9: The file calls json.Unmarshal but does not import the
encoding/json package; add "encoding/json" to the import block in
internal/adapter/registry/profile/vllm_mlx_parser.go so json.Unmarshal resolves
and the file compiles (update the import list where fmt, time, constants, domain
are imported).
---
Nitpick comments:
In `@internal/adapter/converter/factory_test.go`:
- Around line 59-60: The assertion on qpErr.Reason is too loose because "vllm"
is a substring of "vllm-mlx"; update the test in factory_test.go to assert the
standalone presence of the "vllm" converter by either (A) parsing qpErr.Reason
into tokens using the same delimiter used by getSupportedFormats and asserting
one token equals "vllm", or (B) checking the factory's formatMap (or equivalent
map used in the test) directly for the "vllm" key (similar to how the
supported-formats assertions on lines ~80–81 check specific entries); reference
qpErr.Reason, getSupportedFormats, and formatMap to locate and change the
assertion.
In `@internal/adapter/converter/vllm_mlx_converter.go`:
- Around line 13-15: Update the misleading comment above the type aliases
VLLMMLXResponse and VLLMMLXModel: replace the "Type aliases for backward
compatibility with tests" wording with a clearer description that these are
package-level convenience aliases allowing converter code and tests to reference
profile.VLLMMLXResponse and profile.VLLMMLXModel without a qualifier (e.g.,
"Package-level convenience aliases so converter logic and tests can reference
profile types without a qualifier"). Ensure the new comment sits immediately
above the two alias declarations VLLMMLXResponse and VLLMMLXModel.
In `@internal/adapter/registry/profile/vllm_mlx_parser_test.go`:
- Around line 128-158: Update the test's defaultOwners slice to use the shared
constant instead of the hardcoded string: replace the literal "vllm" with
constants.ProviderTypeVLLM in the t.Run block (the test function "does not set
publisher when owned_by is default") so it matches the parser's check in
vllm_mlx_parser.go; ensure you import the package that defines
constants.ProviderTypeVLLM if not already imported and run the test to verify no
regressions.
In `@internal/adapter/registry/profile/vllm_mlx_parser.go`:
- Around line 46-49: The condition in the parser is comparing model.OwnedBy
against a hardcoded "vllm-mlx" string; replace that literal with the existing
constant constants.ProviderTypeVLLMMLX so the check is consistent with the rest
of the code; update the if condition that references model.OwnedBy (and
constants.ProviderTypeVLLM) to use constants.ProviderTypeVLLMMLX instead of the
literal, keeping the rest of the logic intact.
| # Anthropic Messages API support | ||
| # vLLM-MLX inherits the vLLM API surface, which includes native Anthropic Messages | ||
| # API support. Token counting is available via /v1/messages/count_tokens. | ||
| anthropic_support: | ||
| enabled: true | ||
| messages_path: /v1/messages | ||
| token_count: true |
There was a problem hiding this comment.
Minor comment inaccuracy — vLLM-MLX implements the Anthropic API independently, not by inheriting upstream vLLM.
vLLM-MLX provides its own "native /v1/messages endpoint" rather than re-using upstream vLLM's Anthropic layer. The comment "vLLM-MLX inherits the vLLM API surface" is misleading — the Anthropic support is natively implemented in this fork. The enabled: true configuration itself is correct and justified; only the comment needs updating.
📝 Suggested comment update
- # Anthropic Messages API support
- # vLLM-MLX inherits the vLLM API surface, which includes native Anthropic Messages
- # API support. Token counting is available via /v1/messages/count_tokens.
+ # Anthropic Messages API support
+ # vLLM-MLX provides a native /v1/messages endpoint (independent of upstream vLLM),
+ # enabling Claude Code / OpenCode compatibility. Token counting is available via
+ # /v1/messages/count_tokens.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Anthropic Messages API support | |
| # vLLM-MLX inherits the vLLM API surface, which includes native Anthropic Messages | |
| # API support. Token counting is available via /v1/messages/count_tokens. | |
| anthropic_support: | |
| enabled: true | |
| messages_path: /v1/messages | |
| token_count: true | |
| # Anthropic Messages API support | |
| # vLLM-MLX provides a native /v1/messages endpoint (independent of upstream vLLM), | |
| # enabling Claude Code / OpenCode compatibility. Token counting is available via | |
| # /v1/messages/count_tokens. | |
| anthropic_support: | |
| enabled: true | |
| messages_path: /v1/messages | |
| token_count: true |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/profiles/vllm-mlx.yaml` around lines 20 - 26, Update the misleading
comment above the anthropic_support block to say that vLLM-MLX implements the
Anthropic Messages API natively (provides its own /v1/messages endpoint and
token counting via token_count/messages_path) rather than "inheriting the vLLM
API surface"; leave the configuration keys (anthropic_support, enabled,
messages_path, token_count) unchanged and only revise the explanatory comment to
reflect native implementation in vLLM-MLX.
| # MLX quantisation levels map loosely to these memory multipliers. | ||
| # Naming convention differs from GGUF (e.g. "4bit" rather than "Q4_K_M"), | ||
| # but the effective compression ratios are broadly similar. | ||
| quantization: | ||
| multipliers: | ||
| 2bit: 0.35 | ||
| 3bit: 0.45 | ||
| 4bit: 0.50 # Most common MLX distribution format | ||
| 6bit: 0.75 | ||
| 8bit: 0.875 |
There was a problem hiding this comment.
bf16 is missing from quantization.multipliers whilst listed as a supported quantisation level.
features.model_format.quantization_levels (line 233) includes "bf16", but resources.quantization.multipliers has no bf16 entry. Any code path that applies a multiplier lookup for a bf16-quantised model will silently get a zero/missing value, producing incorrect memory footprint estimates (e.g. zero or a default) for those models.
🛠️ Proposed fix
8bit: 0.875
+ bf16: 1.0 # BFloat16 — uncompressed weights, full memory footprint📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # MLX quantisation levels map loosely to these memory multipliers. | |
| # Naming convention differs from GGUF (e.g. "4bit" rather than "Q4_K_M"), | |
| # but the effective compression ratios are broadly similar. | |
| quantization: | |
| multipliers: | |
| 2bit: 0.35 | |
| 3bit: 0.45 | |
| 4bit: 0.50 # Most common MLX distribution format | |
| 6bit: 0.75 | |
| 8bit: 0.875 | |
| # MLX quantisation levels map loosely to these memory multipliers. | |
| # Naming convention differs from GGUF (e.g. "4bit" rather than "Q4_K_M"), | |
| # but the effective compression ratios are broadly similar. | |
| quantization: | |
| multipliers: | |
| 2bit: 0.35 | |
| 3bit: 0.45 | |
| 4bit: 0.50 # Most common MLX distribution format | |
| 6bit: 0.75 | |
| 8bit: 0.875 | |
| bf16: 1.0 # BFloat16 — uncompressed weights, full memory footprint |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/profiles/vllm-mlx.yaml` around lines 165 - 174, The
quantization.multipliers map is missing an entry for the "bf16" quantization
level listed in features.model_format.quantization_levels, causing lookups to
return missing/zero values; add a "bf16" key to quantization.multipliers (e.g.,
bf16: 0.5) so memory/multiplier calculations in any code paths using that map
(refer to quantization.multipliers and
features.model_format.quantization_levels) produce correct estimates.
| import ( | ||
| "fmt" | ||
| "time" | ||
|
|
||
| "github.com/thushan/olla/internal/core/constants" | ||
| "github.com/thushan/olla/internal/core/domain" | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n internal/adapter/registry/profile/vllm_mlx_parser.goRepository: thushan/olla
Length of output: 2103
🏁 Script executed:
# Check Go version
cat go.mod | head -5Repository: thushan/olla
Length of output: 108
Add missing encoding/json import — file will not compile without it.
Line 21 calls json.Unmarshal to deserialise the response, but encoding/json is not imported. This prevents the file from building.
Proposed fix
import (
+ "encoding/json"
"fmt"
"time"
"github.com/thushan/olla/internal/core/constants"
"github.com/thushan/olla/internal/core/domain"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import ( | |
| "fmt" | |
| "time" | |
| "github.com/thushan/olla/internal/core/constants" | |
| "github.com/thushan/olla/internal/core/domain" | |
| ) | |
| import ( | |
| "encoding/json" | |
| "fmt" | |
| "time" | |
| "github.com/thushan/olla/internal/core/constants" | |
| "github.com/thushan/olla/internal/core/domain" | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/adapter/registry/profile/vllm_mlx_parser.go` around lines 3 - 9, The
file calls json.Unmarshal but does not import the encoding/json package; add
"encoding/json" to the import block in
internal/adapter/registry/profile/vllm_mlx_parser.go so json.Unmarshal resolves
and the file compiles (update the import list where fmt, time, constants, domain
are imported).
This PR enables vllm-mlx natively.
Summary by CodeRabbit
Release Notes
New Features