Refactor api-platform integration #348
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds end-to-end LLM provider and deployment support, internal HTTPS/WebSocket server and routes, new models/repositories/services/controllers for APIs/gateways/LLM, many DB migrations, removes the API Platform client, and updates configuration and startup to seed LLM templates and run an internal server. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant InternalServer
participant GatewayInternalCtrl as GatewayInternalController
participant GatewayInternalSvc as GatewayInternalAPIService
participant DeploymentRepo as DeploymentRepository
participant ArtifactRepo as ArtifactRepository
Client->>InternalServer: GET /api/internal/v1/apis (API key)
InternalServer->>GatewayInternalCtrl: dispatch GetAPIsByOrganization
GatewayInternalCtrl->>GatewayInternalSvc: GetAPIsByOrganization(orgID)
GatewayInternalSvc->>ArtifactRepo: query APIs for org
ArtifactRepo-->>GatewayInternalSvc: list of APIs
GatewayInternalSvc->>DeploymentRepo: generate per-API deployment YAMLs
DeploymentRepo-->>GatewayInternalSvc: YAML content map
GatewayInternalSvc-->>GatewayInternalCtrl: map[string]string (api->yaml)
GatewayInternalCtrl-->>InternalServer: zip and send application/zip response
InternalServer-->>Client: 200 application/zip (Content-Disposition)
sequenceDiagram
participant GatewayClient
participant InternalServer
participant WSController as WebSocketController
participant Manager as WS.Manager
participant GatewaySvc as PlatformGatewayService
GatewayClient->>InternalServer: GET /api/internal/v1/ws/gateways/connect (API key)
InternalServer->>WSController: dispatch Connect
WSController->>GatewaySvc: VerifyToken(apiKey)
GatewaySvc-->>WSController: Gateway info (UUID, status)
WSController->>Manager: Register connection (gatewayID, conn)
Manager-->>WSController: ack registered
WSController-->>GatewayClient: send ConnectionAckDTO
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
agent-manager-service/services/organization_sync.go (1)
70-88:⚠️ Potential issue | 🟡 Minor
totalSkippedis always zero — dead variable.
totalSkippedis initialized on line 71 but never incremented anywhere, yet it's included in the summary log on line 86. Either remove it or add logic to actually track skipped organizations.Proposed fix (remove if unused)
totalSynced := 0 - totalSkipped := 0 totalErrors := 0s.logger.Info("Organization sync completed", "totalSynced", totalSynced, - "totalSkipped", totalSkipped, "totalErrors", totalErrors, )agent-manager-service/controllers/gateway_controller.go (1)
352-388:⚠️ Potential issue | 🟠 Major
RemoveGatewayFromEnvironmentskips organization validation, creating a cross-organization authorization gap.Unlike all other mutating handlers in this controller, this method does not resolve or validate the organization. A caller with valid
gatewayIDandenvIDUUIDs can delete gateway-environment mappings across organization boundaries. The middleware only validates parameter presence; handlers must explicitly validate org context viaresolveOrgUUID. Compare withAssignGatewayToEnvironment(lines 321–350), which validates the org before performing the operation.
🤖 Fix all issues with AI agents
In `@agent-manager-service/controllers/gateway_controller.go`:
- Around line 83-90: resolveOrgUUID currently dereferences org.UUID without
checking if org is nil, causing a panic when GetOrganizationByHandle returns
(nil, nil); add the same nil-check used in llm_controller.go after calling
c.orgRepo.GetOrganizationByHandle: if org == nil { return "",
utils.ErrOrganizationNotFound }, then return org.UUID.String(), nil. This
ensures resolveOrgUUID handles a missing organization gracefully and uses
utils.ErrOrganizationNotFound as the error value.
In `@agent-manager-service/controllers/llm_controller.go`:
- Around line 513-516: Update the logging in UpdateLLMProvider to avoid nil
deref when req.Configuration is nil: compute safe strings for configuration
name/version (e.g., cfgName, cfgVersion) by checking if req.Configuration != nil
and using
ptrToStringLog(req.Configuration.Name)/ptrToStringLog(req.Configuration.Version)
only when non-nil, then pass those safe variables into log.Info; alternatively,
wrap the log.Info call in a conditional that uses an empty/default value when
req.Configuration is nil so UpdateLLMProvider never dereferences
req.Configuration.
- Around line 857-864: The code passes orgID into ConvertSpecToModelLLMProxy
causing proxy.ProjectUUID to be set to the organization UUID; resolve the
correct project UUID (as done earlier in CreateLLMProxy) and pass that projectID
into ConvertSpecToModelLLMProxy instead of orgID, ensuring proxy.ProjectUUID is
the resolved project UUID (or explicitly set proxy.ProjectUUID = projectID after
conversion) and remove/confine any leftover orgID usage to avoid confusion;
update references around proxyReq, proxy, ConvertSpecToModelLLMProxy, orgID,
projectID and CreateLLMProxy accordingly.
In `@agent-manager-service/controllers/llm_deployment_controller.go`:
- Around line 58-65: The resolveOrgUUID function currently dereferences org
without checking for nil and maps all errors to utils.ErrOrganizationNotFound;
change it so that after calling c.orgRepo.GetOrganizationByHandle(orgName) you
first check if err != nil and return that err (or wrap it) so infrastructure/DB
errors are propagated, then check if org == nil and return
utils.ErrOrganizationNotFound for the not-found case; update callers of
resolveOrgUUID to treat utils.ErrOrganizationNotFound as a 401/NotFound response
and treat any other returned error from resolveOrgUUID (propagated
DB/infrastructure errors) as a 500/Internal error.
In `@agent-manager-service/db_migrations/005_create_artifacts_and_apis.go`:
- Around line 29-40: The ap_projects table DDL in
005_create_artifacts_and_apis.go is missing the nullable deleted_at TIMESTAMP
column required by the Project model's gorm.DeletedAt; update the CREATE TABLE
for ap_projects to add a nullable deleted_at TIMESTAMP column so GORM
soft-delete queries (which append WHERE deleted_at IS NULL) will work with the
ap_projects table referenced by the Project model in models/project.go.
In `@agent-manager-service/models/project.go`:
- Around line 27-35: The Project model declares DeletedAt gorm.DeletedAt which
requires a deleted_at column, but the ap_projects migration (migration 005)
omits it; either add a nullable deleted_at TIMESTAMP (or equivalent) column to
the ap_projects DDL in that migration so GORM soft-delete filtering works, or
remove the DeletedAt field from the Project struct (and any references to
soft-delete behavior) if you don't intend to use soft deletes; update the
schema/migration and corresponding model (Project.DeletedAt) consistently so
queries no longer fail.
- Around line 27-40: Remove the unused APProject model file entirely and resolve
the schema mismatch for Project: either delete the DeletedAt field from the
Project struct (remove gorm.DeletedAt DeletedAt and its json/db tags) if soft
deletes are not required, or update the DB migration (migration 005) to add a
deleted_at timestamp column to ap_projects and keep the DeletedAt field; ensure
Project.TableName() remains "ap_projects" and run migrations/tests to verify
GORM behavior.
In `@agent-manager-service/repositories/llm_provider_repository.go`:
- Around line 155-165: The query in LLMProviderRepo.Update (and similarly in
Delete) uses tx.Table("artifacts").Select("uuid").Where(...).Scan(&providerUUID)
which never returns gorm.ErrRecordNotFound and leaves providerUUID==uuid.Nil on
no match; replace Scan with a method that reports no rows (e.g.
.Take(&providerUUID) or .First(&providerUUID)) or explicitly check
result.RowsAffected and return gorm.ErrRecordNotFound when zero, so the code
correctly logs and aborts before using providerUUID in subsequent Where("uuid =
?", providerUUID).
In `@agent-manager-service/repositories/llm_provider_template_repository.go`:
- Around line 92-135: GetByID is querying the wrong column: change the WHERE in
LLMProviderTemplateRepo.GetByID to use the handle column (e.g., "handle = ? AND
organization_uuid = ?") so templateID maps to template.Handle like Delete/Exists
expect; additionally, update LLMProviderTemplateRepo.GetByUUID to mirror the
configuration unmarshaling in GetByID: read template.Configuration,
json.Unmarshal into llmProviderTemplateConfig, and populate template.Metadata,
PromptTokens, CompletionTokens, TotalTokens, RemainingTokens, RequestModel and
ResponseModel before returning.
In `@agent-manager-service/repositories/llm_proxy_repository.go`:
- Around line 219-241: In LLMProxyRepo.Delete fix the invalid SQL alias in the
artifact lookup: update the query in Delete (the tx.Table("artifacts")
Select/Where that populates proxyUUID) to reference the column as "kind" (not
"a.kind") and otherwise match the same predicate used in the Update flow; keep
the Select("uuid") and the Where parameters (proxyID, orgUUID,
models.KindLLMAPI) intact so the tx.Scan(&proxyUUID) works correctly.
- Around line 176-186: The current queries use Scan into proxyUUID which
silently yields uuid.Nil on no match; change both the Update and Delete flows to
detect missing records by either using First/Take into a struct (e.g., an
Artifact struct with UUID field) and checking for gorm.ErrRecordNotFound, or by
using Scan/Row and checking result.RowsAffected == 0 after the query;
specifically replace the
tx.Table("artifacts").Select("uuid").Where(...).Scan(&proxyUUID) pattern with a
call like tx.Table("artifacts").Select("uuid").Where(...).Take(&artifact).Error
(or use result := tx...; if result.RowsAffected == 0 return
gorm.ErrRecordNotFound) so you don't proceed using uuid.Nil in Update/Delete
methods (refer to the proxyUUID variable and those artifact lookup queries in
the Update and Delete methods).
In `@agent-manager-service/services/gateway_internal_service.go`:
- Around line 282-290: The code constructs a models.Deployment using
uuid.MustParse on apiUUID, gatewayID, and orgID which will panic on malformed
input; replace those calls with uuid.Parse, check each parse error, and return
or propagate a proper error (instead of panicking) before building the
Deployment (referencing apiUUID, gatewayID, orgID, and the deployment variable);
ensure any handler/method that calls this (where deployment is created) returns
a clear error response or logs the parsing failure so invalid request parameters
don’t crash the service.
In `@agent-manager-service/services/platform_gateway_service.go`:
- Around line 128-241: RegisterGateway currently generates plainToken and stores
only its hash (GatewayToken) but never returns the plain token to the caller;
modify the response to include the plain token by adding a Token field to the
GatewayResponse (or create a new RegisterGatewayResponse) and populate it with
plainToken before returning; update the RegisterGateway signature return type or
mapping so the returned struct includes Token, ensure you still store only
TokenHash/Salt in models.GatewayToken and do not persist the plainToken anywhere
else, and update any callers/tests that expect the previous GatewayResponse
shape accordingly.
🟠 Major comments (26)
agent-manager-service/db_migrations/003_create_organizations.go-24-25 (1)
24-25:⚠️ Potential issue | 🟠 MajorMigration ID renumbering requires deployment verification.
The migration framework uses gormigrate which tracks applied migrations by ID in the
migration_historytable. If this migration was previously deployed under ID4to any environment, renumbering it to ID3creates a mismatch: the framework will find0004recorded as applied but will attempt to execute0003(not found in history), causing theCREATE TABLE organizationsstatement to fail on an already-existing table.Confirm whether this migration was previously released with ID
4. If yes, document the corrective migration path for existing deployments (e.g., a script to update the migration history table before upgrading).agent-manager-service/models/ap_project.go-26-38 (1)
26-38:⚠️ Potential issue | 🟠 MajorConsolidate or clarify the dual model mapping to
ap_projectstable.Two distinct models map to the same
ap_projectstable with inconsistent implementations:
- APProject uses
uuid.UUIDfor the ID field, lacksDeletedAtsupport, and has noBeforeCreatehook- Project uses
stringfor the ID field, includesDeletedAt(soft delete support), and auto-generates UUIDs on creationThis creates confusion and potential data consistency issues. If writes use
APProject, soft deletes won't work; if they useProject, the UUID type conversion betweenuuid.UUIDandstringmay introduce bugs. Either consolidate to a single model or explicitly document which model is the primary interface and deprecate the other.agent-manager-service/resources/default-llm-provider-templates/awsbedrock-template.yaml-37-42 (1)
37-42:⚠️ Potential issue | 🟠 MajorUpdate regex to include forward slashes for ARN-formatted model identifiers.
The regex
model/([A-Za-z0-9.:-]+)/cannot capture AWS Bedrock model identifiers in ARN format. According to AWS documentation, validmodelIdvalues in the InvokeModel API include:
- Base model ARNs:
arn:aws:bedrock:{region}::foundation-model/{model-name}- Inference profile ARNs:
arn:aws:bedrock:{region}:{account}:inference-profile/{profile-id}Both ARN formats contain forward slashes (
/) that the current character class[A-Za-z0-9.:-]does not match. Update the regex tomodel/([A-Za-z0-9.:\/-]+)/to support all valid Bedrock model identifier patterns, including ARNs.agent-manager-service/db_migrations/004_create_gateways.go-42-46 (1)
42-46:⚠️ Potential issue | 🟠 MajorSoft-delete
deleted_atconflicts with the unique constraint.The unique constraint
uq_gateway_org_nameon(organization_uuid, name)doesn't account for soft-deleted rows. If a gateway is soft-deleted (i.e.,deleted_at IS NOT NULL) and a new gateway with the same org+name is created, the insert will violate the unique constraint. A common fix is a partial unique index:Proposed fix
- CONSTRAINT uq_gateway_org_name UNIQUE(organization_uuid, name),And after the table creation, add:
CREATE UNIQUE INDEX uq_gateway_org_name ON gateways(organization_uuid, name) WHERE deleted_at IS NULL;agent-manager-service/main.go-145-173 (1)
145-173:⚠️ Potential issue | 🟠 MajorWaitGroup usage is incorrect — both
Done()calls are in a single goroutine.
wg.Add(2)suggests two independent shutdown tasks, but bothwg.Done()calls happen sequentially in the same goroutine. This means the shutdown is serial (mainServer must finish before internalServer starts shutting down), and both share the same 60s context — the internal server may get little to no remaining budget.Additionally, if
mainServer.Shutdownpanics or the goroutine exits early,wg.Wait()will deadlock.Consider either:
- Using
wg.Add(1)since it's a single goroutine, or- Shutting down the servers in parallel goroutines if independent shutdown is intended.
Option 2: Parallel shutdown
go func() { <-stopCh slog.Info("Shutting down servers gracefully") shutdownCtx, cancel := context.WithTimeout(context.Background(), 60*time.Second) defer cancel() // Shutdown WebSocket manager first if dependencies.WebSocketManager != nil { slog.Info("Shutting down WebSocket manager") dependencies.WebSocketManager.Shutdown() } - // Shutdown main server - if err := mainServer.Shutdown(shutdownCtx); err != nil { - slog.Error("Main server forced shutdown after timeout", "error", err) - } - wg.Done() - - // Shutdown internal server - if err := internalServer.Shutdown(shutdownCtx); err != nil { - slog.Error("Internal server forced shutdown after timeout", "error", err) - } - wg.Done() + // Shutdown both servers in parallel + go func() { + defer wg.Done() + if err := mainServer.Shutdown(shutdownCtx); err != nil { + slog.Error("Main server forced shutdown after timeout", "error", err) + } + }() + go func() { + defer wg.Done() + if err := internalServer.Shutdown(shutdownCtx); err != nil { + slog.Error("Internal server forced shutdown after timeout", "error", err) + } + }() }()agent-manager-service/services/llm_template_seeder.go-87-87 (1)
87-87:⚠️ Potential issue | 🟠 MajorDebug
fmt.Printlnstatements left in production code.Lines 87 and 133 contain
fmt.Printlncalls that appear to be leftover debugging statements. These should be removed or replaced with structured logging (slog.Debug/slog.Info).🐛 Proposed fix — remove debug prints
for _, tpl := range s.templates { - fmt.Println(tpl) if tpl == nil || tpl.Handle == "" {- fmt.Println("to create", toCreate) if err := s.templateRepo.Create(toCreate); err != nil {Also applies to: 133-133
agent-manager-service/resources/default-llm-provider-templates/anthropic-template.yaml-30-31 (1)
30-31:⚠️ Potential issue | 🟠 MajorExternal URLs reference unofficial/personal GitHub repository across all LLM provider templates.
The
logoUrlandopenapiSpecUrlin the Anthropic template (and also in OpenAI, Mistral, Azure OpenAI, Azure AI Foundry, and AWS Bedrock templates) all point tohttps://raw.githubusercontent.com/nomadxd/openapi-connectors/.... This appears to be a personal or unofficial fork rather than an official source. If this repository is deleted, renamed, made private, or its owner loses access, all six provider templates will break at runtime. Consider migrating these assets to an official repository or bundling them as resources within the application.agent-manager-service/controllers/gateway_internal_controller.go-56-77 (1)
56-77: 🛠️ Refactor suggestion | 🟠 MajorDuplicated API-key authentication boilerplate across all 4 handlers — extract into middleware.
Every handler repeats the same ~15-line block: extract
api-keyheader, check empty, callVerifyToken, handle error. This is a textbook case for HTTP middleware that authenticates and injects the gateway into the request context.Sketch of middleware approach
func (c *gatewayInternalController) requireAPIKey(next http.HandlerFunc) http.HandlerFunc { return func(w http.ResponseWriter, r *http.Request) { apiKey := r.Header.Get("api-key") if apiKey == "" { http.Error(w, "API key is required", http.StatusUnauthorized) return } gateway, err := c.gatewayService.VerifyToken(apiKey) if err != nil { http.Error(w, "Invalid or expired API key", http.StatusUnauthorized) return } ctx := context.WithValue(r.Context(), gatewayContextKey, gateway) next(w, r.WithContext(ctx)) } }Then wrap handlers during route registration instead of duplicating in each handler.
Also applies to: 110-131, 175-196, 267-288
agent-manager-service/repositories/llm_provider_template_repository.go-87-87 (1)
87-87:⚠️ Potential issue | 🟠 MajorRemove or downgrade the configuration log statement — potential sensitive data leak.
slog.Info(t.Configuration)logs the entire raw JSON configuration blob at Info level. This may contain sensitive fields such as authentication details (authinLLMProviderTemplateMetadata) or endpoint URLs. At minimum, downgrade toslog.Debugor remove it entirely.Proposed fix
- slog.Info(t.Configuration) + slog.Debug("LLMProviderTemplateRepo.Create: configuration stored", "handle", t.Handle)agent-manager-service/repositories/llm_provider_template_repository.go-138-146 (1)
138-146:⚠️ Potential issue | 🟠 Major
Listdoes not unmarshalConfigurationJSON for returned templates.Callers of
Listwill receive templates with empty parsed fields (Metadata,PromptTokens, etc.) because theConfigurationJSON string is not deserialized. If these fields are needed by API consumers, you should iterate and unmarshal, or extract the unmarshaling logic into a shared helper.Proposed fix — extract helper and apply to List
Extract a helper method:
func populateConfigFields(t *models.LLMProviderTemplate) error { if t.Configuration == "" { return nil } var cfg llmProviderTemplateConfig if err := json.Unmarshal([]byte(t.Configuration), &cfg); err != nil { return err } t.Metadata = cfg.Metadata t.PromptTokens = cfg.PromptTokens t.CompletionTokens = cfg.CompletionTokens t.TotalTokens = cfg.TotalTokens t.RemainingTokens = cfg.RemainingTokens t.RequestModel = cfg.RequestModel t.ResponseModel = cfg.ResponseModel return nil }Then use it in
List:func (r *LLMProviderTemplateRepo) List(orgUUID string, limit, offset int) ([]*models.LLMProviderTemplate, error) { var templates []*models.LLMProviderTemplate err := r.db.Where("organization_uuid = ?", orgUUID). Order("created_at DESC"). Limit(limit). Offset(offset). Find(&templates).Error + if err != nil { + return nil, err + } + for _, t := range templates { + if err := populateConfigFields(t); err != nil { + return nil, err + } + } return templates, err }agent-manager-service/repositories/llm_provider_template_repository.go-158-176 (1)
158-176:⚠️ Potential issue | 🟠 MajorAdd configuration marshaling to
Updatemethod to matchCreatebehavior.The
Updatemethod does not marshal the parsed config fields (Metadata,PromptTokens, etc.) intoConfigurationbefore persisting, whileCreatedoes (lines 72-85). This causes parsed field updates in the seeder (e.g., line 110 ofllm_template_seeder.go) to be silently lost becausetemplateRepo.Updatebypasses serialization.The service layer masks this issue by calling
serializeConfigurationbefore the repo update, but direct repository calls like in the seeder are vulnerable. Adding marshaling toUpdatemakes the repository self-contained and prevents the bug.Proposed fix
func (r *LLMProviderTemplateRepo) Update(t *models.LLMProviderTemplate) error { t.UpdatedAt = time.Now() + + // Marshal parsed config fields into Configuration JSON + configJSON, err := json.Marshal(&llmProviderTemplateConfig{ + Metadata: t.Metadata, + PromptTokens: t.PromptTokens, + CompletionTokens: t.CompletionTokens, + TotalTokens: t.TotalTokens, + RemainingTokens: t.RemainingTokens, + RequestModel: t.RequestModel, + ResponseModel: t.ResponseModel, + }) + if err != nil { + return err + } + t.Configuration = string(configJSON) + result := r.db.Model(&models.LLMProviderTemplate{}).agent-manager-service/services/gateway_internal_service.go-233-246 (1)
233-246:⚠️ Potential issue | 🟠 MajorHardcoded
CreatedBy: "admin"— should use the actual authenticated identity.Line 241 hardcodes
"admin"as the creator. This loses audit trail integrity. The authenticated user or gateway identity should be propagated from the request context.agent-manager-service/services/gateway_events_service.go-71-355 (1)
71-355: 🛠️ Refactor suggestion | 🟠 MajorAll four broadcast methods are nearly identical — extract a common helper.
BroadcastDeploymentEvent,BroadcastUndeploymentEvent,BroadcastLLMProviderDeploymentEvent, andBroadcastLLMProviderUndeploymentEventeach repeat the same ~70-line pattern: serialize → size-check → wrap in DTO → serialize DTO → get connections → broadcast → tally results. The only differences are the event type string and the payload type. A single generic broadcast method would eliminate ~200 lines of duplication.Sketch of a unified helper
func (s *GatewayEventsService) broadcastEvent(gatewayID, eventType string, payload interface{}) error { correlationID := uuid.New().String() payloadJSON, err := json.Marshal(payload) if err != nil { return fmt.Errorf("failed to serialize %s event: %w", eventType, err) } if len(payloadJSON) > MaxEventPayloadSize { return fmt.Errorf("event payload exceeds maximum size: %d bytes (limit: %d)", len(payloadJSON), MaxEventPayloadSize) } eventDTO := GatewayEventDTO{ Type: eventType, Payload: payload, Timestamp: time.Now().Format(time.RFC3339), CorrelationID: correlationID, } eventJSON, err := json.Marshal(eventDTO) if err != nil { return fmt.Errorf("failed to marshal event: %w", err) } if s.manager == nil { return nil } connections := s.manager.GetConnections(gatewayID) if len(connections) == 0 { return fmt.Errorf("no active connections for gateway: %s", gatewayID) } var successCount, failureCount int var lastError error for _, conn := range connections { if err := conn.Send(eventJSON); err != nil { failureCount++ lastError = err conn.DeliveryStats.IncrementFailed(fmt.Sprintf("send error: %v", err)) } else { successCount++ conn.DeliveryStats.IncrementTotalSent() } } if successCount == 0 { return fmt.Errorf("failed to deliver %s event to any connection: %w", eventType, lastError) } return nil } // Then each method becomes a one-liner: func (s *GatewayEventsService) BroadcastDeploymentEvent(gatewayID string, deployment *DeploymentEvent) error { return s.broadcastEvent(gatewayID, "api.deployed", deployment) }agent-manager-service/controllers/websocket_controller.go-215-264 (1)
215-264:⚠️ Potential issue | 🟠 Major
rateLimitMapgrows unboundedly — memory leak for long-running servers.Old entries are pruned only when the same IP makes a new connection attempt (lines 233-238). IPs that connect once and never return will have their entries retained forever. Over time, this map will grow without bound.
Consider adding a periodic cleanup goroutine, or switching to a time-bucketed approach (e.g.,
golang.org/x/time/rateor a TTL cache).agent-manager-service/controllers/websocket_controller.go-194-199 (1)
194-199: 🛠️ Refactor suggestion | 🟠 Major
readLoopbreaks theTransportabstraction with a concrete type assertion.The
Transportinterface (fromwebsocket/transport.go) abstracts the underlying connection. Asserting to*ws.WebSocketTransport(line 195) couples the controller to the concrete implementation, defeating the purpose of the interface. Consider adding aReadMessagemethod (or equivalent) to theTransportinterface if read functionality is needed here.agent-manager-service/services/gateway_internal_service.go-262-270 (1)
262-270:⚠️ Potential issue | 🟠 MajorBlocking redeployment when status is
UNDEPLOYEDseems incorrect.Line 268 rejects deployment if status is either
DEPLOYEDorUNDEPLOYED. Preventing redeployment when the API was previously undeployed from a gateway likely isn't intended — the user should be able to redeploy after an undeploy. Consider only blocking when status isDEPLOYED.Suggested fix
- if existingDeploymentID != "" && (status == models.DeploymentStatusDeployed || status == models.DeploymentStatusUndeployed) { + if existingDeploymentID != "" && status == models.DeploymentStatusDeployed { return nil, fmt.Errorf("API already deployed to this gateway") }agent-manager-service/services/gateway_internal_service.go-305-329 (1)
305-329:⚠️ Potential issue | 🟠 Major
generateAPIDeploymentYAMLignores the API's actual context and operations.Line 320 hardcodes
Context: "/api/v1"instead of usingapi.Configuration.Context. TheOperationsfield in the YAML spec is also left empty, discarding whatever operations the API defines. This generates an incomplete/inaccurate deployment YAML.Suggested improvement
deployment := APIDeploymentYAML{ ApiVersion: "gateway.api-platform.wso2.com/v1alpha1", Kind: api.Kind, Metadata: DeploymentMetadata{ Name: api.Handle, }, Spec: APIDeploymentSpec{ Name: api.Name, Version: api.Version, - Context: "/api/v1", // Default context + Context: getContextOrDefault(api), }, }Where
getContextOrDefaultreturns the API's configured context or a sensible default if unset.agent-manager-service/controllers/websocket_controller.go-155-171 (1)
155-171:⚠️ Potential issue | 🟠 MajorSetting gateway active status to
falseon any connection close may be incorrect with multiple connections.The
Managersupports multiple connections per gateway (seeconnections sync.Mapcomment: "Supports multiple connections per gateway ID for clustering scenarios"). When one connection closes (line 166-169), the gateway is marked inactive even if other connections remain active. Check remaining connections before toggling status.Suggested approach
// Connection closed - cleanup log.Info("WebSocket connection closed", "gatewayID", gateway.UUID.String(), "connectionID", connection.ConnectionID) c.manager.Unregister(gateway.UUID.String(), connection.ConnectionID) // Update gateway active status to false when connection is disconnected - if err := c.gatewayService.UpdateGatewayActiveStatus(gateway.UUID.String(), false); err != nil { - log.Error("Failed to update gateway active status to false", "gatewayID", gateway.UUID.String(), "error", err) + // Only mark inactive if no other connections remain for this gateway + remaining := c.manager.GetConnections(gateway.UUID.String()) + if len(remaining) == 0 { + if err := c.gatewayService.UpdateGatewayActiveStatus(gateway.UUID.String(), false); err != nil { + log.Error("Failed to update gateway active status to false", "gatewayID", gateway.UUID.String(), "error", err) + } }agent-manager-service/controllers/websocket_controller.go-48-54 (1)
48-54:⚠️ Potential issue | 🟠 MajorRemove duplicate
ConnectionAckDTOdefinition fromwebsocket_controller.go.Two
ConnectionAckDTOstructs are defined:
agent-manager-service/models/gateway_internal_dto.go(lines 78-82): fieldsType,Message,GatewayIDagent-manager-service/controllers/websocket_controller.go(lines 49-53): fieldsType,GatewayID,ConnectionID,TimestampHaving duplicate struct definitions with the same name but different fields creates confusion and maintenance burden. Consolidate into a single DTO in the models package.
agent-manager-service/repositories/deployment_repository.go-167-190 (1)
167-190:⚠️ Potential issue | 🟠 Major
Scan()silently returns a zero-valued struct on miss — same issue as inapi_repository.go.
GetCurrentByGateway,GetWithState(line 273), andGetStatus(line 242) all use.Scan()which never raisesgorm.ErrRecordNotFound. The not-found guards are dead code, and callers receive a non-nil zero-valued struct (or zero-valued result) rather than the documented nil/empty return.For
GetCurrentByGateway, useRowsAffectedto detect the empty-result case (similar to the fix suggested forapi_repository.go). Apply the same pattern toGetWithStateandGetStatus.🐛 Proposed fix for GetCurrentByGateway
- err := r.db.Table("deployments d"). + result := r.db.Table("deployments d"). Select("d.deployment_id, d.name, d.artifact_uuid, d.organization_uuid, d.gateway_uuid, "+ "d.base_deployment_id, d.content, d.metadata, d.created_at, "+ "s.status, s.updated_at AS status_updated_at"). Joins("INNER JOIN deployment_status s ON d.deployment_id = s.deployment_id "+ "AND d.artifact_uuid = s.artifact_uuid "+ "AND d.organization_uuid = s.organization_uuid "+ "AND d.gateway_uuid = s.gateway_uuid"). Where("d.artifact_uuid = ? AND d.gateway_uuid = ? AND d.organization_uuid = ? AND s.status = ?", artifactUUID, gatewayID, orgUUID, string(models.DeploymentStatusDeployed)). Order("d.created_at DESC"). Limit(1). - Scan(&deployment).Error - if err != nil { - if errors.Is(err, gorm.ErrRecordNotFound) { - return nil, nil - } - return nil, err + Scan(&deployment) + if result.Error != nil { + return nil, result.Error + } + if result.RowsAffected == 0 { + return nil, nil } return &deployment, nilagent-manager-service/repositories/api_repository.go-122-139 (1)
122-139:⚠️ Potential issue | 🟠 Major
Scan()never returnsgorm.ErrRecordNotFound— caller receives a zero-valued struct instead ofnilon miss.GORM's
.Scan()populates the target and returnsnilerror even when zero rows match; only.First()/.Take()/.Last()raiseErrRecordNotFound. The guard on line 133 is dead code, so when no API exists the function returns&api(all zero values) rather thannil, which will silently mislead callers.The same issue applies to
GetAPIMetadataByHandle(line 142).🐛 Proposed fix — use `RowsAffected`
func (r *APIRepo) GetAPIByUUID(apiUUID, orgUUID string) (*models.API, error) { var api models.API - err := r.db.Table("rest_apis a"). + result := r.db.Table("rest_apis a"). Select("art.uuid, art.handle, art.name, art.kind, a.description, art.version, a.created_by, "+ "a.project_uuid, art.organization_uuid, a.lifecycle_status, "+ "a.transport, a.configuration, art.created_at, art.updated_at"). Joins("INNER JOIN artifacts art ON a.uuid = art.uuid"). - Where("a.uuid = ? AND art.organization_uuid = ?", apiUUID, orgUUID). - Scan(&api).Error - if err != nil { - if errors.Is(err, gorm.ErrRecordNotFound) { - return nil, nil - } - return nil, err + Where("a.uuid = ? AND art.organization_uuid = ?", apiUUID, orgUUID). + Scan(&api) + if result.Error != nil { + return nil, result.Error + } + if result.RowsAffected == 0 { + return nil, nil } return &api, nil }Apply the same fix to
GetAPIMetadataByHandle.agent-manager-service/services/platform_gateway_service.go-414-441 (1)
414-441:⚠️ Potential issue | 🟠 Major
VerifyTokenfetches ALL gateways and their tokens on every call — O(n) DB round-trips.This performs one
List()call plus oneGetActiveTokensByGatewayUUIDcall per gateway. For every token verification (likely on every incoming request from a gateway), this scans the entire gateway table. This will not scale as the number of gateways grows.Consider a single query that joins
gateway_tokens(wherestatus='active') withgatewaysand loads all active token hashes at once, or use a token-prefix lookup index to narrow the search.agent-manager-service/services/platform_gateway_service.go-215-222 (1)
215-222:⚠️ Potential issue | 🟠 MajorGateway and token creation are not transactional.
If
CreateToken(Line 220) fails afterCreate(Line 216) succeeds, the gateway exists without any authentication token. Consider wrapping both operations in a DB transaction.agent-manager-service/services/llm_deployment_service.go-329-336 (1)
329-336:⚠️ Potential issue | 🟠 MajorRestore silently overwrites a different active deployment without undeploying it.
When
currentDeploymentID != deploymentIDbutstatus == DeploymentStatusDeployed, the code proceeds toSetCurrentwhich implicitly replaces the currently-deployed revision. The previous deployment transitions to an undefined state — no undeployment event is broadcast for it, and its status record is overwritten.If this is intentional (atomic swap), consider broadcasting an undeployment event for the previously active deployment so the gateway reflects the change. If unintentional, add a guard or explicit undeploy step.
agent-manager-service/services/llm_deployment_service.go-179-183 (1)
179-183:⚠️ Potential issue | 🟠 MajorError from
GetWithContentis silently replaced withErrBaseDeploymentNotFound.If the repository call fails for a transient reason (e.g., DB connection error), the actual error is discarded and the caller receives a misleading "base deployment not found" error. Distinguish not-found from infrastructure errors.
🐛 Proposed fix
baseDeployment, err := s.deploymentRepo.GetWithContent(req.Base, provider.UUID.String(), orgID) if err != nil { - slog.Warn("LLMProviderDeploymentService.DeployLLMProvider: base deployment not found", "providerID", providerID, "baseDeploymentID", req.Base, "error", err) - return nil, utils.ErrBaseDeploymentNotFound + slog.Error("LLMProviderDeploymentService.DeployLLMProvider: failed to get base deployment", "providerID", providerID, "baseDeploymentID", req.Base, "error", err) + return nil, fmt.Errorf("failed to get base deployment: %w", err) + } + if baseDeployment == nil { + slog.Warn("LLMProviderDeploymentService.DeployLLMProvider: base deployment not found", "providerID", providerID, "baseDeploymentID", req.Base) + return nil, utils.ErrBaseDeploymentNotFound }agent-manager-service/services/platform_gateway_service.go-624-687 (1)
624-687:⚠️ Potential issue | 🟠 Major
validateGatewayInputtrims local copies but the caller uses the original untrimmed values.Lines 635, 658, 667, 673 call
strings.TrimSpaceon local parameter copies (name,displayName,vhost,functionalityType). The validation passes, but the caller inRegisterGatewaystill uses the original (potentially whitespace-padded) values when constructing theGatewaymodel. For example, a name like" my-gw "passes validation here (after local trim) but is stored with leading/trailing spaces.Also,
regexp.MustCompileon Line 647 recompiles the regex on every call. Move it to a package-levelvar.🐛 Proposed fix: compile regex at package level and trim at the call site
+var namePattern = regexp.MustCompile(`^[a-z0-9-]+$`) + func (s *PlatformGatewayService) validateGatewayInput(orgID, name, displayName, vhost, functionalityType string) error {For the trimming issue, either:
- Accept pointer parameters and write back trimmed values, or
- Trim inputs in
RegisterGatewaybefore callingvalidateGatewayInputand passing them to the model constructor.
🟡 Minor comments (18)
agent-manager-service/resources/default-llm-provider-templates/awsbedrock-template.yaml-26-27 (1)
26-27:⚠️ Potential issue | 🟡 MinorExternal resource URLs reference a personal GitHub repository.
Both
logoUrlandopenapiSpecUrlpoint tohttps://raw.githubusercontent.com/nomadxd/openapi-connectors/.... Personal repos can be deleted, renamed, or made private at any time, which would break these references. Consider hosting these assets in an organization-owned repository or a stable CDN.agent-manager-service/models/artifact.go-26-35 (1)
26-35:⚠️ Potential issue | 🟡 MinorInconsistent JSON tag casing compared to other new models.
Artifactuses snake_case JSON tags ("organization_uuid","created_at","updated_at"), whileAPProject,AssociationMapping, andGatewayCredentialsin this same PR use camelCase ("organizationId","createdAt"). If these models are serialized in the same API surface, this inconsistency will be visible to consumers.Suggested alignment to camelCase
type Artifact struct { - UUID uuid.UUID `gorm:"column:uuid;primaryKey" json:"uuid"` - Handle string `gorm:"column:handle" json:"handle"` - Name string `gorm:"column:name" json:"name"` - Version string `gorm:"column:version" json:"version"` - Kind string `gorm:"column:kind" json:"kind"` - OrganizationUUID uuid.UUID `gorm:"column:organization_uuid" json:"organization_uuid"` - CreatedAt time.Time `gorm:"column:created_at" json:"created_at"` - UpdatedAt time.Time `gorm:"column:updated_at" json:"updated_at"` + UUID uuid.UUID `gorm:"column:uuid;primaryKey" json:"uuid"` + Handle string `gorm:"column:handle" json:"handle"` + Name string `gorm:"column:name" json:"name"` + Version string `gorm:"column:version" json:"version"` + Kind string `gorm:"column:kind" json:"kind"` + OrganizationUUID uuid.UUID `gorm:"column:organization_uuid" json:"organizationId"` + CreatedAt time.Time `gorm:"column:created_at" json:"createdAt"` + UpdatedAt time.Time `gorm:"column:updated_at" json:"updatedAt"` }agent-manager-service/models/gateway_internal_dto.go-46-46 (1)
46-46:⚠️ Potential issue | 🟡 MinorBug: YAML tag
upstream(singular) doesn't match JSON tagupstreams(plural).
Upstreamshasjson:"upstreams"butyaml:"upstream". If this DTO is ever parsed or emitted as YAML, the field key won't match, silently dropping data. This looks like a typo.🐛 Proposed fix
- Upstreams []UpstreamSimple `json:"upstreams" yaml:"upstream" binding:"required"` + Upstreams []UpstreamSimple `json:"upstreams" yaml:"upstreams" binding:"required"`agent-manager-service/resources/default-llm-provider-templates/openai-template.yaml-31-32 (1)
31-32:⚠️ Potential issue | 🟡 Minor
logoUrlandopenapiSpecUrlreference a personal GitHub repository — fragile for production use.Both URLs point to
raw.githubusercontent.com/nomadxd/openapi-connectors/.... If that personal repo is deleted, renamed, or set to private, these resources will break silently. Consider hosting these assets in an organization-controlled location or bundling them within this repository.agent-manager-service/db_migrations/005_create_artifacts_and_apis.go-29-57 (1)
29-57:⚠️ Potential issue | 🟡 MinorInconsistent use of
CREATE TABLE IF NOT EXISTSvsCREATE TABLE.
ap_projects(Line 29) usesCREATE TABLE IF NOT EXISTSwhile all other tables (artifacts,rest_apis,deployments, etc.) use plainCREATE TABLE. If there's a reasonap_projectsmight already exist (e.g., created by another path), the reasoning should be documented. Otherwise, make all statements consistent — either all useIF NOT EXISTSor none do.agent-manager-service/db_migrations/004_create_gateways.go-72-72 (1)
72-72:⚠️ Potential issue | 🟡 MinorPartial index on
deleted_at IS NOT NULLis likely inverted.The index
idx_gateways_deletedcovers rows wheredeleted_at IS NOT NULL(i.e., deleted records). The more common query pattern is filtering for active records (WHERE deleted_at IS NULL). Consider whether this should be inverted, or if both patterns are needed.agent-manager-service/main.go-182-182 (1)
182-182:⚠️ Potential issue | 🟡 MinorUse
errors.Isinstead of!=for error comparison.Per the linter (
errorlint): wrapped errors won't match with!=. The same applies to line 190.Proposed fix
Add
"errors"to imports, then:- if err := internalServer.Start(); err != nil && err != http.ErrServerClosed { + if err := internalServer.Start(); err != nil && !errors.Is(err, http.ErrServerClosed) {- if err := mainServer.ListenAndServe(); err != nil && err != http.ErrServerClosed { + if err := mainServer.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {agent-manager-service/main.go-249-252 (1)
249-252:⚠️ Potential issue | 🟡 Minor
org.UUID.String() == ""is always false — UUID zero value stringifies to"00000000-...", not"".Use
org.UUID == uuid.Nilto check for a zero-value UUID.Proposed fix
- if org == nil || org.UUID.String() == "" { + if org == nil || org.UUID == uuid.Nil {This requires adding
"github.com/google/uuid"to the imports.agent-manager-service/server/internal_server.go-64-64 (1)
64-64:⚠️ Potential issue | 🟡 MinorMalformed
slog.Infocall — positional args instead of key-value pairs.
slog.Infoexpects alternating key-value pairs for structured logging. PassingcertPathandkeyPathas bare positional arguments will produce garbled output (the first string becomes a key with the second as its value, or similar unexpected behavior).🐛 Proposed fix
- slog.Info("initializing with certs", certPath, keyPath) + slog.Info("initializing with certs", "certPath", certPath, "keyPath", keyPath)agent-manager-service/models/gateway_compat.go-38-42 (1)
38-42:⚠️ Potential issue | 🟡 Minor
GatewayEnvironmentMappingshould define aTableName()method for consistency and explicitness with GORM operations.Every model in this codebase (Gateway, Environment, LLMProviderTemplate, etc.) defines a
TableName()method.GatewayEnvironmentMappingis actively used with GORM for queries, inserts, and deletes, yet lacks this method. Without it, GORM relies on inference, which is fragile and inconsistent with the established pattern.agent-manager-service/controllers/gateway_internal_controller.go-106-106 (1)
106-106:⚠️ Potential issue | 🟡 MinorUnchecked return value of
w.Write— flagged by linter.Static analysis flags
w.Write(zipData)on lines 106, 171, and 328 for unchecked error return. While typically benign, it's best practice to handle or explicitly discard.Proposed fix (apply to all three locations)
- w.Write(zipData) + if _, err := w.Write(zipData); err != nil { + log.Error("Failed to write response", "error", err) + }Also applies to: 171-171, 328-328
agent-manager-service/controllers/gateway_internal_controller.go-214-218 (1)
214-218:⚠️ Potential issue | 🟡 MinorRaw JSON decode error exposed to client — potential information leakage.
"Invalid request body: "+err.Error()(line 216) passes internal parse error details to the caller. This could expose structure expectations (field names, types) to an attacker.Proposed fix
- http.Error(w, "Invalid request body: "+err.Error(), http.StatusBadRequest) + http.Error(w, "Invalid request body", http.StatusBadRequest)agent-manager-service/controllers/websocket_controller.go-252-264 (1)
252-264:⚠️ Potential issue | 🟡 Minor
X-Forwarded-Formay contain multiple comma-separated IPs.
r.Header.Get("X-Forwarded-For")returns the full header value (e.g.,"client, proxy1, proxy2"). Using it as-is for rate-limiting keys means different header values map to different entries, allowing trivial bypasses. Parse and use only the first (leftmost) IP.Suggested fix
func getClientIP(r *http.Request) string { // Check X-Forwarded-For header first if xff := r.Header.Get("X-Forwarded-For"); xff != "" { - return xff + // Use the first IP in the comma-separated list + if idx := strings.Index(xff, ","); idx != -1 { + return strings.TrimSpace(xff[:idx]) + } + return strings.TrimSpace(xff) }Note: You'll need to add
"strings"to the imports.agent-manager-service/controllers/websocket_controller.go-122-133 (1)
122-133:⚠️ Potential issue | 🟡 MinorUnchecked error returns and misleading variable name on error path.
Line 129:
jsonErris actually the[]byteresult ofjson.Marshal, not an error — the naming is confusing. Lines 130 and 132:conn.WriteMessageandconn.Closeerror returns are unchecked (flagged by linter). While these are best-effort on a failing connection, the linter errors should be addressed.Proposed fix
- if jsonErr, _ := json.Marshal(errorMsg); jsonErr != nil { - conn.WriteMessage(websocket.TextMessage, jsonErr) - } - conn.Close() + if msgBytes, err := json.Marshal(errorMsg); err == nil { + _ = conn.WriteMessage(websocket.TextMessage, msgBytes) + } + _ = conn.Close()agent-manager-service/repositories/api_repository.go-1-2 (1)
1-2:⚠️ Potential issue | 🟡 MinorCopyright year mismatch flagged by linter.
The header says 2025; other new files in this PR use 2026. Update to match.
agent-manager-service/controllers/llm_controller.go-114-119 (1)
114-119:⚠️ Potential issue | 🟡 MinorHTTP 401 is semantically incorrect for "organization not found".
401 Unauthorizedsignals missing/invalid authentication credentials. When the organization handle doesn't resolve,404 Not Found(or403 Forbiddenif it's an access-control decision) would be the appropriate status code. This pattern is repeated across all handlers in the file.agent-manager-service/repositories/api_repository.go-297-312 (1)
297-312:⚠️ Potential issue | 🟡 Minor
CheckAPIExistsByNameAndVersionInOrganizationdoesn't filter by artifact kind.Unlike
CheckAPIExistsByHandleInOrganization(line 289, filters onkind = KindRestAPI), this method matches all artifact kinds. A WebSubAPI (or any future kind) with the same name+version would produce a false positive.Proposed fix
query := r.db.Model(&models.Artifact{}). - Where("name = ? AND version = ? AND organization_uuid = ?", name, version, orgUUID) + Where("name = ? AND version = ? AND organization_uuid = ? AND kind = ?", name, version, orgUUID, models.KindRestAPI)agent-manager-service/services/platform_gateway_service.go-484-493 (1)
484-493:⚠️ Potential issue | 🟡 Minor
uuid.MustParse(gatewayID)can panic ifgatewayIDis malformed.Although
GetByUUIDwas called earlier (which may validate the format),MustParsewill panic rather than return an error. Preferuuid.Parsewith error handling, or reuse the UUID already validated earlier (e.g., store the parsed UUID from thegatewayobject).🐛 Proposed fix
- GatewayUUID: uuid.MustParse(gatewayID), + GatewayUUID: gateway.UUID,
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@agent-manager-service/controllers/gateway_internal_controller.go`:
- Around line 216-222: The handler currently returns the raw parse error to the
client by appending err.Error() to the HTTP response; instead keep the detailed
error only in logs and return a generic message. Update the json decoding error
branch (around json.NewDecoder(r.Body).Decode(¬ificationReq)) to log the full
err via log.Warn (or similar) but call http.Error(w, "Invalid request body",
http.StatusBadRequest) without including err.Error(); ensure notificationReq,
json.NewDecoder, log.Warn and http.Error are the symbols you modify.
In `@agent-manager-service/main.go`:
- Around line 189-194: The error check after calling mainServer.ListenAndServe()
is using a direct comparison (err != http.ErrServerClosed) which is inconsistent
with the earlier use of errors.Is; update the condition to use errors.Is(err,
http.ErrServerClosed) (i.e., treat the error as non-fatal only if errors.Is
returns true) so the block becomes: if err := mainServer.ListenAndServe(); err
!= nil && !errors.Is(err, http.ErrServerClosed) { ... } and ensure the same
pattern is used when logging via slog.Error and exiting.
- Around line 146-174: The shutdown routine currently uses a single shared
shutdownCtx and calls wg.Add(2) even though one goroutine performs both
shutdowns; change wg.Add(2) to wg.Add(1) (and a single wg.Done()), and create
separate contexts for each server so each gets its own 60s timeout: inside the
goroutine, after <-stopCh create shutdownCtxMain, cancelMain :=
context.WithTimeout(...), defer cancelMain (or cancel immediately after the
Shutdown call) and call mainServer.Shutdown(shutdownCtxMain); then create
shutdownCtxInternal, cancelInternal := context.WithTimeout(...), defer/cancel
and call internalServer.Shutdown(shutdownCtxInternal); keep logging around
dependencies.WebSocketManager.Shutdown() as is and handle errors for each
Shutdown call separately.
- Around line 202-224: The function loadAndSeedLLMTemplates mutates
cfg.LLMTemplateDefinitionsPath in-place when trying a fallback path and that
causes a side-effect on the shared config; instead, introduce a local variable
(e.g., templatePath or attemptedPath) initialized from
cfg.LLMTemplateDefinitionsPath and use that for calls to
utils.LoadLLMProviderTemplatesFromDirectory and for computing fallbackPath,
updating only the local variable when the fallback succeeds (and not cfg); also
simplify the fallback conditions by computing a cleaned path once (using
filepath.Clean and strings.TrimSpace) and then checking the few necessary
predicates before joining "src" to form fallbackPath so the logic in
loadAndSeedLLMTemplates is clearer and has no global config mutation.
- Around line 250-253: The guard checking org.UUID.String() == "" is ineffective
because uuid.UUID's zero value string is not empty; update the nil UUID check to
compare org.UUID against uuid.Nil instead. Locate the loop that iterates orgs
and replace the condition org == nil || org.UUID.String() == "" with a check
that skips when org == nil || org.UUID == uuid.Nil, ensuring you import or
reference github.com/google/uuid where needed and keep behavior consistent with
other usages of uuid.Nil in the codebase.
🧹 Nitpick comments (4)
agent-manager-service/controllers/gateway_internal_controller.go (3)
56-109: Extract the repeated API-key authentication boilerplate into a shared helper or middleware.All four handlers (
GetAPIsByOrganization,GetAPI,CreateGatewayDeployment,GetLLMProvider) repeat the same ~15 lines for extracting the API key, logging the client IP, and callingVerifyToken. This violates DRY and makes it easy to introduce inconsistencies (e.g., different error messages across handlers).Consider extracting this into a small middleware or helper that returns
(gateway, error)so each handler can start from the authenticated gateway directly.♻️ Example helper sketch
// authenticateGateway extracts the API key, verifies it, and returns the gateway. func (c *gatewayInternalController) authenticateGateway(w http.ResponseWriter, r *http.Request) (*models.Gateway, bool) { log := logger.GetLogger(r.Context()) clientIP := getClientIP(r) apiKey := r.Header.Get("api-key") if apiKey == "" { log.Warn("Unauthorized access attempt - Missing API key", "ip", clientIP) http.Error(w, "API key is required. Provide 'api-key' header.", http.StatusUnauthorized) return nil, false } gateway, err := c.gatewayService.VerifyToken(apiKey) if err != nil { log.Warn("Authentication failed", "ip", clientIP, "error", err) http.Error(w, "Invalid or expired API key", http.StatusUnauthorized) return nil, false } return gateway, true }Each handler then becomes:
func (c *gatewayInternalController) GetAPIsByOrganization(w http.ResponseWriter, r *http.Request) { gateway, ok := c.authenticateGateway(w, r) if !ok { return } // ... rest of handler }Also applies to: 112-176, 178-268, 271-335
135-141: Consider validatingapiIdandproviderIdpath parameters as valid UUIDs.These values are user-supplied and used in DB queries and interpolated into
Content-Dispositionheaders. While the DB query would fail for invalid IDs, early UUID validation provides clearer error messages and avoids unnecessary DB round-trips.Also applies to: 294-300
228-240: Request DTO includes unnecessary fields that are silently dropped during conversion.The
models.DeploymentNotificationrequest requiresID,Status,CreatedAt,UpdatedAt,DeployedAt, andDeployedVersionfrom the caller, but the conversion toservices.DeploymentNotification(lines 228-240) only usesProjectIdentifierandConfiguration. Since the service auto-generates timestamps instead of using the caller-provided ones, these fields should be removed from the request DTO to clarify the API contract and avoid confusion.agent-manager-service/api/app.go (1)
63-91: CORS middleware on the internal server may be unnecessary.The internal server is meant for gateway-to-service communication using API-key auth. CORS is a browser-enforced mechanism and typically irrelevant for server-to-server traffic. Applying it here adds minor overhead and could inadvertently allow browser-based cross-origin requests to reach internal endpoints if the allowed origin is permissive.
Consider whether CORS is needed here or if it was copied from the public handler.
b427611 to
f2343e2
Compare
There was a problem hiding this comment.
Actionable comments posted: 18
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🤖 Fix all issues with AI agents
In `@agent-manager-service/controllers/gateway_controller.go`:
- Around line 536-541: The handler sets artifactType :=
r.URL.Query().Get("type") and then defaults it to "all", which causes
PlatformGatewayService.GetGatewayArtifacts to skip all artifacts because it
expects an empty string for "no filter"; remove the default assignment (do not
set artifactType = "all") so the empty string is passed through to
PlatformGatewayService.GetGatewayArtifacts (leave artifactType as returned by
r.URL.Query().Get("type") so GetGatewayArtifacts can treat "" as no filter).
- Around line 83-93: The resolveOrgUUID function currently maps all errors from
orgRepo.GetOrganizationByHandle to utils.ErrOrganizationNotFound, which masks DB
errors and causes callers to return 401; change resolveOrgUUID so it returns the
original error (or a wrapped DB error) when GetOrganizationByHandle returns a
non-nil err, and only return utils.ErrOrganizationNotFound when org == nil;
update callers of resolveOrgUUID (following the pattern in
llm_deployment_controller.go) to translate utils.ErrOrganizationNotFound into
http.StatusNotFound and translate repository/DB errors into a 500-level response
instead of 401.
In `@agent-manager-service/controllers/gateway_internal_controller.go`:
- Around line 229-240: The DTO mapping in CreateGatewayDeployment is wrong: set
Configuration.ApiVersion to the CRD version string used in
generateAPIDeploymentYAML (e.g., "gateway.api-platform.wso2.com/v1alpha1")
instead of notificationReq.Configuration.Version, and populate
services.APIDeploymentSpec.Operations from
notificationReq.Configuration.Spec.Operations so operations are not dropped;
update the construction of notification (services.DeploymentNotification ->
Configuration -> Spec) to include the Operations field and use the CRD
ApiVersion constant or literal from generateAPIDeploymentYAML.
In `@agent-manager-service/controllers/websocket_controller.go`:
- Around line 224-255: The rateLimitMap growth causes an unbounded memory leak
because entries are only touched on reconnects; update the implementation to
evict stale IP entries by adding a background cleanup goroutine (or use a
time-expiring cache) that periodically scans rateLimitMap and removes keys whose
last attempt(s) are older than the 1-minute window: implement a cleanupTicker
started from websocketController initialization that locks rateLimitMu, iterates
entries in rateLimitMap, and deletes entries with no timestamps after now.Add(-1
* time.Minute); alternatively store a lastSeen timestamp per IP and evict when
lastSeen is older than the threshold; ensure proper shutdown of the goroutine
and safe use of rateLimitMu in both checkRateLimit and the cleaner.
- Around line 160-176: The current handler unconditionally calls
c.gatewayService.UpdateGatewayActiveStatus(gateway.UUID.String(), false) after
c.manager.Unregister(...), which incorrectly marks the gateway inactive when
other connections may still exist; modify the flow in the websocket handler to
call c.manager.Unregister(gateway.UUID.String(), connection.ConnectionID) first,
then query the manager for remaining connections (e.g.,
c.manager.GetConnections(gateway.UUID.String()) or a provided count method) and
only call c.gatewayService.UpdateGatewayActiveStatus(gateway.UUID.String(),
false) if the returned list/count is zero; keep logging but include the
conditional check to avoid marking gateways inactive while other connections
remain.
- Around line 258-268: getClientIP currently returns the full X-Forwarded-For
header which may contain multiple comma-separated IPs; change it to parse
X-Forwarded-For, split on ',' and use only the first non-empty entry (trim
spaces) as the client IP, falling back to X-Real-IP and then r.RemoteAddr if
needed; update the getClientIP function and add the "strings" import.
In `@agent-manager-service/repositories/api_repository.go`:
- Around line 121-154: Both GetAPIByUUID and GetAPIMetadataByHandle use Scan
which never yields gorm.ErrRecordNotFound and thus returns zero-value structs
instead of nil; change them to use
r.db.Table(...).Where(...).First(&api)/First(&metadata) (or .Take) so GORM
returns ErrRecordNotFound for missing rows, then check errors.Is(err,
gorm.ErrRecordNotFound) and return (nil, nil) or (nil, err) consistent with
callers (ensure gateway_internal_service.CreateGatewayDeployment's existingAPI
nil-check works); update the implementations of GetAPIByUUID and
GetAPIMetadataByHandle (referencing those function names, the models.API and
models.APIMetadata targets, and the SQL joins/selects) accordingly.
In `@agent-manager-service/repositories/deployment_repository.go`:
- Around line 259-287: The GetWithState function currently uses Scan and always
returns a non-nil deployment pointer even when no rows match, causing the
ARCHIVED fallback to create a fake record; change the DB call to capture the
GORM result (e.g., result :=
r.db.Table(...).Select(...).Joins(...).Where(...).Scan(&deployment)) then test
result.Error first and return it if non-nil, and if result.RowsAffected == 0
return nil, errors.New("deployment not found") before applying the Status nil ->
ARCHIVED fallback so callers don't receive a zero-UUID fake deployment.
- Around line 166-189: GetCurrentByGateway uses Scan which never returns
gorm.ErrRecordNotFound, so it always returns a non-nil &deployment even when no
row exists; change the logic in GetCurrentByGateway to detect zero results (e.g.
use r.db.Model(...).Scan(&deployment).RowsAffected or switch to First/Take) and
return (nil, gorm.ErrRecordNotFound) when no rows were found; update the
function around the Scan call and the deployment variable handling so callers’
nil checks work as intended.
In `@agent-manager-service/repositories/gateway_repository.go`:
- Around line 106-111: The List method on GatewayRepo returns all gateways
unscoped, causing cross-organization exposure and unbounded results; change the
signature of GatewayRepo.List to accept an organization UUID and pagination
(e.g., List(orgUUID string, limit, offset int)) or at minimum an orgUUID and
limit, then scope the DB query with Where("organization_uuid = ?", orgUUID) and
apply Limit(limit).Offset(offset) (and keep Order("created_at DESC")), update
all call sites to pass the org UUID and pagination values, and ensure the
returned error handling remains the same.
In `@agent-manager-service/repositories/llm_provider_template_repository.go`:
- Line 86: The call slog.Info(t.Configuration) exposes potentially sensitive
config data; change this to a lower-level log (slog.Debug) and redact sensitive
fields before logging (e.g., remove or mask t.Configuration.Metadata.Auth /
metadata.auth and any tokens/secrets) — locate the usage of t.Configuration and
replace the Info call with Debug and/or construct a sanitized copy of the
configuration (omitting or redacting fields like Metadata.Auth, tokens,
credentials) before logging.
In `@agent-manager-service/services/gateway_events_service.go`:
- Around line 71-355: The four near-identical methods (BroadcastDeploymentEvent,
BroadcastUndeploymentEvent, BroadcastLLMProviderDeploymentEvent,
BroadcastLLMProviderUndeploymentEvent) should be collapsed into a single helper
on GatewayEventsService (e.g., broadcastEvent or sendGatewayEvent) that accepts
gatewayID, a generic payload interface{}, the eventType string, and a short log
prefix; move shared logic (JSON marshal + size check, DTO creation using
GatewayEventDTO, manager nil check, fetch connections via
s.manager.GetConnections, per-connection Send handling, DeliveryStats updates,
and summary/return behavior) into that helper, and have each of the four methods
simply call it with the appropriate payload and type; update log messages to use
the provided prefix/eventType and preserve the existing error returns (including
using lastError) so behavior is unchanged.
In `@agent-manager-service/services/gateway_internal_service.go`:
- Around line 144-161: The nil check in GetActiveDeploymentByGateway is
currently ineffective because deploymentRepo.GetCurrentByGateway does not return
nil for no rows; update the repository (GetCurrentByGateway in
deployment_repository.go) to return (nil, nil) when no rows are found (rather
than a zero-valued struct), and keep the nil check in
GetActiveDeploymentByGateway (function GetActiveDeploymentByGateway) to return a
clear error like "deployment not active" when deployment == nil so the service
correctly signals no active deployment once the repo behavior is fixed.
- Around line 222-259: The current check relies on existingAPI being nil but
GetAPIMetadataByHandle may return a non-nil zero-value struct; update the branch
in the gateway handler to treat a zero-value result as "not found" by checking
the API's UUID (or other unique identifier) in addition to nil: i.e., in the
block around existingAPI handling (variables existingAPI, apiUUID, apiCreated
and calls to s.apiRepo.GetAPIMetadataByHandle and s.apiRepo.CreateAPI), change
the condition to treat existingAPI == nil OR existingAPI.UUID == uuid.Nil (or
equivalent zero-value) as missing so the CreateAPI (newAPI) path actually
executes until the repository Scan bug is fixed. Ensure subsequent uses
(existingAPI.OrganizationUUID and existingAPI.UUID) assume a valid UUID only
when not zero.
In `@agent-manager-service/services/llm_deployment_service.go`:
- Around line 178-182: The code masks all errors from
deploymentRepo.GetWithContent by always returning
utils.ErrBaseDeploymentNotFound; change the logic in
LLMProviderDeploymentService.DeployLLMProvider to distinguish not-found from
other DB errors: call s.deploymentRepo.GetWithContent(req.Base,
provider.UUID.String(), orgID), if err != nil then if errors.Is(err,
repository.ErrNotFound) (or the repo's not-found sentinel) return
utils.ErrBaseDeploymentNotFound, else return the original err (or wrap it) so
callers see a 5xx; ensure you reference baseDeployment and GetWithContent and
propagate non-not-found errors instead of always mapping to
ErrBaseDeploymentNotFound.
In `@agent-manager-service/services/llm_template_seeder.go`:
- Around line 91-111: The metadata changes to current.Metadata are only
in-memory because current.Configuration (the JSON string) isn't updated before
persistence; before calling s.templateRepo.Update(current) re-marshal the
template configuration so the updated Metadata and Name are reflected in
current.Configuration (same approach used by Create()), e.g., marshal the struct
back into the Configuration JSON field on the current template after modifying
current.Metadata/current.Name and before s.templateRepo.Update(current) to
ensure Update persists the changes.
In `@agent-manager-service/services/platform_gateway_service.go`:
- Around line 215-222: The gateway creation and token insertion
(s.gatewayRepo.Create(gateway) followed by
s.gatewayRepo.CreateToken(gatewayToken)) must be made atomic to avoid leaving an
orphan gateway on token failure; either wrap both calls in a DB transaction
exposed by the repository (begin/commit/rollback) so both Create and CreateToken
are committed together, or, if transactions are not available, add compensating
cleanup: call s.gatewayRepo.Delete(gateway.ID) when
s.gatewayRepo.CreateToken(...) returns an error, and ensure errors from the
cleanup are logged/returned appropriately; update the service method to choose
one approach and use the repository's transactional API or Delete(...) method
accordingly.
- Around line 415-442: The VerifyToken method currently does gatewayRepo.List
followed by gatewayRepo.GetActiveTokensByGatewayUUID per gateway causing N+1
queries and it also swallows errors from GetActiveTokensByGatewayUUID; replace
this with a single-repo lookup (e.g., add gatewayRepo.GetAllActiveTokens()
returning token hashes with gateway UUIDs, or better
gatewayRepo.GetActiveTokenByHash(hash) indexed lookup) and change VerifyToken to
query that single method (or lookup by token hash) and validate using
verifyToken; also stop swallowing DB errors—propagate and return an error when
the repository call fails so transient DB errors return a retryable error
instead of "invalid token".
🟡 Minor comments (8)
agent-manager-service/repositories/llm_provider_template_repository.go-151-160 (1)
151-160:⚠️ Potential issue | 🟡 Minor
Listdoes not unmarshalConfigurationJSON — callers get incomplete template data.Unlike
GetByIDandGetByUUID, theListmethod returns templates without populating the parsed fields (Metadata,PromptTokens, etc.). Callers that iterate over the list and access these fields will getnilvalues.Proposed fix — unmarshal after fetching
func (r *LLMProviderTemplateRepo) List(orgUUID string, limit, offset int) ([]*models.LLMProviderTemplate, error) { var templates []*models.LLMProviderTemplate err := r.db.Where("organization_uuid = ?", orgUUID). Order("created_at DESC"). Limit(limit). Offset(offset). Find(&templates).Error + if err != nil { + return templates, err + } + for _, t := range templates { + if t.Configuration != "" { + var cfg llmProviderTemplateConfig + if err := json.Unmarshal([]byte(t.Configuration), &cfg); err != nil { + return nil, err + } + t.Metadata = cfg.Metadata + t.PromptTokens = cfg.PromptTokens + t.CompletionTokens = cfg.CompletionTokens + t.TotalTokens = cfg.TotalTokens + t.RemainingTokens = cfg.RemainingTokens + t.RequestModel = cfg.RequestModel + t.ResponseModel = cfg.ResponseModel + } + } return templates, err }agent-manager-service/repositories/gateway_repository.go-191-200 (1)
191-200:⚠️ Potential issue | 🟡 Minor
RevokeTokensilently succeeds for non-existent or already-revoked tokens.No
RowsAffectedcheck, and no status guard to prevent re-revoking an already-revoked token. A caller won't know if the operation was a no-op.agent-manager-service/repositories/deployment_repository.go-229-250 (1)
229-250:⚠️ Potential issue | 🟡 Minor
GetStatususesScan—ErrRecordNotFoundcheck on line 243 is dead code.The
errors.Is(err, gorm.ErrRecordNotFound)guard will never trigger withScan. When no rows match,resultwill contain zero-values anderrwill be nil. The current callers happen to work because they check for an emptyDeploymentIDstring, but this is fragile. Consider usingRowsAffectedto return a clear sentinel.agent-manager-service/services/llm_provider_service.go-156-173 (1)
156-173:⚠️ Potential issue | 🟡 MinorMissing nil guard on the fetched provider after Create.
After the transaction succeeds (line 148), the code fetches the just-created provider via
GetByID. If the repository implementation can return(nil, nil)(e.g., usingScanrather thanFirst), accessingcreated.ModelListon line 164 orcreated.UUIDon line 172 would panic. Add a nil check analogous to whatGetdoes on line 222.Proposed fix
created, err := s.providerRepo.GetByID(handle, orgID) if err != nil { slog.Error("LLMProviderService.Create: failed to fetch created provider", "orgID", orgID, "handle", handle, "error", err) return nil, fmt.Errorf("failed to fetch created provider: %w", err) } + if created == nil { + slog.Error("LLMProviderService.Create: created provider not found", "orgID", orgID, "handle", handle) + return nil, utils.ErrLLMProviderNotFound + }agent-manager-service/repositories/gateway_repository.go-139-161 (1)
139-161:⚠️ Potential issue | 🟡 Minor
UpdateGatewayandUpdateActiveStatussilently succeed when the gateway doesn't exist.Neither method checks
result.RowsAffected. A caller updating a non-existent gateway UUID receivesnilerror, which is misleading. TheDeletemethod correctly checksRowsAffected == 0— apply the same guard here.Example for UpdateGateway
func (r *GatewayRepo) UpdateGateway(gateway *models.Gateway) error { gateway.UpdatedAt = time.Now() - return r.db.Model(&models.Gateway{}). + result := r.db.Model(&models.Gateway{}). Where("uuid = ?", gateway.UUID). Updates(map[string]interface{}{ "display_name": gateway.DisplayName, "description": gateway.Description, "is_critical": gateway.IsCritical, "properties": gateway.Properties, "updated_at": gateway.UpdatedAt, - }).Error + }) + if result.Error != nil { + return result.Error + } + if result.RowsAffected == 0 { + return errors.New("gateway not found") + } + return nil }agent-manager-service/controllers/llm_controller.go-83-105 (1)
83-105:⚠️ Potential issue | 🟡 Minor
resolveOrgUUIDandresolveProjectUUIDswallow the original error.Both helpers discard the underlying error (e.g., database connectivity failure) and unconditionally return a domain-specific "not found" error. This means a DB outage surfaces as HTTP 401 "Organization not found" instead of HTTP 500, which hampers debugging.
Proposed fix — distinguish not-found from internal errors
func (c *llmController) resolveOrgUUID(ctx context.Context, orgName string) (string, error) { org, err := c.orgRepo.GetOrganizationByHandle(orgName) if err != nil { + if errors.Is(err, gorm.ErrRecordNotFound) { + return "", utils.ErrOrganizationNotFound + } - return "", utils.ErrOrganizationNotFound + return "", fmt.Errorf("failed to resolve organization: %w", err) } if org == nil { return "", utils.ErrOrganizationNotFound } return org.UUID.String(), nil }agent-manager-service/services/platform_gateway_service.go-625-688 (1)
625-688:⚠️ Potential issue | 🟡 MinorTrimmed
name/displayNamenot propagated to caller.Lines 636 and 659 trim the local
nameanddisplayNameparameters, but these are value parameters — the trimmed values don't flow back toRegisterGateway. If a user submits" my-gw ", validation passes (after trimming), but the gateway is created with the untrimmed value.Options
Either trim the values before calling
validateGatewayInputinRegisterGateway, or change the validation function to return the sanitized values.agent-manager-service/services/llm_deployment_service.go-305-364 (1)
305-364:⚠️ Potential issue | 🟡 Minor
RestoreLLMProviderDeploymentshould handle missing deployment status as a valid scenario.Line 329–332 calls
GetStatusand fails immediately if an error occurs. However, when no current deployment status exists for a gateway (whichGetStatusreturns asgorm.ErrRecordNotFound), the restore should proceed rather than fail. The function should only error on genuine database failures, not when records are absent. Consider checkingerrors.Is(err, gorm.ErrRecordNotFound)and treating it as a valid case—allowing restoration to proceed when no prior deployment exists.
🧹 Nitpick comments (26)
agent-manager-service/db_migrations/005_create_artifacts_and_apis.go (3)
141-142: Redundant index:idx_deployments_artifact_gatewayis a leftmost prefix ofidx_deployments_created_at.The index on
(artifact_uuid, gateway_uuid)at Line 141 is fully covered by the index on(artifact_uuid, gateway_uuid, created_at)at Line 142, since PostgreSQL can use a composite index for queries on any leftmost prefix of its columns. Keeping both wastes storage and slows writes with no query benefit.Proposed fix
-- Indexes for performance CREATE INDEX idx_ap_projects_organization ON ap_projects(organization_uuid); CREATE INDEX idx_rest_apis_project ON rest_apis(project_uuid); CREATE INDEX idx_artifacts_org ON artifacts(organization_uuid); - CREATE INDEX idx_deployments_artifact_gateway ON deployments(artifact_uuid, gateway_uuid); CREATE INDEX idx_deployments_created_at ON deployments(artifact_uuid, gateway_uuid, created_at);
44-44: Inconsistent use ofIF NOT EXISTSacross table definitions.
ap_projectsusesCREATE TABLE IF NOT EXISTS(Line 29) while all other tables useCREATE TABLE. Since gormigrate tracks applied migrations, this isn't a runtime risk, but the inconsistency is worth normalizing — either all tables should useIF NOT EXISTSor none should.Also applies to: 77-77, 99-99, 120-120
150-153: No rollback (down) migration defined.This migration creates six tables and multiple indexes but provides no mechanism to reverse it. If this is intentional and consistent with the project's migration strategy, that's fine — but worth noting that rolling back migration 005 would require manual intervention.
agent-manager-service/repositories/artifact_repository.go (1)
92-104: Redundant error branching inGetByHandle.Both
ifandelsebranches return the same(nil, err). Theerrors.Is(err, gorm.ErrRecordNotFound)check has no distinct behavior. Either differentiate the handling (e.g., returnnil, nilfor not-found) or simplify.Simplification
func (r *ArtifactRepo) GetByHandle(handle, orgUUID string) (*models.Artifact, error) { var artifact models.Artifact err := r.db.Where("handle = ? AND organization_uuid = ?", handle, orgUUID). First(&artifact).Error - if err != nil { - if errors.Is(err, gorm.ErrRecordNotFound) { - return nil, err - } - return nil, err - } - return &artifact, nil + if err != nil { + return nil, err + } + return &artifact, nil }agent-manager-service/services/llm_template_seeder.go (1)
64-74: Minor: redundantListcall whentotalCountis 0.When
totalCountis 0, theListcall withlimit=0is unnecessary. Consider short-circuiting.Short-circuit when no existing templates
totalCount, err := s.templateRepo.Count(orgUUIDStr) if err != nil { return fmt.Errorf("failed to count existing templates: %w", err) } - // Get all existing templates for this organization - existing, err := s.templateRepo.List(orgUUIDStr, totalCount, 0) - if err != nil { - return fmt.Errorf("failed to list existing templates: %w", err) - } + var existing []*models.LLMProviderTemplate + if totalCount > 0 { + existing, err = s.templateRepo.List(orgUUIDStr, totalCount, 0) + if err != nil { + return fmt.Errorf("failed to list existing templates: %w", err) + } + }agent-manager-service/repositories/project_repository.go (1)
103-106:DeleteProjectsilently succeeds when the project doesn't exist.Unlike
DeleteinArtifactRepo(which checksRowsAffected == 0), this method doesn't verify the record existed. Callers may not realize the delete was a no-op.Add RowsAffected check
func (r *ProjectRepo) DeleteProject(projectId string) error { - return r.db.Where("uuid = ?", projectId).Delete(&models.Project{}).Error + result := r.db.Where("uuid = ?", projectId).Delete(&models.Project{}) + if result.Error != nil { + return result.Error + } + if result.RowsAffected == 0 { + return gorm.ErrRecordNotFound + } + return nil }agent-manager-service/controllers/websocket_controller.go (2)
49-55: DuplicateConnectionAckDTOtype — conflicts withmodels/gateway_internal_dto.go.A
ConnectionAckDTOis already defined inmodels/gateway_internal_dto.go(with fieldsType,Message,GatewayID). This controller defines a different version withConnectionIDandTimestampinstead ofMessage. This creates ambiguity. Consider consolidating into a single type in the models package.
66-70:CheckOriginunconditionally returnstrue— noted as TODO but is a security gap.This allows any origin to establish a WebSocket connection. Even with API key auth, this enables cross-site WebSocket hijacking if a browser is involved.
Do you want me to open an issue to track implementing proper origin checking?
agent-manager-service/repositories/organization_repository.go (1)
57-68:GetOrganizationByIdOrHandle— potential for unintended matches with OR query.If
idis empty orhandleis empty, theWHERE uuid = ? OR handle = ?clause could match records where only one condition is true, which may not be the caller's intent (e.g., passing an emptyidwould match any org whosehandleequals the empty string if such a row exists). Consider requiring at least one non-empty parameter, or using conditional query building.agent-manager-service/repositories/llm_provider_repository.go (1)
56-92: Excessiveslog.Infologging in repository methods.Every method logs multiple Info-level messages for start, intermediate steps, and completion (e.g., lines 58, 63, 68, 84, 90 just in
Create). This pattern repeats across all methods. Repository-layer operations are typically high-frequency; this level of verbosity will create significant log noise in production.Consider using
slog.Debugfor intermediate steps and reservingslog.Infofor significant events, or removing the per-step logging entirely and relying on service-layer logging.agent-manager-service/services/gateway_events_service.go (1)
19-23: Inconsistent logging — uses stdliblogwhile the rest of the codebase usesslogor structured logger.This file imports
"log"and useslog.Printfwith manual format strings (e.g.,[ERROR],[INFO]), while other files in the same PR use"log/slog"or the project'sloggerpackage. This loses structured logging benefits and creates inconsistency.agent-manager-service/services/llm_provider_template_service.go (1)
208-241: Consider whether an empty config map should be serialized.When all optional fields (
Metadata,PromptTokens, etc.) are nil,serializeConfigurationwrites"{}"intoConfiguration. On a subsequentdeserializeConfigurationpass, this yields all-nil fields again, so it round-trips correctly. However, during an Update, if the caller intends to leaveConfigurationunchanged but passes a zero-value template, the existing configuration will be overwritten with"{}".If partial updates should preserve existing configuration, the service should skip serialization (or the caller should merge) when no config fields are provided.
agent-manager-service/repositories/llm_proxy_repository.go (1)
87-101: Redundant error branching in GetByID.Both branches of the
if errors.Is(…)check return the same thing (nil, err). The same pattern appears in GetByUUID/GetByNameAndOrgID/GetTokenByUUID in the gateway repo. If the intent is to mapErrRecordNotFoundtonil, nil(like other repos do), this needs fixing; otherwise, the caller must handlegorm.ErrRecordNotFound, which differs from the template service's nil-check pattern atllm_provider_template_service.goline 131.If the intent is to return nil for not-found (consistent with the service layer's nil check):
func (r *LLMProxyRepo) GetByID(proxyID, orgUUID string) (*models.LLMProxy, error) { var proxy models.LLMProxy err := r.db. Preload("Artifact"). Joins("JOIN artifacts a ON llm_proxies.uuid = a.uuid"). Where("a.handle = ? AND a.organization_uuid = ? AND a.kind = ?", proxyID, orgUUID, models.KindLLMAPI). First(&proxy).Error if err != nil { if errors.Is(err, gorm.ErrRecordNotFound) { - return nil, err + return nil, nil } return nil, err } return &proxy, nil }agent-manager-service/services/llm_provider_service.go (2)
63-173: Excessiveslog.Infoon every step of the happy path.There are 12+
slog.Infocalls inCreatealone, including before/after every sub-operation. In production, this will generate significant log volume for routine CRUD. Consider reducing happy-path logs toslog.Debugand keeping only the entry/exit logs atInfolevel. Error/warning logs are fine as-is.The same pattern is repeated in
List,Get,Update, andDelete.
318-339: Provider deletion doesn't check for dependent proxies.
Deleteremoves a provider without checking whether any LLM proxies reference it (provider_uuidFK). If proxies exist, this will either fail with a database FK constraint error (unhandled) or leave orphaned proxies depending on the DB schema. Consider either checking for dependent proxies before deletion and returning a descriptive error, or cascading the delete.agent-manager-service/repositories/deployment_repository.go (1)
192-210: Non-idiomatic Go variable names inSetCurrent.Variables like
artifactUUID_uuid,orgUUID_uuid,gatewayUUID_uuid,deploymentUUID_uuiduse underscores, which violates Go naming conventions. ConsiderparsedArtifactUUID,parsedOrgUUID, etc.agent-manager-service/repositories/api_repository.go (1)
257-282:DeleteAPIsilently succeeds when the API doesn't exist.The method deletes associations, deployments, rest_api, and artifact records, but none of the intermediate steps check
RowsAffected. If theapiUUIDdoesn't exist, all DELETE statements are no-ops and the method returnsnil, misleading the caller into thinking a deletion occurred.Consider checking
RowsAffectedon the rest_api or artifact deletion step.agent-manager-service/services/gateway_internal_service.go (1)
318-343: Hardcoded context path/api/v1and API version.
generateAPIDeploymentYAMLhardcodesContext: "/api/v1"andApiVersion: "gateway.api-platform.wso2.com/v1alpha1". If the API model carries a context/basepath (e.g., viaConfiguration), it should be used instead. Hardcoding means all generated YAMLs share the same context path regardless of the actual API.agent-manager-service/controllers/llm_controller.go (1)
164-177: Pagination validation is duplicated across four list handlers.The same limit/offset capping logic (default 20, max 100, min offset 0) is repeated in
ListLLMProviderTemplates,ListLLMProviders,ListLLMProxies, andListLLMProxiesByProvider. Extract this into a shared helper.Example helper
func parsePagination(r *http.Request) (limit, offset int) { limit = getIntQueryParam(r, "limit", 20) offset = getIntQueryParam(r, "offset", 0) if limit <= 0 { limit = 20 } if limit > 100 { limit = 100 } if offset < 0 { offset = 0 } return limit, offset }Also applies to: 408-421, 688-701, 742-755
agent-manager-service/controllers/llm_deployment_controller.go (2)
59-69:resolveOrgUUIDproperly fixed — but note inconsistency withgateway_controller.go.This version correctly separates DB errors from "not found" errors. However, the equivalent
resolveOrgUUIDingateway_controller.go(lines 83-93) still maps all errors (including DB failures) toErrOrganizationNotFound, which the caller then returns as401 Unauthorized. Consider aligning both helpers.
177-257:UndeployLLMProviderDeployment— using query params for required parameters on a POST.
deploymentIdandgatewayIdare required for this operation but are read from query parameters rather than the request body or path. This works, but it's an unusual pattern for POST endpoints and diverges from howDeleteLLMProviderDeploymenttakesdeploymentIdas a path parameter. Consider whether consistency across endpoints would improve API ergonomics.agent-manager-service/services/platform_gateway_service.go (2)
156-163: Redundant UUID generate-then-parse.
uuid.New()returns auuid.UUIDdirectly. Stringifying it (line 157) only to parse it back (line 160) is unnecessary and can never fail.Proposed simplification
- gatewayID := uuid.New().String() - - // 5. Parse and create Gateway model - gatewayUUID, err := uuid.Parse(gatewayID) - if err != nil { - return nil, fmt.Errorf("invalid gateway UUID: %w", err) - } + gatewayUUID := uuid.New()Then use
gatewayUUIDdirectly (andgatewayUUID.String()where needed, e.g., foruuid.MustParseon line 207).
646-651: Regex compiled on every validation call.
regexp.MustCompileis called each timevalidateGatewayInputruns. Since the pattern is constant, compile it once at the package level.Proposed fix
+var gatewayNamePattern = regexp.MustCompile(`^[a-z0-9-]+$`) + func (s *PlatformGatewayService) validateGatewayInput(orgID, name, displayName, vhost, functionalityType string) error { // ... - namePattern := regexp.MustCompile(`^[a-z0-9-]+$`) - if !namePattern.MatchString(name) { + if !gatewayNamePattern.MatchString(name) {agent-manager-service/controllers/gateway_internal_controller.go (1)
55-109: Repeated API-key auth boilerplate across all 4 handlers.Each handler duplicates the same ~15-line authentication block (header extraction,
VerifyToken, error handling). Extracting this into an authentication middleware or a shared helper (returning(*models.PlatformGateway, error)) would reduce duplication and make it easier to update auth logic in one place.Also applies to: 112-176, 270-335
agent-manager-service/controllers/gateway_controller.go (1)
483-492:RevokeGatewayTokenis a stub returning 501.This is exposed in the public API interface. If this endpoint is reachable by clients, they'll get an unexpected "not implemented" error.
Would you like me to help implement the revocation logic using the existing
s.gatewayRepo.RevokeToken(tokenId)method, or open an issue to track this?agent-manager-service/main.go (1)
102-103:dbvariable shadows thedbpackage import.After
db := db.GetDB(), the importeddbpackage is no longer accessible by that name. Currently safe since the package isn't referenced after this line, but a future edit addingdb.DB(ctx)calls here would silently use the wrongdb. Consider renaming the local variable todatabaseordbConn.
There was a problem hiding this comment.
Actionable comments posted: 19
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🤖 Fix all issues with AI agents
In `@agent-manager-service/api/app.go`:
- Around line 63-91: The internal handler chain in MakeInternalHTTPHandler lacks
a global API-key auth layer; wrap the composed internalHandler with the API-key
auth middleware (use the existing params.AuthMiddleware function) so all routes
registered by RegisterGatewayInternalRoutes and RegisterWebSocketRoutes are
protected. Specifically, after building internalHandler (the variable named
internalHandler) and before applying logger/CORS/Recoverer or before mux.Handle,
call params.AuthMiddleware()(internalHandler) (or the equivalent API-key
middleware) to ensure a gatekeeper is applied to the internal routes.
In `@agent-manager-service/config/config_loader.go`:
- Around line 175-194: Add validation for the InternalServer block similar to
the existing HTTP server checks: create a new function
validateInternalServerConfigs(cfg *Config, r *configReader) that checks
InternalServer.Port is 1..65535, ReadTimeoutSeconds and WriteTimeoutSeconds are
> 0, and CertDir is non-empty (append errors to r.errors on failures), then call
validateInternalServerConfigs(&config, r) after the InternalServer assignment
and before the existing validateHTTPServerConfigs call so invalid internal
server values are caught during config loading.
In `@agent-manager-service/controllers/llm_controller.go`:
- Around line 96-105: The resolveProjectUUID function currently masks all errors
returned by ocClient.GetProject and always returns utils.ErrProjectNotFound;
modify resolveProjectUUID to distinguish not-found responses from other errors
by inspecting the error from c.ocClient.GetProject (or using a provided
sentinel/type from ocClient), returning utils.ErrProjectNotFound only when the
project truly does not exist and returning or wrapping other errors (e.g.,
fmt.Errorf("GetProject: %w", err)) for network/RPC/backend failures so callers
see 500-level errors; reference resolveProjectUUID and c.ocClient.GetProject and
ensure you preserve the existing nil-project check returning
utils.ErrProjectNotFound.
- Around line 84-93: The resolveOrgUUID function currently maps all errors from
orgRepo.GetOrganizationByName to utils.ErrOrganizationNotFound; change
resolveOrgUUID to return the original DB error (only return
utils.ErrOrganizationNotFound when the returned org is nil or when the repo
explicitly signals not-found), i.e., call orgRepo.GetOrganizationByName and if
err != nil return err, then if org == nil return utils.ErrOrganizationNotFound,
and keep returning org.UUID.String() on success; then update callers that invoke
resolveOrgUUID to distinguish errors using errors.Is(err,
utils.ErrOrganizationNotFound) → respond 404, otherwise log the err and respond
500 (use errors.Is for comparison).
In `@agent-manager-service/repositories/api_repository.go`:
- Around line 188-201: The SELECT in GetDeployedAPIsByGatewayUUID is missing
art.handle so the returned models.API.Handle will be empty; update the query in
APIRepo.GetDeployedAPIsByGatewayUUID to include "art.handle" in the Select list
(alongside a.uuid, art.name, etc.) so the Handle field is populated, keeping the
same Joins/Where/Order and Scan into apis.
- Around line 203-216: The SELECT in GetAPIsByGatewayUUID omits art.handle so
the API models' Handle field is not populated; update the query in
GetAPIsByGatewayUUID to include "art.handle" in the Select list (alongside
art.name, art.version, etc.) so the returned []*models.API contains the Handle
value, ensuring the column alias matches the struct field mapping used by the
scan.
In `@agent-manager-service/repositories/artifact_repository.go`:
- Around line 92-104: GetByHandle currently returns (nil, nil) when no record is
found which triggers the nilnil linter; change it to return a sentinel error
instead of nil on not-found. Define a package-level error like
ErrArtifactNotFound and modify ArtifactRepo.GetByHandle to return (nil,
ErrArtifactNotFound) when errors.Is(err, gorm.ErrRecordNotFound); update callers
to check with errors.Is(err, ErrArtifactNotFound) (or adapt callers if you
prefer the alternative approach mentioned in the comment).
In `@agent-manager-service/repositories/gateway_repository.go`:
- Around line 139-161: Both UpdateGateway and UpdateActiveStatus currently
return nil even when no row matches the UUID; change them to inspect the GORM
result: assign the update call to a variable (e.g., res :=
r.db.Model(&models.Gateway{}).Where("uuid = ?", gateway.UUID)...Updates(...)),
then if res.Error != nil return res.Error; if res.RowsAffected == 0 return
gorm.ErrRecordNotFound (or a custom "gateway not found" error) so callers can
distinguish "not found" from success; apply the same pattern to
UpdateActiveStatus (use gatewayId in the Where and check res.RowsAffected).
In `@agent-manager-service/repositories/llm_provider_template_repository.go`:
- Around line 149-157: List currently returns templates without decoding the
Configuration JSON into the parsed fields, so callers get nil
Metadata/PromptTokens/etc.; add a private helper (e.g., unmarshalConfig(template
*models.LLMProviderTemplate) error) that performs the JSON unmarshal of
template.Configuration into the struct fields and call it for each template in
LLMProviderTemplateRepo.List after Find, and refactor GetByHandle and GetByUUID
to reuse this helper so all three methods produce fully populated templates.
In `@agent-manager-service/repositories/organization_repository.go`:
- Around line 86-109: GetOrCreateOrganization has a race where two callers can
both see ErrRecordNotFound and call CreateOrganization, causing a duplicate-key
error; update OrganizationRepo.GetOrCreateOrganization to detect a
unique-constraint/duplicate-key error returned from CreateOrganization (inspect
the DB/GORM error code/message), and in that case retry
r.GetOrganizationByName(name) and return its result instead of propagating the
create error; keep the normal behavior for other errors from CreateOrganization.
Use the existing methods GetOrganizationByName and CreateOrganization for the
retry path.
In
`@agent-manager-service/resources/default-llm-provider-templates/azureopenai-template.yaml`:
- Around line 29-30: The template currently hardcodes external asset URLs
(logoUrl and openapiSpecUrl) to a personal GitHub repo for LLM provider
templates (azureopenai, openai, mistral, awsbedrock, anthropic, azureaifoundry);
update each template to point to official/organizational-hosted assets or
internal static hosting instead of raw.githubusercontent.com/nomadxd, replacing
the logoUrl and openapiSpecUrl values for the affected templates (e.g., entries
in azureopenai-template.yaml) with stable organizational URLs or local asset
paths, and verify the assets are accessible and tests/consumers load them
correctly.
In `@agent-manager-service/services/environment_service.go`:
- Around line 73-76: The code currently classifies not-found errors by
string-matching err.Error() (e.g., checking "404" or "not found") in
environment_service.go (occurring around the blocks that return
utils.ErrEnvironmentNotFound at lines shown); replace those fragile checks with
typed error comparison using errors.Is(err, utils.ErrEnvironmentNotFound) so the
function(s) returning environment info return the canonical
utils.ErrEnvironmentNotFound when appropriate (keep the existing return values
and logging, just change the conditional to use errors.Is). Ensure you import
the standard "errors" package if not already imported and update both
occurrences (the blocks around the earlier return and the later one near line
168).
In `@agent-manager-service/services/gateway_internal_service.go`:
- Around line 145-162: The controller expects the sentinel error
utils.ErrDeploymentNotActive, so replace the ad-hoc fmt.Errorf("deployment not
active") returns in GetActiveDeploymentByGateway (and apply the same change to
GetActiveLLMProviderDeploymentByGateway and all deployment-related returns in
CreateGatewayDeployment) with the sentinel error — either return
utils.ErrDeploymentNotActive directly or wrap it using fmt.Errorf("deployment
not active: %w", utils.ErrDeploymentNotActive) so errors.Is will match; also
ensure other repo errors are still wrapped (e.g., fmt.Errorf("failed to get
deployment: %w", err)) so original error context is preserved.
- Around line 328-353: generateAPIDeploymentYAML currently hardcodes
Spec.Context to "/api/v1"; update it to read the actual value from
api.Configuration.Context with a safe nil check and fallback. Specifically,
inside generateAPIDeploymentYAML compute a context string by checking if
api.Configuration != nil and api.Configuration.Context != nil and non-empty, use
*api.Configuration.Context, otherwise use "/api/v1", and assign that to
APIDeploymentSpec.Context so the generated YAML reflects the API's configured
context.
In `@agent-manager-service/services/llm_provider_service.go`:
- Around line 155-161: The call in LLMProviderService.Create uses
s.providerRepo.GetByUUID(handle, orgID) but "handle" is derived from
provider.Configuration.Name (a name/handle string), so either rename the
repository method to reflect a name lookup (e.g., GetByHandle or GetByName) or
change Create to fetch by the actual UUID returned when creating the provider;
update the code paths accordingly (LLMProviderService.Create,
providerRepo.GetByUUID) so the parameter type and method name match (use
provider.Configuration.Name with a GetByHandle/GetByName method, or capture the
created provider's UUID and call GetByUUID(uuid, orgID)).
In `@agent-manager-service/services/llm_proxy_service.go`:
- Around line 70-77: The call to s.providerRepo.GetByUUID in Create (and the
similar calls in Update and ListByProvider) currently wraps any error as "failed
to validate provider", which hides gorm.ErrRecordNotFound and makes the
subsequent nil-check unreachable; change the error handling in the Create path
(and mirror for Update and ListByProvider) to check if errors.Is(err,
gorm.ErrRecordNotFound) and return utils.ErrLLMProviderNotFound, otherwise
return a wrapped error (e.g., fmt.Errorf("failed to validate provider: %w",
err)); update the providerModel == nil guards accordingly so not-found is
handled via the specific ErrLLMProviderNotFound return.
- Around line 148-163: The Get method currently returns a wrapped
gorm.ErrRecordNotFound as a generic error and has an unreachable proxy == nil
check; update LLMProxyService.Get to detect errors.Is(err,
gorm.ErrRecordNotFound) after calling s.proxyRepo.GetByID and return
utils.ErrLLMProxyNotFound in that case (consistent with Delete and Update),
otherwise wrap and return other errors as before, and remove the redundant proxy
== nil branch.
In `@agent-manager-service/services/platform_gateway_service.go`:
- Around line 669-676: The validation is case-sensitive while storage lowercases
the value, causing a mismatch between functionalityType and downstream
expectations; fix by normalizing functionalityType before validation (e.g.,
create a normalized := strings.ToLower(functionalityType)), change validTypes to
use lowercase keys ("regular","ai","event") and validate against normalized,
then store the normalized value consistently (or alternatively remove
strings.ToLower at storage if you prefer preserving original casing); update
references to functionalityType (validation map, the strings.ToLower usage, and
any assignment that persists the value) so validation and storage use the same
canonical casing.
- Around line 586-601: The artifactType check inside the loop is loop-invariant
and wrongly filters out results when artifactType == "all"; move the filter
outside the loop in the function handling apis (reference symbols: artifactType,
apis, GatewayArtifact, allArtifacts) and treat "all" (and empty string) as no
filter: before iterating, if artifactType is non-empty and not "all" and not
"API" then short-circuit/return an empty allArtifacts slice; otherwise iterate
over apis and append every API as you already do (i.e., only allow the loop to
run when artifactType == "" or "all" or "API").
🟡 Minor comments (16)
agent-manager-service/repositories/organization_repository.go-100-100 (1)
100-100:⚠️ Potential issue | 🟡 MinorStale comment: UUID is generated in Go, not by the database.
Line 100 says "UUID is auto-generated by database" but the migration has no
DEFAULTon theuuidcolumn, anduuid.New()on line 97 generates it in the application. Remove or correct this comment.agent-manager-service/db_migrations/004_create_artifacts_and_apis.go-46-57 (1)
46-57:⚠️ Potential issue | 🟡 Minor
project_uuidinrest_apishas no foreign key constraint andlifecycle_statushas no CHECK constraint.
project_uuidisNOT NULLbut lacks a FK reference — if projects are stored in this DB, a FK would enforce referential integrity. If projects are external, a comment explaining this would help.lifecycle_statusdefaults to'CREATED'but unlikedeployment_status.status, there's noCHECKconstraint to restrict valid values.agent-manager-service/db_migrations/004_create_artifacts_and_apis.go-36-37 (1)
36-37:⚠️ Potential issue | 🟡 MinorUse
TIMESTAMPTZinstead ofTIMESTAMPfor time columns.All timestamp columns (
created_at,updated_at) across the five tables useTIMESTAMP(without time zone). PostgreSQL best practice is to useTIMESTAMPTZto avoid ambiguity when the server or application changes time zones. This applies toartifacts(Lines 36-37),deployments(Line 69),deployment_status(Line 88), andassociation_mappings(Lines 109-110).Also applies to: 69-69, 88-88, 109-110
agent-manager-service/services/environment_service.go-88-91 (1)
88-91:⚠️ Potential issue | 🟡 MinorAdd a comment explaining why
UpdatedAtis set toCreatedAt.The OpenChoreo
EnvironmentResponsestruct has noUpdatedAtfield, so the code currently setsUpdatedAttoCreatedAtsilently and without explanation. This occurs at both lines 88–91 and line 139. Since the code already comments missing fields (e.g.,Description), add a similar comment here for consistency and clarity, such as:// OpenChoreo EnvironmentResponse doesn't have UpdatedAt.agent-manager-service/server/internal_server.go-64-64 (1)
64-64:⚠️ Potential issue | 🟡 MinorMalformed structured log call — positional args without keys.
slog.Infoexpects alternating key-value pairs after the message. Here,certPathandkeyPathare passed as bare values, producing garbled output (or runtime warnings in newer slog versions).Proposed fix
- slog.Info("initializing with certs", certPath, keyPath) + slog.Info("Initializing with certs", "certPath", certPath, "keyPath", keyPath)agent-manager-service/controllers/llm_deployment_controller.go-398-401 (1)
398-401:⚠️ Potential issue | 🟡 MinorHTTP 204 No Content should not include a response body.
WriteSuccessResponsewithhttp.StatusNoContentandstruct{}{}will attempt to write a JSON body ({}). Per HTTP semantics, a 204 response must not contain a body. Usew.WriteHeader(http.StatusNoContent)directly instead.Proposed fix
- utils.WriteSuccessResponse(w, http.StatusNoContent, struct{}{}) + w.WriteHeader(http.StatusNoContent)agent-manager-service/main.go-164-175 (1)
164-175:⚠️ Potential issue | 🟡 Minor
os.Exit(1)inside a goroutine can race with main server startup.If the internal server fails to bind (e.g., port conflict),
os.Exit(1)on Line 173 will terminate the process from within a goroutine, bypassing deferred cleanup and potentially racing with the main server'sListenAndServeon Line 179. Consider using a channel orerrgroupto propagate the error and letmainperform a coordinated exit.Sketch using an error channel
+ internalErrCh := make(chan error, 1) go func() { slog.Info("Internal HTTPS server is running", ...) if err := internalServer.Start(); err != nil && !errors.Is(err, http.ErrServerClosed) { - slog.Error("Failed to start internal server", "error", err) - os.Exit(1) + internalErrCh <- err + return } + internalErrCh <- nil }()Then in
main, select on the channel alongsideListenAndServeto handle the error gracefully.agent-manager-service/repositories/llm_provider_repository.go-94-95 (1)
94-95:⚠️ Potential issue | 🟡 MinorStale comment: says "GetByID" but method is
GetByUUID.-// GetByID retrieves an LLM provider by ID (handle) +// GetByUUID retrieves an LLM provider by UUIDagent-manager-service/resources/default-llm-provider-templates/anthropic-template.yaml-30-31 (1)
30-31:⚠️ Potential issue | 🟡 MinorExternal URLs reference a personal GitHub fork — consider using an official or organization-controlled source.
Both
logoUrlandopenapiSpecUrlpoint tohttps://raw.githubusercontent.com/nomadxd/openapi-connectors/.... If this personal repository is deleted, renamed, or restructured, these URLs will break at runtime. Consider hosting these assets under thewso2organization or another stable, organization-controlled location.This same concern applies to all other provider template files (OpenAI, AWS Bedrock, Azure OpenAI, Azure AI Foundry, and Mistral) referencing the same repository.
agent-manager-service/models/rest_api.go-60-65 (1)
60-65:⚠️ Potential issue | 🟡 MinorRemove unused
ChannelandChannelRequesttypes.
RestAPIConfighas anOperationsfield but noChannelsfield. TheChannelandChannelRequesttypes are not referenced anywhere in the codebase and constitute dead code that should be removed.agent-manager-service/models/rest_api.go-89-97 (1)
89-97:⚠️ Potential issue | 🟡 MinorRemove
dbtags or replace withgormtags for consistency.
APIMetadatausesdb:"..."tags, which GORM does not recognize for column mapping. The code currently works becauseGetAPIMetadataByHandle()uses an explicitSelect()that relies on positional mapping rather than tags. This is fragile—any reordering of theSelect()columns or struct fields will break the mapping silently. For consistency withRestAPI(which usesgorm:"column:..."in the same file) and to make column mapping explicit and maintainable, either replacedb:""withgorm:"column:..."or document whydb:""is used if it serves a specific purpose.agent-manager-service/services/gateway_internal_service.go-248-250 (1)
248-250:⚠️ Potential issue | 🟡 Minor
CreatedByis hardcoded to"admin"— should derive from the authenticated gateway or caller context.When creating a new API from a gateway deployment notification, the
CreatedByfield is hardcoded to"admin". This loses provenance information. Consider propagating the gateway identity or a system-level identifier.agent-manager-service/repositories/gateway_repository.go-93-104 (1)
93-104:⚠️ Potential issue | 🟡 MinorStatic analysis:
GetByNameAndOrgIDreturns(nil, nil)— use a sentinel error instead.The
nilnillinter flags this pattern. Returning(nil, nil)forces callers to check both the error and the result for nil, which is error-prone. Consider returning a sentinel error likegorm.ErrRecordNotFound(or a domain-specific one) instead.Proposed fix
func (r *GatewayRepo) GetByNameAndOrgID(name, orgID string) (*models.Gateway, error) { var gateway models.Gateway err := r.db.Where("name = ? AND organization_uuid = ?", name, orgID).First(&gateway).Error if err != nil { if errors.Is(err, gorm.ErrRecordNotFound) { - return nil, nil + return nil, gorm.ErrRecordNotFound } return nil, err } return &gateway, nil }agent-manager-service/controllers/llm_controller.go-319-319 (1)
319-319:⚠️ Potential issue | 🟡 MinorSending a body with HTTP 204 No Content.
WriteSuccessResponse(w, http.StatusNoContent, struct{}{})will attempt to serializestruct{}{}as JSON and write it to the response. Per HTTP semantics, 204 responses should not include a body. The same pattern appears on line 598 and line 930.Proposed fix
- utils.WriteSuccessResponse(w, http.StatusNoContent, struct{}{}) + w.WriteHeader(http.StatusNoContent)agent-manager-service/controllers/gateway_controller.go-97-97 (1)
97-97:⚠️ Potential issue | 🟡 MinorIncorrect
slog.Infousage — will produce malformed structured log entries.
slog.Info("organization", org.UUID.String(), org.Name)usesslogwith positional args whereorg.UUID.String()becomes a log key andorg.Namebecomes the value. This appears to be a debug leftover. It should either use the contextual logger or be removed.Proposed fix
- slog.Info("organization", org.UUID.String(), org.Name)If needed for debugging, use structured logging consistently:
// Use wherever a logger from context is available: // log.Info("resolveOrgUUID: resolved", "orgUUID", org.UUID.String(), "orgName", org.Name)agent-manager-service/controllers/gateway_controller.go-644-656 (1)
644-656:⚠️ Potential issue | 🟡 Minor
UpdatedAtis being incorrectly assigned toCreatedAt— needs alternative source.Line 654:
UpdatedAt: ocEnv.CreatedAtis incorrect. However,ocEnv(OpenChoreoEnvironmentResponse) does not have anUpdatedAtfield to fall back to. SinceGatewayEnvironmentResponserequires this field, determine the appropriate source forUpdatedAt— either track it separately in the mapping, use a default value, or adjust the response model.
🧹 Nitpick comments (34)
agent-manager-service/controllers/websocket_controller.go (3)
196-206:readLoopbreaks theTransportabstraction by type-asserting to*ws.WebSocketTransport.The
Transportinterface was designed to allow swapping protocol implementations, butreadLoophard-casts to the concrete type to callReadMessage(). This defeats the abstraction and will break if a different transport is used (e.g., in tests).Consider adding a
ReadMessage(or equivalent blocking-read) method to theTransportinterface, or restructuring so the read loop doesn't need to know the concrete type.
49-55: DuplicateConnectionAckDTO— another definition exists inmodels/gateway_internal_dto.go.The
modelspackage already defines aConnectionAckDTO(with slightly different fields:Messageinstead ofConnectionID/Timestamp). Having two types with the same name in different packages creates confusion. Consider consolidating into a single definition inmodels.#!/bin/bash # Verify the duplicate ConnectionAckDTO rg -n 'ConnectionAckDTO' --type go
128-134: Misleading variable name and discarded error fromjson.Marshal.Line 130:
jsonErris actually the marshalled JSON bytes, not an error. The marshal error (second return value) is silently discarded with_. If marshalling fails,jsonErrisniland the send is skipped — functionally safe, but the naming is confusing.Proposed fix
- if jsonErr, _ := json.Marshal(errorMsg); jsonErr != nil { - if err := conn.WriteMessage(websocket.TextMessage, jsonErr); err != nil { + if msgBytes, marshalErr := json.Marshal(errorMsg); marshalErr == nil { + if err := conn.WriteMessage(websocket.TextMessage, msgBytes); err != nil { log.Error("Failed to send error message", "gatewayID", gateway.UUID.String(), "error", err) } }agent-manager-service/repositories/organization_repository.go (1)
59-69: Redundantifbranch — both paths return(nil, err).The
errors.Is(err, gorm.ErrRecordNotFound)check doesn't change the return value. Same pattern inGetOrganizationByUUID. This is harmless but unnecessary.Simplified version
func (r *OrganizationRepo) GetOrganizationByName(name string) (*models.Organization, error) { var org models.Organization err := r.db.Where("name = ?", name).First(&org).Error if err != nil { - if errors.Is(err, gorm.ErrRecordNotFound) { - return nil, err - } return nil, err } return &org, nil }agent-manager-service/services/infra_resource_manager.go (1)
67-83: On-demand sync on every list call may add latency.Each
ListOrganizationsinvocation now issues one DB round-trip (GetOrCreateOrganization) per organization returned by OpenChoreo. If the org list grows or calls are frequent, this could noticeably increase response times. Consider caching already-synced org names (e.g., in-memory set with a short TTL) or performing the sync asynchronously/in bulk.agent-manager-service/models/llm_provider.go (1)
156-167: Inconsistentyamlstruct tags onSecurityConfigandAPIKeySecurity.These two types carry both
jsonandyamlstruct tags, while every other type in this file uses onlyjsontags. If YAML serialization isn't needed at the model layer, remove theyamltags for consistency; if it is needed, apply them uniformly.agent-manager-service/services/llm_deployment_service.go (2)
88-232:DeployLLMProvider— consider a more targeted broadcast-failure log level and verify event consistency.The broadcast failure at Lines 222-229 is silently swallowed (only logged). This is acceptable for resilience, but consider whether a retry or at-least-once delivery mechanism is needed. If the gateway never learns about the deployment, the system state becomes inconsistent (deployment recorded in DB but gateway unaware).
This is a design-level observation — the current "fire and forget" approach is fine for MVP but should be revisited for production reliability.
89-103: Input validation could return a single aggregated error instead of short-circuiting.Currently, each missing field returns immediately. Callers making multiple mistakes must fix-and-retry one at a time. Minor UX concern, not blocking.
agent-manager-service/api/app.go (1)
86-86: CORS middleware on internal server may be unnecessary.The internal server is designed for server-to-server communication (WebSocket gateway connections, internal APIs) over TLS with API-key auth. CORS is a browser-enforced mechanism and adds no security value for non-browser clients. Consider whether this middleware is needed here.
agent-manager-service/services/environment_service.go (2)
59-62: No-opCreateEnvironmentreturns a genericfmt.Errorf— consider returning a typed/sentinel error.
UpdateEnvironmentandDeleteEnvironmentfollow the same pattern. Returning plainfmt.Errorfstrings makes it harder for callers (controllers) to distinguish "not supported" from actual failures and return appropriate HTTP status codes (e.g., 405 vs 500).
196-204: Silently skipping missing gateways may hide data integrity issues.When
GetByUUIDfails or returns nil, the code logs a warning andcontinues. This is a reasonable fallback, but if gateway-environment mappings reference non-existent gateways, it could indicate stale data. Consider whether returning a partial result without any signal to the caller is acceptable.agent-manager-service/db_migrations/004_create_artifacts_and_apis.go (2)
121-125:idx_deployments_artifact_gatewayis redundant.The index on
(artifact_uuid, gateway_uuid)at Line 123 is a prefix of the index on(artifact_uuid, gateway_uuid, created_at)at Line 124. PostgreSQL can use the latter index for queries that only filter on(artifact_uuid, gateway_uuid), making the former redundant. Consider removing Line 123 to reduce write overhead.Proposed fix
-- Indexes for performance CREATE INDEX idx_rest_apis_project ON rest_apis(project_uuid); CREATE INDEX idx_artifacts_org ON artifacts(organization_uuid); - CREATE INDEX idx_deployments_artifact_gateway ON deployments(artifact_uuid, gateway_uuid); CREATE INDEX idx_deployments_created_at ON deployments(artifact_uuid, gateway_uuid, created_at);
52-53: Inconsistent JSON storage strategy:JSONBvsTEXT.
configuration(Line 53) usesJSONB, whiletransport(Line 52) andmetadata(Line 68) store JSON asVARCHAR(255)/TEXT. If these are always JSON, usingJSONBconsistently would enable validation and query capabilities. If they're intentionally plain text, consider adding comments explaining why.Also applies to: 67-68
agent-manager-service/config/config.go (1)
205-223: Consider documenting default values as struct tags or constants.The comments reference defaults (port 9243, cert dir
./data/certs, max connections 1000, etc.) but they're only in comments. If defaults are applied in the config loader, this is fine — but co-locating defaults as constants near these types would improve discoverability.agent-manager-service/models/upstream.go (1)
32-37: Defensive measure: Consider maskingUpstreamAuth.Valuefield in API responses.If
UpstreamAuth(or its parent modelsRestAPI/RestAPIConfig) is ever returned in JSON API responses in the future, theValuefield would be exposed as-is since it currently hasjson:"value,omitempty". To prevent accidental secret leakage, consider addingjson:"-"to theValuefield or implementing a custom marshaler that omits this field.Currently, these models are only used internally for YAML generation and gateway deployment, but this change ensures protection if the data model is later exposed through API endpoints.
agent-manager-service/repositories/artifact_repository.go (1)
68-78:Updatesilently succeeds when no rows match.Unlike
Delete(which checksRowsAffected == 0),Updatedoesn't verify that a row was actually modified. If the artifact UUID + org UUID combination doesn't exist, GORM returns no error butRowsAffectedwill be 0. This could mask bugs where callers assume the update took effect.Proposed fix
func (r *ArtifactRepo) Update(tx *gorm.DB, artifact *models.Artifact) error { artifact.UpdatedAt = time.Now() - return tx.Model(&models.Artifact{}). + result := tx.Model(&models.Artifact{}). Where("uuid = ? AND organization_uuid = ?", artifact.UUID, artifact.OrganizationUUID). Updates(map[string]interface{}{ "name": artifact.Name, "version": artifact.Version, "updated_at": artifact.UpdatedAt, - }).Error + }) + if result.Error != nil { + return result.Error + } + if result.RowsAffected == 0 { + return gorm.ErrRecordNotFound + } + return nil }agent-manager-service/server/internal_server.go (2)
68-93: Loaded certificate is not checked for expiry.When reusing an existing certificate from disk (Line 71), there's no validation that it hasn't expired. After the 1-year validity period, the server will silently start with an expired cert, causing TLS handshake failures for clients.
Sketch: add expiry check after loading
loadedCert, err := tls.LoadX509KeyPair(certPath, keyPath) if err != nil { slog.Warn("Failed to load existing certificates", "error", err) } else { - slog.Info("Using existing certificates", "certDir", s.cfg.CertDir) - cert = loadedCert + // Verify the certificate hasn't expired + if len(loadedCert.Certificate) > 0 { + x509Cert, parseErr := x509.ParseCertificate(loadedCert.Certificate[0]) + if parseErr == nil && time.Now().Before(x509Cert.NotAfter) { + slog.Info("Using existing certificates", "certDir", s.cfg.CertDir, "expiresAt", x509Cert.NotAfter) + cert = loadedCert + } else { + slog.Warn("Existing certificate is expired or invalid, will regenerate", "certDir", s.cfg.CertDir) + } + } }
136-152: Static serial number1— consider using a random serial.All generated certificates get
SerialNumber: big.NewInt(1). If a cert is regenerated (e.g., after manual deletion), clients that cached the old cert may get confused by the same serial. RFC 5280 recommends unique serial numbers.Proposed fix
+ serialNumber, err := rand.Int(rand.Reader, new(big.Int).Lsh(big.NewInt(1), 128)) + if err != nil { + return tls.Certificate{}, fmt.Errorf("failed to generate serial number: %w", err) + } + template := x509.Certificate{ - SerialNumber: big.NewInt(1), + SerialNumber: serialNumber,agent-manager-service/controllers/llm_deployment_controller.go (1)
177-257: Consistent handler patterns across all deployment operations.All six handlers follow the same lifecycle: context/logger setup → org resolution → input validation → service delegation → error mapping. The error handling is thorough, covering domain-specific errors with appropriate HTTP status codes.
One minor observation: the verbose
Infologging at each step (e.g., "organization resolved", "calling service layer") could be noisy under load. Consider reducing some of these toDebuglevel in a follow-up.Also applies to: 259-338, 340-401, 403-460, 462-528
agent-manager-service/repositories/llm_provider_template_repository.go (3)
93-98: RedundantErrRecordNotFoundcheck — both branches return the same value.Lines 94–97 check for
gorm.ErrRecordNotFoundbut both theifandelsebranches return(nil, err). This makes the check dead code. If the intent is to returnnil, nilfor not-found (asGetByUUIDimplied in past iterations), update accordingly; otherwise, simplify.Simplify
if err != nil { - if errors.Is(err, gorm.ErrRecordNotFound) { - return nil, err - } return nil, err }
124-129: Same redundant pattern inGetByUUID.The
ErrRecordNotFoundcheck at lines 125–128 is identical dead code as noted inGetByHandle.
88-117: Consider extracting a sharedunmarshalConfighelper to reduce duplication acrossGetByHandle,GetByUUID, and (once fixed)List.The same 8-line configuration-unmarshal block is repeated in
GetByHandle(lines 102–113) andGetByUUID(lines 131–142), and will need to appear inListas well. Extracting it reduces the maintenance surface.Example helper
func unmarshalTemplateConfig(t *models.LLMProviderTemplate) error { if t.Configuration == "" { return nil } var cfg llmProviderTemplateConfig if err := json.Unmarshal([]byte(t.Configuration), &cfg); err != nil { return err } t.Metadata = cfg.Metadata t.PromptTokens = cfg.PromptTokens t.CompletionTokens = cfg.CompletionTokens t.TotalTokens = cfg.TotalTokens t.RemainingTokens = cfg.RemainingTokens t.RequestModel = cfg.RequestModel t.ResponseModel = cfg.ResponseModel return nil }Also applies to: 119-146
agent-manager-service/repositories/llm_provider_repository.go (1)
57-92: Excessiveslog.Infologging throughout — demote toslog.Debugfor routine operations.Nearly every method logs multiple
slog.Infocalls for start, intermediate steps (resolving handles, inserting records), and completion. In production, this will generate a very high volume of logs, especially for frequently called methods likeGetByUUID,List, andExists. Reserveslog.Infofor significant lifecycle events and useslog.Debugfor step-by-step tracing.Also applies to: 94-115, 117-137, 144-206, 208-249, 251-263
agent-manager-service/models/gateway.go (1)
64-74: Use constants for token status values instead of magic strings.
"active"and"revoked"are used as magic strings inIsActive()andRevoke(). Defining package-level constants prevents typos and makes it easier to reference these values consistently across the codebase (e.g., in queries, validation, or tests).Proposed fix
+const ( + TokenStatusActive = "active" + TokenStatusRevoked = "revoked" +) + // IsActive returns true if token status is active func (t *GatewayToken) IsActive() bool { - return t.Status == "active" + return t.Status == TokenStatusActive } // Revoke marks the token as revoked with current timestamp func (t *GatewayToken) Revoke() { now := time.Now() - t.Status = "revoked" + t.Status = TokenStatusRevoked t.RevokedAt = &now }agent-manager-service/models/rest_api.go (1)
82-87: ReconsiderParamsas pointer-to-map—it's non-idiomatic but may be intentional for JSON marshaling.Maps in Go are reference types; a pointer-to-map adds unnecessary indirection. However, this pattern allows
omitemptyto distinguish between nil (omitted) and empty maps. If the JSON API contract requires omitting the field entirely when no params exist, the pointer serves a purpose. If empty{}is acceptable in JSON, this can be simplified tomap[string]interface{}and theomitemptywill work correctly. Verify the intended JSON serialization behavior before refactoring.agent-manager-service/controllers/gateway_internal_controller.go (1)
56-109: Consider extracting the repeated API-key authentication into middleware or a helper.All four handlers repeat the same ~15-line block: extract
api-keyheader →VerifyToken→ error handling. A shared helper (or middleware that injects the authenticatedgatewayinto the request context) would reduce duplication and ensure consistent auth behavior.agent-manager-service/services/llm_provider_template_service.go (1)
208-272: Serialize/deserialize helpers mirror the model struct — consider using the struct directly to stay in sync.If new fields are added to
LLMProviderTemplate, they must be added in three places: the model,serializeConfiguration, anddeserializeConfiguration. You could serialize/deserialize a dedicated config struct type that is shared between both methods, reducing the chance of field drift.agent-manager-service/services/gateway_internal_service.go (1)
315-315: Hard-coded deployment limit of 100 — extract as a named constant or configuration value.The magic number
100is embedded inline. This should be a named constant or, better, a configurable parameter injected into the service.agent-manager-service/services/llm_provider_service.go (1)
63-174: Excessiveslog.Infocalls — most of these should beDebuglevel.There are 12+
slog.Infocalls in theCreatemethod alone, logging every internal step (e.g., "extracted configuration", "validating template", "checking if provider exists", "set default values", "serializing model providers"). In production, this will create significant log noise. ReserveInfofor meaningful business events (creation success/failure) and useDebugfor flow tracing.The same pattern applies to
List,Get,Update, andDelete.agent-manager-service/controllers/llm_controller.go (2)
165-177: Pagination validation logic is duplicated across six handlers.The same limit/offset clamping block (cap to 100, floor at 0/20) is repeated in
ListLLMProviderTemplates,ListLLMProviders,ListLLMProxies,ListLLMProxiesByProvider, etc. Consider extracting a small helper.Example helper
func sanitizePagination(r *http.Request) (limit, offset int) { limit = getIntQueryParam(r, "limit", 20) offset = getIntQueryParam(r, "offset", 0) if limit <= 0 { limit = 20 } if limit > 100 { limit = 100 } if offset < 0 { offset = 0 } return }Also applies to: 409-421, 689-701, 743-755
631-638: PassprojectUUIDdirectly instead oforgIDto match the parameter's semantic meaning.The current code passes
orgID(a valid UUID) toConvertSpecToModelLLMProxy, which expects aprojectIDparameter, then immediately overwritesproxy.ProjectUUIDwith the parsedprojectUUID. This works becauseorgIDis a valid UUID string fromresolveOrgUUID(), but it's semantically misleading—the parameter name doesn't match what's being passed.The
UpdateLLMProxyhandler already uses the correct pattern by passingprojectUUIDdirectly. Apply the same approach here to eliminate the unnecessary overwrite and clarify intent.Proposed change
- proxy := utils.ConvertSpecToModelLLMProxy(&req, orgID) - proxy.ProjectUUID, err = utils.ParseUUID(projectUUID) - if err != nil { - log.Error("CreateLLMProxy: invalid project UUID", "projectUUID", projectUUID, "error", err) - utils.WriteErrorResponse(w, http.StatusInternalServerError, "Invalid project UUID") - return - } + proxy := utils.ConvertSpecToModelLLMProxy(&req, projectUUID)agent-manager-service/controllers/gateway_controller.go (1)
234-250: Client-side pagination after fetching all gateways is inefficient.
ListGatewayscallsc.gatewayService.ListGateways(&orgID)which retrieves all gateways, then applies offset/limit in memory (lines 240–250). Additionally,getGatewayEnvironmentsFromDBis called per-gateway (line 236), making N+1 calls to the DB and OpenChoreo for every list request. As the number of gateways grows, this will degrade significantly.agent-manager-service/services/platform_gateway_service.go (2)
147-154: Redundant UUID round-trip:uuid.New()→.String()→uuid.Parse().
uuid.New()already returns a validuuid.UUID. Converting to string and parsing back is unnecessary overhead.Proposed fix
- // 4. Generate UUID for gateway - gatewayID := uuid.New().String() - - // 5. Parse and create Gateway model - gatewayUUID, err := uuid.Parse(gatewayID) - if err != nil { - return nil, fmt.Errorf("invalid gateway UUID: %w", err) - } + // 4. Generate UUID for gateway + gatewayUUID := uuid.New()Then use
gatewayUUIDdirectly andgatewayUUID.String()where the string form is needed (e.g., line 198).
638-642: Regex compiled on every validation call — compile once at package level.
regexp.MustCompileinsidevalidateGatewayInputruns on every request. This should be a package-level variable.Proposed fix
+var gatewayNamePattern = regexp.MustCompile(`^[a-z0-9-]+$`) + func (s *PlatformGatewayService) validateGatewayInput(orgID, name, displayName, vhost, functionalityType string) error { // ... - namePattern := regexp.MustCompile(`^[a-z0-9-]+$`) - if !namePattern.MatchString(name) { + if !gatewayNamePattern.MatchString(name) {
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@agent-manager-service/repositories/api_repository.go`:
- Around line 296-311: The CheckAPIExistsByNameAndVersionInOrganization function
currently queries models.Artifact by name/version/orgUUID but omits scoping by
kind; update the query in CheckAPIExistsByNameAndVersionInOrganization to
include the same kind filter used by CheckAPIExistsByHandleInOrganization (e.g.,
Where("kind = ?", KindRestAPI) or the appropriate KindRestAPI constant) before
applying the optional excludeHandle filter so only REST API artifacts are
considered and WebSubAPI/other kinds do not cause false-positive collisions.
In `@agent-manager-service/repositories/artifact_repository.go`:
- Around line 70-79: The Update method on ArtifactRepo silently succeeds when no
artifact matches; modify ArtifactRepo.Update to capture the result of the
Updates call (e.g., res :=
tx.Model(&models.Artifact{}).Where(...).Updates(...)), return res.Error if
non-nil, and if res.RowsAffected == 0 return an error (use
gorm.ErrRecordNotFound or a domain-specific NotFound error) so callers like
APIRepo.UpdateAPI can detect the missing artifact and handle the transaction
accordingly.
In `@agent-manager-service/repositories/gateway_repository.go`:
- Around line 206-215: RevokeToken in GatewayRepo updates status but doesn't
check whether any row was affected, so calls for non-existent tokens silently
succeed; modify RevokeToken (in GatewayRepo, operating on models.GatewayToken)
to capture the result of the GORM Updates call, check result.RowsAffected and
return a not-found or explicit error when RowsAffected == 0 (similar to
UpdateGateway/UpdateActiveStatus behavior), propagating any underlying DB error
otherwise.
In `@agent-manager-service/repositories/llm_provider_template_repository.go`:
- Around line 179-198: The Update method on LLMProviderTemplateRepo currently
writes t.Configuration directly, which can persist empty/stale JSON when callers
set parsed fields (e.g., t.Metadata, t.PromptTokens) — add the same marshaling
step used in Create: call serializeConfiguration (or inline the marshaling of
Metadata, PromptTokens, etc.) to produce a JSON string and assign it to
t.Configuration before running
r.db.Model(&models.LLMProviderTemplate{}).Where(...).Updates(...); ensure you
handle and return any marshal error and still update updated_at, RowsAffected,
and result.Error as before.
In `@agent-manager-service/services/llm_provider_service.go`:
- Around line 94-114: The template and provider existence checks
(s.templateRepo.Exists and s.providerRepo.Exists) are performed outside the
transaction in LLMProviderService.Create, creating a TOCTOU race; move these
validations into the same transactional scope used for creation (or perform the
creation inside the transaction and handle the DB unique constraint error), so
that template validation and provider uniqueness checks execute within the
transaction that calls providerRepo.Create, and map any unique-constraint DB
error to utils.ErrLLMProviderExists to preserve the clean error path.
- Around line 240-247: The controller is calling LLMProviderService.Update and
Delete with arguments swapped (passing orgID then providerID) while the service
signatures are Update(providerID, orgID string, ...) and Delete(providerID,
orgID string); fix the controller call sites that invoke
c.providerService.Update(...) and c.providerService.Delete(...) so they pass
providerID first and orgID second (i.e., providerID, orgID, provider for Update
and providerID, orgID for Delete) to match the service method signatures and
ensure correct lookups.
🧹 Nitpick comments (11)
agent-manager-service/services/gateway_internal_service.go (2)
316-316: Extract magic number100into a named constant.The hard deployment limit is buried in the call. A package-level constant (e.g.,
maxDeploymentsPerArtifact) would be easier to find, tune, and document.Proposed fix
+const maxDeploymentsPerArtifact = 100 + // CreateGatewayDeployment handles the registration of an API deployment from a gateway func (s *GatewayInternalAPIService) CreateGatewayDeployment(- err = s.deploymentRepo.CreateWithLimitEnforcement(deployment, 100) // Hard limit + err = s.deploymentRepo.CreateWithLimitEnforcement(deployment, maxDeploymentsPerArtifact)
335-339: Variablecontextshadows the importedcontextpackage.Line 336 declares a local variable
contextwhich shadows thecontextpackage imported on line 20. While the shadow is scoped to this function and the package is not currently used withingenerateAPIDeploymentYAML, this is poor practice and could cause confusion or latent bugs if the function is extended to usecontext.Contextin the future.Rename the variable to avoid shadowing, e.g.,
apiContext.Proposed fix
- context := "/api/v1" // Default context + apiContext := "/api/v1" // Default context if api.Configuration.Context != nil && *api.Configuration.Context != "" { - context = *api.Configuration.Context + apiContext = *api.Configuration.Context }Then use
apiContexton line 350:- Context: context, + Context: apiContext,(Note:
api.Configurationis a struct value, not a pointer, so nil-checking it is unnecessary.)agent-manager-service/repositories/organization_repository.go (1)
60-83: Redundant error branches inGetOrganizationByNameandGetOrganizationByUUID.Both methods have an
if errors.Is(err, gorm.ErrRecordNotFound)branch that returns the same(nil, err)as the outer branch. TheErrRecordNotFoundcheck is unnecessary unless you plan to map it to a different sentinel error.Simplified example (apply to both methods)
func (r *OrganizationRepo) GetOrganizationByName(name string) (*models.Organization, error) { var org models.Organization err := r.db.Where("name = ?", name).First(&org).Error if err != nil { - if errors.Is(err, gorm.ErrRecordNotFound) { - return nil, err - } return nil, err } return &org, nil }agent-manager-service/repositories/gateway_repository.go (1)
187-188: Token status values are hardcoded magic strings throughout the file.
"active"(lines 187, 221) and"revoked"(line 212) are used as raw strings. If these drift from the database or model expectations, queries will silently return wrong results. Consider extracting constants (e.g.,TokenStatusActive,TokenStatusRevoked) in the models package.Also applies to: 207-212, 221-221
agent-manager-service/services/environment_service.go (2)
181-188: Directdb.DB(ctx)call bypasses the repository layer forGatewayEnvironmentMappingqueries.All other data access in this PR goes through repository interfaces, but this method uses raw GORM via
db.DB(ctx). This creates an inconsistency: the mapping query isn't mockable for unit tests and doesn't benefit from any repository-level encapsulation.Consider adding a method like
GetMappingsByEnvironmentUUIDto theGatewayRepository(or a dedicated mapping repository) and injecting it intoEnvironmentService.
196-204: Thegateway == nilcheck on line 201 is unreachable.
GatewayRepo.GetByUUID(gateway_repository.go, line 73–83) returns(nil, err)when not found — it never returns(nil, nil). Theerr != nilcheck on line 197 already handles this case viacontinueon line 199, making the nil check on line 201 dead code.Not harmful, but worth noting for clarity.
agent-manager-service/repositories/llm_provider_template_repository.go (1)
110-127:GetByHandlereturnsgorm.ErrRecordNotFoundon not-found — inconsistent with other repos returningnil, nil.Lines 116-117 return the raw GORM sentinel error when the record is not found. The previous iteration of
GetByUUID(and many GORM repos in this codebase) returnnil, nilfor not-found, letting the service layer decide the error. Here both branches return the same error, making theif errors.Is(...)check redundant. Consider returningnil, nilfor not-found to matchGetByUUID's pattern, or at least collapse the duplicate branches.Option A: return nil, nil for not-found (preferred, matches GetByUUID)
func (r *LLMProviderTemplateRepo) GetByHandle(templateHandle, orgUUID string) (*models.LLMProviderTemplate, error) { var template models.LLMProviderTemplate err := r.db.Where("handle = ? AND organization_uuid = ?", templateHandle, orgUUID). First(&template).Error if err != nil { if errors.Is(err, gorm.ErrRecordNotFound) { - return nil, err + return nil, nil } return nil, err }agent-manager-service/controllers/llm_controller.go (3)
181-193: Pagination validation is duplicated across all List handlers.The same limit/offset clamping logic is repeated four times. Consider extracting a helper like
parsePagination(r *http.Request) (limit, offset int).Example helper
func parsePagination(r *http.Request) (limit, offset int) { limit = getIntQueryParam(r, "limit", 20) offset = getIntQueryParam(r, "offset", 0) if limit <= 0 { limit = 20 } if limit > 100 { limit = 100 } if offset < 0 { offset = 0 } return limit, offset }Also applies to: 450-462, 765-777, 824-836
59-65: Service fields are concrete types, not interfaces.
templateService,providerService, andproxyServiceare stored as*services.LLMProviderTemplateService,*services.LLMProviderService, and*services.LLMProxyService(concrete pointer types). This couples the controller to concrete implementations and makes unit testing harder. Consider defining service interfaces (or accepting existing ones) for better testability.
360-415: Excessiveslog.Infocalls in Provider handlers.
CreateLLMProvideralone has 7 Info-level log lines for a single successful request flow (lines 360, 373, 381, 388, 415 plus similar in other handlers). This will generate significant log volume in production. Consider downgrading most of these toDebugand keeping only entry/exit points atInfo.agent-manager-service/services/llm_provider_service.go (1)
63-174: Excessiveslog.Infologging — 15+ Info calls for a singleCreateflow.The
Createmethod has ~15slog.Infostatements for normal operation. This pattern is repeated across all methods. In production with moderate traffic, this will generate massive log volume. Consider usingslog.Debugfor intermediate steps and reservingslog.Infofor entry/exit/significant events only.
30a5e88 to
525b5cd
Compare
| -- Foreign key constraints | ||
| CONSTRAINT fk_mapping_provider FOREIGN KEY (llm_provider_uuid) | ||
| REFERENCES llm_providers(uuid) ON DELETE CASCADE, | ||
| CONSTRAINT fk_mapping_organization FOREIGN KEY (organization_uuid) |
a5anka
left a comment
There was a problem hiding this comment.
Bug: .Scan() returns zero-value structs instead of gorm.ErrRecordNotFound
Multiple repository methods use GORM's .Scan() but check for gorm.ErrRecordNotFound, which .Scan() never returns. This causes "ghost" zero-value records to be returned as valid entities — the deployment == nil guard in callers never triggers.
How it breaks
// deployment_repository.go:260-287 (GetWithState)
var deployment models.Deployment
err := r.db.Table("deployments d").
Select(...).Joins(...).Where(...).
Scan(&deployment).Error // Scan() — NOT First()
if err != nil {
if errors.Is(err, gorm.ErrRecordNotFound) { // ❌ DEAD CODE — Scan() never sets this
return nil, errors.New("deployment not found")
}
return nil, err
}
return &deployment, nil // ❌ Returns &Deployment{} with all zero values (uuid.Nil, "", nil, ...)When querying for a non-existent deployment:
.Scan()finds zero rows → setserr = nil- The
ErrRecordNotFoundcheck is skipped (dead code) - Returns a non-nil pointer to a zero-value
Deploymentstruct
Callers all do:
deployment, err := s.deploymentRepo.GetWithState(...)
if err != nil { ... } // err is nil — skipped
if deployment == nil { ... } // &Deployment{} is not nil — skipped
// Continues with ghost deployment where GatewayUUID = 00000000-0000-0000-0000-000000000000Concrete consequences
UndeployLLMProxyDeployment: Comparingdeployment.GatewayUUID.String()against the request'sgatewayIDreturns "gateway ID mismatch" instead of "deployment not found"GetStatus: Returns("", DeploymentStatus(""), &time.Time{0001-01-01}, nil)— a zero-time pointer that looks valid to JSON serializers but is clearly wrongGetCurrentByGateway: Returns a deployment with nilContent, which will panic or produce empty YAML whengateway_internal_service.godoesstring(deployment.Content)
Affected methods (5 total)
| File | Method | Line |
|---|---|---|
deployment_repository.go |
GetWithState |
272 |
deployment_repository.go |
GetCurrentByGateway |
181 |
deployment_repository.go |
GetStatus |
241 |
api_repository.go |
GetAPIByUUID |
130 |
api_repository.go |
GetAPIMetadataByHandle |
147 |
Suggested fix
Since these queries use custom Select and Joins clauses that may conflict with .First() (which adds ORDER BY pk LIMIT 1), the safest fix is to check RowsAffected:
result := r.db.Table("deployments d").
Select(...).Joins(...).Where(...).
Scan(&deployment)
if result.Error != nil {
return nil, result.Error
}
if result.RowsAffected == 0 {
return nil, nil // callers already handle nil as "not found"
}
return &deployment, nilApply this pattern to all 5 affected methods.
Performance Issue: Token Verification Full Table Scan — O(N*M) on Every Auth RequestFile: Problem
With N gateways and M active tokens per gateway (max 2 per
For 100 gateways: 101 DB queries and up to 200 SHA-256 hashes on every single request. Invalid tokens always hit worst case since all gateways must be scanned. Additional Concern: Silent Error Swallowing (line 440)if err != nil {
continue // Skip this gateway on error
}A transient DB error (connection timeout, deadlock) when fetching tokens for the gateway that actually owns the token causes it to be silently skipped, rejecting a valid token as Suggested FixStore a token prefix (e.g., first 8 chars of hex token) in a new indexed column on // During token creation:
tokenPrefix := plainToken[:8]
// Store in gateway_tokens.prefix column (with index)
// During verification — 2 queries instead of 1+N:
func (s *PlatformGatewayService) VerifyToken(plainToken string) (*models.PlatformGateway, error) {
prefix := plainToken[:8]
token, err := s.gatewayRepo.GetActiveTokenByPrefix(prefix) // single indexed query
if err != nil {
return nil, errors.New("invalid token")
}
if !verifyToken(plainToken, token.TokenHash, token.Salt) {
return nil, errors.New("invalid token")
}
return s.gatewayRepo.GetByUUID(token.GatewayUUID.String())
}The prefix is not secret (non-reversible subset), and SHA-256 verification still provides the actual security guarantee. This reduces auth from O(N) queries to O(1) regardless of gateway count. |
a5anka
left a comment
There was a problem hiding this comment.
WebSocket Rate Limiter Memory Leak
File: agent-manager-service/controllers/websocket_controller.go, lines 224-255
Severity: High
Problem
The rateLimitMap (map[string][]time.Time) tracks connection attempts per client IP, but entries are only cleaned up when the same IP makes a new request (lines 238-243). If an IP never connects again, its map entry persists forever. There is no background goroutine, no TTL, and no periodic sweep.
How it manifests
- IP
1.2.3.4connects →rateLimitMap["1.2.3.4"] = [10:00:00] - IP
1.2.3.4connects again within 1 min → old entries pruned, new one added - IP
1.2.3.4never connects again → entry sits in the map indefinitely
Why this matters
- Legitimate traffic: Gateways behind NATs or load balancers rotate IPs over time (pod restarts, IP reassignment). Each retired IP leaves a permanent orphan entry.
- Attack amplification via
X-Forwarded-For:getClientIP()(line 258) uses the rawX-Forwarded-Forheader as-is — not just the first IP. An attacker can generate unique map keys trivially by rotating the header value, simultaneously bypassing rate limits and inflating the map:X-Forwarded-For: random-string-1 X-Forwarded-For: random-string-2 - Memory impact: At ~50 bytes per entry, 1M unique IPs ≈ 50MB. Under sustained attack this grows without bound, eventually causing OOM.
Suggested fix
Add a periodic cleanup goroutine started in NewWebSocketController:
go func() {
ticker := time.NewTicker(5 * time.Minute)
defer ticker.Stop()
for range ticker.C {
ctrl.rateLimitMu.Lock()
cutoff := time.Now().Add(-1 * time.Minute)
for ip, attempts := range ctrl.rateLimitMap {
var recent []time.Time
for _, t := range attempts {
if t.After(cutoff) {
recent = append(recent, t)
}
}
if len(recent) == 0 {
delete(ctrl.rateLimitMap, ip)
} else {
ctrl.rateLimitMap[ip] = recent
}
}
ctrl.rateLimitMu.Unlock()
}
}()Also fix getClientIP() to parse only the first IP from X-Forwarded-For:
if xff := r.Header.Get("X-Forwarded-For"); xff != "" {
if i := strings.Index(xff, ","); i != -1 {
return strings.TrimSpace(xff[:i])
}
return strings.TrimSpace(xff)
}Alternatively, consider replacing the custom implementation with golang.org/x/time/rate which handles expiry natively.
Non-Atomic Gateway + Token Creation in
|
| } | ||
|
|
||
| if gateway == nil { | ||
| return nil, errors.New("gateway not found") |
There was a problem hiding this comment.
Inconsistent Error Types Cause Incorrect 500 Responses
All "not found" paths in this service return errors.New("gateway not found") — a new error instance each time — instead of the sentinel utils.ErrGatewayNotFound.
The controller's handleGatewayErrors checks via errors.Is(err, utils.ErrGatewayNotFound), which compares by identity (pointer equality), not by message text. Since a fresh errors.New(...) is a different pointer than the sentinel, the check never matches and the switch falls through to the default — returning 500 Internal Server Error instead of the intended 404 Not Found.
This affects 12 occurrences across GetGateway, UpdateGateway, DeleteGateway, GetGatewayStatus, GetGatewayArtifacts, and RotateToken.
Fix: Replace all inline errors.New("gateway not found") with the sentinel:
// Before (12 occurrences in this file):
return nil, errors.New("gateway not found")
// After:
return nil, utils.ErrGatewayNotFoundLines to update: 316, 320, 354, 357, 409, 412, 462, 465, 539, 543, 592, 595
Review Finding: Shared
|
a5anka
left a comment
There was a problem hiding this comment.
Performance Issue: Client-Side Pagination in ListGateways
File: agent-manager-service/controllers/gateway_controller.go, lines 277-302
Problem
When a client calls GET /orgs/{org}/gateways?limit=10&offset=0, the limit and offset params are parsed but never passed to the service or repository layer. Instead:
- All gateways are loaded from the DB (line 277) —
ListGatewayshas no pagination support - For every gateway (not just the requested page), an N+1 pattern executes:
- 1 DB query per gateway for environment mappings (line 288)
- 1 external RPC to OpenChoreo per gateway via
c.ocClient.ListEnvironments()— which returns the same org-wide list every time
- Only then are results sliced in memory (lines 292-302), discarding the rest
For an org with 100 gateways and limit=10, this means:
- 1 DB query for all 100 gateways
- 100 DB queries for environment mappings
- 100 external RPCs to OpenChoreo (returning the same data each time)
- 90 results discarded
Response time scales with total gateways, not page size.
Additionally, there is no guard against negative offset/limit values, which would panic on the slice operation at line 302:
paginatedGateways := specGateways[start:end] // panic if start < 0Suggested Fix
// 1. Pass limit/offset to the service and repository layers
gatewaysResp, total, err := c.gatewayService.ListGateways(&orgID, filters, limit, offset)
// 2. In the repository, use DB-level pagination
func (r *GatewayRepo) ListWithFilters(opts FilterOptions, limit, offset int) ([]*Gateway, int64, error) {
var total int64
r.db.Model(&Gateway{}).Where(...).Count(&total)
var gateways []*Gateway
r.db.Where(...).Limit(limit).Offset(offset).Find(&gateways)
return gateways, total, nil
}
// 3. Cache the OpenChoreo ListEnvironments call (it's org-wide, not per-gateway)
ocEnvironments, err := c.ocClient.ListEnvironments(ctx, orgName) // call ONCE before loop
for _, gw := range gatewaysResp.List {
environments := matchEnvironments(gw.ID, mappings, ocEnvironments) // no RPC per gateway
}
// 4. Validate pagination params
if limit <= 0 { limit = defaultLimit }
if offset < 0 { offset = 0 }This reduces a 100-gateway org request from 201 queries + 100 RPCs down to 2 queries + 1 RPC.
a5anka
left a comment
There was a problem hiding this comment.
Silent Failure in CreateAndDeploy and UpdateAndSync
File: agent-manager-service/services/llm_provider_service.go
Severity: Medium
Problem 1: Invalid gateway UUIDs silently skipped
CreateAndDeploy (line 507-512):
for _, gatewayID := range gatewayIDs {
gatewayUUID, err := uuid.Parse(gatewayID)
if err != nil {
slog.Error("...invalid gateway UUID...", ...)
continue // silently skips, no error returned
}If a caller passes gatewayIDs: ["valid-uuid", "garbage", "also-garbage"], two of three gateways are silently dropped. The caller receives the created provider with no indication that 2/3 of requested gateways were invalid.
Same pattern in UpdateAndSync (line 367-374).
Problem 2: Deployment failures silently skipped
CreateAndDeploy (line 549-554):
deployment, err := deploymentService.DeployLLMProvider(created.UUID.String(), deployReq, orgID)
if err != nil {
slog.Error("...failed to deploy to gateway...", ...)
continue // silently continues
}If deployment to every single gateway fails, the method still returns (created, nil) -- a success response. The caller has no way to know that zero deployments succeeded.
Line 559 also logs the wrong count:
slog.Info("...completed", "successfulDeployments", len(gatewayIDs)) // WRONG: logs total requested, not actual successesProblem 3: Mapping storage failure also silently swallowed
CreateAndDeploy (line 522-524):
if err := s.mappingRepo.CreateBatch(mappings); err != nil {
slog.Error("...failed to store gateway mappings...", ...)
// Continue with deployment even if mapping storage fails
}The gateway-provider mappings fail to persist but deployment continues, creating a state inconsistency: the provider is deployed to gateways but the system has no record of which ones.
Problem 4: UpdateAndSync compounds the issue
- Line 370-371: Invalid gateway UUID →
continue - Line 420-424: Deploy to new gateway fails → only logged
- Line 437-439: Failed to fetch deployments for undeploy →
continue - Line 445-447: Undeploy fails → only logged
The net effect: UpdateAndSync can return (updated, nil) where some gateways were never deployed to, some old gateways were never undeployed from, and the DB mapping diverges from reality.
Impact
The controller receives (provider, nil) and returns HTTP 200/201. The user sees "success" but the provider may not be deployed anywhere. There's no mechanism to detect or recover from this partial state.
Suggested fix
Option A -- Return partial results to the caller:
type DeploymentResult struct {
GatewayID string
Success bool
Error string
}
type CreateAndDeployResponse struct {
Provider *models.LLMProvider
Deployments []DeploymentResult
}Option B -- At minimum, fail if all deployments fail:
successCount := 0
var lastErr error
for _, gatewayID := range gatewayIDs {
deployment, err := deploymentService.DeployLLMProvider(...)
if err != nil {
lastErr = err
continue
}
successCount++
}
if successCount == 0 && len(gatewayIDs) > 0 {
return nil, fmt.Errorf("all %d deployments failed, last error: %w", len(gatewayIDs), lastErr)
}
Code Review Finding #16: Massive Code Duplication in
|
| Method | Event Type String | Lines |
|---|---|---|
BroadcastDeploymentEvent |
"api.deployed" |
72-145 |
BroadcastUndeploymentEvent |
"api.undeployed" |
148-215 |
BroadcastLLMProviderDeploymentEvent |
"llmprovider.deployed" |
218-285 |
BroadcastLLMProviderUndeploymentEvent |
"llmprovider.undeployed" |
288-355 |
BroadcastLLMProxyDeploymentEvent |
"llmproxy.deployed" |
358-425 |
BroadcastLLMProxyUndeploymentEvent |
"llmproxy.undeployed" |
428-495 |
BroadcastAPIKeyCreatedEvent |
"apikey.created" |
498-565 |
Every method follows the exact same 10-step pattern:
- Generate correlation ID
- Marshal payload to JSON
- Check payload size against
MaxEventPayloadSize - Create
GatewayEventDTOwith event type string - Marshal DTO to JSON
- Check if manager is nil
- Get connections for gateway ID
- Loop connections, send, track success/failure counts
- Log broadcast summary
- Return error if all sends failed
Why It Matters
- Bug propagation risk: A fix applied to one method may be missed in the other 6 copies
- Maintenance cost: Adding retry logic, delivery acknowledgment, or any new feature requires changing 7 places
- Inconsistency: As methods diverge over time, subtle behavioral differences creep in
Suggested Fix
Extract a single internal broadcastEvent method and reduce the public methods to one-liners:
func (s *GatewayEventsService) broadcastEvent(gatewayID string, eventType string, payload interface{}) error {
correlationID := uuid.New().String()
payloadJSON, err := json.Marshal(payload)
if err != nil {
slog.Error("Failed to serialize event", "gatewayID", gatewayID, "type", eventType, "error", err)
return fmt.Errorf("failed to serialize %s event: %w", eventType, err)
}
if len(payloadJSON) > MaxEventPayloadSize {
return fmt.Errorf("event payload exceeds maximum size: %d bytes (limit: %d)", len(payloadJSON), MaxEventPayloadSize)
}
eventDTO := GatewayEventDTO{
Type: eventType,
Payload: payload,
Timestamp: time.Now().Format(time.RFC3339),
CorrelationID: correlationID,
}
eventJSON, err := json.Marshal(eventDTO)
if err != nil {
return fmt.Errorf("failed to marshal event: %w", err)
}
if s.manager == nil {
slog.Warn("WebSocket manager not initialized")
return nil
}
connections := s.manager.GetConnections(gatewayID)
if len(connections) == 0 {
return fmt.Errorf("no active connections for gateway: %s", gatewayID)
}
successCount, failureCount := 0, 0
var lastError error
for _, conn := range connections {
if err := conn.Send(eventJSON); err != nil {
failureCount++
lastError = err
slog.Error("Failed to send event", "gatewayID", gatewayID, "connectionID", conn.ConnectionID,
"correlationID", correlationID, "type", eventType, "error", err)
conn.DeliveryStats.IncrementFailed(fmt.Sprintf("send error: %v", err))
} else {
successCount++
conn.DeliveryStats.IncrementTotalSent()
}
}
slog.Info("Broadcast summary", "gatewayID", gatewayID, "correlationID", correlationID,
"type", eventType, "total", len(connections), "success", successCount, "failed", failureCount)
if successCount == 0 {
return fmt.Errorf("failed to deliver %s event to any connection: %w", eventType, lastError)
}
return nil
}
// Public methods become thin one-liners:
func (s *GatewayEventsService) BroadcastDeploymentEvent(gatewayID string, event *DeploymentEvent) error {
return s.broadcastEvent(gatewayID, "api.deployed", event)
}
func (s *GatewayEventsService) BroadcastUndeploymentEvent(gatewayID string, event *APIUndeploymentEvent) error {
return s.broadcastEvent(gatewayID, "api.undeployed", event)
}
func (s *GatewayEventsService) BroadcastLLMProviderDeploymentEvent(gatewayID string, event *models.LLMProviderDeploymentEvent) error {
return s.broadcastEvent(gatewayID, "llmprovider.deployed", event)
}
func (s *GatewayEventsService) BroadcastLLMProviderUndeploymentEvent(gatewayID string, event *models.LLMProviderUndeploymentEvent) error {
return s.broadcastEvent(gatewayID, "llmprovider.undeployed", event)
}
func (s *GatewayEventsService) BroadcastLLMProxyDeploymentEvent(gatewayID string, event *models.LLMProxyDeploymentEvent) error {
return s.broadcastEvent(gatewayID, "llmproxy.deployed", event)
}
func (s *GatewayEventsService) BroadcastLLMProxyUndeploymentEvent(gatewayID string, event *models.LLMProxyUndeploymentEvent) error {
return s.broadcastEvent(gatewayID, "llmproxy.undeployed", event)
}
func (s *GatewayEventsService) BroadcastAPIKeyCreatedEvent(gatewayID string, event *models.APIKeyCreatedEvent) error {
return s.broadcastEvent(gatewayID, "apikey.created", event)
}This reduces the file from ~565 lines to ~80 lines, eliminates duplication, and as a bonus migrates from log.Printf to structured slog (which is used everywhere else in the codebase).
Code Review Finding #18: Mixed Logging LibrariesSeverity: Medium | Files:
// Current (gateway_events_service.go):
log.Printf("[ERROR] Failed to serialize deployment event: gatewayID=%s error=%v", gatewayID, err)
// Rest of codebase:
slog.Error("Failed to serialize deployment event", "gatewayID", gatewayID, "error", err)Impact:
Fix: Replace |
Code Review Finding #20: Certificate Directory Permission Too PermissiveSeverity: Medium | File: The cert directory is created with if err := os.MkdirAll(s.cfg.CertDir, 0o755); err != nil { // line 85While the private key file itself is correctly written with Fix: Change to if err := os.MkdirAll(s.cfg.CertDir, 0o700); err != nil { |
Code Review Finding #23: Controller Directly Accesses DatabaseSeverity: Low | File: The gateway controller has a // line 371-377 (DeleteGateway)
gwUUID, err := uuid.Parse(gatewayID)
if err == nil {
if err := db.DB(ctx).Where("gateway_uuid = ?", gwUUID).Delete(&models.GatewayEnvironmentMapping{}).Error; err != nil {
log.Warn("DeleteGateway: failed to delete gateway-environment mappings", "error", err)
}
}Same pattern in Impact: Breaks separation of concerns — business logic leaks into the controller, making it harder to test and maintain. If the mapping cleanup logic ever needs to change, you'd need to find it scattered in the controller rather than in a repository. Fix: Move these DB operations into the gateway repository or service layer, and remove the |
Purpose
Fixes #374
This pull request introduces major enhancements to the agent manager service API, focusing on expanding support for LLM (Large Language Model) provider and deployment management, as well as introducing a new internal server for WebSocket and gateway internal APIs. It also reorganizes route registration, adds new configuration options, and cleans up legacy files.
API Enhancements:
RegisterLLMRoutesinllm_routes.go).RegisterLLMDeploymentRoutesinllm_deployment_routes.go).Internal Server & WebSocket Support:
MakeInternalHTTPHandlerto create a dedicated internal HTTPS server for WebSocket connections and gateway internal APIs, using API key authentication instead of JWT.RegisterWebSocketRoutes) and gateway internal routes (RegisterGatewayInternalRoutes), with proper separation from public API routes. [Configuration Improvements:
.env.examplewith new configuration options for the internal server, WebSocket, and LLM provider template paths.File Organization & Cleanup:
gateway_internal_routes.go,websocket_routes.go), removing legacy migration and client code from those locations.README.mdfrom the API platform service client, reflecting a shift away from old documentation and possibly deprecating the previous client structure.These changes collectively modernize the API surface, improve internal/external separation, and lay the groundwork for more robust LLM and gateway management.
Goals
Approach
User stories
Release note
Documentation
Training
Certification
Marketing
Automation tests
Security checks
Samples
Related PRs
Migrations (if applicable)
Test environment
Learning
Summary by CodeRabbit
New Features
Configuration