Skip to content

Refactor api-platform integration #348

Merged
menakaj merged 16 commits intowso2:mainfrom
menakaj:gw-refactor
Feb 17, 2026
Merged

Refactor api-platform integration #348
menakaj merged 16 commits intowso2:mainfrom
menakaj:gw-refactor

Conversation

@menakaj
Copy link
Copy Markdown
Contributor

@menakaj menakaj commented Feb 12, 2026

Purpose

Describe the problems, issues, or needs driving this feature/fix and include links to related issues in the following format: Resolves issue1, issue2, etc.

Fixes #374

This pull request introduces major enhancements to the agent manager service API, focusing on expanding support for LLM (Large Language Model) provider and deployment management, as well as introducing a new internal server for WebSocket and gateway internal APIs. It also reorganizes route registration, adds new configuration options, and cleans up legacy files.

API Enhancements:

  • Added new API routes for managing LLM providers, LLM provider templates, and LLM proxies, including create, list, get, update, and delete operations (RegisterLLMRoutes in llm_routes.go).
  • Introduced routes for LLM provider deployments, supporting deploy, undeploy, restore, list, get, and delete operations (RegisterLLMDeploymentRoutes in llm_deployment_routes.go).
  • Expanded gateway routes to include gateway status and artifact retrieval endpoints.

Internal Server & WebSocket Support:

  • Implemented MakeInternalHTTPHandler to create a dedicated internal HTTPS server for WebSocket connections and gateway internal APIs, using API key authentication instead of JWT.
  • Added registration for WebSocket routes (RegisterWebSocketRoutes) and gateway internal routes (RegisterGatewayInternalRoutes), with proper separation from public API routes. [

Configuration Improvements:

  • Updated .env.example with new configuration options for the internal server, WebSocket, and LLM provider template paths.

File Organization & Cleanup:

  • Renamed and repurposed files for route registration (gateway_internal_routes.go, websocket_routes.go), removing legacy migration and client code from those locations.
  • Removed the outdated README.md from the API platform service client, reflecting a shift away from old documentation and possibly deprecating the previous client structure.

These changes collectively modernize the API surface, improve internal/external separation, and lay the groundwork for more robust LLM and gateway management.

Goals

Describe the solutions that this feature/fix will introduce to resolve the problems described above

Approach

Describe how you are implementing the solutions. Include an animated GIF or screenshot if the change affects the UI (email documentation@wso2.com to review all UI text). Include a link to a Markdown file or Google doc if the feature write-up is too long to paste here.

User stories

Summary of user stories addressed by this change>

Release note

Brief description of the new feature or bug fix as it will appear in the release notes

Documentation

Link(s) to product documentation that addresses the changes of this PR. If no doc impact, enter �N/A� plus brief explanation of why there�s no doc impact

Training

Link to the PR for changes to the training content in https://github.com/wso2/WSO2-Training, if applicable

Certification

Type �Sent� when you have provided new/updated certification questions, plus four answers for each question (correct answer highlighted in bold), based on this change. Certification questions/answers should be sent to certification@wso2.com and NOT pasted in this PR. If there is no impact on certification exams, type �N/A� and explain why.

Marketing

Link to drafts of marketing content that will describe and promote this feature, including product page changes, technical articles, blog posts, videos, etc., if applicable

Automation tests

  • Unit tests

    Code coverage information

  • Integration tests

    Details about the test cases and coverage

Security checks

Samples

Provide high-level details about the samples related to this feature

Related PRs

List any other related PRs

Migrations (if applicable)

Describe migration steps and platforms on which migration has been tested

Test environment

List all JDK versions, operating systems, databases, and browser/versions on which this feature/fix was tested

Learning

Describe the research phase and any blog posts, patterns, libraries, or add-ons you used to solve the problem.

Summary by CodeRabbit

  • New Features

    • Full LLM management: templates, providers, proxies, and deployment lifecycle (deploy/undeploy/restore/list)
    • WebSocket gateway support for real-time connections and event broadcasts
    • Internal gateway APIs to fetch APIs, deployments, and LLM provider artifacts
    • Gateway status and artifacts endpoints; deployment ZIP responses
  • Configuration

    • New internal HTTPS server and WebSocket settings (host/port/certs, timeouts, max connections, rate limits)
    • New LLM template definitions path and .env defaults added

@menakaj menakaj marked this pull request as draft February 12, 2026 19:50
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds end-to-end LLM provider and deployment support, internal HTTPS/WebSocket server and routes, new models/repositories/services/controllers for APIs/gateways/LLM, many DB migrations, removes the API Platform client, and updates configuration and startup to seed LLM templates and run an internal server.

Changes

Cohort / File(s) Summary
Config & Startup
agent-manager-service/.env.example, agent-manager-service/config/config.go, agent-manager-service/config/config_loader.go, agent-manager-service/main.go
Add internal server and WebSocket config/env entries; load/validate new config blocks; seed default LLM templates at startup; start/shutdown internal HTTPS server alongside main server.
API Surface & Routing
agent-manager-service/api/app.go, agent-manager-service/api/.../gateway_routes.go, agent-manager-service/api/.../gateway_internal_routes.go, agent-manager-service/api/llm_routes.go, agent-manager-service/api/llm_deployment_routes.go, agent-manager-service/api/websocket_routes.go
Register public (/api/v1) LLM and gateway routes with JWT middleware and a separate internal handler mounted under /api/internal/v1/ for gateway-internal and WebSocket endpoints (no JWT).
Controllers
agent-manager-service/controllers/gateway_controller.go, agent-manager-service/controllers/gateway_internal_controller.go, agent-manager-service/controllers/llm_controller.go, agent-manager-service/controllers/llm_deployment_controller.go, agent-manager-service/controllers/websocket_controller.go, agent-manager-service/controllers/environment_controller.go
Add LLM, LLM-deployment, WebSocket, and internal gateway controllers; implement handlers for deployments, ZIP artifact responses, WebSocket connect/ack, rate-limiting and org/project resolution; adjust gateway controller to use service layer.
Models & DTOs
agent-manager-service/models/* (new and modified files: gateway*.go, llm_*.go, artifact.go, rest_api.go, deployment.go, association_mapping.go, api_dto.go, upstream.go, gateway_internal_dto.go, aliases.go, gateway_credentials.go, environment.go, organization.go)
Introduce ORM models and DTOs for artifacts, APIs, deployments, association mappings, gateways/tokens, LLM providers/templates/proxies, and upstream/config structures; adjust organizations/environment models and add compatibility DTOs.
Repositories
agent-manager-service/repositories/*.go (api_repository.go, artifact_repository.go, deployment_repository.go, gateway_repository.go, llm_provider_repository.go, llm_provider_template_repository.go, llm_proxy_repository.go, organization_repository.go)
Add GORM-backed repositories for APIs, artifacts, deployments, gateways/tokens, LLM providers/templates/proxies, and organizations with transactions, pagination, and scoped queries.
Services
agent-manager-service/services/*.go (notable: platform_gateway_service.go, gateway_internal_service.go, llm_deployment_service.go, llm_provider_service.go, llm_provider_template_service.go, llm_proxy_service.go, llm_template_seeder.go, gateway_events_service.go, deployment_service.go, environment_service.go, infra_resource_manager.go)
Implement service layer for gateway management, LLM provider lifecycle and deployments, template seeding, gateway internal operations, events broadcasting via WebSocket manager, and infra/org handling (on-demand org sync).
Internal Server & WebSocket infra
agent-manager-service/server/internal_server.go, agent-manager-service/clients/.../, agent-manager-service/ws/*
Add InternalServer (self-signed TLS generation, TLS1.2+, lifecycle) for internal APIs; integrate WebSocket manager usage and per-IP rate limiting in controllers/services.
OpenAPI spec / Client code
agent-manager-service/spec/*.go (new services and ~100+ model files)
Add generated OpenAPI client surface for LLM Providers/Templates/Proxies/Deployments, with request builders, Execute methods, and many spec models (nullable wrappers included).
Database migrations & db helpers
agent-manager-service/db_migrations/*, agent-manager-service/db/db.go, agent-manager-service/db_migrations/migration_list.go
Add migrations for gateways, artifacts/APIs/deployments, LLM provider templates/providers/proxies; adjust organizations/environments migrations and bump latest migration version; add GetDB() helper.
Removed API Platform client & related docs/mocks
agent-manager-service/clients/apiplatformsvc/*, agent-manager-service/clients/clientmocks/apiplatform_client_fake.go, agent-manager-service/clients/apiplatformsvc/README.md, agent-manager-service/services/organization_sync.go, agent-manager-service/services/environment_sync.go, agent-manager-service/db_migrations/002_create_environments.go, agent-manager-service/scripts/gen_client.sh
Delete the API Platform client implementation, its auth/token code, generated mocks, documentation, and previously used sync utilities; generator script tweaks removed explicit UUID mapping options.
Resources
agent-manager-service/resources/default-llm-provider-templates/*.yaml
Add default LLM provider template manifests (OpenAI, Azure, Anthropic, AWS Bedrock, Gemini, Mistral, Azure AI Foundry, etc.) used for seeding templates.
Build dependencies
agent-manager-service/go.mod
Add explicit dependency on github.com/gorilla/websocket and make gopkg.in/yaml.v3 a direct dependency.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant InternalServer
    participant GatewayInternalCtrl as GatewayInternalController
    participant GatewayInternalSvc as GatewayInternalAPIService
    participant DeploymentRepo as DeploymentRepository
    participant ArtifactRepo as ArtifactRepository

    Client->>InternalServer: GET /api/internal/v1/apis (API key)
    InternalServer->>GatewayInternalCtrl: dispatch GetAPIsByOrganization
    GatewayInternalCtrl->>GatewayInternalSvc: GetAPIsByOrganization(orgID)
    GatewayInternalSvc->>ArtifactRepo: query APIs for org
    ArtifactRepo-->>GatewayInternalSvc: list of APIs
    GatewayInternalSvc->>DeploymentRepo: generate per-API deployment YAMLs
    DeploymentRepo-->>GatewayInternalSvc: YAML content map
    GatewayInternalSvc-->>GatewayInternalCtrl: map[string]string (api->yaml)
    GatewayInternalCtrl-->>InternalServer: zip and send application/zip response
    InternalServer-->>Client: 200 application/zip (Content-Disposition)
Loading
sequenceDiagram
    participant GatewayClient
    participant InternalServer
    participant WSController as WebSocketController
    participant Manager as WS.Manager
    participant GatewaySvc as PlatformGatewayService

    GatewayClient->>InternalServer: GET /api/internal/v1/ws/gateways/connect (API key)
    InternalServer->>WSController: dispatch Connect
    WSController->>GatewaySvc: VerifyToken(apiKey)
    GatewaySvc-->>WSController: Gateway info (UUID, status)
    WSController->>Manager: Register connection (gatewayID, conn)
    Manager-->>WSController: ack registered
    WSController-->>GatewayClient: send ConnectionAckDTO
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Poem

🐰 I hopped in code, a carrot in my paw,
New templates sprouted, webs hum with awe,
Gateways whisper, internal doors swing wide,
ZIPs and tokens, services side by side,
A happy rabbit seeds this rolling tide. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (31 files):

⚔️ agent-manager-service/.env.example (content)
⚔️ agent-manager-service/api/app.go (content)
⚔️ agent-manager-service/api/gateway_routes.go (content)
⚔️ agent-manager-service/config/config.go (content)
⚔️ agent-manager-service/config/config_loader.go (content)
⚔️ agent-manager-service/controllers/environment_controller.go (content)
⚔️ agent-manager-service/controllers/gateway_controller.go (content)
⚔️ agent-manager-service/db/db.go (content)
⚔️ agent-manager-service/db_migrations/migration_list.go (content)
⚔️ agent-manager-service/docs/api_v1_openapi.yaml (content)
⚔️ agent-manager-service/go.mod (content)
⚔️ agent-manager-service/go.sum (content)
⚔️ agent-manager-service/main.go (content)
⚔️ agent-manager-service/models/environment.go (content)
⚔️ agent-manager-service/models/gateway.go (content)
⚔️ agent-manager-service/models/organization.go (content)
⚔️ agent-manager-service/scripts/gen_client.sh (content)
⚔️ agent-manager-service/services/environment_service.go (content)
⚔️ agent-manager-service/services/infra_resource_manager.go (content)
⚔️ agent-manager-service/spec/client.go (content)
⚔️ agent-manager-service/utils/api.go (content)
⚔️ agent-manager-service/utils/errors.go (content)
⚔️ agent-manager-service/wiring/params.go (content)
⚔️ agent-manager-service/wiring/wire.go (content)
⚔️ agent-manager-service/wiring/wire_gen.go (content)
⚔️ console/apps/webapp/src/Layouts/OxygenLayout/navigationItems.tsx (content)
⚔️ console/workspaces/libs/views/src/component/PageLayout/PageLayout.tsx (content)
⚔️ console/workspaces/pages/add-new-agent/src/components/EnvironmentVariable.tsx (content)
⚔️ console/workspaces/pages/add-new-agent/src/components/InternalAgentFlow.tsx (content)
⚔️ console/workspaces/pages/overview/src/AgentOverview/InternalAgentOverview.tsx (content)
⚔️ deployments/docker-compose.yml (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
Description check ❓ Inconclusive The PR description is mostly complete but only provides the 'Purpose' section with specific details about API enhancements, internal server support, configuration improvements, and file organization. However, most required sections (Goals, Approach, User stories, Release note, Documentation, Training, Certification, Marketing, Automation tests, Security checks, Samples, Related PRs, Migrations, Test environment, Learning) remain as template placeholders without substantive content. Fill in the remaining template sections (Goals, Approach, User stories, Release note, Documentation, Training, Certification, Marketing, Automation tests, Security checks, Samples, Related PRs, Migrations, Test environment, Learning) with specific details relevant to this refactoring.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Refactor api-platform integration' accurately summarizes the main change of the PR: a comprehensive refactoring of the api-platform integration with expanded LLM support.
Docstring Coverage ✅ Passed Docstring coverage is 80.39% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@menakaj menakaj marked this pull request as ready for review February 13, 2026 06:41
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
agent-manager-service/services/organization_sync.go (1)

70-88: ⚠️ Potential issue | 🟡 Minor

totalSkipped is always zero — dead variable.

totalSkipped is initialized on line 71 but never incremented anywhere, yet it's included in the summary log on line 86. Either remove it or add logic to actually track skipped organizations.

Proposed fix (remove if unused)
 	totalSynced := 0
-	totalSkipped := 0
 	totalErrors := 0
 	s.logger.Info("Organization sync completed",
 		"totalSynced", totalSynced,
-		"totalSkipped", totalSkipped,
 		"totalErrors", totalErrors,
 	)
agent-manager-service/controllers/gateway_controller.go (1)

352-388: ⚠️ Potential issue | 🟠 Major

RemoveGatewayFromEnvironment skips organization validation, creating a cross-organization authorization gap.

Unlike all other mutating handlers in this controller, this method does not resolve or validate the organization. A caller with valid gatewayID and envID UUIDs can delete gateway-environment mappings across organization boundaries. The middleware only validates parameter presence; handlers must explicitly validate org context via resolveOrgUUID. Compare with AssignGatewayToEnvironment (lines 321–350), which validates the org before performing the operation.

🤖 Fix all issues with AI agents
In `@agent-manager-service/controllers/gateway_controller.go`:
- Around line 83-90: resolveOrgUUID currently dereferences org.UUID without
checking if org is nil, causing a panic when GetOrganizationByHandle returns
(nil, nil); add the same nil-check used in llm_controller.go after calling
c.orgRepo.GetOrganizationByHandle: if org == nil { return "",
utils.ErrOrganizationNotFound }, then return org.UUID.String(), nil. This
ensures resolveOrgUUID handles a missing organization gracefully and uses
utils.ErrOrganizationNotFound as the error value.

In `@agent-manager-service/controllers/llm_controller.go`:
- Around line 513-516: Update the logging in UpdateLLMProvider to avoid nil
deref when req.Configuration is nil: compute safe strings for configuration
name/version (e.g., cfgName, cfgVersion) by checking if req.Configuration != nil
and using
ptrToStringLog(req.Configuration.Name)/ptrToStringLog(req.Configuration.Version)
only when non-nil, then pass those safe variables into log.Info; alternatively,
wrap the log.Info call in a conditional that uses an empty/default value when
req.Configuration is nil so UpdateLLMProvider never dereferences
req.Configuration.
- Around line 857-864: The code passes orgID into ConvertSpecToModelLLMProxy
causing proxy.ProjectUUID to be set to the organization UUID; resolve the
correct project UUID (as done earlier in CreateLLMProxy) and pass that projectID
into ConvertSpecToModelLLMProxy instead of orgID, ensuring proxy.ProjectUUID is
the resolved project UUID (or explicitly set proxy.ProjectUUID = projectID after
conversion) and remove/confine any leftover orgID usage to avoid confusion;
update references around proxyReq, proxy, ConvertSpecToModelLLMProxy, orgID,
projectID and CreateLLMProxy accordingly.

In `@agent-manager-service/controllers/llm_deployment_controller.go`:
- Around line 58-65: The resolveOrgUUID function currently dereferences org
without checking for nil and maps all errors to utils.ErrOrganizationNotFound;
change it so that after calling c.orgRepo.GetOrganizationByHandle(orgName) you
first check if err != nil and return that err (or wrap it) so infrastructure/DB
errors are propagated, then check if org == nil and return
utils.ErrOrganizationNotFound for the not-found case; update callers of
resolveOrgUUID to treat utils.ErrOrganizationNotFound as a 401/NotFound response
and treat any other returned error from resolveOrgUUID (propagated
DB/infrastructure errors) as a 500/Internal error.

In `@agent-manager-service/db_migrations/005_create_artifacts_and_apis.go`:
- Around line 29-40: The ap_projects table DDL in
005_create_artifacts_and_apis.go is missing the nullable deleted_at TIMESTAMP
column required by the Project model's gorm.DeletedAt; update the CREATE TABLE
for ap_projects to add a nullable deleted_at TIMESTAMP column so GORM
soft-delete queries (which append WHERE deleted_at IS NULL) will work with the
ap_projects table referenced by the Project model in models/project.go.

In `@agent-manager-service/models/project.go`:
- Around line 27-35: The Project model declares DeletedAt gorm.DeletedAt which
requires a deleted_at column, but the ap_projects migration (migration 005)
omits it; either add a nullable deleted_at TIMESTAMP (or equivalent) column to
the ap_projects DDL in that migration so GORM soft-delete filtering works, or
remove the DeletedAt field from the Project struct (and any references to
soft-delete behavior) if you don't intend to use soft deletes; update the
schema/migration and corresponding model (Project.DeletedAt) consistently so
queries no longer fail.
- Around line 27-40: Remove the unused APProject model file entirely and resolve
the schema mismatch for Project: either delete the DeletedAt field from the
Project struct (remove gorm.DeletedAt DeletedAt and its json/db tags) if soft
deletes are not required, or update the DB migration (migration 005) to add a
deleted_at timestamp column to ap_projects and keep the DeletedAt field; ensure
Project.TableName() remains "ap_projects" and run migrations/tests to verify
GORM behavior.

In `@agent-manager-service/repositories/llm_provider_repository.go`:
- Around line 155-165: The query in LLMProviderRepo.Update (and similarly in
Delete) uses tx.Table("artifacts").Select("uuid").Where(...).Scan(&providerUUID)
which never returns gorm.ErrRecordNotFound and leaves providerUUID==uuid.Nil on
no match; replace Scan with a method that reports no rows (e.g.
.Take(&providerUUID) or .First(&providerUUID)) or explicitly check
result.RowsAffected and return gorm.ErrRecordNotFound when zero, so the code
correctly logs and aborts before using providerUUID in subsequent Where("uuid =
?", providerUUID).

In `@agent-manager-service/repositories/llm_provider_template_repository.go`:
- Around line 92-135: GetByID is querying the wrong column: change the WHERE in
LLMProviderTemplateRepo.GetByID to use the handle column (e.g., "handle = ? AND
organization_uuid = ?") so templateID maps to template.Handle like Delete/Exists
expect; additionally, update LLMProviderTemplateRepo.GetByUUID to mirror the
configuration unmarshaling in GetByID: read template.Configuration,
json.Unmarshal into llmProviderTemplateConfig, and populate template.Metadata,
PromptTokens, CompletionTokens, TotalTokens, RemainingTokens, RequestModel and
ResponseModel before returning.

In `@agent-manager-service/repositories/llm_proxy_repository.go`:
- Around line 219-241: In LLMProxyRepo.Delete fix the invalid SQL alias in the
artifact lookup: update the query in Delete (the tx.Table("artifacts")
Select/Where that populates proxyUUID) to reference the column as "kind" (not
"a.kind") and otherwise match the same predicate used in the Update flow; keep
the Select("uuid") and the Where parameters (proxyID, orgUUID,
models.KindLLMAPI) intact so the tx.Scan(&proxyUUID) works correctly.
- Around line 176-186: The current queries use Scan into proxyUUID which
silently yields uuid.Nil on no match; change both the Update and Delete flows to
detect missing records by either using First/Take into a struct (e.g., an
Artifact struct with UUID field) and checking for gorm.ErrRecordNotFound, or by
using Scan/Row and checking result.RowsAffected == 0 after the query;
specifically replace the
tx.Table("artifacts").Select("uuid").Where(...).Scan(&proxyUUID) pattern with a
call like tx.Table("artifacts").Select("uuid").Where(...).Take(&artifact).Error
(or use result := tx...; if result.RowsAffected == 0 return
gorm.ErrRecordNotFound) so you don't proceed using uuid.Nil in Update/Delete
methods (refer to the proxyUUID variable and those artifact lookup queries in
the Update and Delete methods).

In `@agent-manager-service/services/gateway_internal_service.go`:
- Around line 282-290: The code constructs a models.Deployment using
uuid.MustParse on apiUUID, gatewayID, and orgID which will panic on malformed
input; replace those calls with uuid.Parse, check each parse error, and return
or propagate a proper error (instead of panicking) before building the
Deployment (referencing apiUUID, gatewayID, orgID, and the deployment variable);
ensure any handler/method that calls this (where deployment is created) returns
a clear error response or logs the parsing failure so invalid request parameters
don’t crash the service.

In `@agent-manager-service/services/platform_gateway_service.go`:
- Around line 128-241: RegisterGateway currently generates plainToken and stores
only its hash (GatewayToken) but never returns the plain token to the caller;
modify the response to include the plain token by adding a Token field to the
GatewayResponse (or create a new RegisterGatewayResponse) and populate it with
plainToken before returning; update the RegisterGateway signature return type or
mapping so the returned struct includes Token, ensure you still store only
TokenHash/Salt in models.GatewayToken and do not persist the plainToken anywhere
else, and update any callers/tests that expect the previous GatewayResponse
shape accordingly.
🟠 Major comments (26)
agent-manager-service/db_migrations/003_create_organizations.go-24-25 (1)

24-25: ⚠️ Potential issue | 🟠 Major

Migration ID renumbering requires deployment verification.

The migration framework uses gormigrate which tracks applied migrations by ID in the migration_history table. If this migration was previously deployed under ID 4 to any environment, renumbering it to ID 3 creates a mismatch: the framework will find 0004 recorded as applied but will attempt to execute 0003 (not found in history), causing the CREATE TABLE organizations statement to fail on an already-existing table.

Confirm whether this migration was previously released with ID 4. If yes, document the corrective migration path for existing deployments (e.g., a script to update the migration history table before upgrading).

agent-manager-service/models/ap_project.go-26-38 (1)

26-38: ⚠️ Potential issue | 🟠 Major

Consolidate or clarify the dual model mapping to ap_projects table.

Two distinct models map to the same ap_projects table with inconsistent implementations:

  • APProject uses uuid.UUID for the ID field, lacks DeletedAt support, and has no BeforeCreate hook
  • Project uses string for the ID field, includes DeletedAt (soft delete support), and auto-generates UUIDs on creation

This creates confusion and potential data consistency issues. If writes use APProject, soft deletes won't work; if they use Project, the UUID type conversion between uuid.UUID and string may introduce bugs. Either consolidate to a single model or explicitly document which model is the primary interface and deprecate the other.

agent-manager-service/resources/default-llm-provider-templates/awsbedrock-template.yaml-37-42 (1)

37-42: ⚠️ Potential issue | 🟠 Major

Update regex to include forward slashes for ARN-formatted model identifiers.

The regex model/([A-Za-z0-9.:-]+)/ cannot capture AWS Bedrock model identifiers in ARN format. According to AWS documentation, valid modelId values in the InvokeModel API include:

  • Base model ARNs: arn:aws:bedrock:{region}::foundation-model/{model-name}
  • Inference profile ARNs: arn:aws:bedrock:{region}:{account}:inference-profile/{profile-id}

Both ARN formats contain forward slashes (/) that the current character class [A-Za-z0-9.:-] does not match. Update the regex to model/([A-Za-z0-9.:\/-]+)/ to support all valid Bedrock model identifier patterns, including ARNs.

agent-manager-service/db_migrations/004_create_gateways.go-42-46 (1)

42-46: ⚠️ Potential issue | 🟠 Major

Soft-delete deleted_at conflicts with the unique constraint.

The unique constraint uq_gateway_org_name on (organization_uuid, name) doesn't account for soft-deleted rows. If a gateway is soft-deleted (i.e., deleted_at IS NOT NULL) and a new gateway with the same org+name is created, the insert will violate the unique constraint. A common fix is a partial unique index:

Proposed fix
-			CONSTRAINT uq_gateway_org_name UNIQUE(organization_uuid, name),

And after the table creation, add:

CREATE UNIQUE INDEX uq_gateway_org_name ON gateways(organization_uuid, name) WHERE deleted_at IS NULL;
agent-manager-service/main.go-145-173 (1)

145-173: ⚠️ Potential issue | 🟠 Major

WaitGroup usage is incorrect — both Done() calls are in a single goroutine.

wg.Add(2) suggests two independent shutdown tasks, but both wg.Done() calls happen sequentially in the same goroutine. This means the shutdown is serial (mainServer must finish before internalServer starts shutting down), and both share the same 60s context — the internal server may get little to no remaining budget.

Additionally, if mainServer.Shutdown panics or the goroutine exits early, wg.Wait() will deadlock.

Consider either:

  1. Using wg.Add(1) since it's a single goroutine, or
  2. Shutting down the servers in parallel goroutines if independent shutdown is intended.
Option 2: Parallel shutdown
 	go func() {
 		<-stopCh
 		slog.Info("Shutting down servers gracefully")
 
 		shutdownCtx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
 		defer cancel()
 
 		// Shutdown WebSocket manager first
 		if dependencies.WebSocketManager != nil {
 			slog.Info("Shutting down WebSocket manager")
 			dependencies.WebSocketManager.Shutdown()
 		}
 
-		// Shutdown main server
-		if err := mainServer.Shutdown(shutdownCtx); err != nil {
-			slog.Error("Main server forced shutdown after timeout", "error", err)
-		}
-		wg.Done()
-
-		// Shutdown internal server
-		if err := internalServer.Shutdown(shutdownCtx); err != nil {
-			slog.Error("Internal server forced shutdown after timeout", "error", err)
-		}
-		wg.Done()
+		// Shutdown both servers in parallel
+		go func() {
+			defer wg.Done()
+			if err := mainServer.Shutdown(shutdownCtx); err != nil {
+				slog.Error("Main server forced shutdown after timeout", "error", err)
+			}
+		}()
+		go func() {
+			defer wg.Done()
+			if err := internalServer.Shutdown(shutdownCtx); err != nil {
+				slog.Error("Internal server forced shutdown after timeout", "error", err)
+			}
+		}()
 	}()
agent-manager-service/services/llm_template_seeder.go-87-87 (1)

87-87: ⚠️ Potential issue | 🟠 Major

Debug fmt.Println statements left in production code.

Lines 87 and 133 contain fmt.Println calls that appear to be leftover debugging statements. These should be removed or replaced with structured logging (slog.Debug/slog.Info).

🐛 Proposed fix — remove debug prints
 	for _, tpl := range s.templates {
-		fmt.Println(tpl)
 		if tpl == nil || tpl.Handle == "" {
-		fmt.Println("to create", toCreate)
 		if err := s.templateRepo.Create(toCreate); err != nil {

Also applies to: 133-133

agent-manager-service/resources/default-llm-provider-templates/anthropic-template.yaml-30-31 (1)

30-31: ⚠️ Potential issue | 🟠 Major

External URLs reference unofficial/personal GitHub repository across all LLM provider templates.

The logoUrl and openapiSpecUrl in the Anthropic template (and also in OpenAI, Mistral, Azure OpenAI, Azure AI Foundry, and AWS Bedrock templates) all point to https://raw.githubusercontent.com/nomadxd/openapi-connectors/.... This appears to be a personal or unofficial fork rather than an official source. If this repository is deleted, renamed, made private, or its owner loses access, all six provider templates will break at runtime. Consider migrating these assets to an official repository or bundling them as resources within the application.

agent-manager-service/controllers/gateway_internal_controller.go-56-77 (1)

56-77: 🛠️ Refactor suggestion | 🟠 Major

Duplicated API-key authentication boilerplate across all 4 handlers — extract into middleware.

Every handler repeats the same ~15-line block: extract api-key header, check empty, call VerifyToken, handle error. This is a textbook case for HTTP middleware that authenticates and injects the gateway into the request context.

Sketch of middleware approach
func (c *gatewayInternalController) requireAPIKey(next http.HandlerFunc) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        apiKey := r.Header.Get("api-key")
        if apiKey == "" {
            http.Error(w, "API key is required", http.StatusUnauthorized)
            return
        }
        gateway, err := c.gatewayService.VerifyToken(apiKey)
        if err != nil {
            http.Error(w, "Invalid or expired API key", http.StatusUnauthorized)
            return
        }
        ctx := context.WithValue(r.Context(), gatewayContextKey, gateway)
        next(w, r.WithContext(ctx))
    }
}

Then wrap handlers during route registration instead of duplicating in each handler.

Also applies to: 110-131, 175-196, 267-288

agent-manager-service/repositories/llm_provider_template_repository.go-87-87 (1)

87-87: ⚠️ Potential issue | 🟠 Major

Remove or downgrade the configuration log statement — potential sensitive data leak.

slog.Info(t.Configuration) logs the entire raw JSON configuration blob at Info level. This may contain sensitive fields such as authentication details (auth in LLMProviderTemplateMetadata) or endpoint URLs. At minimum, downgrade to slog.Debug or remove it entirely.

Proposed fix
-	slog.Info(t.Configuration)
+	slog.Debug("LLMProviderTemplateRepo.Create: configuration stored", "handle", t.Handle)
agent-manager-service/repositories/llm_provider_template_repository.go-138-146 (1)

138-146: ⚠️ Potential issue | 🟠 Major

List does not unmarshal Configuration JSON for returned templates.

Callers of List will receive templates with empty parsed fields (Metadata, PromptTokens, etc.) because the Configuration JSON string is not deserialized. If these fields are needed by API consumers, you should iterate and unmarshal, or extract the unmarshaling logic into a shared helper.

Proposed fix — extract helper and apply to List

Extract a helper method:

func populateConfigFields(t *models.LLMProviderTemplate) error {
	if t.Configuration == "" {
		return nil
	}
	var cfg llmProviderTemplateConfig
	if err := json.Unmarshal([]byte(t.Configuration), &cfg); err != nil {
		return err
	}
	t.Metadata = cfg.Metadata
	t.PromptTokens = cfg.PromptTokens
	t.CompletionTokens = cfg.CompletionTokens
	t.TotalTokens = cfg.TotalTokens
	t.RemainingTokens = cfg.RemainingTokens
	t.RequestModel = cfg.RequestModel
	t.ResponseModel = cfg.ResponseModel
	return nil
}

Then use it in List:

 func (r *LLMProviderTemplateRepo) List(orgUUID string, limit, offset int) ([]*models.LLMProviderTemplate, error) {
 	var templates []*models.LLMProviderTemplate
 	err := r.db.Where("organization_uuid = ?", orgUUID).
 		Order("created_at DESC").
 		Limit(limit).
 		Offset(offset).
 		Find(&templates).Error
+	if err != nil {
+		return nil, err
+	}
+	for _, t := range templates {
+		if err := populateConfigFields(t); err != nil {
+			return nil, err
+		}
+	}
 	return templates, err
 }
agent-manager-service/repositories/llm_provider_template_repository.go-158-176 (1)

158-176: ⚠️ Potential issue | 🟠 Major

Add configuration marshaling to Update method to match Create behavior.

The Update method does not marshal the parsed config fields (Metadata, PromptTokens, etc.) into Configuration before persisting, while Create does (lines 72-85). This causes parsed field updates in the seeder (e.g., line 110 of llm_template_seeder.go) to be silently lost because templateRepo.Update bypasses serialization.

The service layer masks this issue by calling serializeConfiguration before the repo update, but direct repository calls like in the seeder are vulnerable. Adding marshaling to Update makes the repository self-contained and prevents the bug.

Proposed fix
 func (r *LLMProviderTemplateRepo) Update(t *models.LLMProviderTemplate) error {
 	t.UpdatedAt = time.Now()
+
+	// Marshal parsed config fields into Configuration JSON
+	configJSON, err := json.Marshal(&llmProviderTemplateConfig{
+		Metadata:         t.Metadata,
+		PromptTokens:     t.PromptTokens,
+		CompletionTokens: t.CompletionTokens,
+		TotalTokens:      t.TotalTokens,
+		RemainingTokens:  t.RemainingTokens,
+		RequestModel:     t.RequestModel,
+		ResponseModel:    t.ResponseModel,
+	})
+	if err != nil {
+		return err
+	}
+	t.Configuration = string(configJSON)
+
 	result := r.db.Model(&models.LLMProviderTemplate{}).
agent-manager-service/services/gateway_internal_service.go-233-246 (1)

233-246: ⚠️ Potential issue | 🟠 Major

Hardcoded CreatedBy: "admin" — should use the actual authenticated identity.

Line 241 hardcodes "admin" as the creator. This loses audit trail integrity. The authenticated user or gateway identity should be propagated from the request context.

agent-manager-service/services/gateway_events_service.go-71-355 (1)

71-355: 🛠️ Refactor suggestion | 🟠 Major

All four broadcast methods are nearly identical — extract a common helper.

BroadcastDeploymentEvent, BroadcastUndeploymentEvent, BroadcastLLMProviderDeploymentEvent, and BroadcastLLMProviderUndeploymentEvent each repeat the same ~70-line pattern: serialize → size-check → wrap in DTO → serialize DTO → get connections → broadcast → tally results. The only differences are the event type string and the payload type. A single generic broadcast method would eliminate ~200 lines of duplication.

Sketch of a unified helper
func (s *GatewayEventsService) broadcastEvent(gatewayID, eventType string, payload interface{}) error {
	correlationID := uuid.New().String()

	payloadJSON, err := json.Marshal(payload)
	if err != nil {
		return fmt.Errorf("failed to serialize %s event: %w", eventType, err)
	}
	if len(payloadJSON) > MaxEventPayloadSize {
		return fmt.Errorf("event payload exceeds maximum size: %d bytes (limit: %d)", len(payloadJSON), MaxEventPayloadSize)
	}

	eventDTO := GatewayEventDTO{
		Type:          eventType,
		Payload:       payload,
		Timestamp:     time.Now().Format(time.RFC3339),
		CorrelationID: correlationID,
	}

	eventJSON, err := json.Marshal(eventDTO)
	if err != nil {
		return fmt.Errorf("failed to marshal event: %w", err)
	}

	if s.manager == nil {
		return nil
	}

	connections := s.manager.GetConnections(gatewayID)
	if len(connections) == 0 {
		return fmt.Errorf("no active connections for gateway: %s", gatewayID)
	}

	var successCount, failureCount int
	var lastError error
	for _, conn := range connections {
		if err := conn.Send(eventJSON); err != nil {
			failureCount++
			lastError = err
			conn.DeliveryStats.IncrementFailed(fmt.Sprintf("send error: %v", err))
		} else {
			successCount++
			conn.DeliveryStats.IncrementTotalSent()
		}
	}

	if successCount == 0 {
		return fmt.Errorf("failed to deliver %s event to any connection: %w", eventType, lastError)
	}
	return nil
}

// Then each method becomes a one-liner:
func (s *GatewayEventsService) BroadcastDeploymentEvent(gatewayID string, deployment *DeploymentEvent) error {
	return s.broadcastEvent(gatewayID, "api.deployed", deployment)
}
agent-manager-service/controllers/websocket_controller.go-215-264 (1)

215-264: ⚠️ Potential issue | 🟠 Major

rateLimitMap grows unboundedly — memory leak for long-running servers.

Old entries are pruned only when the same IP makes a new connection attempt (lines 233-238). IPs that connect once and never return will have their entries retained forever. Over time, this map will grow without bound.

Consider adding a periodic cleanup goroutine, or switching to a time-bucketed approach (e.g., golang.org/x/time/rate or a TTL cache).

agent-manager-service/controllers/websocket_controller.go-194-199 (1)

194-199: 🛠️ Refactor suggestion | 🟠 Major

readLoop breaks the Transport abstraction with a concrete type assertion.

The Transport interface (from websocket/transport.go) abstracts the underlying connection. Asserting to *ws.WebSocketTransport (line 195) couples the controller to the concrete implementation, defeating the purpose of the interface. Consider adding a ReadMessage method (or equivalent) to the Transport interface if read functionality is needed here.

agent-manager-service/services/gateway_internal_service.go-262-270 (1)

262-270: ⚠️ Potential issue | 🟠 Major

Blocking redeployment when status is UNDEPLOYED seems incorrect.

Line 268 rejects deployment if status is either DEPLOYED or UNDEPLOYED. Preventing redeployment when the API was previously undeployed from a gateway likely isn't intended — the user should be able to redeploy after an undeploy. Consider only blocking when status is DEPLOYED.

Suggested fix
-	if existingDeploymentID != "" && (status == models.DeploymentStatusDeployed || status == models.DeploymentStatusUndeployed) {
+	if existingDeploymentID != "" && status == models.DeploymentStatusDeployed {
 		return nil, fmt.Errorf("API already deployed to this gateway")
 	}
agent-manager-service/services/gateway_internal_service.go-305-329 (1)

305-329: ⚠️ Potential issue | 🟠 Major

generateAPIDeploymentYAML ignores the API's actual context and operations.

Line 320 hardcodes Context: "/api/v1" instead of using api.Configuration.Context. The Operations field in the YAML spec is also left empty, discarding whatever operations the API defines. This generates an incomplete/inaccurate deployment YAML.

Suggested improvement
 	deployment := APIDeploymentYAML{
 		ApiVersion: "gateway.api-platform.wso2.com/v1alpha1",
 		Kind:       api.Kind,
 		Metadata: DeploymentMetadata{
 			Name: api.Handle,
 		},
 		Spec: APIDeploymentSpec{
 			Name:    api.Name,
 			Version: api.Version,
-			Context: "/api/v1", // Default context
+			Context: getContextOrDefault(api),
 		},
 	}

Where getContextOrDefault returns the API's configured context or a sensible default if unset.

agent-manager-service/controllers/websocket_controller.go-155-171 (1)

155-171: ⚠️ Potential issue | 🟠 Major

Setting gateway active status to false on any connection close may be incorrect with multiple connections.

The Manager supports multiple connections per gateway (see connections sync.Map comment: "Supports multiple connections per gateway ID for clustering scenarios"). When one connection closes (line 166-169), the gateway is marked inactive even if other connections remain active. Check remaining connections before toggling status.

Suggested approach
 	// Connection closed - cleanup
 	log.Info("WebSocket connection closed", "gatewayID", gateway.UUID.String(), "connectionID", connection.ConnectionID)
 	c.manager.Unregister(gateway.UUID.String(), connection.ConnectionID)
 
 	// Update gateway active status to false when connection is disconnected
-	if err := c.gatewayService.UpdateGatewayActiveStatus(gateway.UUID.String(), false); err != nil {
-		log.Error("Failed to update gateway active status to false", "gatewayID", gateway.UUID.String(), "error", err)
+	// Only mark inactive if no other connections remain for this gateway
+	remaining := c.manager.GetConnections(gateway.UUID.String())
+	if len(remaining) == 0 {
+		if err := c.gatewayService.UpdateGatewayActiveStatus(gateway.UUID.String(), false); err != nil {
+			log.Error("Failed to update gateway active status to false", "gatewayID", gateway.UUID.String(), "error", err)
+		}
 	}
agent-manager-service/controllers/websocket_controller.go-48-54 (1)

48-54: ⚠️ Potential issue | 🟠 Major

Remove duplicate ConnectionAckDTO definition from websocket_controller.go.

Two ConnectionAckDTO structs are defined:

  • agent-manager-service/models/gateway_internal_dto.go (lines 78-82): fields Type, Message, GatewayID
  • agent-manager-service/controllers/websocket_controller.go (lines 49-53): fields Type, GatewayID, ConnectionID, Timestamp

Having duplicate struct definitions with the same name but different fields creates confusion and maintenance burden. Consolidate into a single DTO in the models package.

agent-manager-service/repositories/deployment_repository.go-167-190 (1)

167-190: ⚠️ Potential issue | 🟠 Major

Scan() silently returns a zero-valued struct on miss — same issue as in api_repository.go.

GetCurrentByGateway, GetWithState (line 273), and GetStatus (line 242) all use .Scan() which never raises gorm.ErrRecordNotFound. The not-found guards are dead code, and callers receive a non-nil zero-valued struct (or zero-valued result) rather than the documented nil/empty return.

For GetCurrentByGateway, use RowsAffected to detect the empty-result case (similar to the fix suggested for api_repository.go). Apply the same pattern to GetWithState and GetStatus.

🐛 Proposed fix for GetCurrentByGateway
-	err := r.db.Table("deployments d").
+	result := r.db.Table("deployments d").
 		Select("d.deployment_id, d.name, d.artifact_uuid, d.organization_uuid, d.gateway_uuid, "+
 			"d.base_deployment_id, d.content, d.metadata, d.created_at, "+
 			"s.status, s.updated_at AS status_updated_at").
 		Joins("INNER JOIN deployment_status s ON d.deployment_id = s.deployment_id "+
 			"AND d.artifact_uuid = s.artifact_uuid "+
 			"AND d.organization_uuid = s.organization_uuid "+
 			"AND d.gateway_uuid = s.gateway_uuid").
 		Where("d.artifact_uuid = ? AND d.gateway_uuid = ? AND d.organization_uuid = ? AND s.status = ?",
 			artifactUUID, gatewayID, orgUUID, string(models.DeploymentStatusDeployed)).
 		Order("d.created_at DESC").
 		Limit(1).
-		Scan(&deployment).Error
-	if err != nil {
-		if errors.Is(err, gorm.ErrRecordNotFound) {
-			return nil, nil
-		}
-		return nil, err
+		Scan(&deployment)
+	if result.Error != nil {
+		return nil, result.Error
+	}
+	if result.RowsAffected == 0 {
+		return nil, nil
 	}
 	return &deployment, nil
agent-manager-service/repositories/api_repository.go-122-139 (1)

122-139: ⚠️ Potential issue | 🟠 Major

Scan() never returns gorm.ErrRecordNotFound — caller receives a zero-valued struct instead of nil on miss.

GORM's .Scan() populates the target and returns nil error even when zero rows match; only .First() / .Take() / .Last() raise ErrRecordNotFound. The guard on line 133 is dead code, so when no API exists the function returns &api (all zero values) rather than nil, which will silently mislead callers.

The same issue applies to GetAPIMetadataByHandle (line 142).

🐛 Proposed fix — use `RowsAffected`
 func (r *APIRepo) GetAPIByUUID(apiUUID, orgUUID string) (*models.API, error) {
 	var api models.API
-	err := r.db.Table("rest_apis a").
+	result := r.db.Table("rest_apis a").
 		Select("art.uuid, art.handle, art.name, art.kind, a.description, art.version, a.created_by, "+
 			"a.project_uuid, art.organization_uuid, a.lifecycle_status, "+
 			"a.transport, a.configuration, art.created_at, art.updated_at").
 		Joins("INNER JOIN artifacts art ON a.uuid = art.uuid").
-		Where("a.uuid = ? AND art.organization_uuid = ?", apiUUID, orgUUID).
-		Scan(&api).Error
-	if err != nil {
-		if errors.Is(err, gorm.ErrRecordNotFound) {
-			return nil, nil
-		}
-		return nil, err
+		Where("a.uuid = ? AND art.organization_uuid = ?", apiUUID, orgUUID).
+		Scan(&api)
+	if result.Error != nil {
+		return nil, result.Error
+	}
+	if result.RowsAffected == 0 {
+		return nil, nil
 	}
 	return &api, nil
 }

Apply the same fix to GetAPIMetadataByHandle.

agent-manager-service/services/platform_gateway_service.go-414-441 (1)

414-441: ⚠️ Potential issue | 🟠 Major

VerifyToken fetches ALL gateways and their tokens on every call — O(n) DB round-trips.

This performs one List() call plus one GetActiveTokensByGatewayUUID call per gateway. For every token verification (likely on every incoming request from a gateway), this scans the entire gateway table. This will not scale as the number of gateways grows.

Consider a single query that joins gateway_tokens (where status='active') with gateways and loads all active token hashes at once, or use a token-prefix lookup index to narrow the search.

agent-manager-service/services/platform_gateway_service.go-215-222 (1)

215-222: ⚠️ Potential issue | 🟠 Major

Gateway and token creation are not transactional.

If CreateToken (Line 220) fails after Create (Line 216) succeeds, the gateway exists without any authentication token. Consider wrapping both operations in a DB transaction.

agent-manager-service/services/llm_deployment_service.go-329-336 (1)

329-336: ⚠️ Potential issue | 🟠 Major

Restore silently overwrites a different active deployment without undeploying it.

When currentDeploymentID != deploymentID but status == DeploymentStatusDeployed, the code proceeds to SetCurrent which implicitly replaces the currently-deployed revision. The previous deployment transitions to an undefined state — no undeployment event is broadcast for it, and its status record is overwritten.

If this is intentional (atomic swap), consider broadcasting an undeployment event for the previously active deployment so the gateway reflects the change. If unintentional, add a guard or explicit undeploy step.

agent-manager-service/services/llm_deployment_service.go-179-183 (1)

179-183: ⚠️ Potential issue | 🟠 Major

Error from GetWithContent is silently replaced with ErrBaseDeploymentNotFound.

If the repository call fails for a transient reason (e.g., DB connection error), the actual error is discarded and the caller receives a misleading "base deployment not found" error. Distinguish not-found from infrastructure errors.

🐛 Proposed fix
 		baseDeployment, err := s.deploymentRepo.GetWithContent(req.Base, provider.UUID.String(), orgID)
 		if err != nil {
-			slog.Warn("LLMProviderDeploymentService.DeployLLMProvider: base deployment not found", "providerID", providerID, "baseDeploymentID", req.Base, "error", err)
-			return nil, utils.ErrBaseDeploymentNotFound
+			slog.Error("LLMProviderDeploymentService.DeployLLMProvider: failed to get base deployment", "providerID", providerID, "baseDeploymentID", req.Base, "error", err)
+			return nil, fmt.Errorf("failed to get base deployment: %w", err)
+		}
+		if baseDeployment == nil {
+			slog.Warn("LLMProviderDeploymentService.DeployLLMProvider: base deployment not found", "providerID", providerID, "baseDeploymentID", req.Base)
+			return nil, utils.ErrBaseDeploymentNotFound
 		}
agent-manager-service/services/platform_gateway_service.go-624-687 (1)

624-687: ⚠️ Potential issue | 🟠 Major

validateGatewayInput trims local copies but the caller uses the original untrimmed values.

Lines 635, 658, 667, 673 call strings.TrimSpace on local parameter copies (name, displayName, vhost, functionalityType). The validation passes, but the caller in RegisterGateway still uses the original (potentially whitespace-padded) values when constructing the Gateway model. For example, a name like " my-gw " passes validation here (after local trim) but is stored with leading/trailing spaces.

Also, regexp.MustCompile on Line 647 recompiles the regex on every call. Move it to a package-level var.

🐛 Proposed fix: compile regex at package level and trim at the call site
+var namePattern = regexp.MustCompile(`^[a-z0-9-]+$`)
+
 func (s *PlatformGatewayService) validateGatewayInput(orgID, name, displayName, vhost, functionalityType string) error {

For the trimming issue, either:

  • Accept pointer parameters and write back trimmed values, or
  • Trim inputs in RegisterGateway before calling validateGatewayInput and passing them to the model constructor.
🟡 Minor comments (18)
agent-manager-service/resources/default-llm-provider-templates/awsbedrock-template.yaml-26-27 (1)

26-27: ⚠️ Potential issue | 🟡 Minor

External resource URLs reference a personal GitHub repository.

Both logoUrl and openapiSpecUrl point to https://raw.githubusercontent.com/nomadxd/openapi-connectors/.... Personal repos can be deleted, renamed, or made private at any time, which would break these references. Consider hosting these assets in an organization-owned repository or a stable CDN.

agent-manager-service/models/artifact.go-26-35 (1)

26-35: ⚠️ Potential issue | 🟡 Minor

Inconsistent JSON tag casing compared to other new models.

Artifact uses snake_case JSON tags ("organization_uuid", "created_at", "updated_at"), while APProject, AssociationMapping, and GatewayCredentials in this same PR use camelCase ("organizationId", "createdAt"). If these models are serialized in the same API surface, this inconsistency will be visible to consumers.

Suggested alignment to camelCase
 type Artifact struct {
-	UUID             uuid.UUID `gorm:"column:uuid;primaryKey" json:"uuid"`
-	Handle           string    `gorm:"column:handle" json:"handle"`
-	Name             string    `gorm:"column:name" json:"name"`
-	Version          string    `gorm:"column:version" json:"version"`
-	Kind             string    `gorm:"column:kind" json:"kind"`
-	OrganizationUUID uuid.UUID `gorm:"column:organization_uuid" json:"organization_uuid"`
-	CreatedAt        time.Time `gorm:"column:created_at" json:"created_at"`
-	UpdatedAt        time.Time `gorm:"column:updated_at" json:"updated_at"`
+	UUID             uuid.UUID `gorm:"column:uuid;primaryKey" json:"uuid"`
+	Handle           string    `gorm:"column:handle" json:"handle"`
+	Name             string    `gorm:"column:name" json:"name"`
+	Version          string    `gorm:"column:version" json:"version"`
+	Kind             string    `gorm:"column:kind" json:"kind"`
+	OrganizationUUID uuid.UUID `gorm:"column:organization_uuid" json:"organizationId"`
+	CreatedAt        time.Time `gorm:"column:created_at" json:"createdAt"`
+	UpdatedAt        time.Time `gorm:"column:updated_at" json:"updatedAt"`
 }
agent-manager-service/models/gateway_internal_dto.go-46-46 (1)

46-46: ⚠️ Potential issue | 🟡 Minor

Bug: YAML tag upstream (singular) doesn't match JSON tag upstreams (plural).

Upstreams has json:"upstreams" but yaml:"upstream". If this DTO is ever parsed or emitted as YAML, the field key won't match, silently dropping data. This looks like a typo.

🐛 Proposed fix
-	Upstreams   []UpstreamSimple `json:"upstreams" yaml:"upstream" binding:"required"`
+	Upstreams   []UpstreamSimple `json:"upstreams" yaml:"upstreams" binding:"required"`
agent-manager-service/resources/default-llm-provider-templates/openai-template.yaml-31-32 (1)

31-32: ⚠️ Potential issue | 🟡 Minor

logoUrl and openapiSpecUrl reference a personal GitHub repository — fragile for production use.

Both URLs point to raw.githubusercontent.com/nomadxd/openapi-connectors/.... If that personal repo is deleted, renamed, or set to private, these resources will break silently. Consider hosting these assets in an organization-controlled location or bundling them within this repository.

agent-manager-service/db_migrations/005_create_artifacts_and_apis.go-29-57 (1)

29-57: ⚠️ Potential issue | 🟡 Minor

Inconsistent use of CREATE TABLE IF NOT EXISTS vs CREATE TABLE.

ap_projects (Line 29) uses CREATE TABLE IF NOT EXISTS while all other tables (artifacts, rest_apis, deployments, etc.) use plain CREATE TABLE. If there's a reason ap_projects might already exist (e.g., created by another path), the reasoning should be documented. Otherwise, make all statements consistent — either all use IF NOT EXISTS or none do.

agent-manager-service/db_migrations/004_create_gateways.go-72-72 (1)

72-72: ⚠️ Potential issue | 🟡 Minor

Partial index on deleted_at IS NOT NULL is likely inverted.

The index idx_gateways_deleted covers rows where deleted_at IS NOT NULL (i.e., deleted records). The more common query pattern is filtering for active records (WHERE deleted_at IS NULL). Consider whether this should be inverted, or if both patterns are needed.

agent-manager-service/main.go-182-182 (1)

182-182: ⚠️ Potential issue | 🟡 Minor

Use errors.Is instead of != for error comparison.

Per the linter (errorlint): wrapped errors won't match with !=. The same applies to line 190.

Proposed fix

Add "errors" to imports, then:

-		if err := internalServer.Start(); err != nil && err != http.ErrServerClosed {
+		if err := internalServer.Start(); err != nil && !errors.Is(err, http.ErrServerClosed) {
-	if err := mainServer.ListenAndServe(); err != nil && err != http.ErrServerClosed {
+	if err := mainServer.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
agent-manager-service/main.go-249-252 (1)

249-252: ⚠️ Potential issue | 🟡 Minor

org.UUID.String() == "" is always false — UUID zero value stringifies to "00000000-...", not "".

Use org.UUID == uuid.Nil to check for a zero-value UUID.

Proposed fix
-		if org == nil || org.UUID.String() == "" {
+		if org == nil || org.UUID == uuid.Nil {

This requires adding "github.com/google/uuid" to the imports.

agent-manager-service/server/internal_server.go-64-64 (1)

64-64: ⚠️ Potential issue | 🟡 Minor

Malformed slog.Info call — positional args instead of key-value pairs.

slog.Info expects alternating key-value pairs for structured logging. Passing certPath and keyPath as bare positional arguments will produce garbled output (the first string becomes a key with the second as its value, or similar unexpected behavior).

🐛 Proposed fix
-	slog.Info("initializing with certs", certPath, keyPath)
+	slog.Info("initializing with certs", "certPath", certPath, "keyPath", keyPath)
agent-manager-service/models/gateway_compat.go-38-42 (1)

38-42: ⚠️ Potential issue | 🟡 Minor

GatewayEnvironmentMapping should define a TableName() method for consistency and explicitness with GORM operations.

Every model in this codebase (Gateway, Environment, LLMProviderTemplate, etc.) defines a TableName() method. GatewayEnvironmentMapping is actively used with GORM for queries, inserts, and deletes, yet lacks this method. Without it, GORM relies on inference, which is fragile and inconsistent with the established pattern.

agent-manager-service/controllers/gateway_internal_controller.go-106-106 (1)

106-106: ⚠️ Potential issue | 🟡 Minor

Unchecked return value of w.Write — flagged by linter.

Static analysis flags w.Write(zipData) on lines 106, 171, and 328 for unchecked error return. While typically benign, it's best practice to handle or explicitly discard.

Proposed fix (apply to all three locations)
-	w.Write(zipData)
+	if _, err := w.Write(zipData); err != nil {
+		log.Error("Failed to write response", "error", err)
+	}

Also applies to: 171-171, 328-328

agent-manager-service/controllers/gateway_internal_controller.go-214-218 (1)

214-218: ⚠️ Potential issue | 🟡 Minor

Raw JSON decode error exposed to client — potential information leakage.

"Invalid request body: "+err.Error() (line 216) passes internal parse error details to the caller. This could expose structure expectations (field names, types) to an attacker.

Proposed fix
-		http.Error(w, "Invalid request body: "+err.Error(), http.StatusBadRequest)
+		http.Error(w, "Invalid request body", http.StatusBadRequest)
agent-manager-service/controllers/websocket_controller.go-252-264 (1)

252-264: ⚠️ Potential issue | 🟡 Minor

X-Forwarded-For may contain multiple comma-separated IPs.

r.Header.Get("X-Forwarded-For") returns the full header value (e.g., "client, proxy1, proxy2"). Using it as-is for rate-limiting keys means different header values map to different entries, allowing trivial bypasses. Parse and use only the first (leftmost) IP.

Suggested fix
 func getClientIP(r *http.Request) string {
 	// Check X-Forwarded-For header first
 	if xff := r.Header.Get("X-Forwarded-For"); xff != "" {
-		return xff
+		// Use the first IP in the comma-separated list
+		if idx := strings.Index(xff, ","); idx != -1 {
+			return strings.TrimSpace(xff[:idx])
+		}
+		return strings.TrimSpace(xff)
 	}

Note: You'll need to add "strings" to the imports.

agent-manager-service/controllers/websocket_controller.go-122-133 (1)

122-133: ⚠️ Potential issue | 🟡 Minor

Unchecked error returns and misleading variable name on error path.

Line 129: jsonErr is actually the []byte result of json.Marshal, not an error — the naming is confusing. Lines 130 and 132: conn.WriteMessage and conn.Close error returns are unchecked (flagged by linter). While these are best-effort on a failing connection, the linter errors should be addressed.

Proposed fix
-		if jsonErr, _ := json.Marshal(errorMsg); jsonErr != nil {
-			conn.WriteMessage(websocket.TextMessage, jsonErr)
-		}
-		conn.Close()
+		if msgBytes, err := json.Marshal(errorMsg); err == nil {
+			_ = conn.WriteMessage(websocket.TextMessage, msgBytes)
+		}
+		_ = conn.Close()
agent-manager-service/repositories/api_repository.go-1-2 (1)

1-2: ⚠️ Potential issue | 🟡 Minor

Copyright year mismatch flagged by linter.

The header says 2025; other new files in this PR use 2026. Update to match.

agent-manager-service/controllers/llm_controller.go-114-119 (1)

114-119: ⚠️ Potential issue | 🟡 Minor

HTTP 401 is semantically incorrect for "organization not found".

401 Unauthorized signals missing/invalid authentication credentials. When the organization handle doesn't resolve, 404 Not Found (or 403 Forbidden if it's an access-control decision) would be the appropriate status code. This pattern is repeated across all handlers in the file.

agent-manager-service/repositories/api_repository.go-297-312 (1)

297-312: ⚠️ Potential issue | 🟡 Minor

CheckAPIExistsByNameAndVersionInOrganization doesn't filter by artifact kind.

Unlike CheckAPIExistsByHandleInOrganization (line 289, filters on kind = KindRestAPI), this method matches all artifact kinds. A WebSubAPI (or any future kind) with the same name+version would produce a false positive.

Proposed fix
 	query := r.db.Model(&models.Artifact{}).
-		Where("name = ? AND version = ? AND organization_uuid = ?", name, version, orgUUID)
+		Where("name = ? AND version = ? AND organization_uuid = ? AND kind = ?", name, version, orgUUID, models.KindRestAPI)
agent-manager-service/services/platform_gateway_service.go-484-493 (1)

484-493: ⚠️ Potential issue | 🟡 Minor

uuid.MustParse(gatewayID) can panic if gatewayID is malformed.

Although GetByUUID was called earlier (which may validate the format), MustParse will panic rather than return an error. Prefer uuid.Parse with error handling, or reuse the UUID already validated earlier (e.g., store the parsed UUID from the gateway object).

🐛 Proposed fix
-		GatewayUUID: uuid.MustParse(gatewayID),
+		GatewayUUID: gateway.UUID,

Comment thread agent-manager-service/controllers/llm_controller.go Outdated
Comment thread agent-manager-service/controllers/llm_controller.go Outdated
Comment thread agent-manager-service/controllers/llm_deployment_controller.go
Comment thread agent-manager-service/db_migrations/005_create_artifacts_and_apis.go Outdated
Comment thread agent-manager-service/models/project.go Outdated
Comment thread agent-manager-service/repositories/llm_proxy_repository.go
Comment thread agent-manager-service/repositories/llm_proxy_repository.go
Comment thread agent-manager-service/services/gateway_internal_service.go
Comment thread agent-manager-service/services/platform_gateway_service.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@agent-manager-service/controllers/gateway_internal_controller.go`:
- Around line 216-222: The handler currently returns the raw parse error to the
client by appending err.Error() to the HTTP response; instead keep the detailed
error only in logs and return a generic message. Update the json decoding error
branch (around json.NewDecoder(r.Body).Decode(&notificationReq)) to log the full
err via log.Warn (or similar) but call http.Error(w, "Invalid request body",
http.StatusBadRequest) without including err.Error(); ensure notificationReq,
json.NewDecoder, log.Warn and http.Error are the symbols you modify.

In `@agent-manager-service/main.go`:
- Around line 189-194: The error check after calling mainServer.ListenAndServe()
is using a direct comparison (err != http.ErrServerClosed) which is inconsistent
with the earlier use of errors.Is; update the condition to use errors.Is(err,
http.ErrServerClosed) (i.e., treat the error as non-fatal only if errors.Is
returns true) so the block becomes: if err := mainServer.ListenAndServe(); err
!= nil && !errors.Is(err, http.ErrServerClosed) { ... } and ensure the same
pattern is used when logging via slog.Error and exiting.
- Around line 146-174: The shutdown routine currently uses a single shared
shutdownCtx and calls wg.Add(2) even though one goroutine performs both
shutdowns; change wg.Add(2) to wg.Add(1) (and a single wg.Done()), and create
separate contexts for each server so each gets its own 60s timeout: inside the
goroutine, after <-stopCh create shutdownCtxMain, cancelMain :=
context.WithTimeout(...), defer cancelMain (or cancel immediately after the
Shutdown call) and call mainServer.Shutdown(shutdownCtxMain); then create
shutdownCtxInternal, cancelInternal := context.WithTimeout(...), defer/cancel
and call internalServer.Shutdown(shutdownCtxInternal); keep logging around
dependencies.WebSocketManager.Shutdown() as is and handle errors for each
Shutdown call separately.
- Around line 202-224: The function loadAndSeedLLMTemplates mutates
cfg.LLMTemplateDefinitionsPath in-place when trying a fallback path and that
causes a side-effect on the shared config; instead, introduce a local variable
(e.g., templatePath or attemptedPath) initialized from
cfg.LLMTemplateDefinitionsPath and use that for calls to
utils.LoadLLMProviderTemplatesFromDirectory and for computing fallbackPath,
updating only the local variable when the fallback succeeds (and not cfg); also
simplify the fallback conditions by computing a cleaned path once (using
filepath.Clean and strings.TrimSpace) and then checking the few necessary
predicates before joining "src" to form fallbackPath so the logic in
loadAndSeedLLMTemplates is clearer and has no global config mutation.
- Around line 250-253: The guard checking org.UUID.String() == "" is ineffective
because uuid.UUID's zero value string is not empty; update the nil UUID check to
compare org.UUID against uuid.Nil instead. Locate the loop that iterates orgs
and replace the condition org == nil || org.UUID.String() == "" with a check
that skips when org == nil || org.UUID == uuid.Nil, ensuring you import or
reference github.com/google/uuid where needed and keep behavior consistent with
other usages of uuid.Nil in the codebase.
🧹 Nitpick comments (4)
agent-manager-service/controllers/gateway_internal_controller.go (3)

56-109: Extract the repeated API-key authentication boilerplate into a shared helper or middleware.

All four handlers (GetAPIsByOrganization, GetAPI, CreateGatewayDeployment, GetLLMProvider) repeat the same ~15 lines for extracting the API key, logging the client IP, and calling VerifyToken. This violates DRY and makes it easy to introduce inconsistencies (e.g., different error messages across handlers).

Consider extracting this into a small middleware or helper that returns (gateway, error) so each handler can start from the authenticated gateway directly.

♻️ Example helper sketch
// authenticateGateway extracts the API key, verifies it, and returns the gateway.
func (c *gatewayInternalController) authenticateGateway(w http.ResponseWriter, r *http.Request) (*models.Gateway, bool) {
	log := logger.GetLogger(r.Context())
	clientIP := getClientIP(r)

	apiKey := r.Header.Get("api-key")
	if apiKey == "" {
		log.Warn("Unauthorized access attempt - Missing API key", "ip", clientIP)
		http.Error(w, "API key is required. Provide 'api-key' header.", http.StatusUnauthorized)
		return nil, false
	}

	gateway, err := c.gatewayService.VerifyToken(apiKey)
	if err != nil {
		log.Warn("Authentication failed", "ip", clientIP, "error", err)
		http.Error(w, "Invalid or expired API key", http.StatusUnauthorized)
		return nil, false
	}
	return gateway, true
}

Each handler then becomes:

func (c *gatewayInternalController) GetAPIsByOrganization(w http.ResponseWriter, r *http.Request) {
	gateway, ok := c.authenticateGateway(w, r)
	if !ok {
		return
	}
	// ... rest of handler
}

Also applies to: 112-176, 178-268, 271-335


135-141: Consider validating apiId and providerId path parameters as valid UUIDs.

These values are user-supplied and used in DB queries and interpolated into Content-Disposition headers. While the DB query would fail for invalid IDs, early UUID validation provides clearer error messages and avoids unnecessary DB round-trips.

Also applies to: 294-300


228-240: Request DTO includes unnecessary fields that are silently dropped during conversion.

The models.DeploymentNotification request requires ID, Status, CreatedAt, UpdatedAt, DeployedAt, and DeployedVersion from the caller, but the conversion to services.DeploymentNotification (lines 228-240) only uses ProjectIdentifier and Configuration. Since the service auto-generates timestamps instead of using the caller-provided ones, these fields should be removed from the request DTO to clarify the API contract and avoid confusion.

agent-manager-service/api/app.go (1)

63-91: CORS middleware on the internal server may be unnecessary.

The internal server is meant for gateway-to-service communication using API-key auth. CORS is a browser-enforced mechanism and typically irrelevant for server-to-server traffic. Applying it here adds minor overhead and could inadvertently allow browser-based cross-origin requests to reach internal endpoints if the allowed origin is permissive.

Consider whether CORS is needed here or if it was copied from the public handler.

Comment thread agent-manager-service/controllers/gateway_internal_controller.go
Comment thread agent-manager-service/main.go
Comment thread agent-manager-service/main.go
Comment thread agent-manager-service/main.go
Comment thread agent-manager-service/main.go
@menakaj menakaj force-pushed the gw-refactor branch 2 times, most recently from b427611 to f2343e2 Compare February 13, 2026 10:33
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents
In `@agent-manager-service/controllers/gateway_controller.go`:
- Around line 536-541: The handler sets artifactType :=
r.URL.Query().Get("type") and then defaults it to "all", which causes
PlatformGatewayService.GetGatewayArtifacts to skip all artifacts because it
expects an empty string for "no filter"; remove the default assignment (do not
set artifactType = "all") so the empty string is passed through to
PlatformGatewayService.GetGatewayArtifacts (leave artifactType as returned by
r.URL.Query().Get("type") so GetGatewayArtifacts can treat "" as no filter).
- Around line 83-93: The resolveOrgUUID function currently maps all errors from
orgRepo.GetOrganizationByHandle to utils.ErrOrganizationNotFound, which masks DB
errors and causes callers to return 401; change resolveOrgUUID so it returns the
original error (or a wrapped DB error) when GetOrganizationByHandle returns a
non-nil err, and only return utils.ErrOrganizationNotFound when org == nil;
update callers of resolveOrgUUID (following the pattern in
llm_deployment_controller.go) to translate utils.ErrOrganizationNotFound into
http.StatusNotFound and translate repository/DB errors into a 500-level response
instead of 401.

In `@agent-manager-service/controllers/gateway_internal_controller.go`:
- Around line 229-240: The DTO mapping in CreateGatewayDeployment is wrong: set
Configuration.ApiVersion to the CRD version string used in
generateAPIDeploymentYAML (e.g., "gateway.api-platform.wso2.com/v1alpha1")
instead of notificationReq.Configuration.Version, and populate
services.APIDeploymentSpec.Operations from
notificationReq.Configuration.Spec.Operations so operations are not dropped;
update the construction of notification (services.DeploymentNotification ->
Configuration -> Spec) to include the Operations field and use the CRD
ApiVersion constant or literal from generateAPIDeploymentYAML.

In `@agent-manager-service/controllers/websocket_controller.go`:
- Around line 224-255: The rateLimitMap growth causes an unbounded memory leak
because entries are only touched on reconnects; update the implementation to
evict stale IP entries by adding a background cleanup goroutine (or use a
time-expiring cache) that periodically scans rateLimitMap and removes keys whose
last attempt(s) are older than the 1-minute window: implement a cleanupTicker
started from websocketController initialization that locks rateLimitMu, iterates
entries in rateLimitMap, and deletes entries with no timestamps after now.Add(-1
* time.Minute); alternatively store a lastSeen timestamp per IP and evict when
lastSeen is older than the threshold; ensure proper shutdown of the goroutine
and safe use of rateLimitMu in both checkRateLimit and the cleaner.
- Around line 160-176: The current handler unconditionally calls
c.gatewayService.UpdateGatewayActiveStatus(gateway.UUID.String(), false) after
c.manager.Unregister(...), which incorrectly marks the gateway inactive when
other connections may still exist; modify the flow in the websocket handler to
call c.manager.Unregister(gateway.UUID.String(), connection.ConnectionID) first,
then query the manager for remaining connections (e.g.,
c.manager.GetConnections(gateway.UUID.String()) or a provided count method) and
only call c.gatewayService.UpdateGatewayActiveStatus(gateway.UUID.String(),
false) if the returned list/count is zero; keep logging but include the
conditional check to avoid marking gateways inactive while other connections
remain.
- Around line 258-268: getClientIP currently returns the full X-Forwarded-For
header which may contain multiple comma-separated IPs; change it to parse
X-Forwarded-For, split on ',' and use only the first non-empty entry (trim
spaces) as the client IP, falling back to X-Real-IP and then r.RemoteAddr if
needed; update the getClientIP function and add the "strings" import.

In `@agent-manager-service/repositories/api_repository.go`:
- Around line 121-154: Both GetAPIByUUID and GetAPIMetadataByHandle use Scan
which never yields gorm.ErrRecordNotFound and thus returns zero-value structs
instead of nil; change them to use
r.db.Table(...).Where(...).First(&api)/First(&metadata) (or .Take) so GORM
returns ErrRecordNotFound for missing rows, then check errors.Is(err,
gorm.ErrRecordNotFound) and return (nil, nil) or (nil, err) consistent with
callers (ensure gateway_internal_service.CreateGatewayDeployment's existingAPI
nil-check works); update the implementations of GetAPIByUUID and
GetAPIMetadataByHandle (referencing those function names, the models.API and
models.APIMetadata targets, and the SQL joins/selects) accordingly.

In `@agent-manager-service/repositories/deployment_repository.go`:
- Around line 259-287: The GetWithState function currently uses Scan and always
returns a non-nil deployment pointer even when no rows match, causing the
ARCHIVED fallback to create a fake record; change the DB call to capture the
GORM result (e.g., result :=
r.db.Table(...).Select(...).Joins(...).Where(...).Scan(&deployment)) then test
result.Error first and return it if non-nil, and if result.RowsAffected == 0
return nil, errors.New("deployment not found") before applying the Status nil ->
ARCHIVED fallback so callers don't receive a zero-UUID fake deployment.
- Around line 166-189: GetCurrentByGateway uses Scan which never returns
gorm.ErrRecordNotFound, so it always returns a non-nil &deployment even when no
row exists; change the logic in GetCurrentByGateway to detect zero results (e.g.
use r.db.Model(...).Scan(&deployment).RowsAffected or switch to First/Take) and
return (nil, gorm.ErrRecordNotFound) when no rows were found; update the
function around the Scan call and the deployment variable handling so callers’
nil checks work as intended.

In `@agent-manager-service/repositories/gateway_repository.go`:
- Around line 106-111: The List method on GatewayRepo returns all gateways
unscoped, causing cross-organization exposure and unbounded results; change the
signature of GatewayRepo.List to accept an organization UUID and pagination
(e.g., List(orgUUID string, limit, offset int)) or at minimum an orgUUID and
limit, then scope the DB query with Where("organization_uuid = ?", orgUUID) and
apply Limit(limit).Offset(offset) (and keep Order("created_at DESC")), update
all call sites to pass the org UUID and pagination values, and ensure the
returned error handling remains the same.

In `@agent-manager-service/repositories/llm_provider_template_repository.go`:
- Line 86: The call slog.Info(t.Configuration) exposes potentially sensitive
config data; change this to a lower-level log (slog.Debug) and redact sensitive
fields before logging (e.g., remove or mask t.Configuration.Metadata.Auth /
metadata.auth and any tokens/secrets) — locate the usage of t.Configuration and
replace the Info call with Debug and/or construct a sanitized copy of the
configuration (omitting or redacting fields like Metadata.Auth, tokens,
credentials) before logging.

In `@agent-manager-service/services/gateway_events_service.go`:
- Around line 71-355: The four near-identical methods (BroadcastDeploymentEvent,
BroadcastUndeploymentEvent, BroadcastLLMProviderDeploymentEvent,
BroadcastLLMProviderUndeploymentEvent) should be collapsed into a single helper
on GatewayEventsService (e.g., broadcastEvent or sendGatewayEvent) that accepts
gatewayID, a generic payload interface{}, the eventType string, and a short log
prefix; move shared logic (JSON marshal + size check, DTO creation using
GatewayEventDTO, manager nil check, fetch connections via
s.manager.GetConnections, per-connection Send handling, DeliveryStats updates,
and summary/return behavior) into that helper, and have each of the four methods
simply call it with the appropriate payload and type; update log messages to use
the provided prefix/eventType and preserve the existing error returns (including
using lastError) so behavior is unchanged.

In `@agent-manager-service/services/gateway_internal_service.go`:
- Around line 144-161: The nil check in GetActiveDeploymentByGateway is
currently ineffective because deploymentRepo.GetCurrentByGateway does not return
nil for no rows; update the repository (GetCurrentByGateway in
deployment_repository.go) to return (nil, nil) when no rows are found (rather
than a zero-valued struct), and keep the nil check in
GetActiveDeploymentByGateway (function GetActiveDeploymentByGateway) to return a
clear error like "deployment not active" when deployment == nil so the service
correctly signals no active deployment once the repo behavior is fixed.
- Around line 222-259: The current check relies on existingAPI being nil but
GetAPIMetadataByHandle may return a non-nil zero-value struct; update the branch
in the gateway handler to treat a zero-value result as "not found" by checking
the API's UUID (or other unique identifier) in addition to nil: i.e., in the
block around existingAPI handling (variables existingAPI, apiUUID, apiCreated
and calls to s.apiRepo.GetAPIMetadataByHandle and s.apiRepo.CreateAPI), change
the condition to treat existingAPI == nil OR existingAPI.UUID == uuid.Nil (or
equivalent zero-value) as missing so the CreateAPI (newAPI) path actually
executes until the repository Scan bug is fixed. Ensure subsequent uses
(existingAPI.OrganizationUUID and existingAPI.UUID) assume a valid UUID only
when not zero.

In `@agent-manager-service/services/llm_deployment_service.go`:
- Around line 178-182: The code masks all errors from
deploymentRepo.GetWithContent by always returning
utils.ErrBaseDeploymentNotFound; change the logic in
LLMProviderDeploymentService.DeployLLMProvider to distinguish not-found from
other DB errors: call s.deploymentRepo.GetWithContent(req.Base,
provider.UUID.String(), orgID), if err != nil then if errors.Is(err,
repository.ErrNotFound) (or the repo's not-found sentinel) return
utils.ErrBaseDeploymentNotFound, else return the original err (or wrap it) so
callers see a 5xx; ensure you reference baseDeployment and GetWithContent and
propagate non-not-found errors instead of always mapping to
ErrBaseDeploymentNotFound.

In `@agent-manager-service/services/llm_template_seeder.go`:
- Around line 91-111: The metadata changes to current.Metadata are only
in-memory because current.Configuration (the JSON string) isn't updated before
persistence; before calling s.templateRepo.Update(current) re-marshal the
template configuration so the updated Metadata and Name are reflected in
current.Configuration (same approach used by Create()), e.g., marshal the struct
back into the Configuration JSON field on the current template after modifying
current.Metadata/current.Name and before s.templateRepo.Update(current) to
ensure Update persists the changes.

In `@agent-manager-service/services/platform_gateway_service.go`:
- Around line 215-222: The gateway creation and token insertion
(s.gatewayRepo.Create(gateway) followed by
s.gatewayRepo.CreateToken(gatewayToken)) must be made atomic to avoid leaving an
orphan gateway on token failure; either wrap both calls in a DB transaction
exposed by the repository (begin/commit/rollback) so both Create and CreateToken
are committed together, or, if transactions are not available, add compensating
cleanup: call s.gatewayRepo.Delete(gateway.ID) when
s.gatewayRepo.CreateToken(...) returns an error, and ensure errors from the
cleanup are logged/returned appropriately; update the service method to choose
one approach and use the repository's transactional API or Delete(...) method
accordingly.
- Around line 415-442: The VerifyToken method currently does gatewayRepo.List
followed by gatewayRepo.GetActiveTokensByGatewayUUID per gateway causing N+1
queries and it also swallows errors from GetActiveTokensByGatewayUUID; replace
this with a single-repo lookup (e.g., add gatewayRepo.GetAllActiveTokens()
returning token hashes with gateway UUIDs, or better
gatewayRepo.GetActiveTokenByHash(hash) indexed lookup) and change VerifyToken to
query that single method (or lookup by token hash) and validate using
verifyToken; also stop swallowing DB errors—propagate and return an error when
the repository call fails so transient DB errors return a retryable error
instead of "invalid token".
🟡 Minor comments (8)
agent-manager-service/repositories/llm_provider_template_repository.go-151-160 (1)

151-160: ⚠️ Potential issue | 🟡 Minor

List does not unmarshal Configuration JSON — callers get incomplete template data.

Unlike GetByID and GetByUUID, the List method returns templates without populating the parsed fields (Metadata, PromptTokens, etc.). Callers that iterate over the list and access these fields will get nil values.

Proposed fix — unmarshal after fetching
 func (r *LLMProviderTemplateRepo) List(orgUUID string, limit, offset int) ([]*models.LLMProviderTemplate, error) {
 	var templates []*models.LLMProviderTemplate
 	err := r.db.Where("organization_uuid = ?", orgUUID).
 		Order("created_at DESC").
 		Limit(limit).
 		Offset(offset).
 		Find(&templates).Error
+	if err != nil {
+		return templates, err
+	}
+	for _, t := range templates {
+		if t.Configuration != "" {
+			var cfg llmProviderTemplateConfig
+			if err := json.Unmarshal([]byte(t.Configuration), &cfg); err != nil {
+				return nil, err
+			}
+			t.Metadata = cfg.Metadata
+			t.PromptTokens = cfg.PromptTokens
+			t.CompletionTokens = cfg.CompletionTokens
+			t.TotalTokens = cfg.TotalTokens
+			t.RemainingTokens = cfg.RemainingTokens
+			t.RequestModel = cfg.RequestModel
+			t.ResponseModel = cfg.ResponseModel
+		}
+	}
 	return templates, err
 }
agent-manager-service/repositories/gateway_repository.go-191-200 (1)

191-200: ⚠️ Potential issue | 🟡 Minor

RevokeToken silently succeeds for non-existent or already-revoked tokens.

No RowsAffected check, and no status guard to prevent re-revoking an already-revoked token. A caller won't know if the operation was a no-op.

agent-manager-service/repositories/deployment_repository.go-229-250 (1)

229-250: ⚠️ Potential issue | 🟡 Minor

GetStatus uses ScanErrRecordNotFound check on line 243 is dead code.

The errors.Is(err, gorm.ErrRecordNotFound) guard will never trigger with Scan. When no rows match, result will contain zero-values and err will be nil. The current callers happen to work because they check for an empty DeploymentID string, but this is fragile. Consider using RowsAffected to return a clear sentinel.

agent-manager-service/services/llm_provider_service.go-156-173 (1)

156-173: ⚠️ Potential issue | 🟡 Minor

Missing nil guard on the fetched provider after Create.

After the transaction succeeds (line 148), the code fetches the just-created provider via GetByID. If the repository implementation can return (nil, nil) (e.g., using Scan rather than First), accessing created.ModelList on line 164 or created.UUID on line 172 would panic. Add a nil check analogous to what Get does on line 222.

Proposed fix
 	created, err := s.providerRepo.GetByID(handle, orgID)
 	if err != nil {
 		slog.Error("LLMProviderService.Create: failed to fetch created provider", "orgID", orgID, "handle", handle, "error", err)
 		return nil, fmt.Errorf("failed to fetch created provider: %w", err)
 	}
+	if created == nil {
+		slog.Error("LLMProviderService.Create: created provider not found", "orgID", orgID, "handle", handle)
+		return nil, utils.ErrLLMProviderNotFound
+	}
agent-manager-service/repositories/gateway_repository.go-139-161 (1)

139-161: ⚠️ Potential issue | 🟡 Minor

UpdateGateway and UpdateActiveStatus silently succeed when the gateway doesn't exist.

Neither method checks result.RowsAffected. A caller updating a non-existent gateway UUID receives nil error, which is misleading. The Delete method correctly checks RowsAffected == 0 — apply the same guard here.

Example for UpdateGateway
 func (r *GatewayRepo) UpdateGateway(gateway *models.Gateway) error {
 	gateway.UpdatedAt = time.Now()
-	return r.db.Model(&models.Gateway{}).
+	result := r.db.Model(&models.Gateway{}).
 		Where("uuid = ?", gateway.UUID).
 		Updates(map[string]interface{}{
 			"display_name": gateway.DisplayName,
 			"description":  gateway.Description,
 			"is_critical":  gateway.IsCritical,
 			"properties":   gateway.Properties,
 			"updated_at":   gateway.UpdatedAt,
-		}).Error
+		})
+	if result.Error != nil {
+		return result.Error
+	}
+	if result.RowsAffected == 0 {
+		return errors.New("gateway not found")
+	}
+	return nil
 }
agent-manager-service/controllers/llm_controller.go-83-105 (1)

83-105: ⚠️ Potential issue | 🟡 Minor

resolveOrgUUID and resolveProjectUUID swallow the original error.

Both helpers discard the underlying error (e.g., database connectivity failure) and unconditionally return a domain-specific "not found" error. This means a DB outage surfaces as HTTP 401 "Organization not found" instead of HTTP 500, which hampers debugging.

Proposed fix — distinguish not-found from internal errors
 func (c *llmController) resolveOrgUUID(ctx context.Context, orgName string) (string, error) {
 	org, err := c.orgRepo.GetOrganizationByHandle(orgName)
 	if err != nil {
+		if errors.Is(err, gorm.ErrRecordNotFound) {
+			return "", utils.ErrOrganizationNotFound
+		}
-		return "", utils.ErrOrganizationNotFound
+		return "", fmt.Errorf("failed to resolve organization: %w", err)
 	}
 	if org == nil {
 		return "", utils.ErrOrganizationNotFound
 	}
 	return org.UUID.String(), nil
 }
agent-manager-service/services/platform_gateway_service.go-625-688 (1)

625-688: ⚠️ Potential issue | 🟡 Minor

Trimmed name/displayName not propagated to caller.

Lines 636 and 659 trim the local name and displayName parameters, but these are value parameters — the trimmed values don't flow back to RegisterGateway. If a user submits " my-gw ", validation passes (after trimming), but the gateway is created with the untrimmed value.

Options

Either trim the values before calling validateGatewayInput in RegisterGateway, or change the validation function to return the sanitized values.

agent-manager-service/services/llm_deployment_service.go-305-364 (1)

305-364: ⚠️ Potential issue | 🟡 Minor

RestoreLLMProviderDeployment should handle missing deployment status as a valid scenario.

Line 329–332 calls GetStatus and fails immediately if an error occurs. However, when no current deployment status exists for a gateway (which GetStatus returns as gorm.ErrRecordNotFound), the restore should proceed rather than fail. The function should only error on genuine database failures, not when records are absent. Consider checking errors.Is(err, gorm.ErrRecordNotFound) and treating it as a valid case—allowing restoration to proceed when no prior deployment exists.

🧹 Nitpick comments (26)
agent-manager-service/db_migrations/005_create_artifacts_and_apis.go (3)

141-142: Redundant index: idx_deployments_artifact_gateway is a leftmost prefix of idx_deployments_created_at.

The index on (artifact_uuid, gateway_uuid) at Line 141 is fully covered by the index on (artifact_uuid, gateway_uuid, created_at) at Line 142, since PostgreSQL can use a composite index for queries on any leftmost prefix of its columns. Keeping both wastes storage and slows writes with no query benefit.

Proposed fix
 			-- Indexes for performance
 			CREATE INDEX idx_ap_projects_organization ON ap_projects(organization_uuid);
 			CREATE INDEX idx_rest_apis_project ON rest_apis(project_uuid);
 			CREATE INDEX idx_artifacts_org ON artifacts(organization_uuid);
-			CREATE INDEX idx_deployments_artifact_gateway ON deployments(artifact_uuid, gateway_uuid);
 			CREATE INDEX idx_deployments_created_at ON deployments(artifact_uuid, gateway_uuid, created_at);

44-44: Inconsistent use of IF NOT EXISTS across table definitions.

ap_projects uses CREATE TABLE IF NOT EXISTS (Line 29) while all other tables use CREATE TABLE. Since gormigrate tracks applied migrations, this isn't a runtime risk, but the inconsistency is worth normalizing — either all tables should use IF NOT EXISTS or none should.

Also applies to: 77-77, 99-99, 120-120


150-153: No rollback (down) migration defined.

This migration creates six tables and multiple indexes but provides no mechanism to reverse it. If this is intentional and consistent with the project's migration strategy, that's fine — but worth noting that rolling back migration 005 would require manual intervention.

agent-manager-service/repositories/artifact_repository.go (1)

92-104: Redundant error branching in GetByHandle.

Both if and else branches return the same (nil, err). The errors.Is(err, gorm.ErrRecordNotFound) check has no distinct behavior. Either differentiate the handling (e.g., return nil, nil for not-found) or simplify.

Simplification
 func (r *ArtifactRepo) GetByHandle(handle, orgUUID string) (*models.Artifact, error) {
 	var artifact models.Artifact
 	err := r.db.Where("handle = ? AND organization_uuid = ?", handle, orgUUID).
 		First(&artifact).Error
-	if err != nil {
-		if errors.Is(err, gorm.ErrRecordNotFound) {
-			return nil, err
-		}
-		return nil, err
-	}
-	return &artifact, nil
+	if err != nil {
+		return nil, err
+	}
+	return &artifact, nil
 }
agent-manager-service/services/llm_template_seeder.go (1)

64-74: Minor: redundant List call when totalCount is 0.

When totalCount is 0, the List call with limit=0 is unnecessary. Consider short-circuiting.

Short-circuit when no existing templates
 	totalCount, err := s.templateRepo.Count(orgUUIDStr)
 	if err != nil {
 		return fmt.Errorf("failed to count existing templates: %w", err)
 	}

-	// Get all existing templates for this organization
-	existing, err := s.templateRepo.List(orgUUIDStr, totalCount, 0)
-	if err != nil {
-		return fmt.Errorf("failed to list existing templates: %w", err)
-	}
+	var existing []*models.LLMProviderTemplate
+	if totalCount > 0 {
+		existing, err = s.templateRepo.List(orgUUIDStr, totalCount, 0)
+		if err != nil {
+			return fmt.Errorf("failed to list existing templates: %w", err)
+		}
+	}
agent-manager-service/repositories/project_repository.go (1)

103-106: DeleteProject silently succeeds when the project doesn't exist.

Unlike Delete in ArtifactRepo (which checks RowsAffected == 0), this method doesn't verify the record existed. Callers may not realize the delete was a no-op.

Add RowsAffected check
 func (r *ProjectRepo) DeleteProject(projectId string) error {
-	return r.db.Where("uuid = ?", projectId).Delete(&models.Project{}).Error
+	result := r.db.Where("uuid = ?", projectId).Delete(&models.Project{})
+	if result.Error != nil {
+		return result.Error
+	}
+	if result.RowsAffected == 0 {
+		return gorm.ErrRecordNotFound
+	}
+	return nil
 }
agent-manager-service/controllers/websocket_controller.go (2)

49-55: Duplicate ConnectionAckDTO type — conflicts with models/gateway_internal_dto.go.

A ConnectionAckDTO is already defined in models/gateway_internal_dto.go (with fields Type, Message, GatewayID). This controller defines a different version with ConnectionID and Timestamp instead of Message. This creates ambiguity. Consider consolidating into a single type in the models package.


66-70: CheckOrigin unconditionally returns true — noted as TODO but is a security gap.

This allows any origin to establish a WebSocket connection. Even with API key auth, this enables cross-site WebSocket hijacking if a browser is involved.

Do you want me to open an issue to track implementing proper origin checking?

agent-manager-service/repositories/organization_repository.go (1)

57-68: GetOrganizationByIdOrHandle — potential for unintended matches with OR query.

If id is empty or handle is empty, the WHERE uuid = ? OR handle = ? clause could match records where only one condition is true, which may not be the caller's intent (e.g., passing an empty id would match any org whose handle equals the empty string if such a row exists). Consider requiring at least one non-empty parameter, or using conditional query building.

agent-manager-service/repositories/llm_provider_repository.go (1)

56-92: Excessive slog.Info logging in repository methods.

Every method logs multiple Info-level messages for start, intermediate steps, and completion (e.g., lines 58, 63, 68, 84, 90 just in Create). This pattern repeats across all methods. Repository-layer operations are typically high-frequency; this level of verbosity will create significant log noise in production.

Consider using slog.Debug for intermediate steps and reserving slog.Info for significant events, or removing the per-step logging entirely and relying on service-layer logging.

agent-manager-service/services/gateway_events_service.go (1)

19-23: Inconsistent logging — uses stdlib log while the rest of the codebase uses slog or structured logger.

This file imports "log" and uses log.Printf with manual format strings (e.g., [ERROR], [INFO]), while other files in the same PR use "log/slog" or the project's logger package. This loses structured logging benefits and creates inconsistency.

agent-manager-service/services/llm_provider_template_service.go (1)

208-241: Consider whether an empty config map should be serialized.

When all optional fields (Metadata, PromptTokens, etc.) are nil, serializeConfiguration writes "{}" into Configuration. On a subsequent deserializeConfiguration pass, this yields all-nil fields again, so it round-trips correctly. However, during an Update, if the caller intends to leave Configuration unchanged but passes a zero-value template, the existing configuration will be overwritten with "{}".

If partial updates should preserve existing configuration, the service should skip serialization (or the caller should merge) when no config fields are provided.

agent-manager-service/repositories/llm_proxy_repository.go (1)

87-101: Redundant error branching in GetByID.

Both branches of the if errors.Is(…) check return the same thing (nil, err). The same pattern appears in GetByUUID/GetByNameAndOrgID/GetTokenByUUID in the gateway repo. If the intent is to map ErrRecordNotFound to nil, nil (like other repos do), this needs fixing; otherwise, the caller must handle gorm.ErrRecordNotFound, which differs from the template service's nil-check pattern at llm_provider_template_service.go line 131.

If the intent is to return nil for not-found (consistent with the service layer's nil check):
 func (r *LLMProxyRepo) GetByID(proxyID, orgUUID string) (*models.LLMProxy, error) {
 	var proxy models.LLMProxy
 	err := r.db.
 		Preload("Artifact").
 		Joins("JOIN artifacts a ON llm_proxies.uuid = a.uuid").
 		Where("a.handle = ? AND a.organization_uuid = ? AND a.kind = ?", proxyID, orgUUID, models.KindLLMAPI).
 		First(&proxy).Error
 	if err != nil {
 		if errors.Is(err, gorm.ErrRecordNotFound) {
-			return nil, err
+			return nil, nil
 		}
 		return nil, err
 	}
 	return &proxy, nil
 }
agent-manager-service/services/llm_provider_service.go (2)

63-173: Excessive slog.Info on every step of the happy path.

There are 12+ slog.Info calls in Create alone, including before/after every sub-operation. In production, this will generate significant log volume for routine CRUD. Consider reducing happy-path logs to slog.Debug and keeping only the entry/exit logs at Info level. Error/warning logs are fine as-is.

The same pattern is repeated in List, Get, Update, and Delete.


318-339: Provider deletion doesn't check for dependent proxies.

Delete removes a provider without checking whether any LLM proxies reference it (provider_uuid FK). If proxies exist, this will either fail with a database FK constraint error (unhandled) or leave orphaned proxies depending on the DB schema. Consider either checking for dependent proxies before deletion and returning a descriptive error, or cascading the delete.

agent-manager-service/repositories/deployment_repository.go (1)

192-210: Non-idiomatic Go variable names in SetCurrent.

Variables like artifactUUID_uuid, orgUUID_uuid, gatewayUUID_uuid, deploymentUUID_uuid use underscores, which violates Go naming conventions. Consider parsedArtifactUUID, parsedOrgUUID, etc.

agent-manager-service/repositories/api_repository.go (1)

257-282: DeleteAPI silently succeeds when the API doesn't exist.

The method deletes associations, deployments, rest_api, and artifact records, but none of the intermediate steps check RowsAffected. If the apiUUID doesn't exist, all DELETE statements are no-ops and the method returns nil, misleading the caller into thinking a deletion occurred.

Consider checking RowsAffected on the rest_api or artifact deletion step.

agent-manager-service/services/gateway_internal_service.go (1)

318-343: Hardcoded context path /api/v1 and API version.

generateAPIDeploymentYAML hardcodes Context: "/api/v1" and ApiVersion: "gateway.api-platform.wso2.com/v1alpha1". If the API model carries a context/basepath (e.g., via Configuration), it should be used instead. Hardcoding means all generated YAMLs share the same context path regardless of the actual API.

agent-manager-service/controllers/llm_controller.go (1)

164-177: Pagination validation is duplicated across four list handlers.

The same limit/offset capping logic (default 20, max 100, min offset 0) is repeated in ListLLMProviderTemplates, ListLLMProviders, ListLLMProxies, and ListLLMProxiesByProvider. Extract this into a shared helper.

Example helper
func parsePagination(r *http.Request) (limit, offset int) {
	limit = getIntQueryParam(r, "limit", 20)
	offset = getIntQueryParam(r, "offset", 0)
	if limit <= 0 {
		limit = 20
	}
	if limit > 100 {
		limit = 100
	}
	if offset < 0 {
		offset = 0
	}
	return limit, offset
}

Also applies to: 408-421, 688-701, 742-755

agent-manager-service/controllers/llm_deployment_controller.go (2)

59-69: resolveOrgUUID properly fixed — but note inconsistency with gateway_controller.go.

This version correctly separates DB errors from "not found" errors. However, the equivalent resolveOrgUUID in gateway_controller.go (lines 83-93) still maps all errors (including DB failures) to ErrOrganizationNotFound, which the caller then returns as 401 Unauthorized. Consider aligning both helpers.


177-257: UndeployLLMProviderDeployment — using query params for required parameters on a POST.

deploymentId and gatewayId are required for this operation but are read from query parameters rather than the request body or path. This works, but it's an unusual pattern for POST endpoints and diverges from how DeleteLLMProviderDeployment takes deploymentId as a path parameter. Consider whether consistency across endpoints would improve API ergonomics.

agent-manager-service/services/platform_gateway_service.go (2)

156-163: Redundant UUID generate-then-parse.

uuid.New() returns a uuid.UUID directly. Stringifying it (line 157) only to parse it back (line 160) is unnecessary and can never fail.

Proposed simplification
-	gatewayID := uuid.New().String()
-
-	// 5. Parse and create Gateway model
-	gatewayUUID, err := uuid.Parse(gatewayID)
-	if err != nil {
-		return nil, fmt.Errorf("invalid gateway UUID: %w", err)
-	}
+	gatewayUUID := uuid.New()

Then use gatewayUUID directly (and gatewayUUID.String() where needed, e.g., for uuid.MustParse on line 207).


646-651: Regex compiled on every validation call.

regexp.MustCompile is called each time validateGatewayInput runs. Since the pattern is constant, compile it once at the package level.

Proposed fix
+var gatewayNamePattern = regexp.MustCompile(`^[a-z0-9-]+$`)
+
 func (s *PlatformGatewayService) validateGatewayInput(orgID, name, displayName, vhost, functionalityType string) error {
     // ...
-    namePattern := regexp.MustCompile(`^[a-z0-9-]+$`)
-    if !namePattern.MatchString(name) {
+    if !gatewayNamePattern.MatchString(name) {
agent-manager-service/controllers/gateway_internal_controller.go (1)

55-109: Repeated API-key auth boilerplate across all 4 handlers.

Each handler duplicates the same ~15-line authentication block (header extraction, VerifyToken, error handling). Extracting this into an authentication middleware or a shared helper (returning (*models.PlatformGateway, error)) would reduce duplication and make it easier to update auth logic in one place.

Also applies to: 112-176, 270-335

agent-manager-service/controllers/gateway_controller.go (1)

483-492: RevokeGatewayToken is a stub returning 501.

This is exposed in the public API interface. If this endpoint is reachable by clients, they'll get an unexpected "not implemented" error.

Would you like me to help implement the revocation logic using the existing s.gatewayRepo.RevokeToken(tokenId) method, or open an issue to track this?

agent-manager-service/main.go (1)

102-103: db variable shadows the db package import.

After db := db.GetDB(), the imported db package is no longer accessible by that name. Currently safe since the package isn't referenced after this line, but a future edit adding db.DB(ctx) calls here would silently use the wrong db. Consider renaming the local variable to database or dbConn.

Comment thread agent-manager-service/controllers/gateway_controller.go
Comment thread agent-manager-service/controllers/gateway_controller.go
Comment thread agent-manager-service/controllers/gateway_internal_controller.go
Comment thread agent-manager-service/controllers/websocket_controller.go
Comment thread agent-manager-service/controllers/websocket_controller.go
Comment thread agent-manager-service/services/gateway_internal_service.go
Comment thread agent-manager-service/services/llm_deployment_service.go
Comment thread agent-manager-service/services/llm_template_seeder.go
Comment thread agent-manager-service/services/platform_gateway_service.go Outdated
Comment thread agent-manager-service/services/platform_gateway_service.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents
In `@agent-manager-service/api/app.go`:
- Around line 63-91: The internal handler chain in MakeInternalHTTPHandler lacks
a global API-key auth layer; wrap the composed internalHandler with the API-key
auth middleware (use the existing params.AuthMiddleware function) so all routes
registered by RegisterGatewayInternalRoutes and RegisterWebSocketRoutes are
protected. Specifically, after building internalHandler (the variable named
internalHandler) and before applying logger/CORS/Recoverer or before mux.Handle,
call params.AuthMiddleware()(internalHandler) (or the equivalent API-key
middleware) to ensure a gatekeeper is applied to the internal routes.

In `@agent-manager-service/config/config_loader.go`:
- Around line 175-194: Add validation for the InternalServer block similar to
the existing HTTP server checks: create a new function
validateInternalServerConfigs(cfg *Config, r *configReader) that checks
InternalServer.Port is 1..65535, ReadTimeoutSeconds and WriteTimeoutSeconds are
> 0, and CertDir is non-empty (append errors to r.errors on failures), then call
validateInternalServerConfigs(&config, r) after the InternalServer assignment
and before the existing validateHTTPServerConfigs call so invalid internal
server values are caught during config loading.

In `@agent-manager-service/controllers/llm_controller.go`:
- Around line 96-105: The resolveProjectUUID function currently masks all errors
returned by ocClient.GetProject and always returns utils.ErrProjectNotFound;
modify resolveProjectUUID to distinguish not-found responses from other errors
by inspecting the error from c.ocClient.GetProject (or using a provided
sentinel/type from ocClient), returning utils.ErrProjectNotFound only when the
project truly does not exist and returning or wrapping other errors (e.g.,
fmt.Errorf("GetProject: %w", err)) for network/RPC/backend failures so callers
see 500-level errors; reference resolveProjectUUID and c.ocClient.GetProject and
ensure you preserve the existing nil-project check returning
utils.ErrProjectNotFound.
- Around line 84-93: The resolveOrgUUID function currently maps all errors from
orgRepo.GetOrganizationByName to utils.ErrOrganizationNotFound; change
resolveOrgUUID to return the original DB error (only return
utils.ErrOrganizationNotFound when the returned org is nil or when the repo
explicitly signals not-found), i.e., call orgRepo.GetOrganizationByName and if
err != nil return err, then if org == nil return utils.ErrOrganizationNotFound,
and keep returning org.UUID.String() on success; then update callers that invoke
resolveOrgUUID to distinguish errors using errors.Is(err,
utils.ErrOrganizationNotFound) → respond 404, otherwise log the err and respond
500 (use errors.Is for comparison).

In `@agent-manager-service/repositories/api_repository.go`:
- Around line 188-201: The SELECT in GetDeployedAPIsByGatewayUUID is missing
art.handle so the returned models.API.Handle will be empty; update the query in
APIRepo.GetDeployedAPIsByGatewayUUID to include "art.handle" in the Select list
(alongside a.uuid, art.name, etc.) so the Handle field is populated, keeping the
same Joins/Where/Order and Scan into apis.
- Around line 203-216: The SELECT in GetAPIsByGatewayUUID omits art.handle so
the API models' Handle field is not populated; update the query in
GetAPIsByGatewayUUID to include "art.handle" in the Select list (alongside
art.name, art.version, etc.) so the returned []*models.API contains the Handle
value, ensuring the column alias matches the struct field mapping used by the
scan.

In `@agent-manager-service/repositories/artifact_repository.go`:
- Around line 92-104: GetByHandle currently returns (nil, nil) when no record is
found which triggers the nilnil linter; change it to return a sentinel error
instead of nil on not-found. Define a package-level error like
ErrArtifactNotFound and modify ArtifactRepo.GetByHandle to return (nil,
ErrArtifactNotFound) when errors.Is(err, gorm.ErrRecordNotFound); update callers
to check with errors.Is(err, ErrArtifactNotFound) (or adapt callers if you
prefer the alternative approach mentioned in the comment).

In `@agent-manager-service/repositories/gateway_repository.go`:
- Around line 139-161: Both UpdateGateway and UpdateActiveStatus currently
return nil even when no row matches the UUID; change them to inspect the GORM
result: assign the update call to a variable (e.g., res :=
r.db.Model(&models.Gateway{}).Where("uuid = ?", gateway.UUID)...Updates(...)),
then if res.Error != nil return res.Error; if res.RowsAffected == 0 return
gorm.ErrRecordNotFound (or a custom "gateway not found" error) so callers can
distinguish "not found" from success; apply the same pattern to
UpdateActiveStatus (use gatewayId in the Where and check res.RowsAffected).

In `@agent-manager-service/repositories/llm_provider_template_repository.go`:
- Around line 149-157: List currently returns templates without decoding the
Configuration JSON into the parsed fields, so callers get nil
Metadata/PromptTokens/etc.; add a private helper (e.g., unmarshalConfig(template
*models.LLMProviderTemplate) error) that performs the JSON unmarshal of
template.Configuration into the struct fields and call it for each template in
LLMProviderTemplateRepo.List after Find, and refactor GetByHandle and GetByUUID
to reuse this helper so all three methods produce fully populated templates.

In `@agent-manager-service/repositories/organization_repository.go`:
- Around line 86-109: GetOrCreateOrganization has a race where two callers can
both see ErrRecordNotFound and call CreateOrganization, causing a duplicate-key
error; update OrganizationRepo.GetOrCreateOrganization to detect a
unique-constraint/duplicate-key error returned from CreateOrganization (inspect
the DB/GORM error code/message), and in that case retry
r.GetOrganizationByName(name) and return its result instead of propagating the
create error; keep the normal behavior for other errors from CreateOrganization.
Use the existing methods GetOrganizationByName and CreateOrganization for the
retry path.

In
`@agent-manager-service/resources/default-llm-provider-templates/azureopenai-template.yaml`:
- Around line 29-30: The template currently hardcodes external asset URLs
(logoUrl and openapiSpecUrl) to a personal GitHub repo for LLM provider
templates (azureopenai, openai, mistral, awsbedrock, anthropic, azureaifoundry);
update each template to point to official/organizational-hosted assets or
internal static hosting instead of raw.githubusercontent.com/nomadxd, replacing
the logoUrl and openapiSpecUrl values for the affected templates (e.g., entries
in azureopenai-template.yaml) with stable organizational URLs or local asset
paths, and verify the assets are accessible and tests/consumers load them
correctly.

In `@agent-manager-service/services/environment_service.go`:
- Around line 73-76: The code currently classifies not-found errors by
string-matching err.Error() (e.g., checking "404" or "not found") in
environment_service.go (occurring around the blocks that return
utils.ErrEnvironmentNotFound at lines shown); replace those fragile checks with
typed error comparison using errors.Is(err, utils.ErrEnvironmentNotFound) so the
function(s) returning environment info return the canonical
utils.ErrEnvironmentNotFound when appropriate (keep the existing return values
and logging, just change the conditional to use errors.Is). Ensure you import
the standard "errors" package if not already imported and update both
occurrences (the blocks around the earlier return and the later one near line
168).

In `@agent-manager-service/services/gateway_internal_service.go`:
- Around line 145-162: The controller expects the sentinel error
utils.ErrDeploymentNotActive, so replace the ad-hoc fmt.Errorf("deployment not
active") returns in GetActiveDeploymentByGateway (and apply the same change to
GetActiveLLMProviderDeploymentByGateway and all deployment-related returns in
CreateGatewayDeployment) with the sentinel error — either return
utils.ErrDeploymentNotActive directly or wrap it using fmt.Errorf("deployment
not active: %w", utils.ErrDeploymentNotActive) so errors.Is will match; also
ensure other repo errors are still wrapped (e.g., fmt.Errorf("failed to get
deployment: %w", err)) so original error context is preserved.
- Around line 328-353: generateAPIDeploymentYAML currently hardcodes
Spec.Context to "/api/v1"; update it to read the actual value from
api.Configuration.Context with a safe nil check and fallback. Specifically,
inside generateAPIDeploymentYAML compute a context string by checking if
api.Configuration != nil and api.Configuration.Context != nil and non-empty, use
*api.Configuration.Context, otherwise use "/api/v1", and assign that to
APIDeploymentSpec.Context so the generated YAML reflects the API's configured
context.

In `@agent-manager-service/services/llm_provider_service.go`:
- Around line 155-161: The call in LLMProviderService.Create uses
s.providerRepo.GetByUUID(handle, orgID) but "handle" is derived from
provider.Configuration.Name (a name/handle string), so either rename the
repository method to reflect a name lookup (e.g., GetByHandle or GetByName) or
change Create to fetch by the actual UUID returned when creating the provider;
update the code paths accordingly (LLMProviderService.Create,
providerRepo.GetByUUID) so the parameter type and method name match (use
provider.Configuration.Name with a GetByHandle/GetByName method, or capture the
created provider's UUID and call GetByUUID(uuid, orgID)).

In `@agent-manager-service/services/llm_proxy_service.go`:
- Around line 70-77: The call to s.providerRepo.GetByUUID in Create (and the
similar calls in Update and ListByProvider) currently wraps any error as "failed
to validate provider", which hides gorm.ErrRecordNotFound and makes the
subsequent nil-check unreachable; change the error handling in the Create path
(and mirror for Update and ListByProvider) to check if errors.Is(err,
gorm.ErrRecordNotFound) and return utils.ErrLLMProviderNotFound, otherwise
return a wrapped error (e.g., fmt.Errorf("failed to validate provider: %w",
err)); update the providerModel == nil guards accordingly so not-found is
handled via the specific ErrLLMProviderNotFound return.
- Around line 148-163: The Get method currently returns a wrapped
gorm.ErrRecordNotFound as a generic error and has an unreachable proxy == nil
check; update LLMProxyService.Get to detect errors.Is(err,
gorm.ErrRecordNotFound) after calling s.proxyRepo.GetByID and return
utils.ErrLLMProxyNotFound in that case (consistent with Delete and Update),
otherwise wrap and return other errors as before, and remove the redundant proxy
== nil branch.

In `@agent-manager-service/services/platform_gateway_service.go`:
- Around line 669-676: The validation is case-sensitive while storage lowercases
the value, causing a mismatch between functionalityType and downstream
expectations; fix by normalizing functionalityType before validation (e.g.,
create a normalized := strings.ToLower(functionalityType)), change validTypes to
use lowercase keys ("regular","ai","event") and validate against normalized,
then store the normalized value consistently (or alternatively remove
strings.ToLower at storage if you prefer preserving original casing); update
references to functionalityType (validation map, the strings.ToLower usage, and
any assignment that persists the value) so validation and storage use the same
canonical casing.
- Around line 586-601: The artifactType check inside the loop is loop-invariant
and wrongly filters out results when artifactType == "all"; move the filter
outside the loop in the function handling apis (reference symbols: artifactType,
apis, GatewayArtifact, allArtifacts) and treat "all" (and empty string) as no
filter: before iterating, if artifactType is non-empty and not "all" and not
"API" then short-circuit/return an empty allArtifacts slice; otherwise iterate
over apis and append every API as you already do (i.e., only allow the loop to
run when artifactType == "" or "all" or "API").
🟡 Minor comments (16)
agent-manager-service/repositories/organization_repository.go-100-100 (1)

100-100: ⚠️ Potential issue | 🟡 Minor

Stale comment: UUID is generated in Go, not by the database.

Line 100 says "UUID is auto-generated by database" but the migration has no DEFAULT on the uuid column, and uuid.New() on line 97 generates it in the application. Remove or correct this comment.

agent-manager-service/db_migrations/004_create_artifacts_and_apis.go-46-57 (1)

46-57: ⚠️ Potential issue | 🟡 Minor

project_uuid in rest_apis has no foreign key constraint and lifecycle_status has no CHECK constraint.

  1. project_uuid is NOT NULL but lacks a FK reference — if projects are stored in this DB, a FK would enforce referential integrity. If projects are external, a comment explaining this would help.
  2. lifecycle_status defaults to 'CREATED' but unlike deployment_status.status, there's no CHECK constraint to restrict valid values.
agent-manager-service/db_migrations/004_create_artifacts_and_apis.go-36-37 (1)

36-37: ⚠️ Potential issue | 🟡 Minor

Use TIMESTAMPTZ instead of TIMESTAMP for time columns.

All timestamp columns (created_at, updated_at) across the five tables use TIMESTAMP (without time zone). PostgreSQL best practice is to use TIMESTAMPTZ to avoid ambiguity when the server or application changes time zones. This applies to artifacts (Lines 36-37), deployments (Line 69), deployment_status (Line 88), and association_mappings (Lines 109-110).

Also applies to: 69-69, 88-88, 109-110

agent-manager-service/services/environment_service.go-88-91 (1)

88-91: ⚠️ Potential issue | 🟡 Minor

Add a comment explaining why UpdatedAt is set to CreatedAt.

The OpenChoreo EnvironmentResponse struct has no UpdatedAt field, so the code currently sets UpdatedAt to CreatedAt silently and without explanation. This occurs at both lines 88–91 and line 139. Since the code already comments missing fields (e.g., Description), add a similar comment here for consistency and clarity, such as: // OpenChoreo EnvironmentResponse doesn't have UpdatedAt.

agent-manager-service/server/internal_server.go-64-64 (1)

64-64: ⚠️ Potential issue | 🟡 Minor

Malformed structured log call — positional args without keys.

slog.Info expects alternating key-value pairs after the message. Here, certPath and keyPath are passed as bare values, producing garbled output (or runtime warnings in newer slog versions).

Proposed fix
-	slog.Info("initializing with certs", certPath, keyPath)
+	slog.Info("Initializing with certs", "certPath", certPath, "keyPath", keyPath)
agent-manager-service/controllers/llm_deployment_controller.go-398-401 (1)

398-401: ⚠️ Potential issue | 🟡 Minor

HTTP 204 No Content should not include a response body.

WriteSuccessResponse with http.StatusNoContent and struct{}{} will attempt to write a JSON body ({}). Per HTTP semantics, a 204 response must not contain a body. Use w.WriteHeader(http.StatusNoContent) directly instead.

Proposed fix
-	utils.WriteSuccessResponse(w, http.StatusNoContent, struct{}{})
+	w.WriteHeader(http.StatusNoContent)
agent-manager-service/main.go-164-175 (1)

164-175: ⚠️ Potential issue | 🟡 Minor

os.Exit(1) inside a goroutine can race with main server startup.

If the internal server fails to bind (e.g., port conflict), os.Exit(1) on Line 173 will terminate the process from within a goroutine, bypassing deferred cleanup and potentially racing with the main server's ListenAndServe on Line 179. Consider using a channel or errgroup to propagate the error and let main perform a coordinated exit.

Sketch using an error channel
+	internalErrCh := make(chan error, 1)
 	go func() {
 		slog.Info("Internal HTTPS server is running", ...)
 		if err := internalServer.Start(); err != nil && !errors.Is(err, http.ErrServerClosed) {
-			slog.Error("Failed to start internal server", "error", err)
-			os.Exit(1)
+			internalErrCh <- err
+			return
 		}
+		internalErrCh <- nil
 	}()

Then in main, select on the channel alongside ListenAndServe to handle the error gracefully.

agent-manager-service/repositories/llm_provider_repository.go-94-95 (1)

94-95: ⚠️ Potential issue | 🟡 Minor

Stale comment: says "GetByID" but method is GetByUUID.

-// GetByID retrieves an LLM provider by ID (handle)
+// GetByUUID retrieves an LLM provider by UUID
agent-manager-service/resources/default-llm-provider-templates/anthropic-template.yaml-30-31 (1)

30-31: ⚠️ Potential issue | 🟡 Minor

External URLs reference a personal GitHub fork — consider using an official or organization-controlled source.

Both logoUrl and openapiSpecUrl point to https://raw.githubusercontent.com/nomadxd/openapi-connectors/.... If this personal repository is deleted, renamed, or restructured, these URLs will break at runtime. Consider hosting these assets under the wso2 organization or another stable, organization-controlled location.

This same concern applies to all other provider template files (OpenAI, AWS Bedrock, Azure OpenAI, Azure AI Foundry, and Mistral) referencing the same repository.

agent-manager-service/models/rest_api.go-60-65 (1)

60-65: ⚠️ Potential issue | 🟡 Minor

Remove unused Channel and ChannelRequest types.

RestAPIConfig has an Operations field but no Channels field. The Channel and ChannelRequest types are not referenced anywhere in the codebase and constitute dead code that should be removed.

agent-manager-service/models/rest_api.go-89-97 (1)

89-97: ⚠️ Potential issue | 🟡 Minor

Remove db tags or replace with gorm tags for consistency.

APIMetadata uses db:"..." tags, which GORM does not recognize for column mapping. The code currently works because GetAPIMetadataByHandle() uses an explicit Select() that relies on positional mapping rather than tags. This is fragile—any reordering of the Select() columns or struct fields will break the mapping silently. For consistency with RestAPI (which uses gorm:"column:..." in the same file) and to make column mapping explicit and maintainable, either replace db:"" with gorm:"column:..." or document why db:"" is used if it serves a specific purpose.

agent-manager-service/services/gateway_internal_service.go-248-250 (1)

248-250: ⚠️ Potential issue | 🟡 Minor

CreatedBy is hardcoded to "admin" — should derive from the authenticated gateway or caller context.

When creating a new API from a gateway deployment notification, the CreatedBy field is hardcoded to "admin". This loses provenance information. Consider propagating the gateway identity or a system-level identifier.

agent-manager-service/repositories/gateway_repository.go-93-104 (1)

93-104: ⚠️ Potential issue | 🟡 Minor

Static analysis: GetByNameAndOrgID returns (nil, nil) — use a sentinel error instead.

The nilnil linter flags this pattern. Returning (nil, nil) forces callers to check both the error and the result for nil, which is error-prone. Consider returning a sentinel error like gorm.ErrRecordNotFound (or a domain-specific one) instead.

Proposed fix
 func (r *GatewayRepo) GetByNameAndOrgID(name, orgID string) (*models.Gateway, error) {
 	var gateway models.Gateway
 	err := r.db.Where("name = ? AND organization_uuid = ?", name, orgID).First(&gateway).Error
 	if err != nil {
 		if errors.Is(err, gorm.ErrRecordNotFound) {
-			return nil, nil
+			return nil, gorm.ErrRecordNotFound
 		}
 		return nil, err
 	}
 	return &gateway, nil
 }
agent-manager-service/controllers/llm_controller.go-319-319 (1)

319-319: ⚠️ Potential issue | 🟡 Minor

Sending a body with HTTP 204 No Content.

WriteSuccessResponse(w, http.StatusNoContent, struct{}{}) will attempt to serialize struct{}{} as JSON and write it to the response. Per HTTP semantics, 204 responses should not include a body. The same pattern appears on line 598 and line 930.

Proposed fix
-	utils.WriteSuccessResponse(w, http.StatusNoContent, struct{}{})
+	w.WriteHeader(http.StatusNoContent)
agent-manager-service/controllers/gateway_controller.go-97-97 (1)

97-97: ⚠️ Potential issue | 🟡 Minor

Incorrect slog.Info usage — will produce malformed structured log entries.

slog.Info("organization", org.UUID.String(), org.Name) uses slog with positional args where org.UUID.String() becomes a log key and org.Name becomes the value. This appears to be a debug leftover. It should either use the contextual logger or be removed.

Proposed fix
-	slog.Info("organization", org.UUID.String(), org.Name)

If needed for debugging, use structured logging consistently:

// Use wherever a logger from context is available:
// log.Info("resolveOrgUUID: resolved", "orgUUID", org.UUID.String(), "orgName", org.Name)
agent-manager-service/controllers/gateway_controller.go-644-656 (1)

644-656: ⚠️ Potential issue | 🟡 Minor

UpdatedAt is being incorrectly assigned to CreatedAt — needs alternative source.

Line 654: UpdatedAt: ocEnv.CreatedAt is incorrect. However, ocEnv (OpenChoreo EnvironmentResponse) does not have an UpdatedAt field to fall back to. Since GatewayEnvironmentResponse requires this field, determine the appropriate source for UpdatedAt — either track it separately in the mapping, use a default value, or adjust the response model.

🧹 Nitpick comments (34)
agent-manager-service/controllers/websocket_controller.go (3)

196-206: readLoop breaks the Transport abstraction by type-asserting to *ws.WebSocketTransport.

The Transport interface was designed to allow swapping protocol implementations, but readLoop hard-casts to the concrete type to call ReadMessage(). This defeats the abstraction and will break if a different transport is used (e.g., in tests).

Consider adding a ReadMessage (or equivalent blocking-read) method to the Transport interface, or restructuring so the read loop doesn't need to know the concrete type.


49-55: Duplicate ConnectionAckDTO — another definition exists in models/gateway_internal_dto.go.

The models package already defines a ConnectionAckDTO (with slightly different fields: Message instead of ConnectionID/Timestamp). Having two types with the same name in different packages creates confusion. Consider consolidating into a single definition in models.

#!/bin/bash
# Verify the duplicate ConnectionAckDTO
rg -n 'ConnectionAckDTO' --type go

128-134: Misleading variable name and discarded error from json.Marshal.

Line 130: jsonErr is actually the marshalled JSON bytes, not an error. The marshal error (second return value) is silently discarded with _. If marshalling fails, jsonErr is nil and the send is skipped — functionally safe, but the naming is confusing.

Proposed fix
-		if jsonErr, _ := json.Marshal(errorMsg); jsonErr != nil {
-			if err := conn.WriteMessage(websocket.TextMessage, jsonErr); err != nil {
+		if msgBytes, marshalErr := json.Marshal(errorMsg); marshalErr == nil {
+			if err := conn.WriteMessage(websocket.TextMessage, msgBytes); err != nil {
 				log.Error("Failed to send error message", "gatewayID", gateway.UUID.String(), "error", err)
 			}
 		}
agent-manager-service/repositories/organization_repository.go (1)

59-69: Redundant if branch — both paths return (nil, err).

The errors.Is(err, gorm.ErrRecordNotFound) check doesn't change the return value. Same pattern in GetOrganizationByUUID. This is harmless but unnecessary.

Simplified version
 func (r *OrganizationRepo) GetOrganizationByName(name string) (*models.Organization, error) {
 	var org models.Organization
 	err := r.db.Where("name = ?", name).First(&org).Error
 	if err != nil {
-		if errors.Is(err, gorm.ErrRecordNotFound) {
-			return nil, err
-		}
 		return nil, err
 	}
 	return &org, nil
 }
agent-manager-service/services/infra_resource_manager.go (1)

67-83: On-demand sync on every list call may add latency.

Each ListOrganizations invocation now issues one DB round-trip (GetOrCreateOrganization) per organization returned by OpenChoreo. If the org list grows or calls are frequent, this could noticeably increase response times. Consider caching already-synced org names (e.g., in-memory set with a short TTL) or performing the sync asynchronously/in bulk.

agent-manager-service/models/llm_provider.go (1)

156-167: Inconsistent yaml struct tags on SecurityConfig and APIKeySecurity.

These two types carry both json and yaml struct tags, while every other type in this file uses only json tags. If YAML serialization isn't needed at the model layer, remove the yaml tags for consistency; if it is needed, apply them uniformly.

agent-manager-service/services/llm_deployment_service.go (2)

88-232: DeployLLMProvider — consider a more targeted broadcast-failure log level and verify event consistency.

The broadcast failure at Lines 222-229 is silently swallowed (only logged). This is acceptable for resilience, but consider whether a retry or at-least-once delivery mechanism is needed. If the gateway never learns about the deployment, the system state becomes inconsistent (deployment recorded in DB but gateway unaware).

This is a design-level observation — the current "fire and forget" approach is fine for MVP but should be revisited for production reliability.


89-103: Input validation could return a single aggregated error instead of short-circuiting.

Currently, each missing field returns immediately. Callers making multiple mistakes must fix-and-retry one at a time. Minor UX concern, not blocking.

agent-manager-service/api/app.go (1)

86-86: CORS middleware on internal server may be unnecessary.

The internal server is designed for server-to-server communication (WebSocket gateway connections, internal APIs) over TLS with API-key auth. CORS is a browser-enforced mechanism and adds no security value for non-browser clients. Consider whether this middleware is needed here.

agent-manager-service/services/environment_service.go (2)

59-62: No-op CreateEnvironment returns a generic fmt.Errorf — consider returning a typed/sentinel error.

UpdateEnvironment and DeleteEnvironment follow the same pattern. Returning plain fmt.Errorf strings makes it harder for callers (controllers) to distinguish "not supported" from actual failures and return appropriate HTTP status codes (e.g., 405 vs 500).


196-204: Silently skipping missing gateways may hide data integrity issues.

When GetByUUID fails or returns nil, the code logs a warning and continues. This is a reasonable fallback, but if gateway-environment mappings reference non-existent gateways, it could indicate stale data. Consider whether returning a partial result without any signal to the caller is acceptable.

agent-manager-service/db_migrations/004_create_artifacts_and_apis.go (2)

121-125: idx_deployments_artifact_gateway is redundant.

The index on (artifact_uuid, gateway_uuid) at Line 123 is a prefix of the index on (artifact_uuid, gateway_uuid, created_at) at Line 124. PostgreSQL can use the latter index for queries that only filter on (artifact_uuid, gateway_uuid), making the former redundant. Consider removing Line 123 to reduce write overhead.

Proposed fix
 			-- Indexes for performance
 			CREATE INDEX idx_rest_apis_project ON rest_apis(project_uuid);
 			CREATE INDEX idx_artifacts_org ON artifacts(organization_uuid);
-			CREATE INDEX idx_deployments_artifact_gateway ON deployments(artifact_uuid, gateway_uuid);
 			CREATE INDEX idx_deployments_created_at ON deployments(artifact_uuid, gateway_uuid, created_at);

52-53: Inconsistent JSON storage strategy: JSONB vs TEXT.

configuration (Line 53) uses JSONB, while transport (Line 52) and metadata (Line 68) store JSON as VARCHAR(255)/TEXT. If these are always JSON, using JSONB consistently would enable validation and query capabilities. If they're intentionally plain text, consider adding comments explaining why.

Also applies to: 67-68

agent-manager-service/config/config.go (1)

205-223: Consider documenting default values as struct tags or constants.

The comments reference defaults (port 9243, cert dir ./data/certs, max connections 1000, etc.) but they're only in comments. If defaults are applied in the config loader, this is fine — but co-locating defaults as constants near these types would improve discoverability.

agent-manager-service/models/upstream.go (1)

32-37: Defensive measure: Consider masking UpstreamAuth.Value field in API responses.

If UpstreamAuth (or its parent models RestAPI/RestAPIConfig) is ever returned in JSON API responses in the future, the Value field would be exposed as-is since it currently has json:"value,omitempty". To prevent accidental secret leakage, consider adding json:"-" to the Value field or implementing a custom marshaler that omits this field.

Currently, these models are only used internally for YAML generation and gateway deployment, but this change ensures protection if the data model is later exposed through API endpoints.

agent-manager-service/repositories/artifact_repository.go (1)

68-78: Update silently succeeds when no rows match.

Unlike Delete (which checks RowsAffected == 0), Update doesn't verify that a row was actually modified. If the artifact UUID + org UUID combination doesn't exist, GORM returns no error but RowsAffected will be 0. This could mask bugs where callers assume the update took effect.

Proposed fix
 func (r *ArtifactRepo) Update(tx *gorm.DB, artifact *models.Artifact) error {
 	artifact.UpdatedAt = time.Now()
-	return tx.Model(&models.Artifact{}).
+	result := tx.Model(&models.Artifact{}).
 		Where("uuid = ? AND organization_uuid = ?", artifact.UUID, artifact.OrganizationUUID).
 		Updates(map[string]interface{}{
 			"name":       artifact.Name,
 			"version":    artifact.Version,
 			"updated_at": artifact.UpdatedAt,
-		}).Error
+		})
+	if result.Error != nil {
+		return result.Error
+	}
+	if result.RowsAffected == 0 {
+		return gorm.ErrRecordNotFound
+	}
+	return nil
 }
agent-manager-service/server/internal_server.go (2)

68-93: Loaded certificate is not checked for expiry.

When reusing an existing certificate from disk (Line 71), there's no validation that it hasn't expired. After the 1-year validity period, the server will silently start with an expired cert, causing TLS handshake failures for clients.

Sketch: add expiry check after loading
 			loadedCert, err := tls.LoadX509KeyPair(certPath, keyPath)
 			if err != nil {
 				slog.Warn("Failed to load existing certificates", "error", err)
 			} else {
-				slog.Info("Using existing certificates", "certDir", s.cfg.CertDir)
-				cert = loadedCert
+				// Verify the certificate hasn't expired
+				if len(loadedCert.Certificate) > 0 {
+					x509Cert, parseErr := x509.ParseCertificate(loadedCert.Certificate[0])
+					if parseErr == nil && time.Now().Before(x509Cert.NotAfter) {
+						slog.Info("Using existing certificates", "certDir", s.cfg.CertDir, "expiresAt", x509Cert.NotAfter)
+						cert = loadedCert
+					} else {
+						slog.Warn("Existing certificate is expired or invalid, will regenerate", "certDir", s.cfg.CertDir)
+					}
+				}
 			}

136-152: Static serial number 1 — consider using a random serial.

All generated certificates get SerialNumber: big.NewInt(1). If a cert is regenerated (e.g., after manual deletion), clients that cached the old cert may get confused by the same serial. RFC 5280 recommends unique serial numbers.

Proposed fix
+	serialNumber, err := rand.Int(rand.Reader, new(big.Int).Lsh(big.NewInt(1), 128))
+	if err != nil {
+		return tls.Certificate{}, fmt.Errorf("failed to generate serial number: %w", err)
+	}
+
 	template := x509.Certificate{
-		SerialNumber: big.NewInt(1),
+		SerialNumber: serialNumber,
agent-manager-service/controllers/llm_deployment_controller.go (1)

177-257: Consistent handler patterns across all deployment operations.

All six handlers follow the same lifecycle: context/logger setup → org resolution → input validation → service delegation → error mapping. The error handling is thorough, covering domain-specific errors with appropriate HTTP status codes.

One minor observation: the verbose Info logging at each step (e.g., "organization resolved", "calling service layer") could be noisy under load. Consider reducing some of these to Debug level in a follow-up.

Also applies to: 259-338, 340-401, 403-460, 462-528

agent-manager-service/repositories/llm_provider_template_repository.go (3)

93-98: Redundant ErrRecordNotFound check — both branches return the same value.

Lines 94–97 check for gorm.ErrRecordNotFound but both the if and else branches return (nil, err). This makes the check dead code. If the intent is to return nil, nil for not-found (as GetByUUID implied in past iterations), update accordingly; otherwise, simplify.

Simplify
 	if err != nil {
-		if errors.Is(err, gorm.ErrRecordNotFound) {
-			return nil, err
-		}
 		return nil, err
 	}

124-129: Same redundant pattern in GetByUUID.

The ErrRecordNotFound check at lines 125–128 is identical dead code as noted in GetByHandle.


88-117: Consider extracting a shared unmarshalConfig helper to reduce duplication across GetByHandle, GetByUUID, and (once fixed) List.

The same 8-line configuration-unmarshal block is repeated in GetByHandle (lines 102–113) and GetByUUID (lines 131–142), and will need to appear in List as well. Extracting it reduces the maintenance surface.

Example helper
func unmarshalTemplateConfig(t *models.LLMProviderTemplate) error {
	if t.Configuration == "" {
		return nil
	}
	var cfg llmProviderTemplateConfig
	if err := json.Unmarshal([]byte(t.Configuration), &cfg); err != nil {
		return err
	}
	t.Metadata = cfg.Metadata
	t.PromptTokens = cfg.PromptTokens
	t.CompletionTokens = cfg.CompletionTokens
	t.TotalTokens = cfg.TotalTokens
	t.RemainingTokens = cfg.RemainingTokens
	t.RequestModel = cfg.RequestModel
	t.ResponseModel = cfg.ResponseModel
	return nil
}

Also applies to: 119-146

agent-manager-service/repositories/llm_provider_repository.go (1)

57-92: Excessive slog.Info logging throughout — demote to slog.Debug for routine operations.

Nearly every method logs multiple slog.Info calls for start, intermediate steps (resolving handles, inserting records), and completion. In production, this will generate a very high volume of logs, especially for frequently called methods like GetByUUID, List, and Exists. Reserve slog.Info for significant lifecycle events and use slog.Debug for step-by-step tracing.

Also applies to: 94-115, 117-137, 144-206, 208-249, 251-263

agent-manager-service/models/gateway.go (1)

64-74: Use constants for token status values instead of magic strings.

"active" and "revoked" are used as magic strings in IsActive() and Revoke(). Defining package-level constants prevents typos and makes it easier to reference these values consistently across the codebase (e.g., in queries, validation, or tests).

Proposed fix
+const (
+	TokenStatusActive  = "active"
+	TokenStatusRevoked = "revoked"
+)
+
 // IsActive returns true if token status is active
 func (t *GatewayToken) IsActive() bool {
-	return t.Status == "active"
+	return t.Status == TokenStatusActive
 }

 // Revoke marks the token as revoked with current timestamp
 func (t *GatewayToken) Revoke() {
 	now := time.Now()
-	t.Status = "revoked"
+	t.Status = TokenStatusRevoked
 	t.RevokedAt = &now
 }
agent-manager-service/models/rest_api.go (1)

82-87: Reconsider Params as pointer-to-map—it's non-idiomatic but may be intentional for JSON marshaling.

Maps in Go are reference types; a pointer-to-map adds unnecessary indirection. However, this pattern allows omitempty to distinguish between nil (omitted) and empty maps. If the JSON API contract requires omitting the field entirely when no params exist, the pointer serves a purpose. If empty {} is acceptable in JSON, this can be simplified to map[string]interface{} and the omitempty will work correctly. Verify the intended JSON serialization behavior before refactoring.

agent-manager-service/controllers/gateway_internal_controller.go (1)

56-109: Consider extracting the repeated API-key authentication into middleware or a helper.

All four handlers repeat the same ~15-line block: extract api-key header → VerifyToken → error handling. A shared helper (or middleware that injects the authenticated gateway into the request context) would reduce duplication and ensure consistent auth behavior.

agent-manager-service/services/llm_provider_template_service.go (1)

208-272: Serialize/deserialize helpers mirror the model struct — consider using the struct directly to stay in sync.

If new fields are added to LLMProviderTemplate, they must be added in three places: the model, serializeConfiguration, and deserializeConfiguration. You could serialize/deserialize a dedicated config struct type that is shared between both methods, reducing the chance of field drift.

agent-manager-service/services/gateway_internal_service.go (1)

315-315: Hard-coded deployment limit of 100 — extract as a named constant or configuration value.

The magic number 100 is embedded inline. This should be a named constant or, better, a configurable parameter injected into the service.

agent-manager-service/services/llm_provider_service.go (1)

63-174: Excessive slog.Info calls — most of these should be Debug level.

There are 12+ slog.Info calls in the Create method alone, logging every internal step (e.g., "extracted configuration", "validating template", "checking if provider exists", "set default values", "serializing model providers"). In production, this will create significant log noise. Reserve Info for meaningful business events (creation success/failure) and use Debug for flow tracing.

The same pattern applies to List, Get, Update, and Delete.

agent-manager-service/controllers/llm_controller.go (2)

165-177: Pagination validation logic is duplicated across six handlers.

The same limit/offset clamping block (cap to 100, floor at 0/20) is repeated in ListLLMProviderTemplates, ListLLMProviders, ListLLMProxies, ListLLMProxiesByProvider, etc. Consider extracting a small helper.

Example helper
func sanitizePagination(r *http.Request) (limit, offset int) {
	limit = getIntQueryParam(r, "limit", 20)
	offset = getIntQueryParam(r, "offset", 0)
	if limit <= 0 { limit = 20 }
	if limit > 100 { limit = 100 }
	if offset < 0 { offset = 0 }
	return
}

Also applies to: 409-421, 689-701, 743-755


631-638: Pass projectUUID directly instead of orgID to match the parameter's semantic meaning.

The current code passes orgID (a valid UUID) to ConvertSpecToModelLLMProxy, which expects a projectID parameter, then immediately overwrites proxy.ProjectUUID with the parsed projectUUID. This works because orgID is a valid UUID string from resolveOrgUUID(), but it's semantically misleading—the parameter name doesn't match what's being passed.

The UpdateLLMProxy handler already uses the correct pattern by passing projectUUID directly. Apply the same approach here to eliminate the unnecessary overwrite and clarify intent.

Proposed change
-	proxy := utils.ConvertSpecToModelLLMProxy(&req, orgID)
-	proxy.ProjectUUID, err = utils.ParseUUID(projectUUID)
-	if err != nil {
-		log.Error("CreateLLMProxy: invalid project UUID", "projectUUID", projectUUID, "error", err)
-		utils.WriteErrorResponse(w, http.StatusInternalServerError, "Invalid project UUID")
-		return
-	}
+	proxy := utils.ConvertSpecToModelLLMProxy(&req, projectUUID)
agent-manager-service/controllers/gateway_controller.go (1)

234-250: Client-side pagination after fetching all gateways is inefficient.

ListGateways calls c.gatewayService.ListGateways(&orgID) which retrieves all gateways, then applies offset/limit in memory (lines 240–250). Additionally, getGatewayEnvironmentsFromDB is called per-gateway (line 236), making N+1 calls to the DB and OpenChoreo for every list request. As the number of gateways grows, this will degrade significantly.

agent-manager-service/services/platform_gateway_service.go (2)

147-154: Redundant UUID round-trip: uuid.New().String()uuid.Parse().

uuid.New() already returns a valid uuid.UUID. Converting to string and parsing back is unnecessary overhead.

Proposed fix
-	// 4. Generate UUID for gateway
-	gatewayID := uuid.New().String()
-
-	// 5. Parse and create Gateway model
-	gatewayUUID, err := uuid.Parse(gatewayID)
-	if err != nil {
-		return nil, fmt.Errorf("invalid gateway UUID: %w", err)
-	}
+	// 4. Generate UUID for gateway
+	gatewayUUID := uuid.New()

Then use gatewayUUID directly and gatewayUUID.String() where the string form is needed (e.g., line 198).


638-642: Regex compiled on every validation call — compile once at package level.

regexp.MustCompile inside validateGatewayInput runs on every request. This should be a package-level variable.

Proposed fix
+var gatewayNamePattern = regexp.MustCompile(`^[a-z0-9-]+$`)
+
 func (s *PlatformGatewayService) validateGatewayInput(orgID, name, displayName, vhost, functionalityType string) error {
     // ...
-    namePattern := regexp.MustCompile(`^[a-z0-9-]+$`)
-    if !namePattern.MatchString(name) {
+    if !gatewayNamePattern.MatchString(name) {

Comment thread agent-manager-service/api/app.go
Comment thread agent-manager-service/config/config_loader.go
Comment thread agent-manager-service/controllers/llm_controller.go
Comment thread agent-manager-service/controllers/llm_controller.go
Comment thread agent-manager-service/repositories/api_repository.go
Comment thread agent-manager-service/services/llm_provider_service.go
Comment thread agent-manager-service/services/llm_proxy_service.go
Comment thread agent-manager-service/services/llm_proxy_service.go
Comment thread agent-manager-service/services/platform_gateway_service.go
Comment thread agent-manager-service/services/platform_gateway_service.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@agent-manager-service/repositories/api_repository.go`:
- Around line 296-311: The CheckAPIExistsByNameAndVersionInOrganization function
currently queries models.Artifact by name/version/orgUUID but omits scoping by
kind; update the query in CheckAPIExistsByNameAndVersionInOrganization to
include the same kind filter used by CheckAPIExistsByHandleInOrganization (e.g.,
Where("kind = ?", KindRestAPI) or the appropriate KindRestAPI constant) before
applying the optional excludeHandle filter so only REST API artifacts are
considered and WebSubAPI/other kinds do not cause false-positive collisions.

In `@agent-manager-service/repositories/artifact_repository.go`:
- Around line 70-79: The Update method on ArtifactRepo silently succeeds when no
artifact matches; modify ArtifactRepo.Update to capture the result of the
Updates call (e.g., res :=
tx.Model(&models.Artifact{}).Where(...).Updates(...)), return res.Error if
non-nil, and if res.RowsAffected == 0 return an error (use
gorm.ErrRecordNotFound or a domain-specific NotFound error) so callers like
APIRepo.UpdateAPI can detect the missing artifact and handle the transaction
accordingly.

In `@agent-manager-service/repositories/gateway_repository.go`:
- Around line 206-215: RevokeToken in GatewayRepo updates status but doesn't
check whether any row was affected, so calls for non-existent tokens silently
succeed; modify RevokeToken (in GatewayRepo, operating on models.GatewayToken)
to capture the result of the GORM Updates call, check result.RowsAffected and
return a not-found or explicit error when RowsAffected == 0 (similar to
UpdateGateway/UpdateActiveStatus behavior), propagating any underlying DB error
otherwise.

In `@agent-manager-service/repositories/llm_provider_template_repository.go`:
- Around line 179-198: The Update method on LLMProviderTemplateRepo currently
writes t.Configuration directly, which can persist empty/stale JSON when callers
set parsed fields (e.g., t.Metadata, t.PromptTokens) — add the same marshaling
step used in Create: call serializeConfiguration (or inline the marshaling of
Metadata, PromptTokens, etc.) to produce a JSON string and assign it to
t.Configuration before running
r.db.Model(&models.LLMProviderTemplate{}).Where(...).Updates(...); ensure you
handle and return any marshal error and still update updated_at, RowsAffected,
and result.Error as before.

In `@agent-manager-service/services/llm_provider_service.go`:
- Around line 94-114: The template and provider existence checks
(s.templateRepo.Exists and s.providerRepo.Exists) are performed outside the
transaction in LLMProviderService.Create, creating a TOCTOU race; move these
validations into the same transactional scope used for creation (or perform the
creation inside the transaction and handle the DB unique constraint error), so
that template validation and provider uniqueness checks execute within the
transaction that calls providerRepo.Create, and map any unique-constraint DB
error to utils.ErrLLMProviderExists to preserve the clean error path.
- Around line 240-247: The controller is calling LLMProviderService.Update and
Delete with arguments swapped (passing orgID then providerID) while the service
signatures are Update(providerID, orgID string, ...) and Delete(providerID,
orgID string); fix the controller call sites that invoke
c.providerService.Update(...) and c.providerService.Delete(...) so they pass
providerID first and orgID second (i.e., providerID, orgID, provider for Update
and providerID, orgID for Delete) to match the service method signatures and
ensure correct lookups.
🧹 Nitpick comments (11)
agent-manager-service/services/gateway_internal_service.go (2)

316-316: Extract magic number 100 into a named constant.

The hard deployment limit is buried in the call. A package-level constant (e.g., maxDeploymentsPerArtifact) would be easier to find, tune, and document.

Proposed fix
+const maxDeploymentsPerArtifact = 100
+
 // CreateGatewayDeployment handles the registration of an API deployment from a gateway
 func (s *GatewayInternalAPIService) CreateGatewayDeployment(
-	err = s.deploymentRepo.CreateWithLimitEnforcement(deployment, 100) // Hard limit
+	err = s.deploymentRepo.CreateWithLimitEnforcement(deployment, maxDeploymentsPerArtifact)

335-339: Variable context shadows the imported context package.

Line 336 declares a local variable context which shadows the context package imported on line 20. While the shadow is scoped to this function and the package is not currently used within generateAPIDeploymentYAML, this is poor practice and could cause confusion or latent bugs if the function is extended to use context.Context in the future.

Rename the variable to avoid shadowing, e.g., apiContext.

Proposed fix
-	context := "/api/v1" // Default context
+	apiContext := "/api/v1" // Default context
 	if api.Configuration.Context != nil && *api.Configuration.Context != "" {
-		context = *api.Configuration.Context
+		apiContext = *api.Configuration.Context
 	}

Then use apiContext on line 350:

-			Context: context,
+			Context: apiContext,

(Note: api.Configuration is a struct value, not a pointer, so nil-checking it is unnecessary.)

agent-manager-service/repositories/organization_repository.go (1)

60-83: Redundant error branches in GetOrganizationByName and GetOrganizationByUUID.

Both methods have an if errors.Is(err, gorm.ErrRecordNotFound) branch that returns the same (nil, err) as the outer branch. The ErrRecordNotFound check is unnecessary unless you plan to map it to a different sentinel error.

Simplified example (apply to both methods)
 func (r *OrganizationRepo) GetOrganizationByName(name string) (*models.Organization, error) {
 	var org models.Organization
 	err := r.db.Where("name = ?", name).First(&org).Error
 	if err != nil {
-		if errors.Is(err, gorm.ErrRecordNotFound) {
-			return nil, err
-		}
 		return nil, err
 	}
 	return &org, nil
 }
agent-manager-service/repositories/gateway_repository.go (1)

187-188: Token status values are hardcoded magic strings throughout the file.

"active" (lines 187, 221) and "revoked" (line 212) are used as raw strings. If these drift from the database or model expectations, queries will silently return wrong results. Consider extracting constants (e.g., TokenStatusActive, TokenStatusRevoked) in the models package.

Also applies to: 207-212, 221-221

agent-manager-service/services/environment_service.go (2)

181-188: Direct db.DB(ctx) call bypasses the repository layer for GatewayEnvironmentMapping queries.

All other data access in this PR goes through repository interfaces, but this method uses raw GORM via db.DB(ctx). This creates an inconsistency: the mapping query isn't mockable for unit tests and doesn't benefit from any repository-level encapsulation.

Consider adding a method like GetMappingsByEnvironmentUUID to the GatewayRepository (or a dedicated mapping repository) and injecting it into EnvironmentService.


196-204: The gateway == nil check on line 201 is unreachable.

GatewayRepo.GetByUUID (gateway_repository.go, line 73–83) returns (nil, err) when not found — it never returns (nil, nil). The err != nil check on line 197 already handles this case via continue on line 199, making the nil check on line 201 dead code.

Not harmful, but worth noting for clarity.

agent-manager-service/repositories/llm_provider_template_repository.go (1)

110-127: GetByHandle returns gorm.ErrRecordNotFound on not-found — inconsistent with other repos returning nil, nil.

Lines 116-117 return the raw GORM sentinel error when the record is not found. The previous iteration of GetByUUID (and many GORM repos in this codebase) return nil, nil for not-found, letting the service layer decide the error. Here both branches return the same error, making the if errors.Is(...) check redundant. Consider returning nil, nil for not-found to match GetByUUID's pattern, or at least collapse the duplicate branches.

Option A: return nil, nil for not-found (preferred, matches GetByUUID)
 func (r *LLMProviderTemplateRepo) GetByHandle(templateHandle, orgUUID string) (*models.LLMProviderTemplate, error) {
 	var template models.LLMProviderTemplate
 	err := r.db.Where("handle = ? AND organization_uuid = ?", templateHandle, orgUUID).
 		First(&template).Error
 	if err != nil {
 		if errors.Is(err, gorm.ErrRecordNotFound) {
-			return nil, err
+			return nil, nil
 		}
 		return nil, err
 	}
agent-manager-service/controllers/llm_controller.go (3)

181-193: Pagination validation is duplicated across all List handlers.

The same limit/offset clamping logic is repeated four times. Consider extracting a helper like parsePagination(r *http.Request) (limit, offset int).

Example helper
func parsePagination(r *http.Request) (limit, offset int) {
	limit = getIntQueryParam(r, "limit", 20)
	offset = getIntQueryParam(r, "offset", 0)
	if limit <= 0 {
		limit = 20
	}
	if limit > 100 {
		limit = 100
	}
	if offset < 0 {
		offset = 0
	}
	return limit, offset
}

Also applies to: 450-462, 765-777, 824-836


59-65: Service fields are concrete types, not interfaces.

templateService, providerService, and proxyService are stored as *services.LLMProviderTemplateService, *services.LLMProviderService, and *services.LLMProxyService (concrete pointer types). This couples the controller to concrete implementations and makes unit testing harder. Consider defining service interfaces (or accepting existing ones) for better testability.


360-415: Excessive slog.Info calls in Provider handlers.

CreateLLMProvider alone has 7 Info-level log lines for a single successful request flow (lines 360, 373, 381, 388, 415 plus similar in other handlers). This will generate significant log volume in production. Consider downgrading most of these to Debug and keeping only entry/exit points at Info.

agent-manager-service/services/llm_provider_service.go (1)

63-174: Excessive slog.Info logging — 15+ Info calls for a single Create flow.

The Create method has ~15 slog.Info statements for normal operation. This pattern is repeated across all methods. In production with moderate traffic, this will generate massive log volume. Consider using slog.Debug for intermediate steps and reserving slog.Info for entry/exit/significant events only.

Comment thread agent-manager-service/repositories/api_repository.go
Comment thread agent-manager-service/repositories/artifact_repository.go
Comment thread agent-manager-service/repositories/gateway_repository.go
Comment thread agent-manager-service/services/llm_provider_service.go Outdated
Comment thread agent-manager-service/services/llm_provider_service.go
@menakaj menakaj force-pushed the gw-refactor branch 2 times, most recently from 30a5e88 to 525b5cd Compare February 16, 2026 06:11
-- Foreign key constraints
CONSTRAINT fk_mapping_provider FOREIGN KEY (llm_provider_uuid)
REFERENCES llm_providers(uuid) ON DELETE CASCADE,
CONSTRAINT fk_mapping_organization FOREIGN KEY (organization_uuid)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the FK constraint.

Copy link
Copy Markdown
Member

@a5anka a5anka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: .Scan() returns zero-value structs instead of gorm.ErrRecordNotFound

Multiple repository methods use GORM's .Scan() but check for gorm.ErrRecordNotFound, which .Scan() never returns. This causes "ghost" zero-value records to be returned as valid entities — the deployment == nil guard in callers never triggers.

How it breaks

// deployment_repository.go:260-287 (GetWithState)
var deployment models.Deployment
err := r.db.Table("deployments d").
    Select(...).Joins(...).Where(...).
    Scan(&deployment).Error           // Scan() — NOT First()

if err != nil {
    if errors.Is(err, gorm.ErrRecordNotFound) {  // ❌ DEAD CODE — Scan() never sets this
        return nil, errors.New("deployment not found")
    }
    return nil, err
}
return &deployment, nil  // ❌ Returns &Deployment{} with all zero values (uuid.Nil, "", nil, ...)

When querying for a non-existent deployment:

  1. .Scan() finds zero rows → sets err = nil
  2. The ErrRecordNotFound check is skipped (dead code)
  3. Returns a non-nil pointer to a zero-value Deployment struct

Callers all do:

deployment, err := s.deploymentRepo.GetWithState(...)
if err != nil { ... }          // err is nil — skipped
if deployment == nil { ... }   // &Deployment{} is not nil — skipped
// Continues with ghost deployment where GatewayUUID = 00000000-0000-0000-0000-000000000000

Concrete consequences

  • UndeployLLMProxyDeployment: Comparing deployment.GatewayUUID.String() against the request's gatewayID returns "gateway ID mismatch" instead of "deployment not found"
  • GetStatus: Returns ("", DeploymentStatus(""), &time.Time{0001-01-01}, nil) — a zero-time pointer that looks valid to JSON serializers but is clearly wrong
  • GetCurrentByGateway: Returns a deployment with nil Content, which will panic or produce empty YAML when gateway_internal_service.go does string(deployment.Content)

Affected methods (5 total)

File Method Line
deployment_repository.go GetWithState 272
deployment_repository.go GetCurrentByGateway 181
deployment_repository.go GetStatus 241
api_repository.go GetAPIByUUID 130
api_repository.go GetAPIMetadataByHandle 147

Suggested fix

Since these queries use custom Select and Joins clauses that may conflict with .First() (which adds ORDER BY pk LIMIT 1), the safest fix is to check RowsAffected:

result := r.db.Table("deployments d").
    Select(...).Joins(...).Where(...).
    Scan(&deployment)
if result.Error != nil {
    return nil, result.Error
}
if result.RowsAffected == 0 {
    return nil, nil  // callers already handle nil as "not found"
}
return &deployment, nil

Apply this pattern to all 5 affected methods.

@a5anka
Copy link
Copy Markdown
Member

a5anka commented Feb 16, 2026

Performance Issue: Token Verification Full Table Scan — O(N*M) on Every Auth Request

File: agent-manager-service/services/platform_gateway_service.go, lines 425-452

Problem

VerifyToken() is called on 6 hot paths (every WebSocket connection + every internal gateway API call). The current implementation:

  1. Loads ALL gateways (line 431) — SELECT * FROM gateways with no filter
  2. For EACH gateway, runs a separate query to fetch active tokens (line 438) — SELECT * FROM gateway_tokens WHERE gateway_uuid = ? AND status = 'active'
  3. For EACH token, computes SHA-256 hash and compares (line 444)

With N gateways and M active tokens per gateway (max 2 per RotateToken):

Operation Count
DB queries 1 + N
SHA-256 computations up to N × M (worst case: valid token belongs to last gateway)

For 100 gateways: 101 DB queries and up to 200 SHA-256 hashes on every single request. Invalid tokens always hit worst case since all gateways must be scanned.

Additional Concern: Silent Error Swallowing (line 440)

if err != nil {
    continue // Skip this gateway on error
}

A transient DB error (connection timeout, deadlock) when fetching tokens for the gateway that actually owns the token causes it to be silently skipped, rejecting a valid token as "invalid token" with no logging.

Suggested Fix

Store a token prefix (e.g., first 8 chars of hex token) in a new indexed column on gateway_tokens:

// During token creation:
tokenPrefix := plainToken[:8]
// Store in gateway_tokens.prefix column (with index)

// During verification — 2 queries instead of 1+N:
func (s *PlatformGatewayService) VerifyToken(plainToken string) (*models.PlatformGateway, error) {
    prefix := plainToken[:8]
    token, err := s.gatewayRepo.GetActiveTokenByPrefix(prefix) // single indexed query
    if err != nil {
        return nil, errors.New("invalid token")
    }
    if !verifyToken(plainToken, token.TokenHash, token.Salt) {
        return nil, errors.New("invalid token")
    }
    return s.gatewayRepo.GetByUUID(token.GatewayUUID.String())
}

The prefix is not secret (non-reversible subset), and SHA-256 verification still provides the actual security guarantee. This reduces auth from O(N) queries to O(1) regardless of gateway count.

Copy link
Copy Markdown
Member

@a5anka a5anka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebSocket Rate Limiter Memory Leak

File: agent-manager-service/controllers/websocket_controller.go, lines 224-255

Severity: High

Problem

The rateLimitMap (map[string][]time.Time) tracks connection attempts per client IP, but entries are only cleaned up when the same IP makes a new request (lines 238-243). If an IP never connects again, its map entry persists forever. There is no background goroutine, no TTL, and no periodic sweep.

How it manifests

  1. IP 1.2.3.4 connectsrateLimitMap["1.2.3.4"] = [10:00:00]
  2. IP 1.2.3.4 connects again within 1 min → old entries pruned, new one added
  3. IP 1.2.3.4 never connects again → entry sits in the map indefinitely

Why this matters

  • Legitimate traffic: Gateways behind NATs or load balancers rotate IPs over time (pod restarts, IP reassignment). Each retired IP leaves a permanent orphan entry.
  • Attack amplification via X-Forwarded-For: getClientIP() (line 258) uses the raw X-Forwarded-For header as-is — not just the first IP. An attacker can generate unique map keys trivially by rotating the header value, simultaneously bypassing rate limits and inflating the map:
    X-Forwarded-For: random-string-1
    X-Forwarded-For: random-string-2
    
  • Memory impact: At ~50 bytes per entry, 1M unique IPs ≈ 50MB. Under sustained attack this grows without bound, eventually causing OOM.

Suggested fix

Add a periodic cleanup goroutine started in NewWebSocketController:

go func() {
    ticker := time.NewTicker(5 * time.Minute)
    defer ticker.Stop()
    for range ticker.C {
        ctrl.rateLimitMu.Lock()
        cutoff := time.Now().Add(-1 * time.Minute)
        for ip, attempts := range ctrl.rateLimitMap {
            var recent []time.Time
            for _, t := range attempts {
                if t.After(cutoff) {
                    recent = append(recent, t)
                }
            }
            if len(recent) == 0 {
                delete(ctrl.rateLimitMap, ip)
            } else {
                ctrl.rateLimitMap[ip] = recent
            }
        }
        ctrl.rateLimitMu.Unlock()
    }
}()

Also fix getClientIP() to parse only the first IP from X-Forwarded-For:

if xff := r.Header.Get("X-Forwarded-For"); xff != "" {
    if i := strings.Index(xff, ","); i != -1 {
        return strings.TrimSpace(xff[:i])
    }
    return strings.TrimSpace(xff)
}

Alternatively, consider replacing the custom implementation with golang.org/x/time/rate which handles expiry natively.

@a5anka
Copy link
Copy Markdown
Member

a5anka commented Feb 16, 2026

Non-Atomic Gateway + Token Creation in RegisterGateway

File: services/platform_gateway_service.go:207-214

The gateway insert and token insert are two independent DB operations with no wrapping transaction:

if err := s.gatewayRepo.Create(gateway); err != nil {
    return nil, fmt.Errorf("failed to create gateway: %w", err)
}

if err := s.gatewayRepo.CreateToken(gatewayToken); err != nil {
    return nil, fmt.Errorf("failed to create token: %w", err)
}

If step 1 succeeds but step 2 fails (connection drop, constraint violation, etc.), the gateway row is committed but has no token — making it inaccessible and requiring manual DB cleanup. Retrying may also fail on a unique name constraint.

Suggestion: Wrap both in a single transaction so the gateway insert rolls back if token creation fails:

err := s.db.Transaction(func(tx *gorm.DB) error {
    if err := tx.Create(gateway).Error; err != nil {
        return fmt.Errorf("failed to create gateway: %w", err)
    }
    if err := tx.Create(gatewayToken).Error; err != nil {
        return fmt.Errorf("failed to create token: %w", err)
    }
    return nil
})

The same pattern exists in RotateToken (lines 455-519), where the token count check + creation is not atomic, allowing concurrent requests to exceed the "max 2 active tokens" limit.

}

if gateway == nil {
return nil, errors.New("gateway not found")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent Error Types Cause Incorrect 500 Responses

All "not found" paths in this service return errors.New("gateway not found") — a new error instance each time — instead of the sentinel utils.ErrGatewayNotFound.

The controller's handleGatewayErrors checks via errors.Is(err, utils.ErrGatewayNotFound), which compares by identity (pointer equality), not by message text. Since a fresh errors.New(...) is a different pointer than the sentinel, the check never matches and the switch falls through to the default — returning 500 Internal Server Error instead of the intended 404 Not Found.

This affects 12 occurrences across GetGateway, UpdateGateway, DeleteGateway, GetGatewayStatus, GetGatewayArtifacts, and RotateToken.

Fix: Replace all inline errors.New("gateway not found") with the sentinel:

// Before (12 occurrences in this file):
return nil, errors.New("gateway not found")

// After:
return nil, utils.ErrGatewayNotFound

Lines to update: 316, 320, 354, 357, 409, 412, 462, 465, 539, 543, 592, 595

@a5anka
Copy link
Copy Markdown
Member

a5anka commented Feb 16, 2026

Review Finding: Shared KindLLMAPI Constant for Providers AND Proxies

Severity: High
Files: repositories/llm_provider_repository.go, repositories/llm_proxy_repository.go, models/artifact.go:46

Problem

Both LLM Providers and LLM Proxies use the same artifact kind constant KindLLMAPI = "LLMAPI":

// llm_provider_repository.go:74 — when creating a provider
Kind: models.KindLLMAPI,  // "LLMAPI"

// llm_proxy_repository.go:73 — when creating a proxy
Kind: models.KindLLMAPI,  // "LLMAPI"  ← same value

Impact

1. Count() returns incorrect numbers.
Both repos delegate to artifactRepo.CountByKindAndOrg(models.KindLLMAPI, orgUUID) which runs:

SELECT count(*) FROM artifacts WHERE kind = 'LLMAPI' AND organization_uuid = ?

If an org has 5 providers and 3 proxies, both Count() methods return 8. API responses (e.g. GET /llm-providers) would report an inflated total.

2. Exists() can't distinguish entity types.
LLMProviderRepo.Exists() calls r.artifactRepo.Exists(models.KindLLMAPI, providerID, orgUUID). This returns true if a proxy has that handle, causing false positives in uniqueness checks. A provider could fail to be created because a proxy already uses the same handle, or vice versa.

3. Listing queries only work by accident.
List methods are safe because they JOIN to their specific table (llm_providers or llm_proxies), which acts as a natural filter. But any query operating only on the artifacts table (Count, Exists, handle-resolution in Update/Delete) cannot distinguish providers from proxies.

Suggested Fix

Define separate kind constants:

const (
    KindRestAPI      = "RestAPI"
    KindLLMProvider  = "LLMProvider"
    KindLLMProxy     = "LLMProxy"
)

Then update all references in both repositories. If any data already exists with kind = "LLMAPI", a data migration will be needed (JOIN to llm_providers / llm_proxies to determine which rows are which).

Copy link
Copy Markdown
Member

@a5anka a5anka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance Issue: Client-Side Pagination in ListGateways

File: agent-manager-service/controllers/gateway_controller.go, lines 277-302

Problem

When a client calls GET /orgs/{org}/gateways?limit=10&offset=0, the limit and offset params are parsed but never passed to the service or repository layer. Instead:

  1. All gateways are loaded from the DB (line 277) — ListGateways has no pagination support
  2. For every gateway (not just the requested page), an N+1 pattern executes:
    • 1 DB query per gateway for environment mappings (line 288)
    • 1 external RPC to OpenChoreo per gateway via c.ocClient.ListEnvironments() — which returns the same org-wide list every time
  3. Only then are results sliced in memory (lines 292-302), discarding the rest

For an org with 100 gateways and limit=10, this means:

  • 1 DB query for all 100 gateways
  • 100 DB queries for environment mappings
  • 100 external RPCs to OpenChoreo (returning the same data each time)
  • 90 results discarded

Response time scales with total gateways, not page size.

Additionally, there is no guard against negative offset/limit values, which would panic on the slice operation at line 302:

paginatedGateways := specGateways[start:end]  // panic if start < 0

Suggested Fix

// 1. Pass limit/offset to the service and repository layers
gatewaysResp, total, err := c.gatewayService.ListGateways(&orgID, filters, limit, offset)

// 2. In the repository, use DB-level pagination
func (r *GatewayRepo) ListWithFilters(opts FilterOptions, limit, offset int) ([]*Gateway, int64, error) {
    var total int64
    r.db.Model(&Gateway{}).Where(...).Count(&total)

    var gateways []*Gateway
    r.db.Where(...).Limit(limit).Offset(offset).Find(&gateways)
    return gateways, total, nil
}

// 3. Cache the OpenChoreo ListEnvironments call (it's org-wide, not per-gateway)
ocEnvironments, err := c.ocClient.ListEnvironments(ctx, orgName) // call ONCE before loop
for _, gw := range gatewaysResp.List {
    environments := matchEnvironments(gw.ID, mappings, ocEnvironments) // no RPC per gateway
}

// 4. Validate pagination params
if limit <= 0 { limit = defaultLimit }
if offset < 0 { offset = 0 }

This reduces a 100-gateway org request from 201 queries + 100 RPCs down to 2 queries + 1 RPC.

Copy link
Copy Markdown
Member

@a5anka a5anka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent Failure in CreateAndDeploy and UpdateAndSync

File: agent-manager-service/services/llm_provider_service.go
Severity: Medium

Problem 1: Invalid gateway UUIDs silently skipped

CreateAndDeploy (line 507-512):

for _, gatewayID := range gatewayIDs {
    gatewayUUID, err := uuid.Parse(gatewayID)
    if err != nil {
        slog.Error("...invalid gateway UUID...", ...)
        continue  // silently skips, no error returned
    }

If a caller passes gatewayIDs: ["valid-uuid", "garbage", "also-garbage"], two of three gateways are silently dropped. The caller receives the created provider with no indication that 2/3 of requested gateways were invalid.

Same pattern in UpdateAndSync (line 367-374).

Problem 2: Deployment failures silently skipped

CreateAndDeploy (line 549-554):

deployment, err := deploymentService.DeployLLMProvider(created.UUID.String(), deployReq, orgID)
if err != nil {
    slog.Error("...failed to deploy to gateway...", ...)
    continue  // silently continues
}

If deployment to every single gateway fails, the method still returns (created, nil) -- a success response. The caller has no way to know that zero deployments succeeded.

Line 559 also logs the wrong count:

slog.Info("...completed", "successfulDeployments", len(gatewayIDs))  // WRONG: logs total requested, not actual successes

Problem 3: Mapping storage failure also silently swallowed

CreateAndDeploy (line 522-524):

if err := s.mappingRepo.CreateBatch(mappings); err != nil {
    slog.Error("...failed to store gateway mappings...", ...)
    // Continue with deployment even if mapping storage fails
}

The gateway-provider mappings fail to persist but deployment continues, creating a state inconsistency: the provider is deployed to gateways but the system has no record of which ones.

Problem 4: UpdateAndSync compounds the issue

  • Line 370-371: Invalid gateway UUID → continue
  • Line 420-424: Deploy to new gateway fails → only logged
  • Line 437-439: Failed to fetch deployments for undeploy → continue
  • Line 445-447: Undeploy fails → only logged

The net effect: UpdateAndSync can return (updated, nil) where some gateways were never deployed to, some old gateways were never undeployed from, and the DB mapping diverges from reality.

Impact

The controller receives (provider, nil) and returns HTTP 200/201. The user sees "success" but the provider may not be deployed anywhere. There's no mechanism to detect or recover from this partial state.

Suggested fix

Option A -- Return partial results to the caller:

type DeploymentResult struct {
    GatewayID string
    Success   bool
    Error     string
}

type CreateAndDeployResponse struct {
    Provider    *models.LLMProvider
    Deployments []DeploymentResult
}

Option B -- At minimum, fail if all deployments fail:

successCount := 0
var lastErr error
for _, gatewayID := range gatewayIDs {
    deployment, err := deploymentService.DeployLLMProvider(...)
    if err != nil {
        lastErr = err
        continue
    }
    successCount++
}

if successCount == 0 && len(gatewayIDs) > 0 {
    return nil, fmt.Errorf("all %d deployments failed, last error: %w", len(gatewayIDs), lastErr)
}

@a5anka
Copy link
Copy Markdown
Member

a5anka commented Feb 16, 2026

Code Review Finding #16: Massive Code Duplication in GatewayEventsService

Severity: Medium | File: agent-manager-service/services/gateway_events_service.go


Problem

This 565-line file contains 7 broadcast methods that are structurally identical. The only difference between them is the Type string in the event DTO and the payload type (which is interface{} in the DTO anyway):

Method Event Type String Lines
BroadcastDeploymentEvent "api.deployed" 72-145
BroadcastUndeploymentEvent "api.undeployed" 148-215
BroadcastLLMProviderDeploymentEvent "llmprovider.deployed" 218-285
BroadcastLLMProviderUndeploymentEvent "llmprovider.undeployed" 288-355
BroadcastLLMProxyDeploymentEvent "llmproxy.deployed" 358-425
BroadcastLLMProxyUndeploymentEvent "llmproxy.undeployed" 428-495
BroadcastAPIKeyCreatedEvent "apikey.created" 498-565

Every method follows the exact same 10-step pattern:

  1. Generate correlation ID
  2. Marshal payload to JSON
  3. Check payload size against MaxEventPayloadSize
  4. Create GatewayEventDTO with event type string
  5. Marshal DTO to JSON
  6. Check if manager is nil
  7. Get connections for gateway ID
  8. Loop connections, send, track success/failure counts
  9. Log broadcast summary
  10. Return error if all sends failed

Why It Matters

  • Bug propagation risk: A fix applied to one method may be missed in the other 6 copies
  • Maintenance cost: Adding retry logic, delivery acknowledgment, or any new feature requires changing 7 places
  • Inconsistency: As methods diverge over time, subtle behavioral differences creep in

Suggested Fix

Extract a single internal broadcastEvent method and reduce the public methods to one-liners:

func (s *GatewayEventsService) broadcastEvent(gatewayID string, eventType string, payload interface{}) error {
    correlationID := uuid.New().String()

    payloadJSON, err := json.Marshal(payload)
    if err != nil {
        slog.Error("Failed to serialize event", "gatewayID", gatewayID, "type", eventType, "error", err)
        return fmt.Errorf("failed to serialize %s event: %w", eventType, err)
    }

    if len(payloadJSON) > MaxEventPayloadSize {
        return fmt.Errorf("event payload exceeds maximum size: %d bytes (limit: %d)", len(payloadJSON), MaxEventPayloadSize)
    }

    eventDTO := GatewayEventDTO{
        Type:          eventType,
        Payload:       payload,
        Timestamp:     time.Now().Format(time.RFC3339),
        CorrelationID: correlationID,
    }

    eventJSON, err := json.Marshal(eventDTO)
    if err != nil {
        return fmt.Errorf("failed to marshal event: %w", err)
    }

    if s.manager == nil {
        slog.Warn("WebSocket manager not initialized")
        return nil
    }

    connections := s.manager.GetConnections(gatewayID)
    if len(connections) == 0 {
        return fmt.Errorf("no active connections for gateway: %s", gatewayID)
    }

    successCount, failureCount := 0, 0
    var lastError error
    for _, conn := range connections {
        if err := conn.Send(eventJSON); err != nil {
            failureCount++
            lastError = err
            slog.Error("Failed to send event", "gatewayID", gatewayID, "connectionID", conn.ConnectionID,
                "correlationID", correlationID, "type", eventType, "error", err)
            conn.DeliveryStats.IncrementFailed(fmt.Sprintf("send error: %v", err))
        } else {
            successCount++
            conn.DeliveryStats.IncrementTotalSent()
        }
    }

    slog.Info("Broadcast summary", "gatewayID", gatewayID, "correlationID", correlationID,
        "type", eventType, "total", len(connections), "success", successCount, "failed", failureCount)

    if successCount == 0 {
        return fmt.Errorf("failed to deliver %s event to any connection: %w", eventType, lastError)
    }
    return nil
}

// Public methods become thin one-liners:
func (s *GatewayEventsService) BroadcastDeploymentEvent(gatewayID string, event *DeploymentEvent) error {
    return s.broadcastEvent(gatewayID, "api.deployed", event)
}

func (s *GatewayEventsService) BroadcastUndeploymentEvent(gatewayID string, event *APIUndeploymentEvent) error {
    return s.broadcastEvent(gatewayID, "api.undeployed", event)
}

func (s *GatewayEventsService) BroadcastLLMProviderDeploymentEvent(gatewayID string, event *models.LLMProviderDeploymentEvent) error {
    return s.broadcastEvent(gatewayID, "llmprovider.deployed", event)
}

func (s *GatewayEventsService) BroadcastLLMProviderUndeploymentEvent(gatewayID string, event *models.LLMProviderUndeploymentEvent) error {
    return s.broadcastEvent(gatewayID, "llmprovider.undeployed", event)
}

func (s *GatewayEventsService) BroadcastLLMProxyDeploymentEvent(gatewayID string, event *models.LLMProxyDeploymentEvent) error {
    return s.broadcastEvent(gatewayID, "llmproxy.deployed", event)
}

func (s *GatewayEventsService) BroadcastLLMProxyUndeploymentEvent(gatewayID string, event *models.LLMProxyUndeploymentEvent) error {
    return s.broadcastEvent(gatewayID, "llmproxy.undeployed", event)
}

func (s *GatewayEventsService) BroadcastAPIKeyCreatedEvent(gatewayID string, event *models.APIKeyCreatedEvent) error {
    return s.broadcastEvent(gatewayID, "apikey.created", event)
}

This reduces the file from ~565 lines to ~80 lines, eliminates duplication, and as a bonus migrates from log.Printf to structured slog (which is used everywhere else in the codebase).

@a5anka
Copy link
Copy Markdown
Member

a5anka commented Feb 16, 2026

Code Review Finding #18: Mixed Logging Libraries

Severity: Medium | Files: gateway_events_service.go, websocket/manager.go


gateway_events_service.go uses log.Printf with manual [ERROR]/[WARN]/[INFO] prefixes (~28 occurrences), while every other service file uses structured log/slog.

// Current (gateway_events_service.go):
log.Printf("[ERROR] Failed to serialize deployment event: gatewayID=%s error=%v", gatewayID, err)

// Rest of codebase:
slog.Error("Failed to serialize deployment event", "gatewayID", gatewayID, "error", err)

Impact:

  • slog log-level filtering is bypassed — these log.Printf calls always print regardless of configured level
  • JSON log aggregation breaks — log.Printf outputs raw text, not structured JSON
  • No correlation ID propagation via logger.GetLogger(ctx)

Fix: Replace log.Printf with slog equivalents and remove the "log" import. Pairs well with the duplication fix in finding #16 — if extracted to a single broadcastEvent, the migration only happens once.

@a5anka
Copy link
Copy Markdown
Member

a5anka commented Feb 16, 2026

Code Review Finding #20: Certificate Directory Permission Too Permissive

Severity: Medium | File: agent-manager-service/server/internal_server.go:85


The cert directory is created with 0755 (world-traversable), but it contains private keys:

if err := os.MkdirAll(s.cfg.CertDir, 0o755); err != nil {  // line 85

While the private key file itself is correctly written with 0600 (line 168), the 0755 directory permission allows any user on the system to list directory contents and attempt to read files within it.

Fix: Change to 0o700:

if err := os.MkdirAll(s.cfg.CertDir, 0o700); err != nil {

@a5anka
Copy link
Copy Markdown
Member

a5anka commented Feb 16, 2026

Code Review Finding #23: Controller Directly Accesses Database

Severity: Low | File: agent-manager-service/controllers/gateway_controller.go


The gateway controller has a *gorm.DB field (line 70) and runs raw DB queries directly, bypassing the service/repository layers:

// line 371-377 (DeleteGateway)
gwUUID, err := uuid.Parse(gatewayID)
if err == nil {
    if err := db.DB(ctx).Where("gateway_uuid = ?", gwUUID).Delete(&models.GatewayEnvironmentMapping{}).Error; err != nil {
        log.Warn("DeleteGateway: failed to delete gateway-environment mappings", "error", err)
    }
}

Same pattern in RemoveGatewayFromEnvironment (line 434) and assignGatewayToEnvironmentInDB (line 646).

Impact: Breaks separation of concerns — business logic leaks into the controller, making it harder to test and maintain. If the mapping cleanup logic ever needs to change, you'd need to find it scattered in the controller rather than in a repository.

Fix: Move these DB operations into the gateway repository or service layer, and remove the *gorm.DB field from the controller struct.

@menakaj menakaj merged commit dc3f098 into wso2:main Feb 17, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task] LLM Provider management API implementation

2 participants