Skip to content

Improve gateway event delivery when agent-manager scaled up#949

Merged
menakaj merged 6 commits into
wso2:mainfrom
menakaj:main
May 26, 2026
Merged

Improve gateway event delivery when agent-manager scaled up#949
menakaj merged 6 commits into
wso2:mainfrom
menakaj:main

Conversation

@menakaj
Copy link
Copy Markdown
Contributor

@menakaj menakaj commented May 22, 2026

Purpose

In a multi-pod deployment, GatewayEventsService was delivering events by calling
manager.GetConnections(gatewayID) directly against the local pod's in-memory WebSocket registry. Since each pod
maintains its own isolated connection state, events were silently dropped whenever the pod handling the API request
was not the pod holding the gateway's WebSocket connection. This affected all gateway configuration change
notifications — LLM provider updates, LLM proxy changes, and API key provisioning.

Closes #948

Goals

  • Ensure gateway event delivery succeeds regardless of which pod handles the originating API request
  • Remove the coupling between event publishing and the local pod's in-memory WebSocket state
  • Avoid introducing a new infrastructure dependency — reuse the existing PostgreSQL instance

Approach

Introduce a SQL-backed EventHub that acts as a shared event bus between pods:

  • Publishing: any pod writes the event to eventhub_events and atomically bumps a version ID in
    eventhub_gateway_states
  • Delivery: each pod runs a poll loop against eventhub_gateway_states; when a version change is detected, it
    fetches the new events and fans them out to local subscribers
  • Forwarding: on WebSocket connect, the pod subscribes to the EventHub for that gateway and starts a
    forwardEvents goroutine that reads from the subscription channel and writes to the WebSocket connection
  • Cleanup: on disconnect or eviction, the subscription is removed and the forwardEvents goroutine exits
    cleanly
API request (any pod)
    └─▶ GatewayEventsService.broadcastEvent
            └─▶ hub.PublishEvent
                    └─▶ INSERT INTO eventhub_events
                        UPDATE eventhub_gateway_states (version bump)

Poll loop (pod holding the WebSocket connection, every 2s)
    └─▶ SELECT from eventhub_gateway_states WHERE version changed
            └─▶ SELECT from eventhub_events WHERE gateway_id = ? AND processed_timestamp >= ?
                    └─▶ fan out to subscriber channels
                            └─▶ forwardEvents goroutine
                                    └─▶ conn.Send(payload)

User stories

Summary of user stories addressed by this change>

Release note

Brief description of the new feature or bug fix as it will appear in the release notes

Documentation

Link(s) to product documentation that addresses the changes of this PR. If no doc impact, enter �N/A� plus brief explanation of why there�s no doc impact

Training

Link to the PR for changes to the training content in https://github.com/wso2/WSO2-Training, if applicable

Certification

Type �Sent� when you have provided new/updated certification questions, plus four answers for each question (correct answer highlighted in bold), based on this change. Certification questions/answers should be sent to certification@wso2.com and NOT pasted in this PR. If there is no impact on certification exams, type �N/A� and explain why.

Marketing

Link to drafts of marketing content that will describe and promote this feature, including product page changes, technical articles, blog posts, videos, etc., if applicable

Automation tests

  • Unit tests

    Code coverage information

  • Integration tests

    Details about the test cases and coverage

Security checks

Samples

Provide high-level details about the samples related to this feature

Related PRs

List any other related PRs

Migrations (if applicable)

Describe migration steps and platforms on which migration has been tested

Test environment

List all JDK versions, operating systems, databases, and browser/versions on which this feature/fix was tested

Learning

Describe the research phase and any blog posts, patterns, libraries, or add-ons you used to solve the problem.

Summary by CodeRabbit

  • New Features

    • Persistent Event Hub with subscription-based delivery and indexed event storage.
    • Configurable automatic retention and periodic cleanup of old events.
  • Improvements

    • More reliable delivery to connected gateways with subscription forwarding and backpressure handling.
    • Gateway event broadcasting now uses the Event Hub for consistent delivery.
  • Bug Fixes

    • Improved graceful shutdown to close event delivery cleanly.
    • Form initialization: avoid unwanted auto-population and fix add-button disable logic.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Warning

Review limit reached

@menakaj, we couldn't start this review because you've used your available PR reviews for now.

Your plan includes 1 review of capacity. Refill in 45 minutes and 10 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 66793be8-7e8b-482e-8282-6e74c811073c

📥 Commits

Reviewing files that changed from the base of the PR and between 54f4c07 and 1b1c84c.

📒 Files selected for processing (3)
  • agent-manager-service/eventhub/sqlbackend.go
  • agent-manager-service/eventhub/topic.go
  • console/workspaces/pages/llm-providers/src/subComponents/AddLLMProviderForm.tsx
📝 Walkthrough

Walkthrough

Adds an EventHub: domain types, in-memory gateway registry, PostgreSQL-backed SQLBackend with polling/delivery and cleanup, DB migration v21, WebSocket manager/controller integration to subscribe/forward events, DI wiring (ProvideEventHub), GatewayEventsService publishing via EventHub, and graceful shutdown closing the hub.

Changes

EventHub Multi-Pod Event Delivery

Layer / File(s) Summary
EventHub domain types and contracts
agent-manager-service/eventhub/types.go
Defines EventType, Event, GatewayState, EventHub interface, Config, and DefaultConfig.
EventHub in-memory gateway and subscriber registry
agent-manager-service/eventhub/topic.go
Implements mutex-protected gatewayRegistry supporting registration and per-gateway subscriber channel management and helpers.
SQL-backed EventHub core and statements
agent-manager-service/eventhub/sqlbackend.go
Adds SQLBackend, SQLBackendConfig/defaults, helpers, constructor, and prepared-statement lifecycle management.
SQLBackend runtime: register/publish/subscribe
agent-manager-service/eventhub/sqlbackend.go
Implements Initialize (start poll/cleanup loops), RegisterGateway, PublishEvent (transactional with duplicate detection), and Subscribe/Unsubscribe/UnsubscribeAll.
Polling, delivery, dedupe, backpressure
agent-manager-service/eventhub/sqlbackend.go
Poll loop pages gateway states, polls per-gateway events, trims boundary replay, deduplicates first-catch-up events per entity, sends to subscriber channels under read-lock, and handles blocked delivery.
Cleanup and Close lifecycle
agent-manager-service/eventhub/sqlbackend.go
Periodic event cleanup (retention) and Close cancels loops, closes subscriber channels, and releases prepared statements.
EventHub DB schema and migrations
agent-manager-service/db_migrations/021_add_eventhub_tables.go, agent-manager-service/db_migrations/migration_list.go
Adds eventhub_gateway_states and eventhub_events tables and updates migration list to v21.
WebSocket manager & connection EventHub integration
agent-manager-service/websocket/connection.go, agent-manager-service/websocket/manager.go
Extends Connection with eventSub; Manager accepts EventHub, subscribes on register, launches forwardEvents to send hub events to WebSockets, and unsubscribes on unregister/shutdown.
WebSocket controller gateway registration on connect
agent-manager-service/controllers/websocket_controller.go
Controller accepts EventHub and conditionally calls RegisterGateway during Connect; registration errors logged as warnings.
GatewayEventsService → EventHub publishing
agent-manager-service/services/gateway_events_service.go
Refactors service to publish gateway events via EventHub.PublishEvent instead of enumerating WebSocket connections.
Dependency injection wiring
agent-manager-service/wiring/params.go, agent-manager-service/wiring/wire.go, agent-manager-service/wiring/wire_gen.go
Adds ProvideEventHub, threads EventHub into ProvideWebSocketManager/ProvideWebSocketController, and stores EventHub in AppParams.
App graceful shutdown
agent-manager-service/app/app.go
Closes EventHub during graceful shutdown after WebSocket manager shutdown and before HTTP servers stop; logs close errors and continues.
Console minor fix
console/.../AddLLMProviderForm.tsx
Only auto-populates gatewayIds when the current array is explicitly empty (length === 0) and tightens disabled-check comparison.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • wso2/agent-manager#897: Overlaps with connection eviction and Register flow changes in websocket/manager.go.

Suggested reviewers

  • hanzjk

Poem

🐰 I hopped through rows and sockets bright,
Wrote events in Postgres through the night,
Pollers hummed, forwarders took flight,
Gateways now listen from pod to pod — delight! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Improve gateway event delivery when agent-manager scaled up' directly summarizes the primary change: enabling event delivery in multi-pod deployments.
Description check ✅ Passed The description provides clear Purpose, Goals, and Approach sections that explain the multi-pod event delivery problem and SQL-backed EventHub solution, though most template sections (User stories, Release note, Documentation, etc.) contain only placeholder text without substantive content.
Linked Issues check ✅ Passed The PR implements all core coding requirements from issue #948: a SQL-backed EventHub for event publishing/polling, WebSocket integration with EventHub subscriptions, database migrations, service wiring, and graceful cleanup on disconnect.
Out of Scope Changes check ✅ Passed All changes are directly scoped to multi-pod event delivery: EventHub implementation, WebSocket manager/controller integration, database migrations, service wiring, and a minor form UI fix for gateway ID validation are all necessary for the solution.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
agent-manager-service/services/gateway_events_service.go (1)

71-103: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use operation-specific EventHub Action values instead of hardcoded "CREATE"

All events are currently published with Action: "CREATE", including undeploy/revoke/update paths. That loses semantic correctness for action-driven consumers.

💡 Proposed fix
-func (s *GatewayEventsService) broadcastEvent(gatewayID string, eventType string, payload interface{}) error {
+func (s *GatewayEventsService) broadcastEvent(gatewayID string, eventType string, action string, payload interface{}) error {
@@
 	evt := eventhub.Event{
 		GatewayID:           gatewayID,
 		OriginatedTimestamp: time.Now(),
 		EventType:           eventhub.EventType(eventType),
-		Action:              "CREATE",
+		Action:              action,
 		EntityID:            correlationID,
 		EventData:           string(eventJSON),
 	}
@@
 func (s *GatewayEventsService) BroadcastDeploymentEvent(gatewayID string, event *DeploymentEvent) error {
-	return s.broadcastEvent(gatewayID, "api.deployed", event)
+	return s.broadcastEvent(gatewayID, "api.deployed", "CREATE", event)
 }
@@
 func (s *GatewayEventsService) BroadcastUndeploymentEvent(gatewayID string, event *APIUndeploymentEvent) error {
-	return s.broadcastEvent(gatewayID, "api.undeployed", event)
+	return s.broadcastEvent(gatewayID, "api.undeployed", "DELETE", event)
 }
@@
 func (s *GatewayEventsService) BroadcastLLMProviderDeploymentEvent(gatewayID string, event *models.LLMProviderDeploymentEvent) error {
-	return s.broadcastEvent(gatewayID, "llmprovider.deployed", event)
+	return s.broadcastEvent(gatewayID, "llmprovider.deployed", "CREATE", event)
 }
@@
 func (s *GatewayEventsService) BroadcastLLMProviderUndeploymentEvent(gatewayID string, event *models.LLMProviderUndeploymentEvent) error {
-	return s.broadcastEvent(gatewayID, "llmprovider.undeployed", event)
+	return s.broadcastEvent(gatewayID, "llmprovider.undeployed", "DELETE", event)
 }
@@
 func (s *GatewayEventsService) BroadcastLLMProxyDeploymentEvent(gatewayID string, event *models.LLMProxyDeploymentEvent) error {
-	return s.broadcastEvent(gatewayID, "llmproxy.deployed", event)
+	return s.broadcastEvent(gatewayID, "llmproxy.deployed", "CREATE", event)
 }
@@
 func (s *GatewayEventsService) BroadcastLLMProxyUndeploymentEvent(gatewayID string, event *models.LLMProxyUndeploymentEvent) error {
-	return s.broadcastEvent(gatewayID, "llmproxy.undeployed", event)
+	return s.broadcastEvent(gatewayID, "llmproxy.undeployed", "DELETE", event)
 }
@@
 func (s *GatewayEventsService) BroadcastAPIKeyCreatedEvent(gatewayID string, event *models.APIKeyCreatedEvent) error {
-	return s.broadcastEvent(gatewayID, "apikey.created", event)
+	return s.broadcastEvent(gatewayID, "apikey.created", "CREATE", event)
 }
@@
 func (s *GatewayEventsService) BroadcastAPIKeyRevokedEvent(gatewayID string, event *models.APIKeyRevokedEvent) error {
-	return s.broadcastEvent(gatewayID, "apikey.revoked", event)
+	return s.broadcastEvent(gatewayID, "apikey.revoked", "DELETE", event)
 }
@@
 func (s *GatewayEventsService) BroadcastAPIKeyUpdatedEvent(gatewayID string, event *models.APIKeyUpdatedEvent) error {
-	return s.broadcastEvent(gatewayID, "apikey.updated", event)
+	return s.broadcastEvent(gatewayID, "apikey.updated", "UPDATE", event)
 }

Also applies to: 117-150

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent-manager-service/services/gateway_events_service.go` around lines 71 -
103, The broadcastEvent function always sets EventHub Action to the hardcoded
string "CREATE" which loses semantic intent; update broadcastEvent (and any
callers) to accept an action parameter (e.g., action string) or map eventType to
the correct eventhub action and use that value when constructing evt.Action;
adjust GatewayEventsService.broadcastEvent signature and all invocations to pass
the appropriate action (e.g., "CREATE", "UPDATE", "DELETE"/"REVOKE"/"UNDEPLOY"
as applicable) so events published by eventhub have correct operation-specific
Action values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agent-manager-service/db_migrations/021_add_eventhub_tables.go`:
- Line 37: The migration's events table defines the "action" column with a CHECK
that only allows 'CREATE','UPDATE','DELETE', which will reject API key lifecycle
events; update the CHECK constraint in
agent-manager-service/db_migrations/021_add_eventhub_tables.go (the SQL that
defines the action TEXT NOT NULL CHECK ...) to include 'PROVISION' and 'REVOKE'
so API_KEY provision/revoke events can be inserted; ensure the modified
migration string still matches the surrounding SQL formatting and run or re-run
migration tests to verify inserts succeed for those actions.

In `@agent-manager-service/eventhub/sqlbackend.go`:
- Around line 519-553: The code is advancing gw.knownVersion and gw.lastPolled
even when subscribers is empty; update the logic so updates only occur when
there are active subscribers: detect if len(subscribers) == 0 (or use
subscriberChannelsAvailable(subscribers) appropriately) and in that case do not
set gw.knownVersion or gw.lastPolled (leave the gateway cursor unchanged) and
return/log that delivery was skipped; otherwise proceed with the existing update
path. Ensure the check is applied around the block that assigns gw.knownVersion
and gw.lastPolled so queued events are not silently dropped when subscribers is
empty.
- Around line 354-372: The unsubscribe flow races with pollGatewayWithState
sending to subscribers because removeSubscriber/removeAllSubscribers drop the
registry lock before closing channels; change the coordination so channels are
not closed while pollGatewayWithState may send: either (A) in
pollGatewayWithState hold b.registry.mu.RLock() across the send loop that
iterates gw.subscribers (so sends happen under the read lock and
Unsubscribe/UnsubscribeAll will block until sends complete), or (B) add a
per-subscriber struct with a shutdown/inflight counter or a closed flag (update
removeSubscriber/removeAllSubscribers to mark subscriber as closed under
b.registry.mu.Lock(), wait for inflight sends to drain or set a flag, then close
the channel) to ensure close(ch) cannot race with ch <- evt; update Unsubscribe,
UnsubscribeAll, removeSubscriber, removeAllSubscribers and pollGatewayWithState
accordingly to use the chosen coordination.

In `@agent-manager-service/websocket/manager.go`:
- Around line 149-158: The connection registration currently logs Subscribe
errors but continues as if successful; update the logic in the block using
m.hub.Subscribe(gatewayID) so that when Subscribe returns an error you propagate
that failure back to the caller (do not proceed with setting conn.eventSub,
m.wg.Add(1) or launching m.forwardEvents), close or cleanup the partially
created conn, and return an appropriate error/result indicating registration
failed; specifically modify the branch handling Subscribe(gatewayID) to return
on err (after logging) instead of proceeding to set conn.eventSub and start the
goroutine.

---

Outside diff comments:
In `@agent-manager-service/services/gateway_events_service.go`:
- Around line 71-103: The broadcastEvent function always sets EventHub Action to
the hardcoded string "CREATE" which loses semantic intent; update broadcastEvent
(and any callers) to accept an action parameter (e.g., action string) or map
eventType to the correct eventhub action and use that value when constructing
evt.Action; adjust GatewayEventsService.broadcastEvent signature and all
invocations to pass the appropriate action (e.g., "CREATE", "UPDATE",
"DELETE"/"REVOKE"/"UNDEPLOY" as applicable) so events published by eventhub have
correct operation-specific Action values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 08b89dd8-8026-48cd-90d0-28bc7b0c9562

📥 Commits

Reviewing files that changed from the base of the PR and between ebf2f34 and 5f9795a.

📒 Files selected for processing (13)
  • agent-manager-service/app/app.go
  • agent-manager-service/controllers/websocket_controller.go
  • agent-manager-service/db_migrations/021_add_eventhub_tables.go
  • agent-manager-service/db_migrations/migration_list.go
  • agent-manager-service/eventhub/sqlbackend.go
  • agent-manager-service/eventhub/topic.go
  • agent-manager-service/eventhub/types.go
  • agent-manager-service/services/gateway_events_service.go
  • agent-manager-service/websocket/connection.go
  • agent-manager-service/websocket/manager.go
  • agent-manager-service/wiring/params.go
  • agent-manager-service/wiring/wire.go
  • agent-manager-service/wiring/wire_gen.go

Comment thread agent-manager-service/db_migrations/021_add_eventhub_tables.go Outdated
Comment thread agent-manager-service/eventhub/sqlbackend.go
Comment thread agent-manager-service/eventhub/sqlbackend.go
Comment thread agent-manager-service/websocket/manager.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
agent-manager-service/eventhub/topic.go (1)

100-107: 🏗️ Heavy lift

Avoid returning internal *gateway pointers from gatewayRegistry.getAll.
Current callsites access gw.subscribers, gw.knownVersion, gw.lastPolled, and gw.queuedLoggedAt under gatewayRegistry.mu (so the race concern isn’t evident in current usage), but getAll still leaks mutable pointers outside the lock scope and remains an easy footgun for future changes—return snapshots or provide locked accessor/iteration APIs to enforce the locking contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent-manager-service/eventhub/topic.go` around lines 100 - 107,
gatewayRegistry.getAll currently returns slices of internal *gateway pointers
which leaks mutable state (subscribers, knownVersion, lastPolled,
queuedLoggedAt) outside the registry lock; change getAll to avoid returning
internal pointers by either returning a slice of gateway value copies (copy
fields into new gateway structs) or by replacing getAll with a locked iteration
helper like gatewayRegistry.ranged(fn *gateway) that acquires mu.RLock(), calls
the provided callback for each gateway while still holding the lock, and does
not expose internal pointers after unlock; update callsites to use the new
copying or callback API so no internal gateway pointer escapes the lock.
console/workspaces/pages/llm-providers/src/subComponents/AddLLMProviderForm.tsx (1)

606-606: 💤 Low value

Use === for consistency.

Line 606 uses == to compare length, but line 156 uses === for the same check. Prefer === for type-safe equality throughout.

✨ Proposed fix
         disabled={
           isSubmitting ||
           !formData.gatewayIds ||
-          formData.gatewayIds?.length == 0
+          formData.gatewayIds?.length === 0
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@console/workspaces/pages/llm-providers/src/subComponents/AddLLMProviderForm.tsx`
at line 606, The comparison formData.gatewayIds?.length == 0 in
AddLLMProviderForm should use strict equality; replace the loose equality
operator (==) with the strict operator (===) so it reads
formData.gatewayIds?.length === 0 to match the other check and ensure type-safe
comparison for the gatewayIds length check.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agent-manager-service/eventhub/sqlbackend.go`:
- Around line 230-245: Create a single helper (e.g., normalizeGatewayID or
normalizeGateway) that trims whitespace and validates non-empty gateway IDs, and
use it everywhere RegisterGateway, PublishEvent, Subscribe, Unsubscribe, and
UnsubscribeAll currently accept gatewayID directly; replace the inline
strings.TrimSpace/empty checks with calls to this helper, return the same
"gateway_id cannot be empty" error when validation fails, and ensure all
DB/registry calls (including b.upsertGatewayStmt.Exec and
registry.register/unregister/lookup) use the normalized value so `" gateway-1 "`
and `"gateway-1"` map to the same key.
- Around line 211-219: Initialize currently starts goroutines (pollLoop,
cleanupLoop) before validating SQLBackendConfig, which can cause panics when
PollInterval or CleanupInterval are <= 0 and incorrect purges when
RetentionPeriod <= 0; update Initialize to validate b.config.PollInterval,
b.config.CleanupInterval and b.config.RetentionPeriod (ensure each is > 0)
immediately after prepareStatements() and before b.wg.Add/starting goroutines,
returning a descriptive error if any value is invalid (or optionally set safe
defaults), so pollLoop and cleanupLoop can assume valid durations.

In
`@console/workspaces/pages/llm-providers/src/subComponents/AddLLMProviderForm.tsx`:
- Around line 155-159: The effect can overwrite user edits due to a stale
closure over formData; update the effect (useEffect) to call setFormData with a
functional updater that examines the previous state (prev) and only sets
gatewayIds when prev.gatewayIds is empty, e.g. compute the new state from prev
and gateways[0].uuid rather than spreading the outer formData; keep the effect
dependency on gateways only. Target the setFormData/formData/gatewayIds logic
inside the existing useEffect.

---

Nitpick comments:
In `@agent-manager-service/eventhub/topic.go`:
- Around line 100-107: gatewayRegistry.getAll currently returns slices of
internal *gateway pointers which leaks mutable state (subscribers, knownVersion,
lastPolled, queuedLoggedAt) outside the registry lock; change getAll to avoid
returning internal pointers by either returning a slice of gateway value copies
(copy fields into new gateway structs) or by replacing getAll with a locked
iteration helper like gatewayRegistry.ranged(fn *gateway) that acquires
mu.RLock(), calls the provided callback for each gateway while still holding the
lock, and does not expose internal pointers after unlock; update callsites to
use the new copying or callback API so no internal gateway pointer escapes the
lock.

In
`@console/workspaces/pages/llm-providers/src/subComponents/AddLLMProviderForm.tsx`:
- Line 606: The comparison formData.gatewayIds?.length == 0 in
AddLLMProviderForm should use strict equality; replace the loose equality
operator (==) with the strict operator (===) so it reads
formData.gatewayIds?.length === 0 to match the other check and ensure type-safe
comparison for the gatewayIds length check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 38fc3004-d729-4e0a-a5ef-818c5308ac7f

📥 Commits

Reviewing files that changed from the base of the PR and between 5f9795a and b4190d8.

📒 Files selected for processing (14)
  • agent-manager-service/app/app.go
  • agent-manager-service/controllers/websocket_controller.go
  • agent-manager-service/db_migrations/021_add_eventhub_tables.go
  • agent-manager-service/db_migrations/migration_list.go
  • agent-manager-service/eventhub/sqlbackend.go
  • agent-manager-service/eventhub/topic.go
  • agent-manager-service/eventhub/types.go
  • agent-manager-service/services/gateway_events_service.go
  • agent-manager-service/websocket/connection.go
  • agent-manager-service/websocket/manager.go
  • agent-manager-service/wiring/params.go
  • agent-manager-service/wiring/wire.go
  • agent-manager-service/wiring/wire_gen.go
  • console/workspaces/pages/llm-providers/src/subComponents/AddLLMProviderForm.tsx
✅ Files skipped from review due to trivial changes (1)
  • agent-manager-service/db_migrations/021_add_eventhub_tables.go

Comment thread agent-manager-service/eventhub/sqlbackend.go Outdated
Comment thread agent-manager-service/eventhub/sqlbackend.go
Comment thread agent-manager-service/eventhub/sqlbackend.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agent-manager-service/eventhub/sqlbackend.go`:
- Around line 459-470: The callsite uses b.registry.get(state.GatewayID) to
obtain a *gateway and then calls pollGatewayWithState(gw, state) after the
registry lock has been released, creating a race with
addSubscriber/removeSubscriber; fix by ensuring a safe snapshot or lock is held
while using the gateway: either change b.registry.get to return a safe
copy/immutable snapshot of the gateway (so pollGatewayWithState can read without
locking) or acquire the registry's read lock around the call to
pollGatewayWithState and access the gateway while the lock is held. Update this
callsite (b.registry.get, pollGatewayWithState) to match the chosen approach and
ensure addSubscriber/removeSubscriber in topic.go are consistent with the new
get() behavior.

In `@agent-manager-service/eventhub/topic.go`:
- Around line 110-117: The get method currently takes and releases r.mu itself
which contradicts its docstring and returns a pointer that becomes unsafe;
remove the internal locking from gatewayRegistry.get (delete
r.mu.RLock()/RUnlock()) and update its docstring to clearly state callers MUST
hold r.mu (read or write) while using the returned *gateway; then audit
callsites (e.g., addSubscriber, removeSubscriber and any other callers) to
ensure they acquire the appropriate r.mu lock before calling get, or
alternatively change callers to use a new snapshot accessor if you prefer copy
semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 72aee45e-29ff-4a5a-9732-5b135ba6bc9e

📥 Commits

Reviewing files that changed from the base of the PR and between b4190d8 and 54f4c07.

📒 Files selected for processing (3)
  • agent-manager-service/eventhub/sqlbackend.go
  • agent-manager-service/eventhub/topic.go
  • console/workspaces/pages/llm-providers/src/subComponents/AddLLMProviderForm.tsx

Comment thread agent-manager-service/eventhub/sqlbackend.go
Comment thread agent-manager-service/eventhub/topic.go
@menakaj menakaj merged commit 2ef083b into wso2:main May 26, 2026
9 checks passed
RAVEENSR added a commit to RAVEENSR/agent-manager that referenced this pull request May 26, 2026
Resolves conflicts in wiring/wire.go and wiring/wire_gen.go where
upstream added the EventHub provider (wso2#949) alongside this PR's
instrumentation catalog providers. Both provider sets end up in the
merged wire.Build calls; wire_gen.go was regenerated and formatted to
match.
RAVEENSR added a commit to RAVEENSR/agent-manager that referenced this pull request May 26, 2026
upstream/main brought in PR wso2#955 (CORS config for agent deployments)
whose migration was numbered 021. The earlier upstream merge of wso2#949
(EventHub) also landed at 021. Both files declared migration021,
breaking the build.

Renamed 021_add_cors_allow_origins.go to 022_add_cors_allow_origins.go,
updated the symbol from migration021 to migration022, bumped the
struct's ID field to 22, added the new entry to migration_list.go,
and bumped latestVersion from 21 to 22. EventHub keeps 021 since it
landed on main first.

Also picks up upstream's other PR-merge changes (amctl external agent
create, gateway connectivity tweaks, etc) cleanly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve gateway event delivery to support multi-pod deployments

2 participants