Skip to content

Trigger evaluation run on past monitor update#438

Merged
a5anka merged 4 commits intowso2:mainfrom
nadheesh:past-monitor-updates
Feb 25, 2026
Merged

Trigger evaluation run on past monitor update#438
a5anka merged 4 commits intowso2:mainfrom
nadheesh:past-monitor-updates

Conversation

@nadheesh
Copy link
Copy Markdown
Contributor

@nadheesh nadheesh commented Feb 25, 2026

Problem

When a past monitor is updated (e.g. evaluators, trace window, or sampling rate changed), no new evaluation run is triggered. Unlike future monitors which have a scheduler, past monitors only triggered a run at creation time — leaving users with stale results after updates.

Goal

Ensure past monitors trigger a fresh evaluation run on update, consistent with the create behavior.

Approach

  • In UpdateMonitor(), after the DB update succeeds, check if the monitor is a past monitor and call ExecuteMonitorRun to trigger a new evaluation run
  • Simplified CreateMonitor() by replacing the if/else block with a direct past-monitor check — no else needed since future monitors are handled by the scheduler
  • Added two integration tests: one confirming past monitor updates trigger runs, another confirming future monitor updates don't

Fixes #437

Test plan

  • TestUpdatePastMonitor_TriggersNewRun — confirms a new WorkflowRun CR is created on past monitor update
  • TestUpdateFutureMonitor_DoesNotTriggerRun — confirms no extra CR on future monitor update
  • All existing monitor tests pass (make dev-test)

Summary by CodeRabbit

  • New Features

    • Automatic evaluation runs now trigger immediately when creating or updating past monitors.
    • Evaluation Extension component added to deployments with configurable publisher API key.
  • Documentation

    • Installation guides updated to include Evaluation Extension setup steps.
    • Added configuration notes for publisher API key alignment between components.
  • Chores

    • Added code quality checks (linting, type checking, formatting validation).
    • Enhanced installation scripts to support Evaluation Extension deployment.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 25, 2026

📝 Walkthrough

Walkthrough

The PR adds automatic evaluation run triggering for past monitors during creation and update, introduces GitHub Actions CI/CD workflows for evaluation-job, updates deployment configurations to support evaluation extension installation with new secrets (encryption key and publisher API key), and enhances code quality tooling for the evaluation-job component.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow
.github/workflows/evaluation-job-pr-checks.yaml
New workflow triggering on pull requests to main affecting evaluation-job paths; defines lint (ruff), test (pytest with coverage), and type-check (mypy) jobs.
Agent Manager Configuration
agent-manager-service/.env.example, agent-manager-service/config/config_loader.go
Added PUBLISHER_API_KEY environment variable and updated config loader to read from PUBLISHER_API_KEY instead of INTERNAL_API_KEY with updated validation message.
Monitor Management Logic
agent-manager-service/services/monitor_manager.go
CreateMonitor now immediately triggers evaluation run for past monitors with rollback on failure; UpdateMonitor triggers new evaluation run for past monitors with updated config and monitor's TraceStart/TraceEnd; future monitors retain scheduler-driven behavior.
Monitor Tests
agent-manager-service/tests/monitor_test.go
Added TestUpdatePastMonitor_TriggersNewRun and TestUpdateFutureMonitor_DoesNotTriggerRun to verify past monitors trigger runs on update while future monitors do not.
Docker & Helm Deployment
deployments/docker-compose.yml, deployments/helm-charts/wso2-agent-manager/templates/agent-manager-service/deployment.yaml, deployments/helm-charts/wso2-agent-manager/templates/agent-manager-service/secret.yaml, deployments/helm-charts/wso2-agent-manager/values.yaml
Added ENCRYPTION_KEY and PUBLISHER_API_KEY environment variables with secret management; expanded secret.yaml template logic to handle encryption key generation and existing secret lookup.
Evaluation Extension Helm Chart
deployments/helm-charts/wso2-amp-evaluation-extension/values.yaml
Updated image repository to ghcr.io path, changed publisher endpoint to amp-api service, and set default publisher API key.
Quick-start Installation
deployments/quick-start/install-helpers.sh, deployments/quick-start/install.sh, deployments/scripts/setup-openchoreo.sh
Added evaluation extension installation support with new constants, installation function, and integration into main install flow; added image repository configuration for Helm upgrade.
Evaluation-job Code Quality
evaluation-job/main.py, evaluation-job/Makefile, evaluation-job/pyproject.toml, evaluation-job/test_main.py
Added explicit type hints to function signatures, introduced Makefile targets (lint, format-check, type-check), configured ruff/mypy/pytest in pyproject.toml, and reformatted test payloads for consistency.
Installation Documentation
docs/install/managed-cluster.md, docs/install/self-hosted-cluster.md
Added Evaluation Extension as fifth component with new installation steps, notes on publisherApiKey/ampEvaluation.publisher.apiKey matching requirement, and updated component lists and numbering.

Sequence Diagram

sequenceDiagram
    participant User
    participant MonitorManager as Monitor Manager
    participant Database as Database
    participant Evaluator as Evaluator/Runner

    User->>MonitorManager: CreateMonitor (Past)
    MonitorManager->>Database: Save monitor config
    alt Monitor is Past Type
        MonitorManager->>Evaluator: ExecuteMonitorRun<br/>(monitor.TraceStart/End,<br/>evaluators)
        alt Run succeeds
            Evaluator-->>MonitorManager: Run result
            MonitorManager->>Database: Update latestRun
            MonitorManager-->>User: Monitor created with run
        else Run fails
            Evaluator-->>MonitorManager: Error
            MonitorManager->>Database: Rollback monitor entry
            MonitorManager-->>User: Creation failed
        end
    else Monitor is Future Type
        MonitorManager-->>User: Monitor created<br/>(scheduler will run)
    end

    User->>MonitorManager: UpdateMonitor (Past)
    MonitorManager->>Database: Apply updates to monitor
    alt Monitor is Past Type
        MonitorManager->>Evaluator: ExecuteMonitorRun<br/>(updated config,<br/>monitor.TraceStart/End)
        Evaluator-->>MonitorManager: Run result
        MonitorManager->>Database: Update latestRun
        MonitorManager-->>User: Monitor updated with new run
    else Monitor is Future Type
        MonitorManager-->>User: Monitor updated<br/>(no immediate run)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Hoppy hops through the code with glee,
Past monitors now run instantly!
Secrets encrypted, workflows in place,
Type hints and tests keep the pace,
Evaluation flows with a springy embrace!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Changes to evaluation-job (type hints, linting, GitHub Actions workflow) and Helm/deployment configs are out of scope relative to the core objective of triggering runs on past monitor updates. Separate evaluation-job and deployment infrastructure changes into distinct PRs focused on their respective concerns; consolidate agent-manager changes addressing issue #437 in this PR.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: triggering evaluation runs when past monitors are updated.
Description check ✅ Passed The description covers Problem, Goal, and Approach sections matching the template structure, with clear problem statement, solution objectives, and implementation details.
Linked Issues check ✅ Passed Code changes directly address issue #437 by implementing ExecuteMonitorRun in UpdateMonitor for past monitors, simplifying CreateMonitor, and adding integration tests as required.
Docstring Coverage ✅ Passed Docstring coverage is 85.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
evaluation-job/test_main.py (1)

167-168: ⚠️ Potential issue | 🟡 Minor

Inaccurate version note in comment.

datetime.datetime.fromisoformat("2026-01-15") has accepted date-only strings since Python 3.7, not 3.11+. The comment should read # Python 3.7+ (or simply be removed since this is the minimum version already).

🔧 Proposed fix
-        # datetime.fromisoformat accepts date-only strings in Python 3.11+
+        # datetime.fromisoformat accepts date-only strings in Python 3.7+
         assert validate_time_format("2026-01-15") is True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@evaluation-job/test_main.py` around lines 167 - 168, The comment above the
assertion in test_main.py incorrectly claims datetime.fromisoformat accepts
date-only strings starting in Python 3.11; update that comment to the correct
minimum version (Python 3.7+) or remove the comment entirely. Locate the
assertion referencing validate_time_format("2026-01-15") and replace the comment
text "# datetime.fromisoformat accepts date-only strings in Python 3.11+" with
either "# datetime.fromisoformat accepts date-only strings since Python 3.7+" or
delete the comment so the test reads without the inaccurate version note.
agent-manager-service/config/config_loader.go (1)

171-176: ⚠️ Potential issue | 🟠 Major

Keep a fallback for INTERNAL_API_KEY during migration.

Reading only PUBLISHER_API_KEY can silently fall back to the dev default in existing setups that still provide INTERNAL_API_KEY, causing auth drift and weaker security posture.

Suggested migration-safe pattern
 	config.InternalServer = InternalServerConfig{
 		Host:                r.readOptionalString("INTERNAL_SERVER_HOST", ""),
 		Port:                int(r.readOptionalInt64("INTERNAL_SERVER_PORT", 9243)),
 		CertDir:             r.readOptionalString("INTERNAL_SERVER_CERT_DIR", "./data/certs"),
-		APIKey:              r.readOptionalString("PUBLISHER_API_KEY", "dev-publisher-api-key"),
+		APIKey:              "",
 		ReadTimeoutSeconds:  int(r.readOptionalInt64("INTERNAL_SERVER_READ_TIMEOUT_SECONDS", 10)),
 		WriteTimeoutSeconds: int(r.readOptionalInt64("INTERNAL_SERVER_WRITE_TIMEOUT_SECONDS", 90)),
 		IdleTimeoutSeconds:  int(r.readOptionalInt64("INTERNAL_SERVER_IDLE_TIMEOUT_SECONDS", 60)),
 		MaxHeaderBytes:      int(r.readOptionalInt64("INTERNAL_SERVER_MAX_HEADER_BYTES", 65536)),
 	}
+	config.InternalServer.APIKey = r.readOptionalString("PUBLISHER_API_KEY", "")
+	if config.InternalServer.APIKey == "" {
+		config.InternalServer.APIKey = r.readOptionalString("INTERNAL_API_KEY", "dev-publisher-api-key")
+		if os.Getenv("INTERNAL_API_KEY") != "" {
+			slog.Warn("INTERNAL_API_KEY is deprecated — use PUBLISHER_API_KEY")
+		}
+	}

Also applies to: 241-243

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/config/config_loader.go` around lines 171 - 176, The
config loader currently only reads PUBLISHER_API_KEY into
config.InternalServer.APIKey which can silently fallback to the dev default and
ignore an existing INTERNAL_API_KEY; update the logic that sets
config.InternalServer.APIKey (the call to r.readOptionalString for
"PUBLISHER_API_KEY") to prefer reading "INTERNAL_API_KEY" first and fall back to
"PUBLISHER_API_KEY" (and then to the existing dev default), using the same
pattern wherever you set API keys (including the similar APIKey assignment later
in the file around the other config block) so migration-era environments
providing INTERNAL_API_KEY are honored.
🧹 Nitpick comments (3)
.github/workflows/evaluation-job-pr-checks.yaml (2)

55-57: pytest-cov is installed but no coverage flags are passed to pytest.

Either add --cov=. (and optionally --cov-report=xml) to the pytest invocation to actually collect coverage, or remove pytest-cov from the install step to avoid the unused dependency.

♻️ Option A — enable coverage reporting
-          pytest test_main.py -v
+          pytest test_main.py -v --cov=. --cov-report=xml
♻️ Option B — drop unused install
-          pip install requests litellm pytest pytest-cov
+          pip install requests litellm pytest
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/evaluation-job-pr-checks.yaml around lines 55 - 57, The
workflow installs pytest-cov but never passes coverage flags to pytest; either
add coverage flags to the pytest invocation (e.g., update the step that runs
"pytest" to include "--cov=. --cov-report=xml" so pytest-cov actually collects
and emits coverage) or remove "pytest-cov" from the pip install line to avoid
the unused dependency; update the relevant CI job command that currently runs
"pytest" to include the coverage options if you choose Option A, or delete
"pytest-cov" from the pip installs if you choose Option B.

29-30: Pin ruff version to avoid non-deterministic CI failures.

pip install ruff without a version pin means a new ruff release can introduce additional rules and break CI on PRs that haven't changed any linted code. Pin to a known-good version (e.g., ruff==0.9.x) and update it deliberately.

♻️ Proposed fix
-          pip install ruff
+          pip install ruff==0.9.10  # update intentionally when upgrading
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/evaluation-job-pr-checks.yaml around lines 29 - 30, The
workflow currently installs ruff with an unpinned command ("pip install ruff");
change that to pin ruff to a known-good release (e.g., replace "pip install
ruff" with "pip install ruff==0.9.x" or a specific patch like "ruff==0.9.0") so
CI is deterministic; update the pip install line in the same job step that runs
"python -m pip install --upgrade pip" and remember to intentionally bump this
pinned version when you want to adopt a newer ruff release.
evaluation-job/main.py (1)

45-45: Prefer built-in generics over typing aliases for the Python 3.11 target.

Dict, List from typing are soft-deprecated since Python 3.9 (PEP 585). With target-version = "py311" in pyproject.toml, using the built-in forms is idiomatic and removes the import.

♻️ Proposed fix
-from typing import Dict, List, Any
+from typing import Any

Then update the type hints in publish_scores:

-def publish_scores(
-    monitor_id: str,
-    run_id: str,
-    scores: Dict[str, EvaluatorSummary],
-    display_name_to_identifier: Dict[str, str],
-    api_endpoint: str,
-    api_key: str,
-) -> bool:
+def publish_scores(
+    monitor_id: str,
+    run_id: str,
+    scores: dict[str, EvaluatorSummary],
+    display_name_to_identifier: dict[str, str],
+    api_endpoint: str,
+    api_key: str,
+) -> bool:

And the local variable annotations inside the function body:

-    individual_scores: List[Dict[str, Any]] = []
-    aggregated_scores: List[Dict[str, Any]] = []
+    individual_scores: list[dict[str, Any]] = []
+    aggregated_scores: list[dict[str, Any]] = []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@evaluation-job/main.py` at line 45, Replace the typing aliases Dict and List
with built-in generics in evaluation-job/main.py: remove Dict and List from the
import and keep only Any (i.e., change "from typing import Dict, List, Any" to
"from typing import Any"); then update all type hints in the publish_scores
function (and any local variable annotations in its body) from Dict[...] to
dict[...] and List[...] to list[...], preserving Any where used. Ensure function
signature and internal variable annotations use the native dict/list generics
throughout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/evaluation-job-pr-checks.yaml:
- Around line 13-14: The workflow uses a self-hosted runner in the runs-on field
for lint/type-check which will stall on fork PRs; update the runs-on value for
the lint and type-check jobs (the entries currently using
"codebuild-wso2_agent-manager-${{ github.run_id }}-${{ github.run_attempt }}")
to "ubuntu-latest" to ensure fork PRs run, or alternatively document/configure
your repository's self-hosted runner approval policy so those jobs can run for
external contributions; locate and change the runs-on fields for the lint and
type-check jobs accordingly.

In `@agent-manager-service/services/monitor_manager.go`:
- Around line 324-336: The update path dereferences monitor.TraceStart /
monitor.TraceEnd for models.MonitorTypePast and triggers ExecuteMonitorRun
(ExecuteMonitorRunParams) after persisting, which can crash or leave invalid
state; add validation before persisting and before calling ExecuteMonitorRun:
ensure monitor.TraceStart and monitor.TraceEnd are non-nil, verify
traceStart.Before(traceEnd) (and any domain constraints you need), return a
clear error without writing the DB if invalid, and only then proceed to persist
and call s.executor.ExecuteMonitorRun with the validated values.

In `@deployments/docker-compose.yml`:
- Around line 60-61: Remove the hardcoded ENCRYPTION_KEY value from
docker-compose defaults and require it be provided externally; replace the
literal assignment for ENCRYPTION_KEY with a reference to an environment
variable (so runtime will read process env or an env_file) or configure it to
come from Docker secrets, and add a note/example in .env.example or the README
that ENCRYPTION_KEY must be set (do not commit real keys). Update any service
environment entries that mention ENCRYPTION_KEY and ensure the compose config
will fail-fast if the variable is missing.

In `@deployments/scripts/setup-openchoreo.sh`:
- Around line 214-216: The helm upgrade command installs
"amp-evaluation-workflows-extension" and overrides
ampEvaluation.image.repository with a short name "amp-evaluation-monitor", which
can cause image-pull failures in clean environments; update the override to a
fully-qualified image repository (e.g., include registry and namespace) or
remove the --set ampEvaluation.image.repository override so the chart default is
used, ensuring the ampEvaluation.image.repository value passed to helm (in the
upgrade/install invocation) is a fully qualified registry/path.

In `@docs/install/managed-cluster.md`:
- Around line 563-572: The second helm install snippet (the one setting
ampEvaluation.publisher.apiKey) reads like a follow-up step and could make users
run two installs for the same release; mark it explicitly as an alternative
command (e.g., "Alternative: use this command instead if you changed
publisherApiKey.value") and add a short note that it replaces the default
install to avoid duplicate-release errors; reference the values publisher.apiKey
and publisherApiKey.value and the amp-evaluation-extension helm install so
reviewers can locate and update the text accordingly.

In `@docs/install/self-hosted-cluster.md`:
- Around line 506-515: The install block presents a second helm install as if
sequential, which can fail; mark it explicitly as an alternative (e.g., start
with "Alternatively," or "To override the default API key, run:") and clarify it
is an alternative to the previous helm install for the same release name;
mention the specific values to change (ampEvaluation.publisher.apiKey vs Agent
Manager's publisherApiKey.value) so users know to use this command only when
overriding the default amp-internal-api-key.

---

Outside diff comments:
In `@agent-manager-service/config/config_loader.go`:
- Around line 171-176: The config loader currently only reads PUBLISHER_API_KEY
into config.InternalServer.APIKey which can silently fallback to the dev default
and ignore an existing INTERNAL_API_KEY; update the logic that sets
config.InternalServer.APIKey (the call to r.readOptionalString for
"PUBLISHER_API_KEY") to prefer reading "INTERNAL_API_KEY" first and fall back to
"PUBLISHER_API_KEY" (and then to the existing dev default), using the same
pattern wherever you set API keys (including the similar APIKey assignment later
in the file around the other config block) so migration-era environments
providing INTERNAL_API_KEY are honored.

In `@evaluation-job/test_main.py`:
- Around line 167-168: The comment above the assertion in test_main.py
incorrectly claims datetime.fromisoformat accepts date-only strings starting in
Python 3.11; update that comment to the correct minimum version (Python 3.7+) or
remove the comment entirely. Locate the assertion referencing
validate_time_format("2026-01-15") and replace the comment text "#
datetime.fromisoformat accepts date-only strings in Python 3.11+" with either "#
datetime.fromisoformat accepts date-only strings since Python 3.7+" or delete
the comment so the test reads without the inaccurate version note.

---

Nitpick comments:
In @.github/workflows/evaluation-job-pr-checks.yaml:
- Around line 55-57: The workflow installs pytest-cov but never passes coverage
flags to pytest; either add coverage flags to the pytest invocation (e.g.,
update the step that runs "pytest" to include "--cov=. --cov-report=xml" so
pytest-cov actually collects and emits coverage) or remove "pytest-cov" from the
pip install line to avoid the unused dependency; update the relevant CI job
command that currently runs "pytest" to include the coverage options if you
choose Option A, or delete "pytest-cov" from the pip installs if you choose
Option B.
- Around line 29-30: The workflow currently installs ruff with an unpinned
command ("pip install ruff"); change that to pin ruff to a known-good release
(e.g., replace "pip install ruff" with "pip install ruff==0.9.x" or a specific
patch like "ruff==0.9.0") so CI is deterministic; update the pip install line in
the same job step that runs "python -m pip install --upgrade pip" and remember
to intentionally bump this pinned version when you want to adopt a newer ruff
release.

In `@evaluation-job/main.py`:
- Line 45: Replace the typing aliases Dict and List with built-in generics in
evaluation-job/main.py: remove Dict and List from the import and keep only Any
(i.e., change "from typing import Dict, List, Any" to "from typing import Any");
then update all type hints in the publish_scores function (and any local
variable annotations in its body) from Dict[...] to dict[...] and List[...] to
list[...], preserving Any where used. Ensure function signature and internal
variable annotations use the native dict/list generics throughout.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1e050d and 0bbcba2.

📒 Files selected for processing (20)
  • .github/workflows/evaluation-job-pr-checks.yaml
  • agent-manager-service/.env.example
  • agent-manager-service/config/config_loader.go
  • agent-manager-service/services/monitor_manager.go
  • agent-manager-service/tests/monitor_test.go
  • deployments/docker-compose.yml
  • deployments/helm-charts/wso2-agent-manager/templates/agent-manager-service/deployment.yaml
  • deployments/helm-charts/wso2-agent-manager/templates/agent-manager-service/secret.yaml
  • deployments/helm-charts/wso2-agent-manager/values.yaml
  • deployments/helm-charts/wso2-amp-evaluation-extension/values.yaml
  • deployments/quick-start/install-helpers.sh
  • deployments/quick-start/install.sh
  • deployments/scripts/setup-openchoreo.sh
  • docs/install/managed-cluster.md
  • docs/install/self-hosted-cluster.md
  • evaluation-job/Makefile
  • evaluation-job/main.py
  • evaluation-job/pyproject.toml
  • evaluation-job/test_main.py
  • libs/amp-evaluation/src/amp_evaluation/py.typed

Comment on lines +13 to +14
runs-on:
- codebuild-wso2_agent-manager-${{ github.run_id }}-${{ github.run_attempt }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .github/workflows/evaluation-job-pr-checks.yaml

Repository: wso2/agent-manager

Length of output: 2693


🌐 Web query:

GitHub Actions self-hosted runners fork pull requests unavailable behavior

💡 Result:

GitHub Actions treating pull requests from forks “as unavailable” on self-hosted runners is usually expected security behavior:

  • Public repositories: PR workflows from forks may not run automatically and can require a maintainer approval first (policy: first-time contributors / all outside collaborators). Until approved, the run won’t start, which can look like runner unavailability. [1][2]
  • Private/internal repositories: Fork PR workflows are disabled by default unless you enable the “Run workflows from fork pull requests” policy (set at enterprise/org level first, then repo). When enabled, fork PR runs get a read-only GITHUB_TOKEN and no secrets (unless you explicitly allow write tokens/secrets). [3][2]
  • Why GitHub is strict here: Self-hosted runners are not guaranteed to be ephemeral/clean and can be persistently compromised by untrusted PR code—especially from forks—so GitHub strongly cautions against using self-hosted runners for public repos. [4]

Most common fixes/checks

  • For public repos: repo/org settings → Actions → General → set/confirm the fork PR approval policy, then approve pending runs. [1]
  • For private/internal repos: enable Run workflows from fork pull requests (and optionally other fork policies) at the org/enterprise level, then the repo. [3]

Sources
[1] GitHub Docs – Managing Actions settings / fork PR approvals in public repos (docs.github.com)
[2] GitHub Docs – Enterprise policy: fork PR workflows & notes (docs.github.com)
[3] GitHub Enterprise Server Docs – Fork PR workflows in private repos (run/write/secrets/approval) (docs.github.com)
[4] GitHub Docs – Secure use reference (self-hosted runner hardening) (docs.github.com)

Citations:


Lint and type-check use CodeBuild self-hosted runners while test uses ubuntu-latest — this will cause lint and type-check to stall for fork PRs.

Self-hosted runners require explicit approval or configuration to run on fork pull requests. Without this, the lint and type-check jobs will hang indefinitely on external contributor PRs, while the test job (on ubuntu-latest) will complete successfully, creating a false sense of safety. Either use ubuntu-latest for all three jobs for consistency and to ensure fork PRs complete, or document and configure the fork PR runner approval policy in your repository settings.

Also applies to: 65-66

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/evaluation-job-pr-checks.yaml around lines 13 - 14, The
workflow uses a self-hosted runner in the runs-on field for lint/type-check
which will stall on fork PRs; update the runs-on value for the lint and
type-check jobs (the entries currently using "codebuild-wso2_agent-manager-${{
github.run_id }}-${{ github.run_attempt }}") to "ubuntu-latest" to ensure fork
PRs run, or alternatively document/configure your repository's self-hosted
runner approval policy so those jobs can run for external contributions; locate
and change the runs-on fields for the lint and type-check jobs accordingly.

Comment on lines +324 to +336
if monitor.Type == models.MonitorTypePast {
// Past monitors: trigger a new evaluation run with updated config
result, err := s.executor.ExecuteMonitorRun(ctx, ExecuteMonitorRunParams{
OrgName: orgName,
Monitor: monitor,
StartTime: *monitor.TraceStart,
EndTime: *monitor.TraceEnd,
Evaluators: monitor.Evaluators,
})
if err != nil {
s.logger.Error("Failed to trigger past monitor run after update", "name", monitorName, "error", err)
return nil, fmt.Errorf("monitor updated but failed to trigger evaluation run: %w", err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate past-monitor trace window before running (and before persisting).

The update path dereferences TraceStart/TraceEnd directly and triggers execution after the DB write. Missing/invalid windows can produce unstable behavior and leave invalid persisted state before returning an error.

Suggested fix
+	if monitor.Type == models.MonitorTypePast {
+		if monitor.TraceStart == nil || monitor.TraceEnd == nil {
+			return nil, fmt.Errorf("traceStart and traceEnd are required for past monitors: %w", utils.ErrInvalidInput)
+		}
+		if !monitor.TraceEnd.After(*monitor.TraceStart) {
+			return nil, fmt.Errorf("traceEnd must be after traceStart: %w", utils.ErrInvalidInput)
+		}
+		if monitor.TraceEnd.After(time.Now()) {
+			return nil, fmt.Errorf("traceEnd must not be in the future: %w", utils.ErrInvalidInput)
+		}
+	}
 	if err := s.monitorRepo.UpdateMonitor(monitor); err != nil {
 		return nil, fmt.Errorf("failed to update monitor: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@agent-manager-service/services/monitor_manager.go` around lines 324 - 336,
The update path dereferences monitor.TraceStart / monitor.TraceEnd for
models.MonitorTypePast and triggers ExecuteMonitorRun (ExecuteMonitorRunParams)
after persisting, which can crash or leave invalid state; add validation before
persisting and before calling ExecuteMonitorRun: ensure monitor.TraceStart and
monitor.TraceEnd are non-nil, verify traceStart.Before(traceEnd) (and any domain
constraints you need), return a clear error without writing the DB if invalid,
and only then proceed to persist and call s.executor.ExecuteMonitorRun with the
validated values.

Comment on lines +60 to +61
# Encryption Configuration (AES-256-GCM, hex-encoded 32-byte key)
- ENCRYPTION_KEY=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid committing a fixed ENCRYPTION_KEY in compose defaults.

A hardcoded shared key weakens confidentiality for encrypted-at-rest data across environments. Require it via external env instead.

Suggested fix
-      # Encryption Configuration (AES-256-GCM, hex-encoded 32-byte key)
-      - ENCRYPTION_KEY=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
+      # Encryption Configuration (AES-256-GCM, hex-encoded 32-byte key)
+      - ENCRYPTION_KEY=${ENCRYPTION_KEY:?Set ENCRYPTION_KEY (openssl rand -hex 32)}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Encryption Configuration (AES-256-GCM, hex-encoded 32-byte key)
- ENCRYPTION_KEY=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
# Encryption Configuration (AES-256-GCM, hex-encoded 32-byte key)
- ENCRYPTION_KEY=${ENCRYPTION_KEY:?Set ENCRYPTION_KEY (openssl rand -hex 32)}
🧰 Tools
🪛 Gitleaks (8.30.0)

[high] 61-61: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/docker-compose.yml` around lines 60 - 61, Remove the hardcoded
ENCRYPTION_KEY value from docker-compose defaults and require it be provided
externally; replace the literal assignment for ENCRYPTION_KEY with a reference
to an environment variable (so runtime will read process env or an env_file) or
configure it to come from Docker secrets, and add a note/example in .env.example
or the README that ENCRYPTION_KEY must be set (do not commit real keys). Update
any service environment entries that mention ENCRYPTION_KEY and ensure the
compose config will fail-fast if the variable is missing.

Comment on lines 214 to 216
helm upgrade --install amp-evaluation-workflows-extension "${SCRIPT_DIR}/../helm-charts/wso2-amp-evaluation-extension" --namespace openchoreo-build-plane \
--set ampEvaluation.image.repository="amp-evaluation-monitor" \
--set ampEvaluation.publisher.endpoint="http://agent-manager-service:8080" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a fully qualified evaluation image repository.

This override sets a short repository name, which can resolve to an unintended registry and cause image pull failures in clean environments. Prefer the chart’s full repo value (or remove the override if the chart default is already correct).

Suggested fix
 helm upgrade --install amp-evaluation-workflows-extension "${SCRIPT_DIR}/../helm-charts/wso2-amp-evaluation-extension" --namespace openchoreo-build-plane \
-  --set ampEvaluation.image.repository="amp-evaluation-monitor" \
+  --set ampEvaluation.image.repository="ghcr.io/wso2/amp-evaluation-monitor" \
   --set ampEvaluation.publisher.endpoint="http://agent-manager-service:8080" \
   --set ampEvaluation.publisher.apiKey="dev-publisher-api-key"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
helm upgrade --install amp-evaluation-workflows-extension "${SCRIPT_DIR}/../helm-charts/wso2-amp-evaluation-extension" --namespace openchoreo-build-plane \
--set ampEvaluation.image.repository="amp-evaluation-monitor" \
--set ampEvaluation.publisher.endpoint="http://agent-manager-service:8080" \
helm upgrade --install amp-evaluation-workflows-extension "${SCRIPT_DIR}/../helm-charts/wso2-amp-evaluation-extension" --namespace openchoreo-build-plane \
--set ampEvaluation.image.repository="ghcr.io/wso2/amp-evaluation-monitor" \
--set ampEvaluation.publisher.endpoint="http://agent-manager-service:8080" \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deployments/scripts/setup-openchoreo.sh` around lines 214 - 216, The helm
upgrade command installs "amp-evaluation-workflows-extension" and overrides
ampEvaluation.image.repository with a short name "amp-evaluation-monitor", which
can cause image-pull failures in clean environments; update the override to a
fully-qualified image repository (e.g., include registry and namespace) or
remove the --set ampEvaluation.image.repository override so the chart default is
used, ensuring the ampEvaluation.image.repository value passed to helm (in the
upgrade/install invocation) is a fully qualified registry/path.

Comment on lines +563 to +572
**Note:** The default `publisher.apiKey` must match the `publisherApiKey.value` configured in the Agent Manager chart (Step 4). Both default to `amp-internal-api-key`. If you changed the Agent Manager's `publisherApiKey.value`, override it here:

```bash
helm install amp-evaluation-extension \
oci://${HELM_CHART_REGISTRY}/wso2-amp-evaluation-extension \
--version 0.0.0-dev \
--namespace ${BUILD_CI_NS} \
--set ampEvaluation.publisher.apiKey="your-custom-key" \
--timeout 1800s
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify this as an alternative command (not a follow-up).

As written, users may run both helm install commands for the same release, and the second one will fail because the release already exists.

✏️ Suggested wording tweak
-**Note:** The default `publisher.apiKey` must match the `publisherApiKey.value` configured in the Agent Manager chart (Step 4). Both default to `amp-internal-api-key`. If you changed the Agent Manager's `publisherApiKey.value`, override it here:
+**Alternative (use this instead of the previous install command when using a custom key):**  
+The `ampEvaluation.publisher.apiKey` value must match `agentManagerService.config.publisherApiKey.value` from the Agent Manager chart (Step 4).

Based on learnings: In docs/install/, show a default helm install command and a separate alternate helm install command for customized scenarios (not sequential steps).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/install/managed-cluster.md` around lines 563 - 572, The second helm
install snippet (the one setting ampEvaluation.publisher.apiKey) reads like a
follow-up step and could make users run two installs for the same release; mark
it explicitly as an alternative command (e.g., "Alternative: use this command
instead if you changed publisherApiKey.value") and add a short note that it
replaces the default install to avoid duplicate-release errors; reference the
values publisher.apiKey and publisherApiKey.value and the
amp-evaluation-extension helm install so reviewers can locate and update the
text accordingly.

Comment on lines +506 to +515
**Note:** The default `publisher.apiKey` must match the `publisherApiKey.value` configured in the Agent Manager chart (Step 3). Both default to `amp-internal-api-key`. If you changed the Agent Manager's `publisherApiKey.value`, override it here:

```bash
helm install amp-evaluation-extension \
oci://${HELM_CHART_REGISTRY}/wso2-amp-evaluation-extension \
--version 0.0.0-dev \
--namespace ${BUILD_CI_NS} \
--set ampEvaluation.publisher.apiKey="your-custom-key" \
--timeout 1800s
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Mark this install block as an alternative command.

This currently reads like a second sequential helm install for the same release name, which will fail if users already ran the default command above.

✏️ Suggested wording tweak
-**Note:** The default `publisher.apiKey` must match the `publisherApiKey.value` configured in the Agent Manager chart (Step 3). Both default to `amp-internal-api-key`. If you changed the Agent Manager's `publisherApiKey.value`, override it here:
+**Alternative (use this instead of the previous install command when using a custom key):**  
+The `ampEvaluation.publisher.apiKey` value must match `agentManagerService.config.publisherApiKey.value` from the Agent Manager chart (Step 3).

Based on learnings: In docs/install/, present a default helm install command and an alternate customized helm install command as alternatives, not sequential steps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/install/self-hosted-cluster.md` around lines 506 - 515, The install
block presents a second helm install as if sequential, which can fail; mark it
explicitly as an alternative (e.g., start with "Alternatively," or "To override
the default API key, run:") and clarify it is an alternative to the previous
helm install for the same release name; mention the specific values to change
(ampEvaluation.publisher.apiKey vs Agent Manager's publisherApiKey.value) so
users know to use this command only when overriding the default
amp-internal-api-key.

@a5anka a5anka merged commit 56af4d4 into wso2:main Feb 25, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trigger evaluation run when updating a past monitor

2 participants