Skip to content

Feat/k8s based integration tests#2

Merged
webpiratt merged 7 commits intomainfrom
feat/k8s-based-integration-tests
Feb 25, 2026
Merged

Feat/k8s based integration tests#2
webpiratt merged 7 commits intomainfrom
feat/k8s-based-integration-tests

Conversation

@4others
Copy link
Collaborator

@4others 4others commented Feb 17, 2026

Summary by CodeRabbit

  • New Features

    • Downloadable artifacts (seeder/smoke/integration logs) per test run.
    • Results dashboard: list and detail views with plugin/status filtering and pagination.
    • Test run support for custom plugin endpoint and API key.
    • CLI test runner with seed, smoke, and integration workflows plus end-to-end policy create/read/delete tests.
  • Infrastructure

    • Distributed worker service for parallel test execution and automated cleanup.
    • Kubernetes manifests for local and production deployments (server, worker, DB, Redis, MinIO, RBAC, ingress, secrets).

@4others 4others force-pushed the feat/k8s-based-integration-tests branch from e1179e0 to 6347181 Compare February 17, 2026 15:04
@4others 4others force-pushed the feat/k8s-based-integration-tests branch from 6347181 to 0ece817 Compare February 19, 2026 15:15
@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

Walkthrough

Adds a testrunner CLI and worker binary, refactors Makefile build/docker targets, extends configuration and DB schema for plugin/policy support, implements S3-backed artifacts and extensive Kubernetes orchestration, and adds many k8s/minikube/production manifests plus unit and integration test utilities.

Changes

Cohort / File(s) Summary
Build system
Makefile
Replaced single build-api with build-server, build-worker, build-testrunner; added docker-* image targets and docker-all; updated PHONY list and CGO for testrunner.
CLI entries
cmd/server/main.go, cmd/worker/main.go, cmd/testrunner/main.go
New testrunner and worker mains; server main updated to pass config object to NewServer.
Configuration
config/config.go
Expanded API and Worker configs (ArtifactS3, K8sJobConfig, durations, env prefixes, kubeconfig fallback, concurrency, etc.).
API server & handlers
internal/api/server.go, internal/api/handler.go, internal/api/artifacts.go, internal/api/response.go
New NewServer(cfg...) signature; added artifact GET handler (S3); request payloads extended with plugin endpoint/API key; Response.Data uses any.
API templates & results UI
internal/api/results.go, internal/api/templates/*
Added HTML templates (layout, list, detail), results list/detail handlers, template helpers and pagination.
Queue & payloads
internal/queue/tasks.go, internal/queue/redis_test.go
Extended TestRunPayload with PluginEndpoint/PluginAPIKey; added Redis option unit tests.
Storage/sqlc queries & migrations
internal/storage/postgres/..., internal/storage/postgres/sqlc/test_runs.sql
List/CountTestRuns now support plugin_id/status filters; added GetDistinctPluginIDs and MarkStaleRunAsError; removed pgcrypto extension lines from schema/migration.
Artifacts & worker upload
internal/worker/artifacts.go, internal/api/artifacts.go
New ArtifactUploader for S3 uploads; readArtifact helper and artifact GET handler with name whitelist and size limit.
Worker: consumer, runner, janitor, k8s helpers
internal/worker/consumer.go, internal/worker/runner.go, internal/worker/janitor.go, internal/worker/k8s.go, internal/worker/manifests.go, internal/worker/naming.go
Added Consumer to handle tasks, Runner to orchestrate infra/jobs, Janitor to clean stale namespaces and mark runs, extensive k8s helpers and manifest builders, DNS-safe naming utilities.
Worker tests & manifests tests
internal/worker/*_test.go
Extensive unit tests for k8s helpers, manifest builders, runner, naming utilities, and related behavior using fake k8s client.
Testrunner package & DB
internal/testrunner/* (many files), internal/testrunner/sqlc.yaml
New testrunner package: fixtures, Seeder, TestClient, JWT helper/tests, sqlc-generated DB layer/models/queries, seeder/upsert SQL, seeding logic and S3 bucket creation.
MPC, participant & policy flow
internal/testrunner/mpc/wrappers.go, internal/testrunner/participant.go, internal/testrunner/policy.go
Added MPC wrapper across backends, participant install/reshare orchestration, and policy CRUD/signing flow (MPC-based signing and HTTP interactions).
Testrunner orchestration & tests
internal/testrunner/tests.go, internal/testrunner/*_test.go
TestSuite orchestration for health/plugin/policy tests and many supporting tests for JWT and fixture handling.
Types & unit tests
internal/types/test_run_test.go
Added tests validating TestRunFromQuery conversion across nullable/invalid fields.
Manifests: minikube & production
deploy/minikube/*, deploy/production/*
Added minikube namespace, Postgres, Redis, MinIO, server/worker, RBAC; production namespaces, PVC-backed Postgres/MinIO, server/worker deployments, ingress, cert-manager certs, configmaps, and secrets.
Fixtures tool
scripts/fixture-gen/main.go
Added utility to decode .vult files into JSON fixtures with optional decryption and printed output.
Go modules & deps
go.mod
Added many dependencies (AWS SDK, k8s clients, jwt v5, ethereum, vultisig modules, protobuf/genproto/grpc changes) and replace directives.
Misc tests
internal/queue/redis_test.go, other *_test.go
Added/expanded unit tests across queue, testrunner, worker, and types packages.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat/k8s based integration tests' is concise and clearly summarizes the main feature addition—Kubernetes-based integration test infrastructure with new worker service, testrunner CLI, and supporting deployment manifests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/k8s-based-integration-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@4others 4others marked this pull request as ready for review February 24, 2026 11:27
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/queue/tasks.go (1)

7-17: ⚠️ Potential issue | 🟠 Major

PluginAPIKey should not be passed as plaintext in the queue payload.

The API key is JSON-marshaled and stored unencrypted in the queue backend for 24 hours (lines 20, 29 in producer.go). Store a secret reference, encrypted token, or temporary credential instead.

Evidence
  • handler.go:79: Raw API key from request is placed directly into payload
  • producer.go:20: Payload is JSON-marshaled without encryption
  • producer.go:29: Task retained for 24 hours in backend
  • tasks.go:17: PluginAPIKey is a plain string field
  • consumer.go:42: Unmarshaled into plain string (no decryption)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/queue/tasks.go` around lines 7 - 17, The TestRunPayload currently
includes PluginAPIKey as a plaintext string; change the payload to carry a
secret reference or an encrypted token instead (e.g. rename PluginAPIKey ->
PluginAPIKeyRef or PluginAPIKeyEncrypted) and update all call sites: in the
request handler (handler.go) store a secret reference or encrypt the key before
placing it on the payload, in producer.go continue JSON-marshalling but with the
reference/encrypted value, and in consumer.go retrieve/decrypt the secret at
runtime using the reference (or fetch from the secrets manager) before using the
API key; ensure all symbols referencing PluginAPIKey (TestRunPayload, handler
code that assigns it, producer code that marshals it, consumer code that
unmarshals it) are updated accordingly and any temporary credentials are
short-lived.
🟠 Major comments (27)
deploy/minikube/postgres.yaml-20-26 (1)

20-26: ⚠️ Potential issue | 🟠 Major

Avoid hard-coded Postgres credentials in the manifest.
These values will be committed to the repo and can leak into CI logs. Prefer a Secret and reference it via valueFrom.

🔐 Suggested change (reference a Secret)
           env:
             - name: POSTGRES_USER
-              value: vultisig
+              valueFrom:
+                secretKeyRef:
+                  name: postgres-credentials
+                  key: username
             - name: POSTGRES_PASSWORD
-              value: vultisig
+              valueFrom:
+                secretKeyRef:
+                  name: postgres-credentials
+                  key: password
             - name: POSTGRES_DB
               value: plugin-tests
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/minikube/postgres.yaml` around lines 20 - 26, Replace the hard-coded
env values for POSTGRES_USER, POSTGRES_PASSWORD, and POSTGRES_DB in the
Deployment/Pod spec with references to a Kubernetes Secret: create a Secret
containing keys for user, password, and db, then change each env entry to use
valueFrom.secretKeyRef pointing at that Secret (keeping the env names
POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB) so credentials are not stored in
the manifest or logs.
deploy/minikube/server.yaml-22-41 (1)

22-41: ⚠️ Potential issue | 🟠 Major

Move DB/S3 credentials into Secrets to match production environment.

Inline credentials in deploy/minikube/server.yaml are a security risk and should be sourced from Kubernetes Secrets. The production environment already follows this pattern with separate Secret resources for postgres, redis, and minio—minikube should match this approach.

🔐 Suggested change (align with production pattern)
         - name: PLUGIN_TESTS_API_DATABASE_DSN
-          value: "postgres://vultisig:vultisig@postgres:5432/plugin-tests?sslmode=disable"
+          valueFrom:
+            secretKeyRef:
+              name: postgres
+              key: dsn
         - name: PLUGIN_TESTS_API_ARTIFACT_S3_ACCESS_KEY
-          value: "minioadmin"
+          valueFrom:
+            secretKeyRef:
+              name: minio
+              key: root-user
         - name: PLUGIN_TESTS_API_ARTIFACT_S3_SECRET_KEY
-          value: "minioadmin"
+          valueFrom:
+            secretKeyRef:
+              name: minio
+              key: root-password

Create or update deploy/minikube/secrets.yaml with Secret definitions for postgres and minio (following the structure in deploy/production/secrets.yaml).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/minikube/server.yaml` around lines 22 - 41, The YAML currently
hardcodes DB and S3 credentials (env vars like PLUGIN_TESTS_API_DATABASE_DSN,
PLUGIN_TESTS_API_QUEUE_REDIS_HOST/PORT, PLUGIN_TESTS_API_ARTIFACT_S3_ACCESS_KEY,
PLUGIN_TESTS_API_ARTIFACT_S3_SECRET_KEY, PLUGIN_TESTS_API_ARTIFACT_S3_BUCKET) in
deploy/minikube/server.yaml; instead create a deploy/minikube/secrets.yaml
defining Kubernetes Secrets (matching production Secret names such as the
postgres and minio secrets) with the corresponding keys (e.g., database dsn,
redis host/port if desired, s3 access_key/secret_key/bucket) and then update
deploy/minikube/server.yaml to remove inline values and use valueFrom:
secretKeyRef (or envFrom) to reference those Secret keys by name (ensure Secret
keys match the env var names expected by the app).
deploy/minikube/redis.yaml-16-33 (1)

16-33: ⚠️ Potential issue | 🟠 Major

Include complete securityContext with runAsUser for Redis.
The redis:7 image creates a redis user (UID 999) and runs as non-root via the entrypoint, but you must also specify runAsUser: 999 and runAsGroup: 999 in securityContext—setting only runAsNonRoot: true without the UID may cause permission failures on the /data volume.

🛡️ Corrected hardening
         - name: redis
           image: redis:7
           imagePullPolicy: IfNotPresent
+          securityContext:
+            runAsNonRoot: true
+            runAsUser: 999
+            runAsGroup: 999
+            allowPrivilegeEscalation: false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/minikube/redis.yaml` around lines 16 - 33, The Redis container spec
(container name "redis") is missing a complete securityContext and only using
runAsNonRoot which can still lead to /data permission issues; add a
securityContext on the container (or pod) for the "redis" container that sets
runAsUser: 999, runAsGroup: 999 and runAsNonRoot: true (optionally also set
fsGroup if using a shared volume) so the process runs as the redis UID/GID and
can write to the mounted /data volume.
deploy/minikube/minio.yaml-15-40 (1)

15-40: ⚠️ Potential issue | 🟠 Major

Add container securityContext for MinIO.

Defaults allow privilege escalation and may run as root. Set a restrictive security context to align with least-privilege and address static-analysis findings.

🔒 Suggested securityContext
       containers:
         - name: minio
           image: minio/minio:latest
+          securityContext:
+            allowPrivilegeEscalation: false
+            runAsNonRoot: true
+            runAsUser: 10001
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/minikube/minio.yaml` around lines 15 - 40, Add a restrictive
container-level securityContext for the MinIO container (the container with name
"minio") to prevent running as root and to disable privilege escalation: set
runAsNonRoot: true and a non-root runAsUser (e.g., 1000), set
allowPrivilegeEscalation: false, drop all Linux capabilities, and enable
readOnlyRootFilesystem: true (and optionally set runAsGroup/fsGroup at the pod
level if needed); place this securityContext under the "minio" container spec so
it applies to the container while leaving existing readinessProbe, ports, env
and resource settings intact.
scripts/fixture-gen/main.go-18-20 (1)

18-20: ⚠️ Potential issue | 🟠 Major

Don’t embed a default vault password in the CLI.

Hardcoding a real-looking password risks accidental exposure and incorrect usage. Default to empty and require explicit input when decryption is needed.

🔐 Suggested change
-	password := flag.String("password", "Saggy@Commotion@Occupier@Registry1", "vault encryption password")
+	password := flag.String("password", "", "vault encryption password (required if encrypted)")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/fixture-gen/main.go` around lines 18 - 20, Remove the hardcoded
default password by changing the password flag default to an empty string
(update the password := flag.String("password", "", "...") declaration and its
help text) and add a post-flag.Parse() validation that checks the password
variable when a vultFile is provided (vultFile)—if decryption is required and
password is empty, log an error and exit so callers must explicitly supply the
vault password rather than relying on an embedded default.
internal/testrunner/jwt.go-17-29 (1)

17-29: ⚠️ Potential issue | 🟠 Major

Validate expireHours to prevent immediately-expired tokens.

expireHours <= 0 yields tokens that are already expired (or expiring instantly), which will break auth flows unpredictably.

✅ Suggested guard
 func GenerateJWT(secret, pubkey, tokenID string, expireHours int) (string, error) {
 	if secret == "" || pubkey == "" {
 		return "", fmt.Errorf("secret and pubkey are required")
 	}
+	if expireHours <= 0 {
+		return "", fmt.Errorf("expireHours must be greater than zero")
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/testrunner/jwt.go` around lines 17 - 29, GenerateJWT currently
accepts expireHours which if <= 0 produces an already-expired token; update
GenerateJWT to validate expireHours and reject or normalize invalid values.
Inside GenerateJWT (function GenerateJWT), check the expireHours parameter at
the start and either return a clear error (e.g., "expireHours must be > 0") or
set a sensible default positive TTL before computing expirationTime; ensure the
validation happens before computing expirationTime and creating Claims so no
token is ever created with ExpiresAt in the past.
deploy/minikube/minio.yaml-21-25 (1)

21-25: ⚠️ Potential issue | 🟠 Major

Avoid default MinIO root credentials in the manifest.

minioadmin/minioadmin is a known default and should not be committed. Use a Secret (or external secret manager) and rotate.

🔐 Suggested change
-            - name: MINIO_ROOT_USER
-              value: minioadmin
-            - name: MINIO_ROOT_PASSWORD
-              value: minioadmin
+            - name: MINIO_ROOT_USER
+              valueFrom:
+                secretKeyRef:
+                  name: minio-secrets
+                  key: root_user
+            - name: MINIO_ROOT_PASSWORD
+              valueFrom:
+                secretKeyRef:
+                  name: minio-secrets
+                  key: root_password
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/minikube/minio.yaml` around lines 21 - 25, The manifest currently
hardcodes MINIO_ROOT_USER and MINIO_ROOT_PASSWORD as "minioadmin" — remove those
literal env entries and instead reference a Kubernetes Secret; create a Secret
(e.g., name minio-creds) holding keys MINIO_ROOT_USER and MINIO_ROOT_PASSWORD
and update the env entries in deploy/minikube/minio.yaml to use
valueFrom.secretKeyRef for MINIO_ROOT_USER and MINIO_ROOT_PASSWORD (or use
envFrom: secretRef) so no credentials are committed; ensure you generate strong
values and document rotation/storage in your secret manager.
deploy/minikube/worker.yaml-16-81 (1)

16-81: ⚠️ Potential issue | 🟠 Major

Set a restrictive securityContext for the worker container.

The pod currently relies on default privileges. Add allowPrivilegeEscalation: false and run as non-root to match least-privilege guidance and avoid Checkov findings.

🔒 Suggested securityContext
       containers:
         - name: worker
           image: plugin-tests-worker:latest
+          securityContext:
+            allowPrivilegeEscalation: false
+            runAsNonRoot: true
+            runAsUser: 10001
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/minikube/worker.yaml` around lines 16 - 81, The worker container
currently runs with default privileges; add a restrictive securityContext under
the container named "worker" to prevent privilege escalation and ensure non-root
execution: set allowPrivilegeEscalation: false, runAsNonRoot: true, specify a
non-root runAsUser (e.g. 1000), and drop all Linux capabilities via
capabilities: { drop: ["ALL"] } (optionally add seccompProfile: { type:
"RuntimeDefault" }) to the container spec to enforce least-privilege.
deploy/minikube/worker.yaml-23-64 (1)

23-64: ⚠️ Potential issue | 🟠 Major

Move secrets/credentials to Kubernetes Secrets.

DB DSN, encryption/JWT secrets, API keys, and S3 credentials are hardcoded in the manifest. This risks leakage and makes rotation difficult. Use valueFrom.secretKeyRef (or external secret manager) and keep plaintext out of the Deployment.

🔐 Suggested pattern (apply to all sensitive env vars)
-            - name: PLUGIN_TESTS_WORKER_DATABASE_DSN
-              value: "postgres://vultisig:vultisig@postgres:5432/plugin-tests?sslmode=disable"
+            - name: PLUGIN_TESTS_WORKER_DATABASE_DSN
+              valueFrom:
+                secretKeyRef:
+                  name: worker-secrets
+                  key: database_dsn
...
-            - name: PLUGIN_TESTS_WORKER_K8S_JWT_SECRET
-              value: mysecret
+            - name: PLUGIN_TESTS_WORKER_K8S_JWT_SECRET
+              valueFrom:
+                secretKeyRef:
+                  name: worker-secrets
+                  key: jwt_secret
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/minikube/worker.yaml` around lines 23 - 64, The manifest exposes
secrets like PLUGIN_TESTS_WORKER_DATABASE_DSN,
PLUGIN_TESTS_WORKER_K8S_ENCRYPTION_SECRET, PLUGIN_TESTS_WORKER_K8S_JWT_SECRET,
PLUGIN_TESTS_WORKER_K8S_PLUGIN_API_KEY,
PLUGIN_TESTS_WORKER_ARTIFACT_S3_ACCESS_KEY and
PLUGIN_TESTS_WORKER_ARTIFACT_S3_SECRET_KEY; replace each env var's value with
valueFrom.secretKeyRef referencing keys in a Kubernetes Secret (create a Secret
containing dsn, encryption_secret, jwt_secret, plugin_api_key, s3_access_key,
s3_secret_key or similar), update the Deployment to use those secretKeyRefs for
the corresponding env names (e.g., PLUGIN_TESTS_WORKER_DATABASE_DSN ->
secretKeyRef: name: plugin-tests-secrets, key: database_dsn), and ensure any
plaintext copies are removed from the manifest and documented for rotation or
external secret manager use.
go.mod-5-206 (1)

5-206: ⚠️ Potential issue | 🟠 Major

Upgrade vulnerable dependencies to patched versions.

Static analysis correctly identifies high-severity vulnerabilities in key dependencies:

  • github.com/ethereum/go-ethereum (v1.15.11): Remote DoS via p2p messages (GO-2026-4314, GO-2026-4315). Upgrade to v1.16.8+.

  • github.com/cometbft/cometbft (v0.38.17 indirect): Two advisories unpatched—upgrade to v0.38.19 (GO-2025-4025) and v0.38.21 (GO-2026-4361).

  • github.com/cosmos/cosmos-sdk (v0.50.11 indirect): Three advisories unpatched—upgrade to v0.50.12 (GO-2025-3476), v0.50.13 (GO-2025-3516), and v0.50.14 (GO-2025-3803), affecting group, staking, and distribution modules.

  • github.com/consensys/gnark-crypto (v0.16.0 indirect): OOM/DoS via unchecked vector deserialization (GO-2025-4087). Upgrade to v0.18.1+ or v0.19.2+.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` around lines 5 - 206, The go.mod pins vulnerable dependencies: bump
github.com/ethereum/go-ethereum to v1.16.8 or newer, update
github.com/cometbft/cometbft to at least v0.38.21 (or v0.38.19+ as needed),
upgrade github.com/cosmos/cosmos-sdk to v0.50.14, and raise
github.com/consensys/gnark-crypto to v0.18.1+ or v0.19.2+; then run go get
<module>@<version> (or edit the require blocks), run go mod tidy to fix indirect
deps, rebuild and run tests to ensure compatibility, and address any API changes
surfaced by functions/types in those modules.
internal/testrunner/schema/schema.sql-36-47 (1)

36-47: ⚠️ Potential issue | 🟠 Major

Add an FK for plugin_policies.plugin_id.

Without a foreign key, orphaned policies can be created. Consider referencing plugins(id) (and cascading on delete if that’s the intended lifecycle).

🛠️ Suggested constraint
 CREATE TABLE plugin_policies (
     id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
     public_key TEXT NOT NULL,
-    plugin_id plugin_id NOT NULL,
+    plugin_id plugin_id NOT NULL REFERENCES plugins(id) ON DELETE CASCADE,
     plugin_version TEXT NOT NULL,
     policy_version INTEGER NOT NULL,
     signature TEXT NOT NULL,
     recipe TEXT NOT NULL,
     active BOOLEAN NOT NULL DEFAULT true,
     created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
     updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
 );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/testrunner/schema/schema.sql` around lines 36 - 47, Add a foreign
key constraint on plugin_policies.plugin_id referencing plugins(id) to prevent
orphaned policy rows; modify the CREATE TABLE or add an ALTER TABLE for
plugin_policies to reference plugins(id) (optionally WITH ON DELETE CASCADE if
policies should be removed when a plugin is deleted) and ensure the constraint
name is descriptive (e.g., plugin_policies_plugin_id_fkey).
deploy/production/ingress.yaml-6-48 (1)

6-48: ⚠️ Potential issue | 🟠 Major

TLS secret/host mismatch with cert-manager resources.

The Ingress uses secretName: server-tls and hosts tests.plugins.vultisig.com / tests-dev.plugins.vultisig.com, but the cert-manager manifests define secretName: wildcard-test-plugins-tls and wildcard DNS under *.test.plugins... / *.test-dev.plugins.... This mismatch will prevent valid TLS issuance/termination. Please align the Ingress hosts and TLS secret with the Certificate resources (or update the Certificates to match these hosts).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/production/ingress.yaml` around lines 6 - 48, The Ingress TLS settings
are mismatched with your cert-manager Certificate resources: the Ingress uses
secretName: server-tls and hosts tests.plugins.vultisig.com /
tests-dev.plugins.vultisig.com while the Certificates use secretName:
wildcard-test-plugins-tls and wildcard DNS entries; fix by making the Ingress
TLS secretName and host entries match the Certificate resources (or update the
Certificate resources to use secretName: server-tls and explicit hosts
tests.plugins.vultisig.com and tests-dev.plugins.vultisig.com), targeting the
Ingress named "server" in namespace "plugin-tests-dev" and the cert-manager
secret names (wildcard-test-plugins-tls or server-tls) so cert-manager can
issue/terminate TLS correctly.
deploy/production/secrets.yaml-1-133 (1)

1-133: ⚠️ Potential issue | 🟠 Major

Production secrets are placeholders—add guardrails.

These manifests include "CHANGE_ME" placeholders for production namespaces. If applied as-is, services will start with invalid credentials. Consider moving this to a *-example.yaml, using ExternalSecrets/SealedSecrets, or enforcing a CI gate that rejects CHANGE_ME values before deployment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/production/secrets.yaml` around lines 1 - 133, The production Secret
manifests contain "CHANGE_ME" placeholders for Secret resources (names:
postgres, redis, minio, encryption, jwt, vault, ghcr-pull-secret) and must not
be deployable as-is; update by moving these manifests to a *-example.yaml (or
remove production entries), replace the production Secret definitions with
references to SealedSecrets/ExternalSecrets (or Kubernetes Secret templates that
expect real values from your secret backend), and add a CI pre-deploy validation
that scans Secret fields (e.g., stringData.dsn, stringData.password,
stringData.uri, stringData["root-user"], stringData["root-password"],
stringData.secret, stringData["vault-b64"], data[".dockerconfigjson"]) to fail
the pipeline if any value equals "CHANGE_ME" or "CHANGE_ME_BASE64" to prevent
accidental production rollout.
deploy/production/redis.yaml-16-39 (1)

16-39: ⚠️ Potential issue | 🟠 Major

Harden Redis container security context.

The deployment doesn’t constrain privilege escalation or root usage. Add a security context to reduce blast radius.

🔒 Suggested securityContext hardening
     spec:
       containers:
         - name: redis
           image: redis:7
+          securityContext:
+            allowPrivilegeEscalation: false
+            runAsNonRoot: true
+            runAsUser: 1000
           command: ["redis-server", "--requirepass", "$(REDIS_PASSWORD)"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/production/redis.yaml` around lines 16 - 39, Add a hardened
securityContext to the Redis container (the entry under containers with name:
redis) to prevent privilege escalation and running as root: set
securityContext.allowPrivilegeEscalation: false, securityContext.runAsNonRoot:
true with securityContext.runAsUser: 1000 (or a non-root UID used in your
environment), securityContext.readOnlyRootFilesystem: true, and drop
capabilities via securityContext.capabilities.drop: ["ALL"]; apply these keys
under the same container spec (next to env/ports/readinessProbe) so the redis
container is constrained from escalating privileges or using root.
deploy/production/server.yaml-16-76 (1)

16-76: ⚠️ Potential issue | 🟠 Major

Harden server container security context.

The deployment doesn’t constrain privilege escalation or root usage. Add a securityContext to reduce risk.

🔒 Suggested securityContext hardening
       containers:
         - name: server
           image: ghcr.io/vultisig/plugin-tests/server:latest
+          securityContext:
+            allowPrivilegeEscalation: false
+            runAsNonRoot: true
+            runAsUser: 1000
           command: ["/app/main"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/production/server.yaml` around lines 16 - 76, Add a securityContext to
harden the server container: under the container with name "server" set a
container-level securityContext that disables privilege escalation
(allowPrivilegeEscalation: false), drops all capabilities (capabilities.drop:
["ALL"]), enforces a non-root user (runAsNonRoot: true and runAsUser: a non-zero
UID), and enables a read-only root filesystem (readOnlyRootFilesystem: true);
optionally also add a pod-level securityContext to set fsGroup/runAsUser
defaults if needed and ensure image user matches the chosen runAsUser before
deploying.
deploy/production/rbac.yaml-13-41 (1)

13-41: ⚠️ Potential issue | 🟠 Major

ClusterRole permissions are overly broad for a worker.

This role can create/delete namespaces and create secrets and workloads cluster‑wide. If the worker is compromised, it can affect the whole cluster. Please scope to the minimum required resources/verbs (prefer namespaced Roles/RoleBindings) and avoid namespace delete unless strictly necessary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/production/rbac.yaml` around lines 13 - 41, The ClusterRole
plugin-tests-worker grants overly broad, cluster-wide permissions (e.g.,
creating/deleting namespaces, creating secrets and workloads); change it to the
minimum necessary by replacing the ClusterRole with a namespaced Role (or create
a separate Role per target namespace) and use RoleBinding(s) instead of
ClusterRole/ClusterRoleBinding, remove the "delete" verb for namespaces, remove
cluster-wide "namespaces" create permission, narrow "secrets" and workload
permissions to only required verbs and specific resources (e.g., limit to
get/create if needed) and restrict resources to the specific namespaces the
worker operates in; update the role named plugin-tests-worker and its rules for
apiGroups [""] / ["apps"] / ["batch"] / ["networking.k8s.io"] accordingly so it
only allows the exact verbs and resources required.
deploy/production/worker.yaml-20-22 (1)

20-22: ⚠️ Potential issue | 🟠 Major

Avoid :latest for production images.

Using latest makes deployments non‑reproducible and can introduce untested changes. Pin to a version or digest produced by your build pipeline.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/production/worker.yaml` around lines 20 - 22, The deployment uses an
unpinned image for the container named "worker" (image:
ghcr.io/vultisig/plugin-tests/worker:latest); change the image reference to a
reproducible tag or digest provided by your build pipeline (for example replace
":latest" with a CI-produced semantic version tag or an immutable digest) so the
"worker" container image is pinned and deployments are reproducible.
deploy/production/postgres.yaml-16-48 (1)

16-48: ⚠️ Potential issue | 🟠 Major

Harden the Postgres pod/container security context.

This pod runs with default privileges. Recommend explicitly setting non‑root execution and disabling privilege escalation to reduce blast radius.

🔒 Suggested hardening
     spec:
+      securityContext:
+        runAsNonRoot: true
+        runAsUser: 999
+        runAsGroup: 999
+        fsGroup: 999
       containers:
         - name: postgres
           image: postgres:15
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/production/postgres.yaml` around lines 16 - 48, The Postgres container
is running with default privileges; update the deployment to add securityContext
settings to the Pod and the container for the postgres container (name:
postgres) to enforce non‑root and restrict privileges: set pod-level
securityContext runAsNonRoot: true and a runAsUser (e.g., 999), and add
container-level securityContext with runAsNonRoot: true, runAsUser matching the
pod, allowPrivilegeEscalation: false, capabilities: drop: ["ALL"], and
readOnlyRootFilesystem: true (and optionally seccompProfile:type:
RuntimeDefault) to the container spec to harden the Postgres pod.
deploy/production/worker.yaml-15-23 (1)

15-23: ⚠️ Potential issue | 🟠 Major

Harden the worker pod/container security context.

Default privileges allow escalation and root execution. Please lock this down explicitly.

🔒 Suggested hardening
     spec:
+      securityContext:
+        runAsNonRoot: true
       containers:
         - name: worker
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/production/worker.yaml` around lines 15 - 23, The worker pod currently
has no security context; add explicit pod- and container-level hardened settings
to prevent root and privilege escalation: under spec add a podSecurityContext
with fsGroup and runAsNonRoot if needed, and inside the containers entry for
name: worker add a securityContext that sets runAsNonRoot: true, runAsUser to a
non-root uid, allowPrivilegeEscalation: false, readOnlyRootFilesystem: true,
drop capabilities (capabilities.drop: ["ALL"]), and a seccompProfile (e.g.,
type: RuntimeDefault) to lock down syscalls; ensure these keys are placed under
the existing spec and the containers -> name: worker block so Kubernetes will
enforce them.
internal/testrunner/tests.go-108-115 (1)

108-115: ⚠️ Potential issue | 🟠 Major

Add HTTP status assertions to avoid false positives.

Several tests only check connectivity and ignore non‑2xx responses, so failures can be missed. Please assert expected status codes (e.g., 2xx for health/endpoints; specific expected codes where appropriate).

✅ Example pattern to apply
 	resp, err := s.pluginCli.GET("/healthz")
 	if err != nil {
 		return fmt.Errorf("plugin unreachable: %w", err)
 	}
 	defer resp.Body.Close()
+	if resp.StatusCode < 200 || resp.StatusCode >= 300 {
+		return fmt.Errorf("expected 2xx, got %d", resp.StatusCode)
+	}
 	return nil

Also applies to: 203-224, 238-263

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/testrunner/tests.go` around lines 108 - 115, The test only verifies
connectivity for PluginHealth but not the HTTP status, which can mask non-2xx
responses; update the s.run("PluginHealth", ...) handler to assert the response
status (e.g., ensure resp.StatusCode is in the 2xx range or equals
http.StatusOK) and return an error including the status code (and body snippet
if useful) when it is not expected; apply the same pattern to the other test
handlers that call s.pluginCli.GET/POST (the blocks around lines 203-224 and
238-263) so every HTTP call checks resp.StatusCode and fails the test on
non-success codes.
internal/api/handler.go-184-194 (1)

184-194: ⚠️ Potential issue | 🟠 Major

Validate status filter values to avoid DB enum errors.

buildFilterParams accepts any string and passes it to the DB layer; invalid values will likely throw DB errors and return 500. Validate against the known status set and return a 400 on invalid input.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/handler.go` around lines 184 - 194, buildFilterParams currently
accepts any status string and forwards it to the DB causing enum errors; update
it to validate the incoming status against the known set of allowed values (use
the existing queries.TestRunStatus constants or an explicit map/set of allowed
strings) before constructing queries.NullTestRunStatus. Change buildFilterParams
to return (filterParams, error) (or an additional validation error) so callers
can return a 400 Bad Request on invalid status; inside the function, if the
trimmed c.QueryParam("status") is not in the allowed set, return a descriptive
validation error so the handler can respond with HTTP 400 instead of allowing an
invalid queries.TestRunStatus to reach the DB. Ensure you still construct
queries.NullTestRunStatus only when the value is valid.
deploy/production/minio.yaml-16-46 (1)

16-46: ⚠️ Potential issue | 🟠 Major

Harden the MinIO container security context.

The container currently allows privilege escalation and runs as root. Add a restrictive securityContext to reduce blast radius in production.

Note: The upstream minio/minio:latest image defaults to root (UID 0), so validate that running as non-root is compatible with your storage setup. Include fsGroup: 1000 for proper volume mount permissions if using the suggested UID.

🔒 Example hardening
       containers:
         - name: minio
           image: minio/minio:latest
+          securityContext:
+            allowPrivilegeEscalation: false
+            runAsNonRoot: true
+            runAsUser: 1000
+            runAsGroup: 1000
+            fsGroup: 1000
+            seccompProfile:
+              type: RuntimeDefault
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/production/minio.yaml` around lines 16 - 46, Add a restrictive
securityContext to the MinIO container block (container name "minio") to prevent
privilege escalation and avoid running as root: set
securityContext.allowPrivilegeEscalation to false, set
securityContext.runAsNonRoot to true (and runAsUser/runAsGroup to the suggested
UID/GID if compatible, e.g., 1000), and add pod-level securityContext.fsGroup:
1000 for proper volume permissions; ensure these fields are added next to the
existing container definition and validate the image supports the chosen UID
before deploying.
internal/worker/manifests.go-238-291 (1)

238-291: ⚠️ Potential issue | 🟠 Major

Secrets are embedded in ConfigMaps/env vars.
JWT secrets, encryption secrets, and S3 credentials are rendered into ConfigMaps and plain env vars, which is a security risk in shared clusters. Consider moving them to Kubernetes Secrets and referencing via secretKeyRef / mounted volumes.

Also applies to: 510-544

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/worker/manifests.go` around lines 238 - 291, verifierConfigMap and
workerConfigMap currently serialize sensitive values (v.Server.JWTSecret,
v.EncryptionSecret, w.VaultService.EncryptionSecret,
BlockStorage.AccessKey/Secret, DB DSN, etc.) into ConfigMaps/envs; change these
to read secrets from Kubernetes Secrets instead: create Secret objects for
JWT/encryption/S3/DB credentials and then update verifierConfigMap and
workerConfigMap to not inline sensitive fields but reference them via
secretKeyRef (for env vars) or mount the Secret as a file and populate the JSON
from those files at startup (or inject non-sensitive placeholders into the
ConfigMap and assemble full config at runtime), ensuring json.MarshalIndent no
longer includes secret values.
internal/worker/consumer.go-40-136 (1)

40-136: ⚠️ Potential issue | 🟠 Major

Namespace cleanup is fully disabled.
The deferred cleanup path is always skipped, which will leak namespaces/resources in steady state. Please re‑enable cleanup or gate it behind an explicit config/debug flag before release.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/worker/consumer.go` around lines 40 - 136, The deferred namespace
cleanup is currently skipped; add a config flag (e.g.,
Consumer.cfg.EnableNamespaceCleanup bool) and modify the defer to run cleanup
only when createdNS is true and cfg.EnableNamespaceCleanup is true: restore the
commented cleanup block inside the defer but wrap it in a conditional that
checks c.cfg.EnableNamespaceCleanup, create a timeout context
(context.WithTimeout) for deleteNamespace(ctx, c.k8s, nsName), call
deleteNamespace and log errors with log.WithError(delErr).Warn(...) or
log.Info(...) on success; ensure the defer closes over nsName, c.k8s and log
correctly and that the new config flag is honored when constructing Consumer.
internal/testrunner/seeder.go-45-109 (1)

45-109: ⚠️ Potential issue | 🟠 Major

Guard against a nil fixture before seeding the vault token.
SeedDatabase dereferences s.config.Fixture without validation, which will panic if config is incomplete.

🛠️ Suggested fix
 func (s *Seeder) SeedDatabase(ctx context.Context) error {
+	if s.config.Fixture == nil {
+		return fmt.Errorf("fixture is required for seeding")
+	}
+	if s.config.Fixture.Vault.PublicKey == "" {
+		return fmt.Errorf("fixture vault public_key is required")
+	}
 	pool, err := pgxpool.New(ctx, s.config.DSN)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/testrunner/seeder.go` around lines 45 - 109, SeedDatabase currently
dereferences s.config.Fixture and s.config.Fixture.Vault when calling
queries.UpsertVaultToken, which panics if Fixture is nil; update
Seeder.SeedDatabase to guard against this by checking if s.config.Fixture == nil
|| s.config.Fixture.Vault == nil before constructing the UpsertVaultToken params
and either return a descriptive error (e.g., fmt.Errorf("missing Fixture.Vault
in config")) or skip vault-token seeding with a logged warning; ensure the check
is placed just before the now := time.Now() / tokenExpiry block so
UpsertVaultToken is never called with a nil pointer.
internal/testrunner/participant.go-159-181 (1)

159-181: ⚠️ Potential issue | 🟠 Major

Add a timeout to the delete‑existing‑plugin call.

http.DefaultClient has no timeout, and the caller passes context.Background() (line 73), so this can hang indefinitely on network issues. Add both a context deadline and client timeout:

🛠️ Suggested fix
 func deleteExistingPlugin(ctx context.Context, verifierURL, jwtToken, pluginID string, logger *logrus.Logger) error {
+	ctx, cancel := context.WithTimeout(ctx, 15*time.Second)
+	defer cancel()
 	url := verifierURL + "/plugin/" + pluginID
 	req, err := http.NewRequestWithContext(ctx, http.MethodDelete, url, nil)
 	if err != nil {
 		return fmt.Errorf("failed to create request: %w", err)
 	}
 	req.Header.Set("Authorization", "Bearer "+jwtToken)
 
-	resp, err := http.DefaultClient.Do(req)
+	client := &http.Client{Timeout: 15 * time.Second}
+	resp, err := client.Do(req)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/testrunner/participant.go` around lines 159 - 181, The
deleteExistingPlugin function can hang because it uses http.DefaultClient with
no timeout and the passed ctx may be background; wrap the incoming ctx with a
deadline (e.g., context.WithTimeout and defer cancel) and use that derived
context when creating the request, and create a local http.Client with a
sensible Timeout (instead of http.DefaultClient) and call client.Do(req); keep
existing error handling and resp.Body.Close() and still limit reading with
maxErrorResponseBytes.
internal/testrunner/seeder.go-120-149 (1)

120-149: ⚠️ Potential issue | 🟠 Major

Distinguish bucket creation errors—ignore only "already owned" errors, not all failures.

Currently, both CreateBucket calls treat all errors identically (log at Debug level and continue), which masks credential/endpoint configuration issues and falsely reports successful seeding even if no buckets were created.

The AWS SDK v1 distinguishes between BucketAlreadyOwnedByYou (safe to ignore—bucket already exists in your account) and BucketAlreadyExists (critical—bucket name taken globally) via error code inspection. Other errors (credentials, network, endpoint) should propagate.

Refactor to check error codes and return non-idempotent errors:

Example pattern (AWS SDK v1)
if err != nil {
    if aerr, ok := err.(awserr.Error); ok {
        switch aerr.Code() {
        case s3.ErrCodeBucketAlreadyOwnedByYou:
            // Idempotent: bucket already exists in your account
            return nil
        }
    }
    return fmt.Errorf("failed to create bucket: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/testrunner/seeder.go` around lines 120 - 149, In SeedVaults, don't
treat all CreateBucket errors as ignorable: after each s3Client.CreateBucket
call in Seeder.SeedVaults, assert the error is an awserr.Error and inspect
aerr.Code() and only ignore s3.ErrCodeBucketAlreadyOwnedByYou (idempotent); for
any other code including s3.ErrCodeBucketAlreadyExists or non-awserr errors,
return a wrapped error (e.g., fmt.Errorf("failed to create bucket %s: %w",
bucketName, err)) so credential/endpoint/network issues propagate; apply this
check for both CreateBucket calls that create s.config.S3.Bucket and
"vultisig-dca".
🟡 Minor comments (7)
deploy/minikube/postgres.yaml-16-40 (1)

16-40: ⚠️ Potential issue | 🟡 Minor

Complete the securityContext hardening with explicit UID/GID for postgres:15.
The postgres:15 image creates a postgres user (UID/GID 999) but doesn't set it as the default USER. Using runAsNonRoot: true alone will fail; you must specify the numeric UID explicitly.

🛡️ Correct hardening for postgres:15
         - name: postgres
           image: postgres:15
           imagePullPolicy: IfNotPresent
+          securityContext:
+            allowPrivilegeEscalation: false
+            runAsNonRoot: true
+            runAsUser: 999
+            runAsGroup: 999
+            fsGroup: 999
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/minikube/postgres.yaml` around lines 16 - 40, The container spec for
the postgres container (name: postgres, image: postgres:15) must include an
explicit securityContext to harden UID/GID; add a securityContext block on the
postgres container with runAsUser: 999, runAsGroup: 999, fsGroup: 999 and keep
runAsNonRoot: true so the pod runs as the numeric postgres user/group used by
the image; update the container spec (the postgres container entry) to include
these fields.
internal/testrunner/mpc/wrappers.go-138-145 (1)

138-145: ⚠️ Potential issue | 🟡 Minor

Guard negative index before the uint32 cast in the edDSA path.

When a negative index is cast to uint32, it wraps silently (e.g., -1 becomes 4294967295), which can select the wrong party without any error. While current callers pass non-negative indices, defensive validation prevents this risk.

🛡️ Suggested guard
 func (w *Wrapper) SignSessionMessageReceiver(h Handle, message []byte, index int) (string, error) {
 	if w.isEdDSA {
+		if index < 0 {
+			return "", fmt.Errorf("index must be non-negative")
+		}
 		b, err := eddsaSession.SchnorrSignSessionMessageReceiver(eddsaSession.Handle(h), message, uint32(index))
 		return string(b), err
 	}
 	b, err := session.DklsSignSessionMessageReceiver(session.Handle(h), message, index)
 	return string(b), err
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/testrunner/mpc/wrappers.go` around lines 138 - 145, The edDSA branch
of SignSessionMessageReceiver casts index to uint32 without validating it, which
silently wraps negative values; before calling
eddsaSession.SchnorrSignSessionMessageReceiver validate index >= 0 and return an
error if negative (do not perform the uint32 conversion when invalid), otherwise
cast to uint32 and call
eddsaSession.SchnorrSignSessionMessageReceiver(eddsaSession.Handle(h), message,
uint32(index)); keep the non-edDSA path unchanged.
internal/storage/postgres/sqlc/test_runs.sql-11-13 (1)

11-13: ⚠️ Potential issue | 🟡 Minor

Guard against empty-string filters being treated as real values.

sqlc.narg only ignores NULL. If callers pass "" for plugin_id or status, these queries will filter by empty string and return no rows. Ensure callers coerce empty strings to NULL (e.g., sql.NullString{Valid:false}), or adjust the SQL to treat '' as NULL.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/storage/postgres/sqlc/test_runs.sql` around lines 11 - 13, The query
in test_runs uses sqlc.narg('plugin_id') and sqlc.narg('status') which treat
only NULL as absent, so empty-string filters will be applied; update the WHERE
clauses to treat empty strings as NULL by wrapping the sqlc.narg calls with
NULLIF(...,'') (and keep the status cast to test_run_status), e.g., use
NULLIF(sqlc.narg('plugin_id')::text,'') in place of sqlc.narg('plugin_id')::text
and NULLIF(sqlc.narg('status')::text,'')::test_run_status for the status check
so empty strings are ignored the same as NULL.
internal/api/templates/list.html-6-18 (1)

6-18: ⚠️ Potential issue | 🟡 Minor

Add accessible labels for filter controls.

The <select> elements lack labels; screen readers won’t have descriptive names. Add aria-label or explicit <label for>.

🧩 Minimal change
-  <select name="plugin_id">
+  <select name="plugin_id" aria-label="Plugin">
...
-  <select name="status">
+  <select name="status" aria-label="Status">
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/templates/list.html` around lines 6 - 18, The two filter selects
in the form (select name="plugin_id" and select name="status" inside the filters
form) lack accessible labels; add descriptive accessible names by either adding
explicit <label for="..."> elements that reference new ids on the selects (e.g.,
id="plugin_id" and id="status") or by adding aria-label attributes (e.g.,
aria-label="Plugin filter" and aria-label="Status filter") to the select
elements so screen readers can identify each control (update the template where
range .PluginIDs and range .Statuses are rendered).
cmd/testrunner/main.go-123-152 (1)

123-152: ⚠️ Potential issue | 🟡 Minor

Inconsistent handling of ENCRYPTION_SECRET across subcommands.

In runSeed (line 55), ENCRYPTION_SECRET is fetched via requireEnv (fatal on empty). In runIntegration (line 133), it uses os.Getenv directly, making it silently optional. If the integration flow actually requires the encryption secret (e.g., for decrypt/verify steps), this will produce confusing downstream errors instead of a clear "variable not set" message at startup.

If it's truly optional here, consider adding a comment explaining why. Otherwise, use requireEnv.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/testrunner/main.go` around lines 123 - 152, runIntegration currently
reads ENCRYPTION_SECRET with os.Getenv while runSeed uses requireEnv, causing
inconsistent behavior; decide whether the secret is required and then make
handling consistent: if required, replace the os.Getenv call in runIntegration
with requireEnv("ENCRYPTION_SECRET") and pass that into
testrunner.InstallConfig.EncryptionSecret; if optional, add a clear comment
above the os.Getenv call explaining why it's optional and ensure downstream code
checks for empty EncryptionSecret before using it (see functions runIntegration,
runSeed, requireEnv, and the InstallConfig.EncryptionSecret field).
cmd/worker/main.go-71-75 (1)

71-75: ⚠️ Potential issue | 🟡 Minor

srv.Run error after graceful shutdown may trigger Fatal exit.

When ctx is cancelled, the goroutine at line 74 calls srv.Shutdown(). If srv.Run(mux) subsequently returns an error (e.g., asynq returns a sentinel error on shutdown), logger.Fatal at line 99 will call os.Exit(1), bypassing defers (like db.Close() on line 46).

Consider checking whether the error is due to intentional shutdown before calling Fatal:

Proposed approach
 	err = srv.Run(mux)
 	if err != nil {
+		if ctx.Err() != nil {
+			logger.Info("worker stopped")
+			return
+		}
 		logger.WithError(err).Fatal("worker failed")
 	}

Also applies to: 92-100

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/worker/main.go` around lines 71 - 75, The srv.Run(mux) error handling
should ignore expected shutdown errors so logger.Fatal doesn't call os.Exit and
skip defers (e.g., db.Close). Update the post-run error path to check the
returned error from srv.Run(mux) with errors.Is(..., context.Canceled) and also
against the server shutdown sentinel (e.g., asynq.ErrServerClosed) before
calling logger.Fatal; for shutdown-related errors log an Info or Debug and
return normally so deferred cleanup (like db.Close) runs, but only use
logger.Fatal for unexpected errors. Ensure you reference srv.Run(mux),
srv.Shutdown(), ctx.Done(), logger.Fatal and db.Close when making the change.
internal/testrunner/client.go-104-115 (1)

104-115: ⚠️ Potential issue | 🟡 Minor

Properly handle response bodies to prevent connection leaks in polling loops.

net/http can return a non-nil response even when an error occurs (e.g., from redirect policy failures, though those already have closed bodies). More importantly, when a response is successfully received but the status is not OK, the body should be fully consumed or explicitly closed before the next iteration to allow connection reuse.

🛠️ Suggested fix
		resp, err := c.GET("/plugins")
+		if resp != nil {
+			defer resp.Body.Close()
+		}
 		if err == nil {
-			resp.Body.Close()
 			if resp.StatusCode == http.StatusOK {
 				return nil
 			}
 		}

Alternatively, consider draining the body to enable HTTP keep-alive:

		resp, err := c.GET("/plugins")
 		if err == nil {
+			defer io.Copy(io.Discard, resp.Body)
 			resp.Body.Close()
 			if resp.StatusCode == http.StatusOK {
 				return nil
 			}
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/testrunner/client.go` around lines 104 - 115, In
TestClient.WaitForHealth, ensure you always close or fully drain resp.Body to
avoid connection leaks: after calling c.GET("/plugins") handle the case where
err != nil but resp != nil by draining (io.Copy(io.Discard, resp.Body)) and then
resp.Body.Close(), and likewise when resp.StatusCode != http.StatusOK drain and
close before sleeping; update the logic in WaitForHealth to call this
drain+close pattern for any non-nil resp and still close bodies on the success
path as well.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe25e99 and 30141f0.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (67)
  • Makefile
  • cmd/server/main.go
  • cmd/testrunner/main.go
  • cmd/worker/main.go
  • config/config.go
  • deploy/minikube/minio.yaml
  • deploy/minikube/namespace.yaml
  • deploy/minikube/postgres.yaml
  • deploy/minikube/rbac.yaml
  • deploy/minikube/redis.yaml
  • deploy/minikube/server.yaml
  • deploy/minikube/worker.yaml
  • deploy/production/cert-manager.yaml
  • deploy/production/configmaps.yaml
  • deploy/production/ingress.yaml
  • deploy/production/minio.yaml
  • deploy/production/namespace.yaml
  • deploy/production/postgres.yaml
  • deploy/production/rbac.yaml
  • deploy/production/redis.yaml
  • deploy/production/secrets.yaml
  • deploy/production/server.yaml
  • deploy/production/worker.yaml
  • go.mod
  • internal/api/artifacts.go
  • internal/api/handler.go
  • internal/api/response.go
  • internal/api/results.go
  • internal/api/server.go
  • internal/api/templates/detail.html
  • internal/api/templates/layout.html
  • internal/api/templates/list.html
  • internal/queue/redis_test.go
  • internal/queue/tasks.go
  • internal/storage/postgres/migrations/20260216000001_create_test_runs.sql
  • internal/storage/postgres/queries/test_runs.sql.go
  • internal/storage/postgres/schema/schema.sql
  • internal/storage/postgres/sqlc/test_runs.sql
  • internal/testrunner/client.go
  • internal/testrunner/db/db.go
  • internal/testrunner/db/models.go
  • internal/testrunner/db/seeder.sql.go
  • internal/testrunner/fixture.json
  • internal/testrunner/fixtures.go
  • internal/testrunner/jwt.go
  • internal/testrunner/jwt_test.go
  • internal/testrunner/mpc/wrappers.go
  • internal/testrunner/participant.go
  • internal/testrunner/policy.go
  • internal/testrunner/queries/seeder.sql
  • internal/testrunner/schema/schema.sql
  • internal/testrunner/seeder.go
  • internal/testrunner/sqlc.yaml
  • internal/testrunner/tests.go
  • internal/types/test_run_test.go
  • internal/worker/artifacts.go
  • internal/worker/consumer.go
  • internal/worker/janitor.go
  • internal/worker/k8s.go
  • internal/worker/k8s_test.go
  • internal/worker/manifests.go
  • internal/worker/manifests_test.go
  • internal/worker/naming.go
  • internal/worker/naming_test.go
  • internal/worker/runner.go
  • internal/worker/runner_test.go
  • scripts/fixture-gen/main.go
💤 Files with no reviewable changes (2)
  • internal/storage/postgres/migrations/20260216000001_create_test_runs.sql
  • internal/storage/postgres/schema/schema.sql

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
internal/worker/manifests_test.go (2)

161-170: TestSmokeJob has an undocumented TTL-zero sentinel and missing container assertions.

Two issues:

  1. Passing 0 for TTL and asserting nil on TTLSecondsAfterFinished relies on an implicit convention that 0 means "disabled" — which inverts Kubernetes semantics (K8s treats ttlSecondsAfterFinished: 0 as "clean up immediately"). A brief comment or a named constant (e.g., noTTL = 0) would make this contract explicit.

  2. Unlike TestSeederJob, this sub-test never asserts the container name ("testrunner"), the container image ("testrunner:v1"), or that c.Env is non-empty. These omissions leave the smoke-job builder under-tested.

💡 Suggested additions
 func TestSmokeJob(t *testing.T) {
 	envVars := testrunnerEnvVars(testK8sCfg)
 	job := smokeJob("ns", "testrunner:v1", "my-secret", nil, envVars, 0, nil)

 	assert.Equal(t, "smoke", job.Name)
 	require.Len(t, job.Spec.Template.Spec.Containers, 1)
-	assert.Equal(t, []string{"smoke"}, job.Spec.Template.Spec.Containers[0].Args)
-	assert.Nil(t, job.Spec.TTLSecondsAfterFinished)
+	c := job.Spec.Template.Spec.Containers[0]
+	assert.Equal(t, "testrunner", c.Name)
+	assert.Equal(t, "testrunner:v1", c.Image)
+	assert.Equal(t, []string{"smoke"}, c.Args)
+	assert.NotEmpty(t, c.Env)
+	// TTL=0 is treated as "no TTL" — job.Spec.TTLSecondsAfterFinished is left nil
+	// (K8s TTL=0 would mean immediate deletion, so 0 acts as a sentinel for disabled).
+	assert.Nil(t, job.Spec.TTLSecondsAfterFinished)
 	require.Len(t, job.Spec.Template.Spec.ImagePullSecrets, 1)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/worker/manifests_test.go` around lines 161 - 170, TestSmokeJob
relies on an undocumented TTL-zero sentinel and omits container assertions:
introduce a named constant (e.g., noTTL = 0) and use it when calling
smokeJob("ns", "testrunner:v1", "my-secret", nil, envVars, noTTL, nil) or add a
brief comment near the call to document that 0 means "disabled" for the test,
then update assertions around TTLSecondsAfterFinished on the returned job.Spec
to reflect the intended "disabled" contract; additionally, extend the test
assertions for the first container (job.Spec.Template.Spec.Containers[0]) to
check its Name equals "testrunner", Image equals "testrunner:v1", and that Env
(c.Env) is non-empty to match the coverage of TestSeederJob.

172-187: POSTGRES_DSN and MINIO_ENDPOINT are only checked for key presence, not value.

Unlike ENCRYPTION_SECRET, JWT_SECRET, PLUGIN_ENDPOINT, and VERIFIER_URL, which all assert specific values, POSTGRES_DSN and MINIO_ENDPOINT are validated only with assert.Contains. Adding value assertions (e.g., confirming the DSN points to the in-namespace postgres host and Minio to http://minio:9000) would catch mis-wiring between the manifest builder and the expected in-cluster topology.

💡 Suggested additions
 assert.Contains(t, envMap, "POSTGRES_DSN")
+assert.Contains(t, envMap["POSTGRES_DSN"], "postgres")   // should target the in-namespace postgres service
 assert.Contains(t, envMap, "MINIO_ENDPOINT")
+assert.Equal(t, "http://minio:9000", envMap["MINIO_ENDPOINT"])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/worker/manifests_test.go` around lines 172 - 187, The test
TestTestrunnerEnvVars currently only checks that POSTGRES_DSN and MINIO_ENDPOINT
keys exist; update it to also assert their expected values by reading envMap
(built from testrunnerEnvVars) — e.g., assert that envMap["MINIO_ENDPOINT"]
equals "http://minio:9000" and that envMap["POSTGRES_DSN"] targets the
in-cluster postgres host (assert.Contains t, envMap["POSTGRES_DSN"],
"postgres:5432" or assert.Contains t, envMap["POSTGRES_DSN"], "@postgres") so
the manifest builder wiring is validated.
internal/api/handler.go (2)

22-28: Normalize new request fields alongside existing trims.
Keeps PluginEndpoint / PluginAPIKey consistent with other request fields in case they’re used elsewhere later.

♻️ Suggested change
 req.Version = strings.TrimSpace(req.Version)
 req.RequestedBy = strings.TrimSpace(req.RequestedBy)
+req.PluginEndpoint = strings.TrimSpace(req.PluginEndpoint)
+req.PluginAPIKey = strings.TrimSpace(req.PluginAPIKey)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/handler.go` around lines 22 - 28, The new fields PluginEndpoint
and PluginAPIKey on CreateTestRunRequest must be normalized (trimmed) like the
other request fields; find where the request is parsed/validated (code handling
CreateTestRunRequest, e.g., the CreateTestRun handler or a normalize/validate
helper) and apply the same strings.TrimSpace normalization to PluginEndpoint and
PluginAPIKey as is done for PluginID, ProposalID, Version, and RequestedBy so
they remain consistent if used elsewhere.

188-209: Consider case-insensitive status filtering for client ergonomics.
Allowing queued / Queued avoids unnecessary 400s without changing behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/handler.go` around lines 188 - 209, The status filter in
buildFilterParams currently rejects values with different casing because it
looks up the raw query string in validStatuses; normalize the incoming status
(e.g., v := strings.ToUpper(strings.TrimSpace(c.QueryParam("status")))) before
validating against validStatuses and before constructing
queries.NullTestRunStatus/queries.TestRunStatus so inputs like "queued" or
"Queued" are accepted while preserving the same stored enum values; update the
validation and the TestRunStatus assignment to use the normalized uppercase
value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/api/handler.go`:
- Around line 72-80: The payload currently stores PluginAPIKey in plain text on
TestRunPayload.PluginAPIKey before queuing; remove sensitive data from the
queued payload or encrypt it before marshaling. Fix by (A) deleting PluginAPIKey
from queue payload construction in the handler that builds TestRunPayload and
instead store the key in a secure store (e.g., secrets table or vault) keyed by
RunID/PluginID, then have the consumer lookup the secret after dequeuing; or (B)
if you must include it, encrypt TestRunPayload.PluginAPIKey with a service
encryption key prior to enqueueing and decrypt in the consumer, ensuring
trimming/validation (strings.TrimSpace) happens before encryption and never
logging the decrypted value. Ensure changes touch the code paths that build and
enqueue TestRunPayload and the consumer path that processes it so the consumer
can retrieve/decrypt the secret by RunID/PluginID.
- Around line 151-155: The offset parsed via strconv.Atoi is cast to int32 and
can overflow; add const maxQueryOffset = 1<<31 - 1 and validate the parsed
offset before casting: if parsed > maxQueryOffset return a 400 error (bad
request) saying offset too large. Apply this check in both places where offset
is parsed and cast to int32 (the code that sets QueryOffset and the other
occurrence in results.go), ensuring you reject oversized offsets instead of
allowing the int32 cast to wrap.

---

Nitpick comments:
In `@internal/api/handler.go`:
- Around line 22-28: The new fields PluginEndpoint and PluginAPIKey on
CreateTestRunRequest must be normalized (trimmed) like the other request fields;
find where the request is parsed/validated (code handling CreateTestRunRequest,
e.g., the CreateTestRun handler or a normalize/validate helper) and apply the
same strings.TrimSpace normalization to PluginEndpoint and PluginAPIKey as is
done for PluginID, ProposalID, Version, and RequestedBy so they remain
consistent if used elsewhere.
- Around line 188-209: The status filter in buildFilterParams currently rejects
values with different casing because it looks up the raw query string in
validStatuses; normalize the incoming status (e.g., v :=
strings.ToUpper(strings.TrimSpace(c.QueryParam("status")))) before validating
against validStatuses and before constructing
queries.NullTestRunStatus/queries.TestRunStatus so inputs like "queued" or
"Queued" are accepted while preserving the same stored enum values; update the
validation and the TestRunStatus assignment to use the normalized uppercase
value.

In `@internal/worker/manifests_test.go`:
- Around line 161-170: TestSmokeJob relies on an undocumented TTL-zero sentinel
and omits container assertions: introduce a named constant (e.g., noTTL = 0) and
use it when calling smokeJob("ns", "testrunner:v1", "my-secret", nil, envVars,
noTTL, nil) or add a brief comment near the call to document that 0 means
"disabled" for the test, then update assertions around TTLSecondsAfterFinished
on the returned job.Spec to reflect the intended "disabled" contract;
additionally, extend the test assertions for the first container
(job.Spec.Template.Spec.Containers[0]) to check its Name equals "testrunner",
Image equals "testrunner:v1", and that Env (c.Env) is non-empty to match the
coverage of TestSeederJob.
- Around line 172-187: The test TestTestrunnerEnvVars currently only checks that
POSTGRES_DSN and MINIO_ENDPOINT keys exist; update it to also assert their
expected values by reading envMap (built from testrunnerEnvVars) — e.g., assert
that envMap["MINIO_ENDPOINT"] equals "http://minio:9000" and that
envMap["POSTGRES_DSN"] targets the in-cluster postgres host (assert.Contains t,
envMap["POSTGRES_DSN"], "postgres:5432" or assert.Contains t,
envMap["POSTGRES_DSN"], "@postgres") so the manifest builder wiring is
validated.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30141f0 and 3b2397f.

📒 Files selected for processing (6)
  • internal/api/handler.go
  • internal/testrunner/jwt.go
  • internal/testrunner/schema/schema.sql
  • internal/testrunner/tests.go
  • internal/worker/manifests_test.go
  • internal/worker/naming_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/testrunner/jwt.go
  • internal/testrunner/tests.go
  • internal/testrunner/schema/schema.sql

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
internal/api/handler.go (2)

189-210: Consider making status filter case-insensitive for better UX.
Normalizing to uppercase before validation would let clients pass queued, Queued, etc., without 400s.

Suggested tweak
 	if v := strings.TrimSpace(c.QueryParam("status")); v != "" {
+		v = strings.ToUpper(v)
 		if !validStatuses[v] {
 			return fp, fmt.Errorf("invalid status: %s", v)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/handler.go` around lines 189 - 210, The status filter is
case-sensitive and rejects values like "queued" or "Queued"; update
buildFilterParams to normalize the status query param to a canonical form before
validating and assigning it: trim and convert the value to strings.ToUpper(v)
(or equivalent) prior to checking against validStatuses and prior to setting
queries.NullTestRunStatus/TestRunStatus so the stored value uses the uppercase
constant; ensure you reference validStatuses, buildFilterParams,
queries.NullTestRunStatus and queries.TestRunStatus when making the change.

73-81: Trim plugin fields alongside other request fields for consistency.
Right now PluginEndpoint/PluginAPIKey are trimmed only at payload construction. Consider trimming them with the rest of the request to keep req consistently sanitized and avoid double TrimSpace.

Suggested refactor
@@
 	req.PluginID = strings.TrimSpace(req.PluginID)
 	req.ProposalID = strings.TrimSpace(req.ProposalID)
 	req.Version = strings.TrimSpace(req.Version)
 	req.RequestedBy = strings.TrimSpace(req.RequestedBy)
+	req.PluginEndpoint = strings.TrimSpace(req.PluginEndpoint)
+	req.PluginAPIKey = strings.TrimSpace(req.PluginAPIKey)
@@
 	payload := queue.TestRunPayload{
 		RunID:          result.ID.String(),
 		PluginID:       req.PluginID,
 		ProposalID:     req.ProposalID,
 		Version:        req.Version,
 		Suite:          defaultSuite,
 		RequestedBy:    req.RequestedBy,
-		PluginEndpoint: strings.TrimSpace(req.PluginEndpoint),
-		PluginAPIKey:   strings.TrimSpace(req.PluginAPIKey),
+		PluginEndpoint: req.PluginEndpoint,
+		PluginAPIKey:   req.PluginAPIKey,
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/api/handler.go` around lines 73 - 81, Trim PluginEndpoint and
PluginAPIKey on the incoming request object instead of only when building the
payload: locate where the request variable req is unmarshalled/validated (the
handler that constructs queue.TestRunPayload) and apply strings.TrimSpace to
req.PluginEndpoint and req.PluginAPIKey early (alongside the other sanitized
fields) so req is consistently sanitized; then remove the redundant TrimSpace
calls in the queue.TestRunPayload construction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/api/handler.go`:
- Around line 189-210: The status filter is case-sensitive and rejects values
like "queued" or "Queued"; update buildFilterParams to normalize the status
query param to a canonical form before validating and assigning it: trim and
convert the value to strings.ToUpper(v) (or equivalent) prior to checking
against validStatuses and prior to setting
queries.NullTestRunStatus/TestRunStatus so the stored value uses the uppercase
constant; ensure you reference validStatuses, buildFilterParams,
queries.NullTestRunStatus and queries.TestRunStatus when making the change.
- Around line 73-81: Trim PluginEndpoint and PluginAPIKey on the incoming
request object instead of only when building the payload: locate where the
request variable req is unmarshalled/validated (the handler that constructs
queue.TestRunPayload) and apply strings.TrimSpace to req.PluginEndpoint and
req.PluginAPIKey early (alongside the other sanitized fields) so req is
consistently sanitized; then remove the redundant TrimSpace calls in the
queue.TestRunPayload construction.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b2397f and d40a3a6.

📒 Files selected for processing (2)
  • internal/api/handler.go
  • internal/api/results.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/api/results.go

@webpiratt webpiratt merged commit 6c33f8a into main Feb 25, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants