Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 4 minutes and 8 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (37)
📝 WalkthroughWalkthrough实现rginx控制面的完整第0到9阶段系统,包含Axum Web API服务、PostgreSQL/Dragonfly状态存储层、多服务业务逻辑、Postgres数据库迁移、Docker容器化部署、Vue 3前端console应用、边缘节点代理程序、以及架构文档与配置规范。 Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR introduces a new “control plane platform” including backend services (API/worker/store/types), a node agent, a Vue-based console, Docker/Compose delivery consolidation, and Postgres bootstrap migrations/docs.
Changes:
- Added control-plane Rust crates (API/service/store/worker/types) plus node agent and shared domain/types.
- Added Vue 3 console (routing/views/composables/styles) to interact with the control-plane APIs and SSE streams.
- Added Dockerfile/Compose-based delivery + Postgres bootstrap migrations and operational/ADR documentation.
Reviewed changes
Copilot reviewed 104 out of 115 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| web/console/src/views/AuditView.vue | Adds audit log list/detail UI with filters and auth handling |
| web/console/src/style.css | Adds base console styling and responsive layout |
| web/console/src/router/index.ts | Adds console routes (dashboard/audit/nodes/revisions/deployments) |
| web/console/src/main.ts | Boots Vue app with router and global styles |
| web/console/src/lib/display.ts | Adds formatting helpers (dates/lists/bools/stream labels) |
| web/console/src/composables/useNodeDetailStream.ts | Adds SSE-driven node detail streaming composable |
| web/console/src/components/MetricCard.vue | Adds reusable metric display component |
| web/console/src/App.vue | Adds root RouterView host |
| web/console/package.json | Adds console package scripts and dependencies |
| web/console/index.html | Adds console HTML entry |
| scripts/run-nginx-compare-docker.sh | Updates nginx-compare build to use root Dockerfile target |
| migrations/postgres/0009_control_plane_phase8_deployments.sql | Adds deployment orchestration tables/columns/indexes |
| migrations/postgres/0008_control_plane_phase7_revisions.sql | Adds revision/draft metadata and drafts table |
| migrations/postgres/0007_control_plane_phase6_snapshots.sql | Adds node snapshot persistence |
| migrations/postgres/0006_control_plane_phase5_nodes.sql | Extends nodes table with runtime/status fields |
| migrations/postgres/0005_control_plane_auth_seed.sql | Seeds local users/roles and auth bootstrap audit logs |
| migrations/postgres/0004_control_plane_auth_schema.sql | Adds auth schema and request_id to audit logs |
| migrations/postgres/0003_control_plane_seed.sql | Seeds minimal cluster/node/revision/deployment/audit data |
| migrations/postgres/0002_control_plane_phase3_schema.sql | Adds heartbeats and audit log tables |
| migrations/postgres/0001_control_plane_bootstrap.sql | Adds base cluster/node/revision/deployment tables and indexes |
| migrations/README.md | Documents migration phases and initdb ordering |
| docker/nginx-compare/Dockerfile | Removes standalone Dockerfile in favor of root multi-target Dockerfile |
| docker/control-plane/rginx-control-entrypoint.sh | Adds unified entrypoint to run api/worker from one image |
| docker/README.md | Documents unified Docker/Compose approach and operational conventions |
| deploy/control-plane/systemd/rginx-node-agent.service | Adds systemd unit for node agent on edge nodes |
| deploy/control-plane/README.md | Clarifies deploy assets scope (edge node references only) |
| crates/rginx-node-agent/src/main.rs | Adds node agent binary entrypoint |
| crates/rginx-node-agent/src/config.rs | Adds node agent env-based configuration |
| crates/rginx-node-agent/Cargo.toml | Adds node agent crate manifest |
| crates/rginx-control-worker/src/worker.rs | Adds worker loop (tick report + ctrl-c shutdown) |
| crates/rginx-control-worker/src/main.rs | Adds worker binary entrypoint wiring store/services |
| crates/rginx-control-worker/src/config.rs | Adds worker env-based configuration |
| crates/rginx-control-worker/Cargo.toml | Adds worker crate manifest |
| crates/rginx-control-types/src/revisions.rs | Adds revision/draft/diff DTOs and enums |
| crates/rginx-control-types/src/nodes.rs | Adds node DTOs and lifecycle enum |
| crates/rginx-control-types/src/meta.rs | Adds meta DTOs and API version constant |
| crates/rginx-control-types/src/lib.rs | Exposes control-plane DTO modules |
| crates/rginx-control-types/src/events.rs | Adds SSE event DTOs |
| crates/rginx-control-types/src/deployments.rs | Adds deployment/task DTOs and enums |
| crates/rginx-control-types/src/dashboard.rs | Adds dashboard summary DTO |
| crates/rginx-control-types/src/auth.rs | Adds auth DTOs and role enum |
| crates/rginx-control-types/src/audit.rs | Adds audit DTOs |
| crates/rginx-control-types/src/alerts.rs | Adds alert DTOs and severity enum |
| crates/rginx-control-types/Cargo.toml | Adds shared types crate manifest |
| crates/rginx-control-store/src/lib.rs | Exposes store config/repositories/deployment types |
| crates/rginx-control-store/src/dragonfly.rs | Adds Dragonfly keyspace helper + tests |
| crates/rginx-control-store/src/config.rs | Adds env-based Postgres/Dragonfly config |
| crates/rginx-control-store/Cargo.toml | Adds store crate manifest |
| crates/rginx-control-service/src/worker.rs | Adds worker service tick aggregation/reporting |
| crates/rginx-control-service/src/nodes.rs | Adds node service (agent register/heartbeat/snapshot/detail/reconcile) |
| crates/rginx-control-service/src/metrics.rs | Adds Prometheus text exposition rendering |
| crates/rginx-control-service/src/meta.rs | Adds meta service |
| crates/rginx-control-service/src/lib.rs | Wires domain services into ControlPlaneServices |
| crates/rginx-control-service/src/health.rs | Adds health service with dependency checks |
| crates/rginx-control-service/src/error.rs | Adds service error model |
| crates/rginx-control-service/src/dashboard.rs | Adds dashboard service + unit test |
| crates/rginx-control-service/src/config.rs | Adds service config and env parsing |
| crates/rginx-control-service/src/auth.rs | Adds auth service (sessions, pbkdf2, audits, RBAC) |
| crates/rginx-control-service/src/audit.rs | Adds audit service (list/get) with limit validation |
| crates/rginx-control-service/src/alerts.rs | Adds derived alerts service |
| crates/rginx-control-service/Cargo.toml | Adds service crate manifest |
| crates/rginx-control-api/src/state.rs | Adds API AppState (bind addr, agent token, ui dir, services) |
| crates/rginx-control-api/src/routes/users.rs | Adds users routes (list/create) with RBAC guard |
| crates/rginx-control-api/src/routes/revisions.rs | Adds revision/draft routes + diff/validate/publish |
| crates/rginx-control-api/src/routes/nodes.rs | Adds nodes routes (list/detail) |
| crates/rginx-control-api/src/routes/mod.rs | Adds router wiring for all API endpoints |
| crates/rginx-control-api/src/routes/metrics.rs | Adds /metrics route returning Prometheus text |
| crates/rginx-control-api/src/routes/meta.rs | Adds /api/v1/meta route |
| crates/rginx-control-api/src/routes/health.rs | Adds /healthz route |
| crates/rginx-control-api/src/routes/events.rs | Adds SSE event stream route with overview/node/deployment ticks |
| crates/rginx-control-api/src/routes/deployments.rs | Adds deployments routes (list/get/create/pause/resume) |
| crates/rginx-control-api/src/routes/dashboard.rs | Adds dashboard route |
| crates/rginx-control-api/src/routes/auth.rs | Adds auth routes (login/logout/me) |
| crates/rginx-control-api/src/routes/audit.rs | Adds audit log routes (list/get) |
| crates/rginx-control-api/src/routes/alerts.rs | Adds alerts route |
| crates/rginx-control-api/src/routes/agent.rs | Adds agent routes (register/heartbeat/tasks/snapshots) |
| crates/rginx-control-api/src/request_context.rs | Adds request context extractor (request id, idempotency, UA, remote addr) |
| crates/rginx-control-api/src/middleware.rs | Adds request logging + x-request-id response header |
| crates/rginx-control-api/src/main.rs | Adds API binary entrypoint and server startup |
| crates/rginx-control-api/src/error.rs | Adds API error envelope + ServiceError mapping |
| crates/rginx-control-api/src/config.rs | Adds API env-based configuration |
| crates/rginx-control-api/src/auth.rs | Adds auth/role guards + agent guard |
| crates/rginx-control-api/src/app.rs | Adds router build incl. console static serving fallback |
| crates/rginx-control-api/Cargo.toml | Adds API crate manifest |
| compose.yaml | Adds unified Compose topology (postgres/dragonfly/control-api/control-worker) |
| adr/README.md | Adds ADR index |
| adr/ADR-0003-node-agent-pull-model.md | Documents agent pull-based task model decision |
| adr/ADR-0002-control-plane-state-boundary.md | Documents Postgres truth + Dragonfly cache boundary |
| adr/ADR-0001-control-plane-monorepo-boundary.md | Documents monorepo boundary decision |
| README.md | Updates repo structure + control-plane quickstart/docs references |
| Dockerfile | Adds multi-stage build for control-plane + nginx-compare target |
| Cargo.toml | Pins workspace members and adds shared dependencies |
| CONTROL_PLANE_ENVIRONMENT_VARIABLES.md | Documents control-plane env var naming conventions |
| CONTROL_PLANE_DRAGONFLY_KEYSPACE.md | Documents Dragonfly keyspace conventions |
| CONTROL_PLANE_BACKUP_AND_RECOVERY.md | Documents backup and recovery procedures |
| CONTROL_PLANE_API_CONVENTIONS.md | Documents API conventions (paths/errors/envelopes/idempotency) |
| .env.example | Adds example env for Compose + agent sample vars |
Files not reviewed (1)
- web/console/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 6
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (32)
ARCHITECTURE_CONTROL_PLANE_IMPLEMENTATION_PLAN.md-381-387 (1)
381-387:⚠️ Potential issue | 🟠 Major控制面容器拓扑与文档已声明的收口状态冲突。
Line 385 仍把
console列为独立容器,但 Line 156-157 已声明 Console 并入control-api镜像。建议统一为“当前基线拓扑”,避免运维按过期架构落地。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ARCHITECTURE_CONTROL_PLANE_IMPLEMENTATION_PLAN.md` around lines 381 - 387, 文档中“控制面容器”拓扑与前文已声明的基线不一致:在第156-157行已说明 console 已并入 control-api,但本节仍把 `console` 列为独立容器;请统一为当前基线拓扑,移除或合并重复条目,将列表改为仅包含 `control-api`(含 console)、`control-worker`、`postgres`、`dragonfly`,并在本节说明 console 已并入 `control-api` 镜像以避免运维按过期架构实施。CONTROL_PLANE_ENVIRONMENT_VARIABLES.md-122-157 (1)
122-157:⚠️ Potential issue | 🟠 MajorNode Agent 前缀规范与示例变量命名不一致。
Line 122 规定前缀为
RGINX_NODE_AGENT_,但 Line 143-157 示例主要是RGINX_NODE_*等命名。该不一致会直接误导配置落地,建议明确“Node 基础身份变量”和“Agent 进程变量”的双前缀边界。建议修订方向
-### 2.6 Node Agent +### 2.6 Node / Node Agent -前缀: -- `RGINX_NODE_AGENT_` +前缀: +- `RGINX_NODE_`(节点身份与运行信息) +- `RGINX_NODE_AGENT_`(agent 轮询、超时、心跳等进程参数)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CONTROL_PLANE_ENVIRONMENT_VARIABLES.md` around lines 122 - 157, 文档中存在前缀不一致:头部说明用的是 RGINX_NODE_AGENT_,但示例大多为 RGINX_NODE_*,请把“Node 基础身份变量”与“Agent 进程变量”明确区分并统一示例;具体操作是在 CONTROL_PLANE_ENVIRONMENT_VARIABLES.md 中将身份类变量保持为前缀 RGINX_NODE_(例如 RGINX_NODE_ID、RGINX_CLUSTER_ID、RGINX_NODE_ADVERTISE_ADDR、RGINX_NODE_ROLE、RGINX_NODE_RUNNING_VERSION、RGINX_NODE_LIFECYCLE_STATE、RGINX_CONTROL_PLANE_ORIGIN、RGINX_ADMIN_SOCKET、RGINX_NODE_BINARY、RGINX_NODE_CONFIG_PATH、RGINX_NODE_CONFIG_BACKUP_DIR、RGINX_NODE_CONFIG_STAGING_DIR),而把 agent 运行时/监控类变量改为或示例为 RGINX_NODE_AGENT_*(例如 RGINX_NODE_AGENT_HEARTBEAT_SECS、RGINX_NODE_AGENT_REQUEST_TIMEOUT_SECS、RGINX_NODE_AGENT_TASK_POLL_SECS),并在文档正文添加一句明确说明“RGINX_NODE_ 用于节点静态/身份信息;RGINX_NODE_AGENT_ 用于 agent 运行时/周期性配置”以消除歧义。README.md-216-221 (1)
216-221:⚠️ Potential issue | 🟠 Major默认账号与共享令牌说明缺少强警示,存在误部署风险。
Line 216-221 建议加入“仅限本地开发、上线前必须轮换”的明确提示,避免默认凭据进入可达环境。
🛡️ 建议补充文案
- `admin` / `change-me-now` - `operator` / `change-me-now` - `viewer` / `change-me-now` + +> ⚠️ 以上仅用于本地开发演示。任何可达网络环境在首次启动后都必须立即修改默认账号口令, +> 并轮换 `RGINX_CONTROL_AGENT_SHARED_TOKEN`。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 216 - 221, Add a strong, prominent warning to the README around the listed default credentials (`admin`/`change-me-now`, `operator`/`change-me-now`, `viewer`/`change-me-now`) and the `RGINX_CONTROL_AGENT_SHARED_TOKEN` note stating these are for local development only and must be rotated/replaced before any deployment to reachable/staging/production environments; place the warning immediately above or next to those lines and use clear wording like "FOR LOCAL DEVELOPMENT ONLY — MUST ROTATE BEFORE DEPLOYMENT" so operators cannot miss it.crates/rginx-control-api/src/routes/metrics.rs-20-24 (1)
20-24:⚠️ Potential issue | 🟠 Major
/metrics错误响应泄露内部异常细节。Line 23 直接返回
{error}会把内部信息暴露给调用方。建议仅记录日志,对外返回固定错误文本。🔒 建议修复
use axum::{ extract::State, http::{HeaderValue, StatusCode, header}, response::{IntoResponse, Response}, }; +use tracing::warn; @@ - Err(error) => ( - StatusCode::SERVICE_UNAVAILABLE, - [(header::CONTENT_TYPE, HeaderValue::from_static("text/plain; charset=utf-8"))], - format!("metrics collection failed: {error}"), - ) - .into_response(), + Err(error) => { + warn!(error = %error, "metrics collection failed"); + ( + StatusCode::SERVICE_UNAVAILABLE, + [(header::CONTENT_TYPE, HeaderValue::from_static("text/plain; charset=utf-8"))], + "metrics collection failed".to_string(), + ) + .into_response() + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-api/src/routes/metrics.rs` around lines 20 - 24, The handler currently returns the internal error string in the response via format!("metrics collection failed: {error}") which leaks implementation details; instead, log the actual error (e.g., with tracing::error! or the crate's logger) and change the response body to a fixed, generic message like "metrics collection failed" while keeping the same StatusCode::SERVICE_UNAVAILABLE and CONTENT_TYPE header (i.e., replace format!("metrics collection failed: {error}") with a constant string and add a logging call that records the original error).migrations/postgres/0005_control_plane_auth_seed.sql-18-45 (1)
18-45:⚠️ Potential issue | 🟠 Major不要在迁移中固化可登录默认账户的密码哈希
这里直接落库固定
password_hash且is_active=true,会把“默认账号能力”永久写进初始化链路,存在明显的安全暴露面。建议改为运行时注入(一次性 bootstrap 密钥/哈希)或首次启动强制重置口令后再激活账号。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@migrations/postgres/0005_control_plane_auth_seed.sql` around lines 18 - 45, The migration seeds hardcoded password_hashes and sets is_active=true for default accounts ('usr_local_admin','usr_local_operator','usr_local_viewer'), which leaks credentials; remove the fixed password_hash values and avoid activating accounts in the migration (e.g., insert NULL/empty for password_hash and set is_active=false or omit is_active), and update the INSERT block (the tuples for those user_ids) to leave password_hash blank/null and is_active false so account activation and password creation are performed at runtime via a secure bootstrap or first-run reset process (add a comment in the migration pointing to the runtime bootstrap flow).crates/rginx-control-api/src/routes/alerts.rs-14-22 (1)
14-22:⚠️ Potential issue | 🟠 Major返回体与 API envelope 约定不一致
这里直接返回
Vec裸数组,会让该路由与既定{"data": ...}结构脱节,前端和 SDK 会出现特例处理。建议统一为 envelope。💡 建议补丁
use axum::{Json, extract::State}; +use serde::Serialize; @@ use crate::state::AppState; + +#[derive(Serialize)] +pub struct AlertsListResponse { + pub data: Vec<ControlPlaneAlertSummary>, +} @@ -) -> ApiResult<Json<Vec<ControlPlaneAlertSummary>>> { +) -> ApiResult<Json<AlertsListResponse>> { @@ - Ok(Json(alerts)) + Ok(Json(AlertsListResponse { data: alerts })) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-api/src/routes/alerts.rs` around lines 14 - 22, The route currently returns a bare Vec<ControlPlaneAlertSummary> (Json<Vec<...>>) which breaks the API envelope convention; change the handler to return the standard envelope (e.g., Json<ApiEnvelope<Vec<ControlPlaneAlertSummary>>> or whatever project envelope type is used) and wrap the alerts value under the "data" key before Ok(Json(...)); update the return type and the Ok(Json(...)) construction, keeping the same call to state.services().alerts().list_current_alerts().await and preserving the ApiError mapping that uses request_context.request_id..env.example-27-30 (1)
27-30:⚠️ Potential issue | 🟠 Major示例中的默认密钥过弱,容易被误用于真实环境。
Line 27 和 Line 29 的占位值可预测,若被直接沿用会导致会话与 agent 鉴权强度不足。建议改成“必须替换”的哨兵值,并在文档中明确生产环境启动前校验。
💡 建议修复(示例模板)
-RGINX_CONTROL_AUTH_SESSION_SECRET=change-me-for-local-dev +RGINX_CONTROL_AUTH_SESSION_SECRET=__REQUIRED_CHANGE_ME__ @@ -RGINX_CONTROL_AGENT_SHARED_TOKEN=change-me-for-node-agent +RGINX_CONTROL_AGENT_SHARED_TOKEN=__REQUIRED_CHANGE_ME__🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.env.example around lines 27 - 30, The example .env uses weak, predictable placeholders for RGINX_CONTROL_AUTH_SESSION_SECRET and RGINX_CONTROL_AGENT_SHARED_TOKEN; replace those values with explicit sentinel strings like MUST_REPLACE_AUTH_SESSION_SECRET and MUST_REPLACE_AGENT_SHARED_TOKEN (or REPLACE_ME_...) to make it obvious they must be changed, update RGINX_CONTROL_AUTH_SESSION_TTL_SECS/RGINX_CONTROL_NODE_OFFLINE_THRESHOLD_SECS comments if needed, and add a short note in the README/docs stating that these sentinel values must be replaced before production and that the application should validate these env vars at startup (e.g., fail fast if they still equal the sentinel).migrations/postgres/0003_control_plane_seed.sql-95-103 (1)
95-103:⚠️ Potential issue | 🟠 Major种子心跳时间写死会导致环境随时间退化。
Line 95 和 Line 102 使用固定时间戳,后续任意日期初始化都会产生“过期心跳”,节点可能在启动后立即被判为离线。
💡 建议修复(改为相对时间)
- '2026-04-16T08:59:30Z'::timestamptz, + now() - interval '30 seconds', @@ - '2026-04-16T08:58:30Z'::timestamptz, + now() - interval '90 seconds',🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@migrations/postgres/0003_control_plane_seed.sql` around lines 95 - 103, The seed rows that include the tuples with 'edge-sz-01', '/run/rginx/admin.sock', 'draining' are using fixed timestamps ('2026-04-16T08:59:30Z' and '2026-04-16T08:58:30Z') which will age; replace those literal timestamp strings with dynamic expressions such as now() AT TIME ZONE 'UTC' (or current_timestamp) optionally minus a small interval (e.g., now() - interval '30 seconds') and keep the ::timestamptz cast so the inserted heartbeat times are relative and fresh on every migration.migrations/postgres/0008_control_plane_phase7_revisions.sql-10-35 (1)
10-35:⚠️ Potential issue | 🟠 Major避免把历史 revision 的
config_text全量回填成同一模板。Line 10 到 Line 35 会把所有
config_text=''的历史记录写成同一段文本;由于新列默认空串,这通常会覆盖全部既有 revision 的“源配置语义”,后续编辑/审计会失真。💡 建议修复(限制回填范围)
update cp_config_revisions set config_text = $phase7_seed_revision$ @@ $phase7_seed_revision$ -where config_text = ''; +where config_text = '' + and revision_id = 'rev_local_0001';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@migrations/postgres/0008_control_plane_phase7_revisions.sql` around lines 10 - 35, 当前 migration 的 UPDATE 会把所有 config_text = '' 的历史记录统一回写为 $phase7_seed_revision$,需限制回填范围以免覆盖既有 revision 的源配置语义;修改 cp_config_revisions 上的语句,保留使用 $phase7_seed_revision$ 的内容但加上更具体的 WHERE 条件(例如只针对特定 revision id、特定 created_at/created_by、或只对尚不存在该 seed 的单一记录),或改为插入一条新的 seed revision(INSERT ... ON CONFLICT)而不是全表 UPDATE,从而只影响目标 seed 行而非所有空字符串记录。crates/rginx-control-worker/src/worker.rs-21-47 (1)
21-47:⚠️ Potential issue | 🟠 Major为
collect_tick_report()增加超时,避免停机信号被长时间阻塞。Line 21 当前是直接 await;一旦底层调用(reconcile_deployments、reconcile_stale_nodes、load_runtime_context 等)卡住,
ctrl_c分支(Line 46)要等该 await 返回后才有机会执行,优雅停机会失效。💡 建议修复(示例)
- match services.worker().collect_tick_report().await { + match tokio::time::timeout( + config.poll_interval, + services.worker().collect_tick_report() + ).await { + Ok(Ok(report)) => { tracing::info!( concurrency = config.concurrency, service = %report.service_name, known_nodes = report.known_nodes, active_deployments = report.active_deployments, offline_reconciled_nodes = report.offline_reconciled_nodes, dispatched_targets = report.dispatched_targets, finalized_deployments = report.finalized_deployments, rollback_deployments_created = report.rollback_deployments_created, postgres = %report.postgres_endpoint, dragonfly = %report.dragonfly_endpoint, "control worker heartbeat" ); } + Ok(Err(error)) => { - Err(error) => { tracing::warn!( concurrency = config.concurrency, error = %error, "control worker failed to collect runtime context" ); } + Err(_) => { + tracing::warn!( + concurrency = config.concurrency, + "control worker tick timed out" + ); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-worker/src/worker.rs` around lines 21 - 47, The call to services.worker().collect_tick_report().await must be wrapped with a timeout so a stuck collect_tick_report (which calls reconcile_deployments, reconcile_stale_nodes, load_runtime_context, etc.) cannot block the ctrl_c branch; replace the direct await with tokio::time::timeout(<duration>, services.worker().collect_tick_report()).await, handle the Timeout error case by logging a warning/error (similar to the existing Err arm) and continue the loop, and ensure tokio::time::timeout is imported and the timeout duration comes from a config value (e.g., config.tick_timeout) or a defined constant; keep the existing success and Err(error) handling for the inner result.crates/rginx-control-api/src/middleware.rs-36-38 (1)
36-38:⚠️ Potential issue | 🟠 Major日志直接记录
remote_addr存在隐私合规风险。建议默认不输出原始 IP(改为哈希/截断,或仅在 debug 级别开启),以减少敏感信息暴露面。
Also applies to: 47-49
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-api/src/middleware.rs` around lines 36 - 38, The log currently prints request_context.remote_addr directly (and similarly at lines 47-49); change logging to avoid raw IPs by either hashing/truncating the value or only emitting the full remote_addr at debug level. Update the middleware logging call that uses request_context.remote_addr and request_context.user_agent (and the equivalent later logging block) to compute a redacted_remote (e.g., SHA256 or mask last octets) or conditionalize the field on log::max_level()/debug, and log that redacted_remote instead of the raw remote_addr.crates/rginx-control-worker/src/config.rs-14-23 (1)
14-23:⚠️ Potential issue | 🟠 Major“正整数”约束未真正生效,需显式拒绝 0。
当前仅做了解析,
0会被接受,但错误提示写的是正整数。建议在解析后显式校验concurrency > 0且poll_interval_secs > 0。🔧 建议修复
-use anyhow::{Context, Result}; +use anyhow::{Context, Result, ensure}; pub fn from_env() -> Result<Self> { - let concurrency = env::var("RGINX_CONTROL_WORKER_CONCURRENCY") + let concurrency: usize = env::var("RGINX_CONTROL_WORKER_CONCURRENCY") .unwrap_or_else(|_| "2".to_string()) .parse() .context("RGINX_CONTROL_WORKER_CONCURRENCY should be a positive integer")?; - let poll_interval_secs = env::var("RGINX_CONTROL_WORKER_POLL_INTERVAL_SECS") + ensure!(concurrency > 0, "RGINX_CONTROL_WORKER_CONCURRENCY should be > 0"); + + let poll_interval_secs: u64 = env::var("RGINX_CONTROL_WORKER_POLL_INTERVAL_SECS") .unwrap_or_else(|_| "5".to_string()) .parse() .context("RGINX_CONTROL_WORKER_POLL_INTERVAL_SECS should be a positive integer")?; + ensure!(poll_interval_secs > 0, "RGINX_CONTROL_WORKER_POLL_INTERVAL_SECS should be > 0"); Ok(Self { concurrency, poll_interval: Duration::from_secs(poll_interval_secs) }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-worker/src/config.rs` around lines 14 - 23, The parsing currently accepts 0 even though the error messages say "positive integer"; after parsing the values for RGINX_CONTROL_WORKER_CONCURRENCY into concurrency and RGINX_CONTROL_WORKER_POLL_INTERVAL_SECS into poll_interval_secs (in the config construction in config.rs), add explicit checks that concurrency > 0 and poll_interval_secs > 0 and return a contexted error if not (mirroring the existing .context messages), before creating Self with Duration::from_secs(poll_interval_secs); ensure the error uses the same descriptive context as the parse errors.web/console/src/views/AuditView.vue-61-66 (1)
61-66:⚠️ Potential issue | 🟠 Major未授权错误处理不一致,且状态码检测方式脆弱
getMe()的失败分支(89-94 行)缺少resetUnauthorized()重置,导致 token 失效时页面不会跳转,用户停留在错误状态;而loadAuditLogs()的失败分支(61-66 行)有此重置。另外,两处都使用
String(caught).includes("401"/"403")判定,这种字符串匹配方式脆弱且不稳定。应改为结构化的caught.response?.status === 401判定,并在getMe()的 catch 块中补充resetUnauthorized()调用。另外
limit解析(55 行)的Number.parseInt(filters.limit, 10) || 50允许负数透传,建议增加边界验证。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/console/src/views/AuditView.vue` around lines 61 - 66, The catch handling is inconsistent and brittle: in loadAuditLogs() you call resetUnauthorized() on string-matched 401/403 but getMe()'s catch lacks that, and both use String(caught).includes(...) which is fragile; change both catch blocks to check structured status via caught?.response?.status === 401 || caught?.response?.status === 403 and call resetUnauthorized() when that condition is true (for getMe() and loadAuditLogs()), otherwise set error.value = extractApiErrorMessage(caught). Also harden the limit parsing by replacing Number.parseInt(filters.limit, 10) || 50 with a bounded check that parses an integer, ensures it's a positive number within acceptable range (e.g., >0), and falls back to 50 if parsing fails or value is non-positive.crates/rginx-control-service/src/health.rs-17-25 (1)
17-25:⚠️ Potential issue | 🟠 Major健康检查范围过窄,可能误报
ok。Line 18-22 只检查了 Postgres,就直接在 Line 24 返回
ok。如果 Dragonfly 或其他关键依赖不可用,这里仍会返回健康,可能导致错误流量继续进入。建议把所有控制面关键依赖纳入同一 readiness 判定。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-service/src/health.rs` around lines 17 - 25, get_service_health currently only calls dependency_repository().ensure_postgres_ready() and returns ok even if other control-plane dependencies (e.g. Dragonfly) are down; update get_service_health to call readiness checks for all critical dependencies via dependency_repository() (e.g., ensure_postgres_ready(), ensure_dragonfly_ready(), etc.), aggregate any errors into a single ServiceError::DependencyUnavailable (including each dependency name and error), and only construct ServiceHealth { service: self.service_name.clone(), status: "ok".to_string() } when all checks succeed.crates/rginx-control-api/src/config.rs-20-21 (1)
20-21:⚠️ Potential issue | 🟠 Major共享令牌缺少“非空”校验,存在安全配置风险。
Line 20-21 仅要求环境变量存在;若值为空字符串或全空白,会进入运行态,等价于弱化鉴权密钥约束。
建议修复(diff)
use anyhow::{Context, Result}; @@ - let agent_shared_token = env::var("RGINX_CONTROL_AGENT_SHARED_TOKEN") - .context("RGINX_CONTROL_AGENT_SHARED_TOKEN is required")?; + let agent_shared_token = env::var("RGINX_CONTROL_AGENT_SHARED_TOKEN") + .context("RGINX_CONTROL_AGENT_SHARED_TOKEN is required")?; + if agent_shared_token.trim().is_empty() { + anyhow::bail!("RGINX_CONTROL_AGENT_SHARED_TOKEN must not be empty"); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-api/src/config.rs` around lines 20 - 21, The code reads RGINX_CONTROL_AGENT_SHARED_TOKEN into agent_shared_token but only checks presence, allowing empty or whitespace-only values; update the logic around the env::var("RGINX_CONTROL_AGENT_SHARED_TOKEN") call (the agent_shared_token assignment in config.rs) to validate that the returned string is non-empty after trimming (reject empty or all-whitespace), and if invalid return an error with a clear context message (e.g., using .context or mapping to a failure) so the configuration load fails rather than accepting a blank shared token.crates/rginx-node-agent/src/config.rs-9-17 (1)
9-17:⚠️ Potential issue | 🟠 Major避免在配置
Debug中泄露control_plane_agent_token配置对象派生
Debug会把 agent token 直接输出到日志,建议移除自动Debug或手写脱敏实现。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-node-agent/src/config.rs` around lines 9 - 17, The struct NodeAgentConfig derives Debug which will print sensitive control_plane_agent_token; remove the automatic #[derive(Debug)] from NodeAgentConfig (or replace it with a manual impl of std::fmt::Debug) and implement a custom Debug for NodeAgentConfig that redacts or omits control_plane_agent_token while preserving other fields when formatting, ensuring any logs or panics won't expose the token.crates/rginx-control-api/src/state.rs-7-12 (1)
7-12:⚠️ Potential issue | 🟠 Major避免在
Debug输出中暴露敏感令牌
AppState自动派生Debug会把agent_shared_token暴露到日志/错误上下文,存在密钥泄露风险。建议移除自动Debug,改为手写并对令牌脱敏。🔐 建议修复
-#[derive(Debug, Clone)] +#[derive(Clone)] pub struct AppState { api_bind_addr: SocketAddr, agent_shared_token: Arc<str>, ui_dir: Option<Arc<PathBuf>>, services: Arc<ControlPlaneServices>, } + +impl std::fmt::Debug for AppState { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + f.debug_struct("AppState") + .field("api_bind_addr", &self.api_bind_addr) + .field("agent_shared_token", &"<redacted>") + .field("ui_dir", &self.ui_dir) + .field("services", &self.services) + .finish() + } +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-api/src/state.rs` around lines 7 - 12, The AppState struct currently derives Debug and will print sensitive agent_shared_token; remove #[derive(Debug)] from AppState and implement a custom Debug for AppState that prints api_bind_addr, ui_dir, and services normally but redacts or masks agent_shared_token (e.g., show "<redacted>" or only last 4 chars) to avoid leaking the token; update any places relying on derived Debug to use the new impl for AppState.crates/rginx-node-agent/src/config.rs-31-48 (1)
31-48:⚠️ Potential issue | 🟠 Major“正整数”校验与实现不一致,当前会接受
0秒这里的 parse 仅保证是整数,没有保证
> 0。heartbeat/task_poll/request_timeout为0会带来明显可靠性风险。⏱️ 建议修复
+fn parse_positive_secs(var: &str, default: &str) -> Result<Duration> { + let raw = env::var(var).unwrap_or_else(|_| default.to_string()); + let secs: u64 = raw + .parse() + .with_context(|| format!("{var} should be a positive integer"))?; + if secs == 0 { + anyhow::bail!("{var} should be > 0"); + } + Ok(Duration::from_secs(secs)) +} + impl NodeAgentConfig { pub fn from_env() -> Result<Self> { - let heartbeat_interval = Duration::from_secs( - env::var("RGINX_NODE_AGENT_HEARTBEAT_SECS") - .unwrap_or_else(|_| "10".to_string()) - .parse() - .context("RGINX_NODE_AGENT_HEARTBEAT_SECS should be a positive integer")?, - ); - let request_timeout = Duration::from_secs( - env::var("RGINX_NODE_AGENT_REQUEST_TIMEOUT_SECS") - .unwrap_or_else(|_| "5".to_string()) - .parse() - .context("RGINX_NODE_AGENT_REQUEST_TIMEOUT_SECS should be a positive integer")?, - ); - let task_poll_interval = Duration::from_secs( - env::var("RGINX_NODE_AGENT_TASK_POLL_SECS") - .unwrap_or_else(|_| "3".to_string()) - .parse() - .context("RGINX_NODE_AGENT_TASK_POLL_SECS should be a positive integer")?, - ); + let heartbeat_interval = parse_positive_secs("RGINX_NODE_AGENT_HEARTBEAT_SECS", "10")?; + let request_timeout = parse_positive_secs("RGINX_NODE_AGENT_REQUEST_TIMEOUT_SECS", "5")?; + let task_poll_interval = parse_positive_secs("RGINX_NODE_AGENT_TASK_POLL_SECS", "3")?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-node-agent/src/config.rs` around lines 31 - 48, The current parsing for heartbeat_interval, request_timeout, and task_poll_interval only ensures the env values are integers but allows 0, which is invalid; after parsing each env var (RGINX_NODE_AGENT_HEARTBEAT_SECS, RGINX_NODE_AGENT_REQUEST_TIMEOUT_SECS, RGINX_NODE_AGENT_TASK_POLL_SECS) into a numeric type (e.g., u64 via .parse::<u64>()), add a check that the parsed value > 0 and return a clear error (use anyhow::ensure or map a custom context) if it’s 0, then call Duration::from_secs(validated_value) to build heartbeat_interval, request_timeout, and task_poll_interval; update the error messages to mention the value must be a positive non-zero integer and reference the existing variable names in the message for clarity.crates/rginx-control-store/src/config.rs-6-17 (1)
6-17:⚠️ Potential issue | 🟠 Major
Debug派生会暴露敏感字段(db_password)配置结构体包含数据库密码,直接
derive(Debug)存在日志泄漏风险。建议移除Debug派生或对敏感字段做脱敏输出。💡 建议修复
-#[derive(Debug, Clone)] +#[derive(Clone)] pub struct ControlPlaneStoreConfig { @@ } + +impl std::fmt::Debug for ControlPlaneStoreConfig { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + f.debug_struct("ControlPlaneStoreConfig") + .field("db_host", &self.db_host) + .field("db_port", &self.db_port) + .field("db_user", &self.db_user) + .field("db_password", &"***redacted***") + .field("db_name", &self.db_name) + .field("db_max_connections", &self.db_max_connections) + .field("dragonfly_host", &self.dragonfly_host) + .field("dragonfly_port", &self.dragonfly_port) + .field("dragonfly_key_prefix", &self.dragonfly_key_prefix) + .finish() + } +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-store/src/config.rs` around lines 6 - 17, The ControlPlaneStoreConfig struct currently derives Debug which will expose the sensitive db_password field; update the type so sensitive data isn't logged by either removing the #[derive(Debug)] from ControlPlaneStoreConfig or implementing a custom Debug for ControlPlaneStoreConfig that redacts or omits db_password (e.g., show "<redacted>" or not include it) while leaving other fields intact; ensure any uses of Debug (e.g., logging or unwraps) will compile against the new implementation and search for ControlPlaneStoreConfig and db_password to locate all affected sites.compose.yaml-94-97 (1)
94-97:⚠️ Potential issue | 🟠 Major请避免可预测的默认鉴权密钥/令牌
RGINX_CONTROL_AUTH_SESSION_SECRET和RGINX_CONTROL_AGENT_SHARED_TOKEN提供了固定默认值,误用于非本地环境时风险较高。建议改为“必填”变量并在缺失时启动失败。💡 建议修复
- RGINX_CONTROL_AUTH_SESSION_SECRET: ${RGINX_CONTROL_AUTH_SESSION_SECRET:-change-me-for-local-dev} + RGINX_CONTROL_AUTH_SESSION_SECRET: ${RGINX_CONTROL_AUTH_SESSION_SECRET:?RGINX_CONTROL_AUTH_SESSION_SECRET is required} @@ - RGINX_CONTROL_AGENT_SHARED_TOKEN: ${RGINX_CONTROL_AGENT_SHARED_TOKEN:-change-me-for-node-agent} + RGINX_CONTROL_AGENT_SHARED_TOKEN: ${RGINX_CONTROL_AGENT_SHARED_TOKEN:?RGINX_CONTROL_AGENT_SHARED_TOKEN is required}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compose.yaml` around lines 94 - 97, Make the two secrets mandatory instead of providing predictable defaults: remove the fallback defaults for RGINX_CONTROL_AUTH_SESSION_SECRET and RGINX_CONTROL_AGENT_SHARED_TOKEN and enforce failure when they are missing (e.g., use the compose variable syntax to require them or make the service entrypoint check and exit). Update references to the variables RGINX_CONTROL_AUTH_SESSION_SECRET and RGINX_CONTROL_AGENT_SHARED_TOKEN in compose.yaml (and any startup scripts or entrypoint logic) so the container fails fast with a clear error when those env vars are not set.crates/rginx-control-service/src/config.rs-59-68 (1)
59-68:⚠️ Potential issue | 🟠 Majorworker 不应静默吞掉坏配置。
for_api()对非法RGINX_CONTROL_NODE_OFFLINE_THRESHOLD_SECS会直接报错,for_worker()却悄悄回退到 30 秒。相同配置源在两个进程里语义不同,离线判定和 API 展示就可能不一致,而且很难排查。建议让for_worker()也返回Result<Self>并保持同样的校验策略。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-service/src/config.rs` around lines 59 - 68, for_worker currently silently ignores invalid RGINX_CONTROL_NODE_OFFLINE_THRESHOLD_SECS and falls back to 30s; change its signature to return Result<Self, E> (matching for_api's error type), parse the env var the same way as for_api (validate value.parse::<u64>() and return an error on failure instead of defaulting), set config.node_offline_threshold = Duration::from_secs(seconds) on success, and update any callers to handle the Result; keep the same error message/type semantics as for_api so worker and API share identical validation behavior for RGINX_CONTROL_NODE_OFFLINE_THRESHOLD_SECS.crates/rginx-control-service/src/nodes.rs-16-16 (1)
16-16:⚠️ Potential issue | 🟠 Major当前 ID 生成器不保证跨进程唯一。
audit_id是数据库主键,但这里的prefix + unix_ms + 进程内计数器只能保证“单进程内”不重复。控制面至少有 API 和 worker 两个独立进程;它们在同一毫秒内各自产生一次audit_*_1并不罕见,最终会把审计写入打成主键冲突。这里更适合改成 ULID/UUID,或者直接让数据库生成主键。Also applies to: 305-308
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-service/src/nodes.rs` at line 16, The current audit ID generation uses a process-local counter (NODE_EVENT_COUNTER) combined with a prefix + unix_ms which does not guarantee cross-process uniqueness for the audit_id primary key; replace this generator with a globally-unique ID strategy (e.g., generate a ULID or UUID via a crate such as ulid/uuid) or let the database assign the primary key instead. Locate the code that composes audit_id (the prefix + unix_ms + NODE_EVENT_COUNTER logic and its usages where audit_id is set/inserted) and change it to call a ULID/UUID generator (or remove the client-side id and use a DB-generated id) and update any types/serialization accordingly so concurrent processes cannot produce colliding audit_* values.crates/rginx-control-service/src/config.rs-45-53 (1)
45-53:⚠️ Potential issue | 🟠 Major拒绝空的 session secret。
这里现在只校验环境变量“存在”,不校验内容。若部署把
RGINX_CONTROL_AUTH_SESSION_SECRET设成空串,服务会正常启动,但会把会话签名退化成空密钥。这个配置应该在启动时直接失败,至少要拦住trim().is_empty()。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-service/src/config.rs` around lines 45 - 53, 在读取 RGINX_CONTROL_AUTH_SESSION_SECRET 并填充 ControlPlaneAuthConfig.session_secret 时,不仅要检查环境变量存在,还要对值做 trim() 并拒绝空字符串;修改从 env::var("RGINX_CONTROL_AUTH_SESSION_SECRET") 获取的流程以去掉前后空白并在结果为空时返回带有上下文的错误(或使用 .context(...)?),确保任何只含空白的值会导致启动失败,避免将空密钥写入 session_secret。crates/rginx-node-agent/src/agent.rs-229-257 (1)
229-257:⚠️ Potential issue | 🟠 Major异步任务执行中的阻塞 I/O 和子进程调用会卡住主循环
execute_task()在异步上下文中直接调用fs::create_dir_all()、fs::write()等阻塞式文件操作,且未使用spawn_blocking包装。run_rginx_command()是同步函数,内部调用Command::new().output()阻塞执行,这会直接阻塞整个 tokio 运行时。当这些操作卡住时(如磁盘 I/O 缓慢、
rginx check/reload子进程挂起或 admin socket 无响应),主循环的tokio::select!会被阻塞,导致心跳(heartbeat)和任务轮询同时停掉,节点容易被误判离线。建议:
- 将
execute_task()中的文件操作迁移到tokio::fs或使用tokio::task::spawn_blocking- 为
run_rginx_command()中的子进程调用加timeout()- 为
read_admin_snapshot()中的 socket 操作(connect、write、read_line)加显式 timeout🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-node-agent/src/agent.rs` around lines 229 - 257, The execute_task path currently performs blocking filesystem and subprocess work directly (fs::create_dir_all, fs::write, run_rginx_check/run_rginx_reload which call run_rginx_command, etc.), so convert those blocking operations to async-safe variants: replace blocking file ops with tokio::fs equivalents or wrap them in tokio::task::spawn_blocking (refer to execute_task, backup_path/staging_path writes, and read_existing_config), and ensure any spawn_blocking calls are .await'd; change run_rginx_command to perform the Command::output() inside spawn_blocking and guard the whole call with tokio::time::timeout so check/reload calls (run_rginx_check, run_rginx_reload) return a timed-out error instead of blocking the runtime; similarly add explicit tokio::time::timeout wrappers around socket connect/read/write in read_admin_snapshot (or use tokio's timeout-able connect/read APIs) so admin socket ops never block the main loop; update error handling in execute_task (restore_previous_config/promote_staging_to_live paths) to handle timeout errors consistently.migrations/postgres/0002_control_plane_phase3_schema.sql-21-31 (1)
21-31:⚠️ Potential issue | 🟠 Major确认
cp_audit_logs是否漏了request_id列。这张表的定义里没有
request_id,但新服务代码在多处都在构造NewAuditLogEntry { request_id, ... }。如果仓储层确实把它写入数据库,这里会在第一条审计日志落库时直接炸掉;如果没写,那请求关联信息也会被静默丢掉。建议让 schema 和仓储结构显式对齐。#!/bin/bash set -euo pipefail echo "== NewAuditLogEntry 定义 ==" rg -n -C3 'struct NewAuditLogEntry|request_id' crates/rginx-control-store echo echo "== cp_audit_logs 的插入/查询语句 ==" rg -n -C3 'cp_audit_logs|request_id' crates/rginx-control-store echo echo "== 迁移里对 cp_audit_logs 的定义 ==" rg -n -C3 'create table if not exists cp_audit_logs|request_id' migrations/postgres🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@migrations/postgres/0002_control_plane_phase3_schema.sql` around lines 21 - 31, The cp_audit_logs table is missing the request_id column but the Rust type NewAuditLogEntry includes request_id, so add a request_id column to the cp_audit_logs migration definition (e.g., request_id text NULL) so the schema matches the repository model; update the INSERT/SELECT SQL in the control-store code to include request_id where necessary and ensure any nullable/NOT NULL choice matches NewAuditLogEntry (or make the struct optional) so writes/reads don’t fail.crates/rginx-control-service/src/auth.rs-79-100 (1)
79-100:⚠️ Potential issue | 🟠 Major不要让失败审计把“无效凭证”变成 500。
这两条无效登录分支都用了
record_failed_login(...).await?。一旦审计写入暂时失败,接口就会返回Internal,把认证可用性绑死在审计存储上。失败登录审计更适合作为 best-effort,审计失败后仍应返回InvalidCredentials。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-service/src/auth.rs` around lines 79 - 100, The failed-login audit call record_failed_login(...) must be best-effort and must not convert audit write failures into a 500: remove the use of the fallible .await? in both branches (the "unknown user or inactive account" and the "password mismatch" paths) and instead call record_failed_login(...).await and ignore or log any error (e.g. match the Result and log the Err) so the function still returns ServiceError::InvalidCredentials from those branches; keep verify_password, stored_user, request_id, user_agent, remote_addr and ServiceError::InvalidCredentials unchanged except for not propagating audit errors.crates/rginx-control-service/src/revisions.rs-464-468 (1)
464-468:⚠️ Potential issue | 🟠 Major这个 ID 生成器在多实例/重启后不能保证唯一。
unix_time_ms + 进程内 AtomicU64只在单进程内有序。控制面一旦横向扩容,或者进程在同一毫秒内重启,draft_/rev_/audit_就可能撞 ID,直接打到主键或唯一键冲突。这里需要换成数据库序列,或真正的全局唯一 ID。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-service/src/revisions.rs` around lines 464 - 468, The current generate_id method (generate_id, REVISION_EVENT_COUNTER, unix_time_ms, SystemTime::now()) relies on an in-process AtomicU64 and timestamp and can collide across multiple instances or restarts; replace this with a globally-unique ID strategy: either fetch an ID from a database sequence when creating revisions (use the DB sequence/serial nextval in the code path that persists the row) or generate a collision-free identifier such as a UUID (e.g., uuid::Uuid::new_v4 or a time-ordered UUID/ULID) instead of unix_time_ms+REVISION_EVENT_COUNTER; update all callers that expect the old string format to accept the new ID shape and ensure any uniqueness constraints use the DB-backed ID or the new UUID values.crates/rginx-control-store/src/deployments.rs-399-417 (1)
399-417:⚠️ Potential issue | 🟠 Majordispatch 事务里没有确认部署仍处于
running。即使外层 reconcile 读到的是
running,操作员也可以在它进入这个事务前先把部署暂停。这里锁住行后没有读取/校验status,仍然会继续派发新任务,导致“已暂停”部署继续扩散。需要在同一把锁下读取并拒绝非running状态。Also applies to: 446-555
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-store/src/deployments.rs` around lines 399 - 417, The dispatch transaction locks the cp_deployments row but never checks the deployment status, so a deployment paused before entering this transaction can still be dispatched; update the query in the dispatch path (the sqlx::query that binds deployment_id and yields let Some(deployment)) to also select the status column from cp_deployments and, while holding the same lock (using the existing transaction and fetched row), verify deployment.status == "running" (or your enum/constant for running) and return early (rolling back the transaction) for any other status; apply the same check to the other dispatch block referenced (lines ~446-555) to ensure non-running deployments are rejected under the same lock.crates/rginx-control-service/src/auth.rs-221-269 (1)
221-269:⚠️ Potential issue | 🟠 Major用户名存在性检查和创建之间有竞态窗口。
这里先查
find_user_credentials_by_username,再执行创建,不是原子的。并发创建同一用户名时,两个请求都可能通过检查;随后只能依赖底层唯一约束兜底,而当前路径又会把那类错误映射成Internal。建议直接依赖数据库唯一约束并把重复键转换成Conflict,或者把检查和创建合并到仓储层原子完成。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-service/src/auth.rs` around lines 221 - 269, The current flow uses auth_repository().find_user_credentials_by_username(...) to pre-check existence before calling create_local_user_with_audit(...), creating a race condition and mapping DB duplicate-key failures to ServiceError::Internal; remove the non-atomic pre-check or move the logic into the repository so creation is atomic — specifically either (A) eliminate the find_user_credentials_by_username(...) call and call create_local_user_with_audit(...) directly, then detect a duplicate-key/database-constraint error from the repository and map it to ServiceError::Conflict (instead of ServiceError::Internal), or (B) add a new repository method (e.g., create_local_user_if_not_exists) that performs the check-and-insert atomically and returns a clear duplicate error the service maps to ServiceError::Conflict; update the error mapping around create_local_user_with_audit/new repo method accordingly.crates/rginx-control-store/src/deployments.rs-621-645 (1)
621-645:⚠️ Potential issue | 🟠 Major暂停/恢复是无条件 UPDATE,会把终态部署改回 running/paused。
服务层的状态检查发生在另一个往返里;如果 worker 在这之后刚把部署标记为
succeeded/failed,这里仍然会直接覆盖状态。建议把允许转换的前置条件放进 SQLWHERE,并根据rows_affected决定是否返回冲突。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-store/src/deployments.rs` around lines 621 - 645, The update in set_deployment_paused unconditionally flips cp_deployments.status and can overwrite terminal states; change the SQL in set_deployment_paused to include a WHERE clause that only allows updates when status is in non-terminal states (e.g. status NOT IN ('succeeded','failed','terminated') or status = any(allowed)), then after .execute(self.store.postgres()).await check the returned rows_affected and if it is 0 return a conflict-style error (instead of silently continuing) so callers get a clear concurrency/invalid-transition result; reference function set_deployment_paused, table cp_deployments, and columns status/status_reason when making this change.crates/rginx-control-store/src/deployments.rs-139-171 (1)
139-171:⚠️ Potential issue | 🟠 Major这里会把重试过的 target 重复返回。
dispatch_next_targets每次重试都会往cp_agent_tasks追加新行,但这里用at.target_id = t.target_id做左连接,会把同一个 target 的历史任务全部连出来。只要一个 target 重试过一次,load_deployment_detail就会返回重复 target,甚至混入旧 task 的状态。这里应该只关联t.task_id指向的当前任务,或者显式选最新一条任务记录。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/rginx-control-store/src/deployments.rs` around lines 139 - 171, 查询把历史任务也连出来导致重复 target 返回;在 load_deployment_detail 的 SQL 中不要用 at.target_id = t.target_id 单纯左连接 cp_agent_tasks,而应改为只关联当前任务(用 t.task_id 指向的任务)或显式选每个 target 的最新任务。具体修复:在该查询(map_deployment_target_row 被调用处)把 left join cp_agent_tasks at 的 join 条件改为基于 t.task_id(例如 at.task_id = t.task_id)或改为一个子查询/聚合(按 target_id 取 max(task_id) 或最新时间戳)再 join,以确保每个 t.target_id 只返回一条当前任务记录;参考符号:dispatch_next_targets、load_deployment_detail、cp_agent_tasks、t.task_id、at.target_id、map_deployment_target_row。web/console/src/api/controlPlane.ts-995-1009 (1)
995-1009:⚠️ Potential issue | 🟠 Major不要把长期认证 token 放进 SSE URL 查询串。
access_token会进入浏览器历史、代理日志、访问日志,以及很多默认的观测链路;这类泄漏通常比请求头更难收敛。SSE 如果不能直接带Authorization,建议改成 cookie,或先换取一个短时、一次性的 stream token。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/console/src/api/controlPlane.ts` around lines 995 - 1009, The function buildEventsUrl currently reads a long-lived token via getStoredAuthToken and injects it as access_token in the SSE URL query, which leaks credentials; instead stop placing the long-lived token in the query: remove writing access_token from buildEventsUrl and either rely on an Authorization header / secure SameSite HttpOnly cookie for SSE auth or fetch a short-lived stream token from the control-plane (e.g., a dedicated endpoint) and only append that ephemeral token to the URL. Update buildEventsUrl (and its callers) to use the cookie or to call the stream-token endpoint (using getStoredAuthToken for the initial auth header when acquiring the ephemeral token) and then build the events URL with the short-lived token or no query token at all.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 135bc1a8-4ba7-4ce0-aad1-57a2b659b488
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockweb/console/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (113)
.env.example.gitignoreARCHITECTURE_CONTROL_PLANE_IMPLEMENTATION_PLAN.mdCONTROL_PLANE_API_CONVENTIONS.mdCONTROL_PLANE_BACKUP_AND_RECOVERY.mdCONTROL_PLANE_DRAGONFLY_KEYSPACE.mdCONTROL_PLANE_ENVIRONMENT_VARIABLES.mdCargo.tomlDockerfileREADME.mdadr/ADR-0001-control-plane-monorepo-boundary.mdadr/ADR-0002-control-plane-state-boundary.mdadr/ADR-0003-node-agent-pull-model.mdadr/README.mdcompose.yamlcrates/rginx-control-api/Cargo.tomlcrates/rginx-control-api/src/app.rscrates/rginx-control-api/src/auth.rscrates/rginx-control-api/src/config.rscrates/rginx-control-api/src/error.rscrates/rginx-control-api/src/main.rscrates/rginx-control-api/src/middleware.rscrates/rginx-control-api/src/request_context.rscrates/rginx-control-api/src/routes/agent.rscrates/rginx-control-api/src/routes/alerts.rscrates/rginx-control-api/src/routes/audit.rscrates/rginx-control-api/src/routes/auth.rscrates/rginx-control-api/src/routes/dashboard.rscrates/rginx-control-api/src/routes/deployments.rscrates/rginx-control-api/src/routes/events.rscrates/rginx-control-api/src/routes/health.rscrates/rginx-control-api/src/routes/meta.rscrates/rginx-control-api/src/routes/metrics.rscrates/rginx-control-api/src/routes/mod.rscrates/rginx-control-api/src/routes/nodes.rscrates/rginx-control-api/src/routes/revisions.rscrates/rginx-control-api/src/routes/users.rscrates/rginx-control-api/src/state.rscrates/rginx-control-service/Cargo.tomlcrates/rginx-control-service/src/alerts.rscrates/rginx-control-service/src/audit.rscrates/rginx-control-service/src/auth.rscrates/rginx-control-service/src/config.rscrates/rginx-control-service/src/dashboard.rscrates/rginx-control-service/src/deployments.rscrates/rginx-control-service/src/error.rscrates/rginx-control-service/src/health.rscrates/rginx-control-service/src/lib.rscrates/rginx-control-service/src/meta.rscrates/rginx-control-service/src/metrics.rscrates/rginx-control-service/src/nodes.rscrates/rginx-control-service/src/revisions.rscrates/rginx-control-service/src/worker.rscrates/rginx-control-store/Cargo.tomlcrates/rginx-control-store/src/config.rscrates/rginx-control-store/src/deployments.rscrates/rginx-control-store/src/dragonfly.rscrates/rginx-control-store/src/lib.rscrates/rginx-control-store/src/repositories.rscrates/rginx-control-types/Cargo.tomlcrates/rginx-control-types/src/alerts.rscrates/rginx-control-types/src/audit.rscrates/rginx-control-types/src/auth.rscrates/rginx-control-types/src/dashboard.rscrates/rginx-control-types/src/deployments.rscrates/rginx-control-types/src/events.rscrates/rginx-control-types/src/lib.rscrates/rginx-control-types/src/meta.rscrates/rginx-control-types/src/nodes.rscrates/rginx-control-types/src/revisions.rscrates/rginx-control-worker/Cargo.tomlcrates/rginx-control-worker/src/config.rscrates/rginx-control-worker/src/main.rscrates/rginx-control-worker/src/worker.rscrates/rginx-node-agent/Cargo.tomlcrates/rginx-node-agent/src/agent.rscrates/rginx-node-agent/src/config.rscrates/rginx-node-agent/src/main.rsdeploy/control-plane/README.mddeploy/control-plane/systemd/rginx-node-agent.servicedocker/README.mddocker/control-plane/rginx-control-entrypoint.shdocker/nginx-compare/Dockerfilemigrations/README.mdmigrations/postgres/0001_control_plane_bootstrap.sqlmigrations/postgres/0002_control_plane_phase3_schema.sqlmigrations/postgres/0003_control_plane_seed.sqlmigrations/postgres/0004_control_plane_auth_schema.sqlmigrations/postgres/0005_control_plane_auth_seed.sqlmigrations/postgres/0006_control_plane_phase5_nodes.sqlmigrations/postgres/0007_control_plane_phase6_snapshots.sqlmigrations/postgres/0008_control_plane_phase7_revisions.sqlmigrations/postgres/0009_control_plane_phase8_deployments.sqlscripts/run-nginx-compare-docker.shweb/console/index.htmlweb/console/package.jsonweb/console/src/App.vueweb/console/src/api/controlPlane.tsweb/console/src/components/MetricCard.vueweb/console/src/composables/useNodeDetailStream.tsweb/console/src/env.d.tsweb/console/src/lib/display.tsweb/console/src/main.tsweb/console/src/router/index.tsweb/console/src/style.cssweb/console/src/views/AuditView.vueweb/console/src/views/DashboardView.vueweb/console/src/views/DeploymentsView.vueweb/console/src/views/NodeDetailView.vueweb/console/src/views/NodeTlsView.vueweb/console/src/views/RevisionsView.vueweb/console/tsconfig.jsonweb/console/vite.config.ts
💤 Files with no reviewable changes (1)
- docker/nginx-compare/Dockerfile
Summary
rginx-control,由control-api/control-worker复用Verification
cargo check -p rginx-control-api -p rginx-control-worker -p rginx-node-agentnpm run buildinweb/consoledocker compose -f compose.yaml config -qdocker build --target rginx-control -t rginx-control:test .rginx-control:test后访问http://127.0.0.1:58080/返回控制台首页 HTMLpostgres时,官方 initdb 能顺序执行migrations/postgres/*.sql并通过 schema healthcheck