Skip to content

Container/k8s SAST lint — 4-layer codify gap (Dockerfile + Helm template + Helm values + k8s manifest) #448

@vencil

Description

@vencil

Context

v2.8.0 Phase .a 軌道二 Policy-as-Code 落地 7 個新 lint,但全部都在 Python / JSX / shell 層Container / Kubernetes manifest 層完全沒 codified lint coverage

  • Dockerfile:build 完整性 + over-permissive COPY + 缺 .dockerignore
  • Helm template:危險環境變數(*ALLOW_EMPTY_PASSWORD / INSECURE_*
  • Helm template:擴大的 K8s securityContext.capabilities.add
  • Helm values:secrets hardcoded(password: "..."/token: "..."
  • K8s manifest:hostNetwork: true / hostPID: true / privileged container

v2.8.0 closure session(2026-05-12)owner 本機 kind setup 嘗試暴露此 gap:

  • Dockerfile silent-break since v2.7.0(cmd/+internal/ subdir added but COPY rule not updated)— no lint caught it
  • mariadb-instance deployment 加 MARIADB_ALLOW_EMPTY_ROOT_PASSWORD: "yes" 修 local startup — 沒 lint 攔本 commit-time
  • 3 個 capabilities(CHOWN/SETGID/SETUID)加進來 — 雖然 mariadb 真實需要,但沒被 review 也無 codified policy

Strategic shift(v2 rewrite, 2026-05-12)

過去 v2.8.0 之前的 lint 路線是「DIY check_*.py 看到就打」(50 個 lint 之證)。本票作為全新 IaC SAST 層試點 hybrid policy

業界工具當 engine(hadolint / kube-linter / trivy config)+ Vibe wrapper 加 project-specific 規則(rationale 註解強制 / 雙語錯誤 / dev-rules 連結)。

選擇理由:IaC 規則(runAsNonRoot / hostNetwork / capabilities)是業界標準(CIS-aligned),重寫成 Python 浪費 + 漏 edge case + 稽核難自證。Vibe-specific(如「capabilities.add 必有 rationale 註解」)open-source 無內建,由 wrapper 補。

不回頭遷現有 50 個 lint — 已能跑、有測試,改的 risk > 收益。讓自然汰換發生。本票成功後 hybrid policy 寫進 dev-rules.md,未來新 lint 都按這套。

Pre-flight findings(v2 rewrite 期間發現)

Issue 原 prose / Gemini 推測 現實 影響
check_helm_template_security.py AST scan templates」 4 個 deployment.yaml 平均 14-43 個 {{ }} Go template,純 YAML parser 必崩 雙模式:text regex scan source + render-then-lint via helm template | kube-linter
「每個 Dockerfile 必須有對應 .dockerignore 6 個 Dockerfile,只 1 個有 .dockerignore(da-tools) AC 改 fix-then-enforce:先補 5 個檔案再啟動 enforce mode
Gemini 假設 mariadb 是 third-party helm/mariadb-instance/ 是 in-house Helm chart Framing 改成「Capability-elevation 必須 explicit rationale」,in-house 也適用
「Layer 3 掃 values*.yaml mariadb-instance 把 secret 放在 templates/secret.yaml;da-portal 有 values-tier{1,2}.yaml 多環境 Scope 擴:values*.yaml + templates/secret*.yaml
Trivy 已用但 informational release.yaml line 97 註解:「Security scan (post-push, informational)」— 4 個 image 都這樣 新 IaC SAST Critical → block PR,CVE image scan 維持 informational

Acceptance criteria

AC 0 — Tool engine 採用

  • Layer 1 Enginehadolint v2.12.0+(containerized 跑,無需 host install)
  • Layer 2/4 Enginekube-linter v0.7+(Go binary,Dev Container 內 install)
  • Layer 2 補強trivy config 模式(既有 trivy-action 同 toolchain extension,0 新依賴)
  • Vibe wrapper:scripts/tools/lint/check_iac_vibe_rules.py — engine 跑完聚合 output、加 Vibe-specific 規則、雙語錯誤訊息

AC 1 — Dockerfile (Layer 1)

  • hadolint config .hadolint.yaml,rule set 包含:
    • DL3025 COPY <src> 不允許 * / . 單獨
    • DL3007 不允許 :latest tag
    • DL3008 / DL3009 apt-get install 必須 chain clean
    • DL3018 / DL3019 apk add 必須 --no-cache
  • Vibe wrapper 加規則:每個 Dockerfile 必須有 HEALTHCHECK# rationale: K8s probes 註解(distroless 例外明列)
  • .dockerignore fix-then-enforce
    • 補齊 5 個缺失的 .dockerignore(da-portal / tenant-api / threshold-exporter / e2e-bench/driver / e2e-bench/receiver)
    • baseline 內容必含(不可只 touch):.git/tests/scripts/docs/*.md.github/*.log.env*
    • Lint 失敗 = 缺檔 baseline 不全
  • pre-commit hook + CI gate

AC 2 — Helm template security(Layer 2,雙模式)

  • Mode A — Text regex scan on source(抓「寫了危險 pattern」哪怕被 {{ if }} 包住):
    • *ALLOW_EMPTY_PASSWORD*: "yes" / *ALLOW_EMPTY_*: "true" / INSECURE_*: "true"
    • 失敗 = ERROR,無 rationale escape(這類 pattern 不應該被 commit)
  • Mode B — Render-then-lint(抓「會生效的危險結構」):
    • helm template <chart> → pipe to kube-linter lint --format json
    • kube-linter 內建 rules:run-as-non-root / privileged-container / host-network / host-pid / no-read-only-root-fs / unset-cpu-requirements
    • 對每個 chart 跑 values.yaml + values-tier*.yaml 各一次
  • Vibe wrapper 加規則
    • securityContext.capabilities.add: [...] 必須 同行或上一行# rationale: <reason> 註解
    • 缺 rationale = ERROR;有 rationale 但內容 < 10 字元 = WARNING
  • Severity → action mapping(全 layer 共用):
    • Critical(privileged / hostNetwork / hostPID / runAsNonRoot: false / ALLOW_EMPTY_PASSWORD)→ BLOCK PR
    • High(runAsUser 0 / readOnlyRootFilesystem false / missing resources)→ WARNING(log 但不擋)
    • Medium / Low → INFO

AC 3 — Helm values secret-shape(Layer 3)

  • scripts/tools/lint/check_helm_values_secrets.py(Vibe wrapper,無對應 open-source engine)
  • Scope:helm/*/values*.yaml + helm/*/templates/secret*.yaml
  • 攔 patterns:password: / token: / apiKey: / secret: / clientSecret: 後接非空字串
  • 顯式 false-positive 排除(regex 白名單)
    • \${[A-Z0-9_]+} — 環境變數插值(如 ${OAUTH_CLIENT_SECRET}
    • "" — 空字串(must-be-set 標記)
    • {{ .Values.* }} — Helm template reference
    • <placeholder> / <changeme> — 文件範例
  • A-14: Secret scan dual-layer + release-preflight digest verification (deferred from v2.8.0) #445 trufflehog L1/L2 不衝突(trufflehog 抓高熵值,本 lint 抓 yaml shape)

AC 4 — K8s manifest security(Layer 4,純 kube-linter)

  • 若 repo 有 k8s/**/*.yaml raw manifest(非 Helm output),跑 kube-linter lint k8s/
  • 目前 grep 結果:repo 沒有獨立 raw K8s manifest(都走 Helm)→ AC 4 暫不啟用,留 stub workflow + 未來啟用觸發條件
  • 觸發條件:當 k8s/**/*.yaml 出現第一個檔案 → CI 自動 enable

AC 5 — CI integration

  • 全部 4 個 layer 跑 Tier 1 (fast),總執行時間 ≤ 30s(hadolint <1s × 6 / kube-linter <5s × 4 chart / wrapper <5s)
  • Critical → BLOCK PR merge(GitHub branch protection rule + required check)
  • CVE Image Scan(既有 trivy)維持 informational(理由:upstream CVE 隨時爆,會無預警卡 release)
  • Pre-commit hook auto-stage(Layer 1+2+3,Layer 4 待啟用)

AC 6 — Doc-as-Code sync(dev-rules #4

AC 7 — Baseline acceptance

  • 跑 4 layer against main HEAD → 0 Critical(必須),High 數量記錄為 baseline
  • 任何 High 都需在 docs/internal/iac-lint-baseline.md 列入 + rationale + 預計修補時程
  • mariadb-instance 的 CHOWN/SETGID/SETUID 需在 templates 內補 # rationale: mariadb-server requires for file ownership 註解

Dependencies

Out of scope

  • 回頭遷移現有 50 個 check_*.py — 已 stable,risk > reward。讓自然汰換發生
  • ❌ Runtime admission webhook(OPA Gatekeeper / Kyverno)— production cluster 責任,不是 SAST
  • ❌ Image CVE scanning policy 變更 — 既有 trivy 維持 informational
  • ❌ Custom CIS benchmark scoring — kube-linter 內建已對齊 CIS,不重做
  • ❌ Cosign / SBOM 整合 release.yaml 變動 — 已存在,不本票範圍

Sizing(v2 修訂)

AC 估時 主要工作
AC 0 tool engine 採用 0.2w Dev Container 安裝 + version pin + config 初稿
AC 1 Dockerfile + 5 個 .dockerignore 補齊 + Vibe wrapper HEALTHCHECK 規則 0.4w 5 個 baseline 補檔 + hadolint wire-up
AC 2 Helm 雙模式(text + render-lint)+ rationale wrapper 0.7w 最重,雙 mode 設計 + kube-linter wrap
AC 3 secret-shape + OAuth ${VAR} 排除 + 雙 scope(values + secret templates) 0.4w False-positive 消除最費時
AC 4 kube-linter wrapper(stub mode) 0.2w trigger condition 設計,啟用先 stub
AC 5 CI integration + branch protection 0.2w required check + severity mapping
AC 6 Doc-as-Code sync 0.3w hybrid policy 寫進 dev-rules + 5 處 doc
AC 7 Baseline acceptance + 0 Critical 對齊 0.3w mariadb rationale 註解 + High 列管
總計 ~2.5w 原估 1-1.5w,rewrite 後實際面積較大

Why v2.9.0

References

  • v2.8.0 closure session 2026-05-12 audit 發現
  • v2.8.0 Policy-as-Code 8 提案(Phase .a 軌道二 baseline)
  • 相關 issue: #447 Dockerfile precise allowlist (v2.8.1)
  • 配套 issue: #445 Secret scan multi-layer (v2.8.1)
  • 配套 epic: #449 Try-local onboarding (v2.9.0)
  • 既有 lint patterns: #382 codename-leak / #387 (b) class lints
  • Pre-flight finding source: helm/*/templates/*.yaml Go template token count + .dockerignore glob (1/6 covered) + .pre-commit-config.yaml (0 IaC tool) + release.yaml trivy informational scan

Rewritten v2 (2026-05-12) incorporating Claude + Gemini three-round review — focusing on hybrid policy (open-source engine + Vibe wrapper) replacing whack-a-mole DIY default, Helm dual-mode scanning (text + render-then-lint), severity → action mapping (Critical block / High warn), explicit false-positive escape mechanism, .dockerignore fix-then-enforce with baseline content, and v2.9.0 P2 independent epic positioning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Normal priority — backlogciCI / build pipelineepicTracks a multi-PR initiative / epic-level work itemsecuritySecurity / CVE / supply-chain

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions