Add rule engine Phases 4-5: engine, orchestrator, audit, config, rules by wrhalpin · Pull Request #131 · wrhalpin/GNAT

wrhalpin · 2026-04-19T21:14:32Z

Core rule evaluation pipeline and reference rules:

engine.py: AnalysisRuleEngine.evaluate() — priority-sorted rule evaluation with phase gates, transition-slot semantics, dirty-tree refusal, exception logging (never halts). Pure function: returns RuleEvaluationResult without mutating state.
orchestrator.py: RuleOrchestrator.on_hypothesis_changed() — bridges engine to InvestigationService. Audit-first pattern: writes all firing records before applying the primary decision. Confidence is never set (deliberate None).
audit.py: AuditWriter with in-memory fallback when SQLAlchemy is not available. Git SHA capture via subprocess with 5s timeout. git_file_is_clean() for dirty-tree policy enforcement.
alembic 0010: rule_firing_audit table with hypothesis/rule/time indexes.
config.ini.example: [rules] section with 7 keys, feature flag OFF.
rules/promotion/: 3 production-grade .hy rules (support-on-strong- evidence, refute-on-refuting-evidence, analyst-override).
rules/examples/: 4 example rules (evidence, reliability, temporal, AI-source check).
27 new tests (audit, engine, orchestrator), 123 total passing.

https://claude.ai/code/session_01H5UbjsuiiGya5n1eUCxoaR

Core rule evaluation pipeline and reference rules: - engine.py: AnalysisRuleEngine.evaluate() — priority-sorted rule evaluation with phase gates, transition-slot semantics, dirty-tree refusal, exception logging (never halts). Pure function: returns RuleEvaluationResult without mutating state. - orchestrator.py: RuleOrchestrator.on_hypothesis_changed() — bridges engine to InvestigationService. Audit-first pattern: writes all firing records before applying the primary decision. Confidence is never set (deliberate None). - audit.py: AuditWriter with in-memory fallback when SQLAlchemy is not available. Git SHA capture via subprocess with 5s timeout. git_file_is_clean() for dirty-tree policy enforcement. - alembic 0010: rule_firing_audit table with hypothesis/rule/time indexes. - config.ini.example: [rules] section with 7 keys, feature flag OFF. - rules/promotion/: 3 production-grade .hy rules (support-on-strong- evidence, refute-on-refuting-evidence, analyst-override). - rules/examples/: 4 example rules (evidence, reliability, temporal, AI-source check). - 27 new tests (audit, engine, orchestrator), 123 total passing. https://claude.ai/code/session_01H5UbjsuiiGya5n1eUCxoaR

Copilot

Pull request overview

Adds the core hypothesis rule-evaluation pipeline (engine + orchestrator) with auditing, configuration, DB migration, and a set of reference .hy rules, backed by new unit tests.

Changes:

Introduces AnalysisRuleEngine evaluation (phase gating, priority ordering via loader, transition-slot semantics, dirty-rule policy, non-fatal rule errors).
Adds RuleOrchestrator to run evaluation, write audit records first, and apply the primary status-transition decision via InvestigationService.
Adds AuditWriter (DB-backed when available, otherwise in-memory) plus an Alembic migration and example/promotion rule files and config sample, with new unit tests.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`gnat/analysis/rules/engine.py`	Rule evaluation loop producing `RuleEvaluationResult` firings (with git metadata + dirty-file gating).
`gnat/analysis/rules/orchestrator.py`	Bridges engine → `InvestigationService`, audit-first recording, applies primary `SetStatusDecision`.
`gnat/analysis/rules/audit.py`	Audit writer with in-memory fallback, git helpers, DB insert/update paths.
`alembic/versions/0010_add_rule_firing_audit.py`	Adds `rule_firing_audit` table and indexes.
`config/config.ini.example`	Documents `[rules]` configuration keys and default disabled behavior.
`rules/promotion/support-on-strong-evidence.hy`	Promotion rule for OPEN → SUPPORTED based on evidence/confidence/source checks.
`rules/promotion/refute-on-refuting-evidence.hy`	Refutation rule for OPEN → REFUTED when refuting evidence dominates.
`rules/promotion/analyst-override.hy`	Intended analyst override rule (currently stubbed).
`rules/examples/evidence_check.hy`	Example rule: annotate when insufficient supporting evidence.
`rules/examples/reliability_check.hy`	Example rule: block promotion when reliability below threshold.
`rules/examples/temporal_check.hy`	Example rule: mark stale hypotheses inconclusive.
`rules/examples/analyst_flag_check.hy`	Example rule file (content is AI-source check).
`tests/unit/analysis/rules/test_engine.py`	Unit tests for engine semantics (phase gate, slot consumption, exceptions, dirty rules).
`tests/unit/analysis/rules/test_orchestrator.py`	Unit tests for orchestrator audit-first and applying status transitions.
`tests/unit/analysis/rules/test_audit.py`	Unit tests for in-memory audit writer + decision serialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T21:20:17Z

+;; Example: AI-only source check
+;;
+;; Demonstrates the source helpers: ai-only?, has-trusted-evidence?,
+;; evidence-sources, within-ai-ceiling?.
+;;
+;; This rule blocks promotion for hypotheses where ALL evidence
+;; originates from AI connectors (ChatGPT, Copilot, etc.).
+
+(require gnat.analysis.rules.macros *)
+(import gnat.analysis.rules.helpers *)
+
+(defrule ai-source-check-example
+  :description "Block promotion when evidence is exclusively AI-sourced"


This file is named analyst_flag_check.hy, but it actually defines ai-source-check-example and its header/doc describe an AI-only source check. Renaming the file (or the rule) to match the behavior will make the examples directory easier to navigate and reduces confusion for users copying example rules.

Suggested change

;; Example: AI-only source check

;;

;; Demonstrates the source helpers: ai-only?, has-trusted-evidence?,

;; evidence-sources, within-ai-ceiling?.

;;

;; This rule blocks promotion for hypotheses where ALL evidence

;; originates from AI connectors (ChatGPT, Copilot, etc.).

(require gnat.analysis.rules.macros *)

(import gnat.analysis.rules.helpers *)

(defrule ai-source-check-example

:description "Block promotion when evidence is exclusively AI-sourced"

;; Example: Analyst flag check

;;

;; Demonstrates the source helpers: ai-only?, has-trusted-evidence?,

;; evidence-sources, within-ai-ceiling?.

;;

;; This example keeps an analyst-oriented check in place for hypotheses

;; where ALL evidence originates from AI connectors (ChatGPT, Copilot, etc.).

(require gnat.analysis.rules.macros *)

(import gnat.analysis.rules.helpers *)

(defrule analyst-flag-check-example

:description "Analyst flag check for hypotheses with exclusively AI-sourced evidence"

Copilot · 2026-04-19T21:20:17Z

+        "idx_rfa_hypothesis",
+        "rule_firing_audit",
+        ["hypothesis_id"],
+    )
+    op.create_index(
+        "idx_rfa_rule_time",
+        "rule_firing_audit",
+        ["rule_name", "fired_at"],
+    )
+    op.create_index(
+        "idx_rfa_investigation",
+        "rule_firing_audit",
+        ["investigation_id", "fired_at"],
+    )
+
+
+def downgrade() -> None:
+    op.drop_index("idx_rfa_investigation", table_name="rule_firing_audit")
+    op.drop_index("idx_rfa_rule_time", table_name="rule_firing_audit")
+    op.drop_index("idx_rfa_hypothesis", table_name="rule_firing_audit")


Index naming here (idx_rfa_*) is inconsistent with the existing Alembic migrations in this repo (which consistently use ix_<table>_<cols>). For consistency and easier DB maintenance, consider renaming these indexes to follow the established ix_... pattern.

Suggested change

"idx_rfa_hypothesis",

"rule_firing_audit",

["hypothesis_id"],

)

op.create_index(

"idx_rfa_rule_time",

"rule_firing_audit",

["rule_name", "fired_at"],

)

op.create_index(

"idx_rfa_investigation",

"rule_firing_audit",

["investigation_id", "fired_at"],

)

def downgrade() -> None:

op.drop_index("idx_rfa_investigation", table_name="rule_firing_audit")

op.drop_index("idx_rfa_rule_time", table_name="rule_firing_audit")

op.drop_index("idx_rfa_hypothesis", table_name="rule_firing_audit")

"ix_rule_firing_audit_hypothesis_id",

"rule_firing_audit",

["hypothesis_id"],

)

op.create_index(

"ix_rule_firing_audit_rule_name_fired_at",

"rule_firing_audit",

["rule_name", "fired_at"],

)

op.create_index(

"ix_rule_firing_audit_investigation_id_fired_at",

"rule_firing_audit",

["investigation_id", "fired_at"],

)

def downgrade() -> None:

op.drop_index(

"ix_rule_firing_audit_investigation_id_fired_at",

table_name="rule_firing_audit",

)

op.drop_index(

"ix_rule_firing_audit_rule_name_fired_at",

table_name="rule_firing_audit",

)

op.drop_index(

"ix_rule_firing_audit_hypothesis_id",

table_name="rule_firing_audit",

)

Copilot · 2026-04-19T21:20:18Z

+                        "applied_at = NOW(), error_message = :err "
+                        "WHERE id = :aid"
+                    ),
+                    {
+                        "app": error_message is None,


_update_db takes an applied_at argument, but the SQL always sets applied_at = NOW() and ignores the passed value. This is inconsistent with the in-memory implementation and makes testing/clock control harder; either use the provided applied_at in the UPDATE statement or drop the parameter to avoid misleading callers.

Suggested change

"applied_at = NOW(), error_message = :err "

"WHERE id = :aid"

),

{

"app": error_message is None,

"applied_at = :applied_at, error_message = :err "

"WHERE id = :aid"

),

{

"app": error_message is None,

"applied_at": applied_at,

Copilot · 2026-04-19T21:20:18Z

+def rule_file_sha(path: str | Path) -> str | None:
+    """Return the latest git commit SHA for a file, or None if dirty/unavailable."""
+    try:
+        result = subprocess.run(
+            ["git", "log", "-n", "1", "--format=%H", "--", str(path)],
+            capture_output=True,
+            text=True,
+            timeout=5,
+        )
+        if result.returncode == 0 and result.stdout.strip():
+            return result.stdout.strip()
+    except Exception:  # noqa: BLE001
+        pass
+    return None


The rule_file_sha docstring says it returns None if the file is dirty, but the implementation does not check cleanliness (it only runs git log). Either update the docstring or incorporate a dirty check (possibly reusing git_file_is_clean) so callers can rely on the documented behavior.

Copilot · 2026-04-19T21:20:18Z

+            except Exception as exc:  # noqa: BLE001
+                logger.error(
+                    "Failed to apply rule decision for hypothesis %s: %s",
+                    hypothesis_id,
+                    exc,
+                )


logger.error(...) here drops the traceback, which makes production debugging of rule-application failures much harder. Use logger.exception(...) or pass exc_info=True so the stack trace is captured in logs.

Copilot · 2026-04-19T21:20:18Z

+            if not self._policy.allow_dirty_rules and rule.source_file and not git_file_is_clean(rule.source_file):
+                logger.warning(
+                    "Skipping rule %s: source file %s has uncommitted changes",
+                    rule.name,
+                    rule.source_file,
+                )
+                continue


Rule evaluation calls git_file_is_clean(...) (a subprocess) for each rule when allow_dirty_rules is false. With many rules and frequent evaluations, this becomes a significant overhead. Consider computing cleanliness once per unique rule.source_file (or once per evaluation when rules are all under the same repo) and caching the result within evaluate().

Copilot · 2026-04-19T21:20:19Z

+            firing = RuleFiring(
+                rule_name=rule.name,
+                rule_source_file=str(rule.source_file),
+                rule_git_sha=rule_file_sha(rule.source_file) if rule.source_file else None,
+                decision=decision,


rule_file_sha(...) is invoked for every firing, spawning a git log subprocess each time. If multiple rules share the same source file or evaluations are frequent, this can be expensive. Consider caching SHAs per rule.source_file within a single evaluate() call (or precomputing them during rule load) to avoid repeated subprocess invocations.

Copilot · 2026-04-19T21:20:19Z

+                result = sess.execute(
+                    text(
+                        "INSERT INTO rule_firing_audit "
+                        "(investigation_id, hypothesis_id, workspace_id, "
+                        "rule_name, rule_source_file, rule_git_sha, "
+                        "fired_at, decision, applied, engine_version) "
+                        "VALUES (:inv, :hyp, :ws, :rn, :rf, :sha, "
+                        "NOW(), :dec, false, :ev) RETURNING id"
+                    ),


The DB write path hard-codes NOW() and uses RETURNING id in raw SQL. This is not portable (the project defaults to SQLite via GNAT_DB_URL), and will cause audit persistence to fail and silently fall back to in-memory mode. Prefer binding fired_at/applied_at values from Python (or using SQLAlchemy func.now() with dialect-aware compilation) and retrieving inserted IDs via SQLAlchemy APIs rather than RETURNING in a text query.

Copilot · 2026-04-19T21:20:19Z

+;; Checks for an explicit analyst flag in Investigation.tags matching
+;; the pattern "hyp:<hypothesis_id>:ready-for-support". When present,
+;; force-promotes the hypothesis regardless of evidence thresholds.
+;;
+;; This convention lets analysts encode explicit overrides in the
+;; investigation's tag list. The flag should be added via the CLI or
+;; UI, reviewed in PR, and removed after the hypothesis transitions.
+;;
+;; Priority 200 — preempts all other rules.
+
+(require gnat.analysis.rules.macros *)
+(import gnat.analysis.rules.helpers *)
+
+(defrule analyst-override
+  :description "Force-promote when analyst sets hyp:<id>:ready-for-support tag"
+  :phase "open"
+  :target-status "supported"
+  :priority 200
+  :tags ["override" "analyst"]
+  :when (fn [h ctx]
+          (let [tag (+ "hyp:" (. h id) ":ready-for-support")]
+            ;; Check if investigation tags contain the override flag
+            ;; ctx doesn't carry investigation directly, so we check
+            ;; the hypothesis's own metadata for the tag convention
+            False))
+  :then (fn [h ctx]
+          (set-status "supported"
+                      :reason "Analyst override via investigation tag")))


This rule's :when predicate is currently hard-coded to False, so the rule can never fire even though it is described as a highest-priority production override. Either implement the actual tag check (which likely requires making the investigation tags available in RuleContext) or remove/relocate this rule so it doesn't look enabled but do nothing.

Suggested change

;; Checks for an explicit analyst flag in Investigation.tags matching

;; the pattern "hyp:<hypothesis_id>:ready-for-support". When present,

;; force-promotes the hypothesis regardless of evidence thresholds.

;;

;; This convention lets analysts encode explicit overrides in the

;; investigation's tag list. The flag should be added via the CLI or

;; UI, reviewed in PR, and removed after the hypothesis transitions.

;;

;; Priority 200 — preempts all other rules.

(require gnat.analysis.rules.macros *)

(import gnat.analysis.rules.helpers *)

(defrule analyst-override

:description "Force-promote when analyst sets hyp:<id>:ready-for-support tag"

:phase "open"

:target-status "supported"

:priority 200

:tags ["override" "analyst"]

:when (fn [h ctx]

(let [tag (+ "hyp:" (. h id) ":ready-for-support")]

;; Check if investigation tags contain the override flag

;; ctx doesn't carry investigation directly, so we check

;; the hypothesis's own metadata for the tag convention

False))

:then (fn [h ctx]

(set-status "supported"

:reason "Analyst override via investigation tag")))

;; Intended behavior: check for an explicit analyst flag in

;; Investigation.tags matching the pattern

;; "hyp:<hypothesis_id>:ready-for-support" and, when present,

;; force-promote the hypothesis regardless of evidence thresholds.

;;

;; This convention would let analysts encode explicit overrides in the

;; investigation's tag list. The flag should be added via the CLI or

;; UI, reviewed in PR, and removed after the hypothesis transitions.

;;

;; NOTE:

;; This file intentionally does not register a rule yet. The previous

;; implementation defined a highest-priority rule whose :when predicate

;; was hard-coded to False because RuleContext does not currently expose

;; the investigation tags needed to evaluate the override.

;;

;; Do not reintroduce the rule until the required tags are made

;; available to the predicate and the real tag check can be implemented.

(require gnat.analysis.rules.macros *)

(import gnat.analysis.rules.helpers *)

Copilot AI review requested due to automatic review settings April 19, 2026 21:14

wrhalpin merged commit 9b7086e into main Apr 19, 2026
2 of 23 checks passed

Copilot started reviewing on behalf of wrhalpin April 19, 2026 21:15 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rule engine Phases 4-5: engine, orchestrator, audit, config, rules#131

Add rule engine Phases 4-5: engine, orchestrator, audit, config, rules#131
wrhalpin merged 1 commit into
mainfrom
claude/create-gnat-admin-guide-BOSrp

wrhalpin commented Apr 19, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Copilot AI Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wrhalpin commented Apr 19, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants