Skip to content

feat: Entity-annotated learning captures via terraphim_automata #703

@AlexMikhalev

Description

@AlexMikhalev

Summary

Enhance the learning capture system to annotate failed command error messages with entities from the knowledge graph using existing terraphim_automata Aho-Corasick matching. This makes learn query semantically aware -- enabling searches by entity type/label, not just raw text.

Context

  • Plan: plans/quickner-terraphim-learning-integration.md (Phase 1)
  • The learning system (terraphim_github_runner::learning) already captures FailureTracker, SuccessPattern, and ApplicableLesson records
  • terraphim_automata::matcher::find_matches() already provides fast Aho-Corasick matching against KG terms
  • These two systems are not yet connected

Current Behaviour

  1. learn capture records failed command + error output as plain text
  2. learn query "pattern" does basic text matching against stored failures
  3. No entity-level annotation on error messages
  4. No ability to filter by entity type (e.g. "show all failures involving cargo clippy" requires exact string match)

Proposed Behaviour

  1. When learn capture records a failed command + error output, run terraphim_automata::find_matches() against the error text using the active role's thesaurus
  2. Store matched entities (term, normalised_term, positions) alongside FailureTracker records
  3. Enhance learn query to support entity-based filtering:
    • learn query --entity "cargo" -- find failures where "cargo" was matched as a KG entity
    • learn query --label "CLI Tool" -- find failures annotated with a specific entity label
  4. Existing plain-text learn query "pattern" continues to work unchanged

Implementation Notes

Data Model Changes

Add to FailureTracker:

pub struct AnnotatedFailure {
    // existing fields...
    pub matched_entities: Vec<MatchedEntity>,
}

pub struct MatchedEntity {
    pub term: String,
    pub label: String,  // from NormalizedTerm
    pub positions: Vec<(usize, usize)>,
}

Integration Points

  1. learn capture / learn hook: After recording failure, call find_matches(error_text, thesaurus, true) to annotate
  2. learn query: Add --entity and --label filter flags
  3. learn list: Optionally display matched entities alongside each entry
  4. Thesaurus loading: Use the command thesaurus from learning/thesaurus.rs plus the active role's KG thesaurus

Pre-seeded Command Thesaurus

The existing learning/thesaurus.rs already contains normalised command patterns (cargo, git, npm, docker, etc.). These become the entity dictionary for annotation -- no additional KG setup required.

Acceptance Criteria

  • learn capture annotates error messages with matched KG entities
  • learn hook (PostToolUse integration) also annotates
  • learn query --entity <term> filters by matched entity
  • learn query --label <label> filters by entity label/type
  • Existing learn query "text" behaviour unchanged
  • learn list shows entity annotations when present
  • Unit tests for annotation pipeline
  • Integration test: capture a failure, verify entities extracted, query by entity

Estimated Effort

2-3 days

Relates To

  • Quickner evaluation: https://github.com/terraphim/quickner (Phase 2 would use this for spaCy/CoNLL export)
  • Knowledge base entry: knowledge/external/learning-rust/quickner-fast-ner-annotator-rust.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions