# Exception-Handling ChatBot – KG-based Skeleton

Dieses Notebook ist die Notebook-Variante von `exception_handling_chatbot_skeleton.py`.

**Ziel:** Auf Basis deines PLC Knowledge Graphs (`Test_filled.ttl`) soll der ChatBot deterministisch (ohne LLM) folgende Grundlagen liefern:

1. **Signatur** eines Programms/POU extrahieren (Call-Graph, verwendete Variablen, Ports, Hardware-Adressen sofern vorhanden)
2. **Checkable vs not_checkable**: Welche Sensoren aus einem Snapshot sind durch das gewählte Programm überhaupt prüfbar?
3. **Similarity Search**: Gibt es schon ein ähnliches Programm/POU (über rekursives Traversieren + Signaturvergleich)?

> Steps 3 & 4 (Hypothesen, Monitoring, Systemreaktionen, Matching gegen alternative Prozesse) sind als **Integrationspunkte** vorgesehen und können später mit deinem `plc_kg_gemini_text2sparql_template` kombiniert werden.


## 0) Konfiguration

Passe diese Pfade/Parameter an deine Umgebung an.


In [None]:
# Pfad zu deinem TTL Knowledge Graph
# Tipp: Wenn du das Notebook lokal ausführst, setze hier den absoluten Pfad.
TTL_PATH = r"D:\MA_Python_Agent\MSRGuard_Anpassung\KGs\Test_filled.ttl" # z.B. r"C:\...\Test_filled.ttl"

# Fallback: falls der Pfad nicht existiert, versuche eine Datei im Notebook-Ordner
from pathlib import Path
if not Path(TTL_PATH).exists():
    if Path("Test_filled.ttl").exists():
        TTL_PATH = str(Path("Test_filled.ttl").resolve())

# Ziel-Programm/POU (wie im KG unter dp:hasPOUName)
TARGET_PROGRAM = "MAIN"

# Beispielhafter Sensor-Snapshot (User Input)
SENSOR_SNAPSHOT = {
    "GVL.Start": True,
    "GVL.Fehler": False,
    "OPCUA.TriggerD1": True,
}

# Index-Export/Import (optional)
OUT_DIR = Path(r"D:\MA_Python_Agent\Notebooks\Zusatzdateien")
OUT_DIR.mkdir(parents=True, exist_ok=True)  # legt den Ordner an, falls er fehlt

INDEX_PATH = OUT_DIR / "routine_index.json"
print("Speichere nach:", INDEX_PATH)


## 1) Imports, Namespaces, Datenmodelle

In [10]:
"""
Exception Handling ChatBot Skeleton (KG-backed)

This file is meant to be used with your local PLC KG (TTL) as created by your ingestion pipeline.
It focuses on the pipeline you described:

0) User selects a program + provides sensor snapshot
1) Suggest fault hypotheses + classify "checkable via program" vs "not checkable"
2) Before generating a new routine, search for similar routines by recursively resolving signals down to hardware addresses
3) Propose additional monitoring actions to disambiguate faults
4) Propose system reactions + match to alternative processes (JSON)

Notes:
- The KG you shared already contains dp:hasHardwareAddress as a property, but in Test_filled.ttl it is not populated.
  If you populate it during ingestion, the IO-resolution will automatically become much stronger.
- To avoid brittle LLM-generated SPARQL for core logic, this skeleton uses deterministic RDFLib traversals,
  and only expects an LLM for "ideas" (hypotheses, monitoring, reactions) as a separate optional step.
"""

from __future__ import annotations

from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional, Set, Tuple
import json
import re

from rdflib import Graph, Namespace, RDF, URIRef, Literal


# -----------------------------
# Namespaces (match your KG)
# -----------------------------
AG = Namespace("http://www.semanticweb.org/AgentProgramParams/")
DP = Namespace("http://www.semanticweb.org/AgentProgramParams/dp_")
OP = Namespace("http://www.semanticweb.org/AgentProgramParams/op_")


# -----------------------------
# Data models
# -----------------------------
@dataclass
class SensorSnapshot:
    program_name: str
    sensor_values: Dict[str, Any]  # e.g. {"GVL.Start": True, "OPCUA.Z1": 0, ...}


@dataclass
class RoutineSignature:
    pou_name: str
    pou_uri: str
    reachable_pous: List[str] = field(default_factory=list)
    used_variables: List[str] = field(default_factory=list)          # variable URIs
    used_variable_names: List[str] = field(default_factory=list)     # dp:hasVariableName values
    hardware_addresses: List[str] = field(default_factory=list)      # dp:hasHardwareAddress values (if available)
    called_pou_names: List[str] = field(default_factory=list)        # dp:hasPOUName of called POUs
    ports: List[str] = field(default_factory=list)                   # port URIs
    port_names: List[str] = field(default_factory=list)              # dp:hasPortName values

    def as_dict(self) -> Dict[str, Any]:
        return {
            "pou_name": self.pou_name,
            "pou_uri": self.pou_uri,
            "reachable_pous": self.reachable_pous,
            "used_variables": self.used_variables,
            "used_variable_names": self.used_variable_names,
            "hardware_addresses": self.hardware_addresses,
            "called_pou_names": self.called_pou_names,
            "ports": self.ports,
            "port_names": self.port_names,
        }


# -----------------------------
# KG access layer
# -----------------------------

## 2) KGStore: RDFLib laden + Basis-Traversals

In [11]:
class KGStore:
    def __init__(self, ttl_path: str):
        self.ttl_path = ttl_path
        self.g = Graph()
        self.g.parse(ttl_path, format="turtle")

        # quick lookup caches
        self._varname_to_vars: Dict[str, List[URIRef]] = {}
        self._pouname_to_pou: Dict[str, URIRef] = {}

        self._build_caches()

    def _build_caches(self) -> None:
        # Variables by name (dp:hasVariableName)
        for var, _, name in self.g.triples((None, DP.hasVariableName, None)):
            if not isinstance(name, Literal):
                continue
            key = str(name)
            self._varname_to_vars.setdefault(key, []).append(var)

        # POUs by name (dp:hasPOUName)
        for pou, _, name in self.g.triples((None, DP.hasPOUName, None)):
            if not isinstance(name, Literal):
                continue
            self._pouname_to_pou[str(name)] = pou

    def pou_uri_by_name(self, pou_name: str) -> Optional[URIRef]:
        return self._pouname_to_pou.get(pou_name)

    def variable_uris_by_name(self, name: str) -> List[URIRef]:
        """Returns all var URIs that share the same dp:hasVariableName."""
        return list(self._varname_to_vars.get(name, []))

    def best_variable_uri_by_name(self, name: str) -> Optional[URIRef]:
        """
        Heuristic disambiguation if multiple variables share the same name.
        Priorities:
          1) exact scoped names like 'GVL.X' / 'OPCUA.X'
          2) global variable instances (scope=global)
          3) else first
        """
        cands = self.variable_uris_by_name(name)
        if not cands:
            return None

        # 1) prefer exact qualified name matches
        if "." in name:
            return cands[0]

        # 2) prefer global scope if available
        for v in cands:
            scope = self.g.value(v, DP.hasVariableScope)
            if scope and str(scope).lower() == "global":
                return v

        return cands[0]

    def get_pou_code(self, pou_uri: URIRef) -> str:
        code = self.g.value(pou_uri, DP.hasPOUCode)
        return str(code) if code else ""

    def get_reachable_pous(self, root_pou_uri: URIRef) -> Set[URIRef]:
        """
        BFS via: POU --op:containsPOUCall--> POUCall --op:callsPOU--> called POU
        """
        visited: Set[URIRef] = set()
        queue: List[URIRef] = [root_pou_uri]

        while queue:
            cur = queue.pop(0)
            if cur in visited:
                continue
            visited.add(cur)

            for call in self.g.objects(cur, OP.containsPOUCall):
                for called in self.g.objects(call, OP.callsPOU):
                    if isinstance(called, URIRef) and called not in visited:
                        queue.append(called)

        return visited

    def get_called_pous(self, pou_uri: URIRef) -> Set[URIRef]:
        called: Set[URIRef] = set()
        for call in self.g.objects(pou_uri, OP.containsPOUCall):
            for target in self.g.objects(call, OP.callsPOU):
                if isinstance(target, URIRef):
                    called.add(target)
        return called

    def get_used_variables(self, pou_uri: URIRef) -> Set[URIRef]:
        vars_: Set[URIRef] = set()
        for v in self.g.objects(pou_uri, OP.usesVariable):
            if isinstance(v, URIRef):
                vars_.add(v)
        for v in self.g.objects(pou_uri, OP.hasInternalVariable):
            if isinstance(v, URIRef):
                vars_.add(v)
        return vars_

    def get_ports_of_pou(self, pou_uri: URIRef) -> Set[URIRef]:
        ports: Set[URIRef] = set()
        for p in self.g.objects(pou_uri, OP.hasPort):
            if isinstance(p, URIRef):
                ports.add(p)
        return ports

    def get_port_name(self, port_uri: URIRef) -> str:
        v = self.g.value(port_uri, DP.hasPortName)
        return str(v) if v else ""

    def get_variable_names(self, var_uri: URIRef) -> Set[str]:
        names: Set[str] = set()
        for _, _, name in self.g.triples((var_uri, DP.hasVariableName, None)):
            if isinstance(name, Literal):
                names.add(str(name))
        return names

    def get_hardware_address(self, var_uri: URIRef) -> Optional[str]:
        v = self.g.value(var_uri, DP.hasHardwareAddress)
        return str(v) if v else None


# -----------------------------
# ST token extraction (lightweight)
# -----------------------------
_ST_TOKEN = re.compile(r"\b[A-Za-z_][A-Za-z0-9_]*(?:\.[A-Za-z_][A-Za-z0-9_]*)+\b")
# This matches qualified tokens like GVL.Start, fbDiag.Alt_gefunden, etc.

def extract_qualified_tokens_from_st(code: str) -> Set[str]:
    return set(_ST_TOKEN.findall(code or ""))


# -----------------------------
# Signature extraction + similarity
# -----------------------------

## 3) SignatureExtractor: rekursiv Call-Graph + Variablen/Ports/HW-Adressen einsammeln

In [12]:
class SignatureExtractor:
    def __init__(self, kg: KGStore):
        self.kg = kg

    def extract_signature(self, pou_name: str) -> RoutineSignature:
        pou_uri = self.kg.pou_uri_by_name(pou_name)
        if pou_uri is None:
            raise ValueError(f"POU '{pou_name}' not found in KG.")

        reachable = self.kg.get_reachable_pous(pou_uri)
        reachable_names = []
        called_names: Set[str] = set()
        used_vars: Set[URIRef] = set()
        used_var_names: Set[str] = set()
        hw_addrs: Set[str] = set()
        ports: Set[URIRef] = set()
        port_names: Set[str] = set()

        for rp in reachable:
            rp_name = self.kg.g.value(rp, DP.hasPOUName)
            if rp_name:
                reachable_names.append(str(rp_name))

            # called pous
            for c in self.kg.get_called_pous(rp):
                cn = self.kg.g.value(c, DP.hasPOUName)
                if cn:
                    called_names.add(str(cn))

            # variables from KG relations
            used_vars |= self.kg.get_used_variables(rp)

            # ports (for FBTypes)
            ports |= self.kg.get_ports_of_pou(rp)

            # variables also from ST code tokens
            code = self.kg.get_pou_code(rp)
            for tok in extract_qualified_tokens_from_st(code):
                v_uri = self.kg.best_variable_uri_by_name(tok)
                if v_uri:
                    used_vars.add(v_uri)

        for v in used_vars:
            used_var_names |= self.kg.get_variable_names(v)
            ha = self.kg.get_hardware_address(v)
            if ha:
                hw_addrs.add(ha)

        for p in ports:
            pn = self.kg.get_port_name(p)
            if pn:
                port_names.add(pn)

        sig = RoutineSignature(
            pou_name=pou_name,
            pou_uri=str(pou_uri),
            reachable_pous=sorted(set(reachable_names)),
            used_variables=sorted({str(v) for v in used_vars}),
            used_variable_names=sorted(used_var_names),
            hardware_addresses=sorted(hw_addrs),
            called_pou_names=sorted(called_names),
            ports=sorted({str(p) for p in ports}),
            port_names=sorted(port_names),
        )
        return sig


def jaccard(a: Set[str], b: Set[str]) -> float:
    if not a and not b:
        return 1.0
    if not a or not b:
        return 0.0
    inter = len(a & b)
    union = len(a | b)
    return inter / union if union else 0.0



## 4) RoutineIndex: Similarity Search (gewichtete Jaccard)

In [13]:
class RoutineIndex:
    """
    Offline index for fast "is there already something similar?" checks.
    """

    def __init__(self, signatures: List[RoutineSignature]):
        self.signatures = signatures

    @staticmethod
    def build_from_kg(kg: KGStore, only_names: Optional[Iterable[str]] = None) -> "RoutineIndex":
        extractor = SignatureExtractor(kg)

        # default: index all POUs in KG
        if only_names is None:
            only_names = sorted(kg._pouname_to_pou.keys())

        sigs = []
        for name in only_names:
            try:
                sigs.append(extractor.extract_signature(name))
            except Exception:
                # skip things that fail (optional)
                pass
        return RoutineIndex(sigs)

    def save_json(self, path: str) -> None:
        Path(path).write_text(json.dumps([s.as_dict() for s in self.signatures], indent=2), encoding="utf-8")

    @staticmethod
    def load_json(path: str) -> "RoutineIndex":
        data = json.loads(Path(path).read_text(encoding="utf-8"))
        sigs = [RoutineSignature(**d) for d in data]
        return RoutineIndex(sigs)

    def find_similar(
        self,
        target: RoutineSignature,
        top_k: int = 5,
        w_hw: float = 0.55,
        w_varnames: float = 0.25,
        w_called: float = 0.20,
    ) -> List[Tuple[float, RoutineSignature]]:
        """
        Similarity is a weighted combination of:
          - IO similarity via hardware addresses (best if populated)
          - fallback IO similarity via variable names
          - structural similarity via called POU names
        """
        tgt_hw = set(target.hardware_addresses)
        tgt_vars = set(target.used_variable_names)
        tgt_called = set(target.called_pou_names)

        scored = []
        for cand in self.signatures:
            cand_hw = set(cand.hardware_addresses)
            cand_vars = set(cand.used_variable_names)
            cand_called = set(cand.called_pou_names)

            sim_hw = jaccard(tgt_hw, cand_hw) if (tgt_hw or cand_hw) else 0.0
            sim_vars = jaccard(tgt_vars, cand_vars)
            sim_called = jaccard(tgt_called, cand_called)

            score = w_hw * sim_hw + w_varnames * sim_vars + w_called * sim_called
            scored.append((score, cand))

        scored.sort(key=lambda x: x[0], reverse=True)
        return scored[:top_k]


# -----------------------------
# Step (1) "checkable vs not checkable"
# -----------------------------

## 5) Step 1 + Step 2 ausführen (Checkable & Similarity)

- **Checkable**: Sensor steht in der Signatur (VariableName oder HardwareAddress)
- **Similarity**: Top-K ähnliche POUs nach gewichteter Ähnlichkeit


In [14]:
def classify_checkable_sensors(snapshot: SensorSnapshot, sig: RoutineSignature) -> Dict[str, str]:
    """
    Very first pragmatic classifier:
    - A sensor is "checkable" if it appears in the routine signature variable names
      (or if it matches a known hardware address in the signature).
    """
    checkable_set = set(sig.used_variable_names) | set(sig.hardware_addresses)

    out = {}
    for k in snapshot.sensor_values.keys():
        out[k] = "checkable" if k in checkable_set else "not_checkable"
    return out


# -----------------------------
# Example runner
# -----------------------------

In [15]:
# KG laden
kg = KGStore(TTL_PATH)

# Signatur des Zielprogramms extrahieren
extractor = SignatureExtractor(kg)
target_sig = extractor.extract_signature(TARGET_PROGRAM)

print("=== Target Signature (Kurz) ===")
print("POU:", target_sig.pou_name)
print("reachable_pous:", len(target_sig.reachable_pous))
print("used_variable_names:", len(target_sig.used_variable_names))
print("hardware_addresses:", len(target_sig.hardware_addresses))
print("called_pou_names:", len(target_sig.called_pou_names))

# Schritt 1: checkable vs not_checkable
snap = SensorSnapshot(program_name=TARGET_PROGRAM, sensor_values=SENSOR_SNAPSHOT)
check_map = classify_checkable_sensors(snap, target_sig)
print("\n=== Checkable classification ===")
for k, v in check_map.items():
    print(f"{k}: {v}")

# Schritt 2: Similarity Search
# Index einmalig bauen oder aus JSON laden
try:
    index = RoutineIndex.load_json(INDEX_JSON_PATH)
    print(f"\nLoaded index from: {INDEX_JSON_PATH}")
except Exception:
    index = RoutineIndex.build_from_kg(kg)
    print("\nBuilt index from KG (consider saving it for faster startup).")
    # optional:
    # index.save_json(INDEX_JSON_PATH)

top_k = 5
similar = index.find_similar(target_sig, top_k=top_k)

print(f"\n=== Top {top_k} similar routines ===")
for score, cand in similar:
    print(f"score={score:.3f}  pou={cand.pou_name}")


=== Target Signature (Kurz) ===
POU: MAIN
reachable_pous: 10
used_variable_names: 124
hardware_addresses: 0
called_pou_names: 9

=== Checkable classification ===
GVL.Start: checkable
GVL.Fehler: checkable
OPCUA.TriggerD1: checkable

Built index from KG (consider saving it for faster startup).

=== Top 5 similar routines ===
score=0.450  pou=MAIN
score=0.133  pou=FB_Automatikbetrieb_F1
score=0.097  pou=FB_ProduktionMitStoerung_D3
score=0.095  pou=FB_Betriebsarten
score=0.073  pou=FB_Diagnose_D2


## 6) Nächste Schritte (Integrationspunkte für LLM / Gemini Template)

Wenn du dein `plc_kg_gemini_text2sparql_template` nutzen willst, empfehle ich:

1. **Deterministische Basis behalten** (Signatur, Similarity, Checkable).
2. LLM nur für:
   - Hypothesen-Generierung (welche Fehler könnten vorliegen?)
   - zusätzliche Monitoring-Maßnahmen (welche Signale noch prüfen?)
   - Systemreaktionen + alternative Prozesspfade (Matching gegen JSON)

### Output-Schema (Vorschlag)
Du kannst das LLM zwingen, **nur JSON** zu liefern, z.B.:

- `fault_hypotheses[]` mit `name`, `rationale`, `required_signals[]`, `checkable_signals[]`, `not_checkable_signals[]`
- `monitoring_actions[]` mit `action`, `signals[]`, `time_window_ms`, `decision_rule`
- `reactions[]` mit `action`, `preconditions[]`, `actuators[]`, `fallbacks[]`

Dann validierst du die Vorschläge gegen deinen KG (Signale/Aktoren existieren wirklich).


In [None]:
# (Optional) LLM Hook – Pseudocode
# from your template: init_llm(), text2sparql(), etc.
#
# 1) build an "allowed signals" list from target_sig.used_variable_names + hardware_addresses
# 2) prompt LLM with snapshot + allowed signals + check_map
# 3) parse JSON output
# 4) validate each referenced signal/actuator against KGStore caches
#
# NOTE: Keep KG queries deterministic for robustness.
