# PLC KG ChatBot (Single Notebook)

Dieses Notebook baut einen **einzelnen** ChatBot auf deinem PLC Knowledge Graph (TTL/RDF) auf.

Design-Ziele:
- **Deterministisch wo m√∂glich** (Tools f√ºr Call-Graph, Variable-Info, Trace, Similarity)
- **LLM nur als Planner + Text2SPARQL-Fallback**
- **Guardrails**: nur SELECT, LIMIT erzwingen, Code-Fences strippen
- **Plan ‚Üí Execute ‚Üí Answer** Ablauf (debugbar)

Referenzen / Best Practices:
- Plan-and-Execute Agent Pattern (LangGraph)
- SPARQL QA Chains & SPARQL Extraction Helper (LangChain)
- Tool-Guardrails & Role-Isolation gegen Prompt-Injection


## 0) Installation (optional)
Wenn du lokal etwas vermisst, installiere hier die Dependencies.

In [1]:
# Optional: einmalig ausf√ºhren (lokal)
%pip install -U rdflib pandas ipywidgets langchain-core langchain-openai langchain-community pydantic
%pip install faiss-cpu langchain-openai

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## 1) Konfiguration
Passe die Pfade und Modelle an. Der Code versucht automatisch, eine TTL im selben Ordner oder unter /mnt/data zu finden.

In [2]:
from pathlib import Path

# === Pfad zur TTL-Datei ===
# 1) Lokal: setze hier deinen absoluten Pfad.
TTL_PATH = r"D:\MA_Python_Agent\MSRGuard_Anpassung\KGs\TestEvents.ttl"
filename = "TestEvents.ttl"

# 2) Autodetect (z.B. Sandbox)

print("TTL_PATH =", TTL_PATH)

# === Index-Datei (Similarity / Routine Index) ===
index_name = filename.replace(".ttl", "_routine_index.json")
INDEX_DIR = Path(r"D:\MA_Python_Agent\MSRGuard_Anpassung\KGs\ChatBotRoutinen")
INDEX_PATH = str(INDEX_DIR / index_name)
print("INDEX_PATH =", INDEX_PATH)

# === LLM Backend ===
# "openai" (via langchain_openai). Du kannst sp√§ter "gemini" erg√§nzen.
LLM_BACKEND = "openai"

# OpenAI (LangChain) Settings
OPENAI_MODEL = "gpt-4o-mini"
OPENAI_TEMPERATURE = 0

# Limits
MAX_SPARQL_ROWS = 200

TTL_PATH = D:\MA_Python_Agent\MSRGuard_Anpassung\KGs\TestEvents.ttl
INDEX_PATH = D:\MA_Python_Agent\MSRGuard_Anpassung\KGs\ChatBotRoutinen\TestEvents_routine_index.json


## 2) Graph laden

In [3]:
from rdflib import Graph

g = Graph()
g.parse(TTL_PATH, format="turtle")

print("‚úÖ Graph geladen")
print("Triples:", len(g))
print("Namespaces (Auszug):", list(g.namespaces())[:10])

‚úÖ Graph geladen
Triples: 2256
Namespaces (Auszug): [('brick', rdflib.term.URIRef('https://brickschema.org/schema/Brick#')), ('csvw', rdflib.term.URIRef('http://www.w3.org/ns/csvw#')), ('dc', rdflib.term.URIRef('http://purl.org/dc/elements/1.1/')), ('dcat', rdflib.term.URIRef('http://www.w3.org/ns/dcat#')), ('dcmitype', rdflib.term.URIRef('http://purl.org/dc/dcmitype/')), ('dcterms', rdflib.term.URIRef('http://purl.org/dc/terms/')), ('dcam', rdflib.term.URIRef('http://purl.org/dc/dcam/')), ('doap', rdflib.term.URIRef('http://usefulinc.com/ns/doap#')), ('foaf', rdflib.term.URIRef('http://xmlns.com/foaf/0.1/')), ('geo', rdflib.term.URIRef('http://www.opengis.net/ont/geosparql#'))]


## 3) Schema Card (kompakte KG-√úbersicht)
Diese √úbersicht geht in Planner und Text2SPARQL Prompt.

In [4]:
from collections import Counter
from rdflib.namespace import RDF

def schema_card(graph: Graph, top_n: int = 15) -> str:
    pred_counts = Counter()
    type_counts = Counter()

    for s, p, o in graph:
        try:
            pred_counts[graph.qname(p)] += 1
        except Exception:
            pred_counts[str(p)] += 1

        if p == RDF.type:
            try:
                type_counts[graph.qname(o)] += 1
            except Exception:
                type_counts[str(o)] += 1

    lines = []
    lines.append("TOP CLASSES (rdf:type):")
    for k, v in type_counts.most_common(top_n):
        lines.append(f"  - {k}: {v}")
    lines.append("")
    lines.append("TOP PROPERTIES:")
    for k, v in pred_counts.most_common(top_n):
        lines.append(f"  - {k}: {v}")
    return "\n".join(lines)

SCHEMA_CARD = schema_card(g, top_n=15)
print(SCHEMA_CARD[:2000])

TOP CLASSES (rdf:type):
  - ag:class_Variable: 147
  - ag:class_Port: 81
  - ag:class_ParameterAssignment: 56
  - ag:class_FBInstance: 51
  - owl:NamedIndividual: 50
  - owl:DatatypeProperty: 35
  - owl:ObjectProperty: 31
  - ag:class_POUCall: 27
  - ag:class_SignalSource: 23
  - owl:Class: 19
  - ag:class_FBType: 17
  - ag:class_PortInstance: 16
  - ag:class_StandardFBType: 10
  - ag:class_SourceLiteral: 7
  - ag:class_CustomFBType: 7

TOP PROPERTIES:
  - rdf:type: 587
  - dp:hasVariableName: 170
  - dp:hasVariableType: 147
  - op:usesVariable: 124
  - op:hasInternalVariable: 124
  - op:hasPort: 81
  - dp:hasPortDirection: 81
  - dp:hasPortType: 81
  - dp:hasPortName: 81
  - op:hasAssignment: 56
  - op:assignsFrom: 56
  - rdfs:domain: 55
  - rdfs:range: 55
  - op:assignsToPort: 52
  - op:isInstanceOfFBType: 51


## 4) Tools mit SPARQL Helper (Guardrails)
- nur SELECT
- blockt UPDATE/Service
- erzwingt LIMIT
- Ergebnisse als Liste von Dicts
- Tools beantworten typische Fragen

Wenn ein LLM SPARQL in Codebl√∂cke packt, extrahieren wir es robust.


In [5]:
import re
import inspect
import json
from abc import ABC, abstractmethod
from typing import Any, Callable, Dict, List, Optional
from rdflib import Graph

# ==========================================
# 1. SPARQL HELPER & GUARDRAILS
# ==========================================

DEFAULT_PREFIXES = """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ag:  <http://www.semanticweb.org/AgentProgramParams/>
PREFIX dp:  <http://www.semanticweb.org/AgentProgramParams/dp_>
PREFIX op:  <http://www.semanticweb.org/AgentProgramParams/op_>
"""

def _normalize_ws(s: str) -> str:
    return re.sub(r"\s+", " ", s).strip()

def enforce_select_only(query: str, max_limit: int = 200) -> str:
    """Verhindert gef√§hrliche Operationen und erzwingt LIMIT."""
    q = query.strip()
    q_u = _normalize_ws(q).upper()

    if not (q_u.startswith("PREFIX") or q_u.startswith("SELECT")):
        raise ValueError("Only SELECT queries are allowed (optionally with PREFIX).")

    forbidden = [
        "INSERT","DELETE","LOAD","CLEAR","CREATE","DROP","MOVE","COPY","ADD",
        "SERVICE","WITH","USING","GRAPH"
    ]
    for kw in forbidden:
        if re.search(rf"\b{kw}\b", q_u):
            raise ValueError(f"Forbidden SPARQL keyword detected: {kw}")

    m = re.search(r"\bLIMIT\s+(\d+)\b", q_u)
    if m:
        lim = int(m.group(1))
        if lim > max_limit:
            q = re.sub(r"(?i)\bLIMIT\s+\d+\b", f"LIMIT {max_limit}", q)
    else:
        q = q.rstrip() + f"\nLIMIT {max_limit}\n"
    return q

def strip_code_fences(text: str) -> str:
    t = text.strip()
    t = re.sub(r"^```[a-zA-Z]*\s*", "", t)
    t = re.sub(r"\s*```$", "", t)
    return t.strip()

# Versuch, LangChain Helper zu laden (optional)
try:
    from langchain_community.chains.graph_qa.neptune_sparql import extract_sparql as lc_extract_sparql
except Exception:
    lc_extract_sparql = None

def extract_sparql_from_llm(text: str) -> str:
    """Extrahiert reinen SPARQL Code aus einer LLM Antwort."""
    if lc_extract_sparql is not None:
        try:
            return lc_extract_sparql(text).strip()
        except Exception:
            pass
    t = strip_code_fences(text)
    m = re.search(r"(SELECT\s+.*)", t, flags=re.IGNORECASE | re.DOTALL)
    return (m.group(1).strip() if m else t)

def sparql_select_raw(query: str, max_rows: int = 200) -> List[Dict[str, Any]]:
    """
    F√ºhrt die Query auf dem globalen Graph 'g' aus.
    F√ºgt Prefixes hinzu und formatiert das Ergebnis als Liste von Dicts.
    """
    # Zugriff auf globale Variable 'g' (rdflib.Graph)
    if 'g' not in globals():
        raise RuntimeError("Global graph 'g' not found via globals().")
    
    q = query.strip()
    if "PREFIX" not in q.upper():
        q = DEFAULT_PREFIXES + "\n" + q
    
    # Guardrails anwenden
    q = enforce_select_only(q, max_limit=max_rows)

    res = g.query(q)
    vars_ = [str(v) for v in res.vars]

    out: List[Dict[str, Any]] = []
    for row in res:
        item = {}
        for i, v in enumerate(vars_):
            val = row[i]
            item[v] = None if val is None else str(val)
        out.append(item)
    return out


# ==========================================
# 2. AGENT TOOLS (COMMAND PATTERN)
# ==========================================

class BaseAgentTool(ABC):
    """Abstrakte Basisklasse f√ºr alle Tools."""
    name: str = ""
    description: str = ""
    usage_guide: str = ""

    def get_prompt_signature(self) -> str:
        sig = inspect.signature(self.run)
        params = [
            f"{k}" 
            for k, v in sig.parameters.items() 
            if k != "self" and v.kind != inspect.Parameter.VAR_KEYWORD
        ]
        return f"{self.name}({', '.join(params)})"

    def get_documentation(self) -> str:
        return (
            f"- {self.get_prompt_signature()}\n"
            f"  Beschreibung: {self.description}\n"
            f"  Wann nutzen: {self.usage_guide}\n"
        )

    @abstractmethod
    def run(self, **kwargs) -> Any:
        pass


class ListProgramsTool(BaseAgentTool):
    name = "list_programs"
    description = "Listet alle verf√ºgbaren Programme im Projekt auf."
    usage_guide = "Wenn der User fragt 'Welche Programme gibt es?' oder einen Einstiegspunkt sucht."

    def run(self, **kwargs) -> List[Dict[str, Any]]:
        q = """
        SELECT ?programName WHERE {
          ?program rdf:type ag:class_Program ;
                   dp:hasProgramName ?programName .
        } ORDER BY ?programName
        """
        return sparql_select_raw(q)


class CalledPousTool(BaseAgentTool):
    name = "called_pous"
    description = "Zeigt alle POUs, die von einem Programm aufgerufen werden."
    usage_guide = "Bei Fragen nach Call-Graph, Struktur, 'Wer ruft wen auf?'."

    def run(self, program_name: str, **kwargs) -> List[Dict[str, Any]]:
        q = f"""
        SELECT DISTINCT ?calleeName WHERE {{
          ?program rdf:type ag:class_Program ;
                   dp:hasProgramName "{program_name}" ;
                   op:containsPOUCall ?call .
          ?call op:callsPOU ?callee .
          OPTIONAL {{ ?callee dp:hasPOUName ?calleeName }}
        }} ORDER BY ?calleeName
        """
        return sparql_select_raw(q)


class PouCodeTool(BaseAgentTool):
    name = "pou_code"
    description = "Holt ST-Code, Sprache und Report einer POU."
    usage_guide = "Wenn User nach 'Code', 'Implementierung' oder 'Inhalt' fragt."

    def run(self, pou_name: str, **kwargs) -> List[Dict[str, Any]]:
        q = f"""
        SELECT ?lang ?code ?report WHERE {{
          ?pou dp:hasPOUName "{pou_name}" .
          OPTIONAL {{ ?pou dp:hasPOULanguage ?lang }}
          OPTIONAL {{ ?pou dp:hasPOUCode ?code }}
          OPTIONAL {{ ?pou dp:hasConsistencyReport ?report }}
        }}
        """
        return sparql_select_raw(q)


class SearchVariablesTool(BaseAgentTool):
    name = "search_variables"
    description = "Sucht Variablen nach Name (Teilstring)."
    usage_guide = "Fragen nach 'Variable', 'Adresse', 'I/O' oder Signalnamen."

    def run(self, name_contains: str, **kwargs) -> List[Dict[str, Any]]:
        needle = name_contains.replace('"', '\\"')
        q = f"""
        SELECT DISTINCT ?name ?type ?addr WHERE {{
          ?var rdf:type ag:class_Variable ;
               dp:hasVariableName ?name ;
               dp:hasVariableType ?type .
          FILTER(CONTAINS(LCASE(STR(?name)), LCASE("{needle}")))
          OPTIONAL {{ ?var dp:hasHardwareAddress ?addr }}
        }} ORDER BY ?name
        """
        return sparql_select_raw(q)


class VariableTraceTool(BaseAgentTool):
    name = "variable_trace"
    description = "Analysiert Schreib-/Lesezugriffe auf Variablen (Data Flow)."
    usage_guide = "Fragen wie 'Woher kommt Signal X?', 'Wer nutzt Variable Y?'."

    def run(self, name_contains: str, **kwargs) -> List[Dict[str, Any]]:
        needle = name_contains.replace('"', '\\"')
        q = f"""
        SELECT DISTINCT ?varName ?exprText ?calleeName WHERE {{
          ?var rdf:type ag:class_Variable ;
               dp:hasVariableName ?varName .
          FILTER(CONTAINS(LCASE(STR(?varName)), LCASE("{needle}")))
          
          OPTIONAL {{
            ?expr rdf:type ag:class_Expression ;
                  dp:hasExpressionText ?exprText ;
                  op:isExpressionCreatedBy ?var .
            OPTIONAL {{
              ?assign rdf:type ag:class_ParameterAssignment ;
                      op:assignsFrom ?expr .
              OPTIONAL {{
                ?pouCall rdf:type ag:class_POUCall ;
                         op:hasAssignment ?assign ;
                         op:callsPOU ?callee .
                OPTIONAL {{ ?callee dp:hasPOUName ?calleeName }}
              }}
            }}
          }}
        }}
        """
        return sparql_select_raw(q)

class PouCallersTool(BaseAgentTool):
    name = "pou_callers"
    description = "Findet heraus, von welchen Programmen oder FBs eine POU aufgerufen wird (Reverse Call Graph)."
    # Hier f√ºgen wir 'Was macht...' hinzu, damit der Planner anspringt
    usage_guide = "Nutzen bei Fragen wie 'Wer nutzt X?', 'Wo wird X verwendet?' oder allgemein 'Was macht X?' (um den Kontext zu zeigen)."

    def run(self, pou_name: str, **kwargs) -> List[Dict[str, Any]]:
        # Wir suchen alle POUs (?caller), die einen Call (?call) beinhalten,
        # der auf unsere Ziel-POU (?target) zeigt.
        q = f"""
        SELECT DISTINCT ?callerName WHERE {{
          ?targetPou dp:hasPOUName "{pou_name}" .
          ?call op:callsPOU ?targetPou .
          
          ?caller op:containsPOUCall ?call ;
                  dp:hasPOUName ?callerName .
        }} ORDER BY ?callerName
        """
        return sparql_select_raw(q)
    
class ExceptionAnalysisTool(BaseAgentTool):
    name = "exception_prep"
    description = "Analysiert einen Snapshot gegen Routine-Signaturen."
    usage_guide = "Bei konkreten Sensorwerten oder 'Fehlerbild'."

    def __init__(self, kg_store, index):
        self.kg = kg_store
        self.index = index

    def run(self, program_name: str, snapshot: Dict[str, Any], top_k: int = 5, **kwargs) -> Dict[str, Any]:
        # Hinweis: SignatureExtractor/SensorSnapshot Klassen m√ºssen im Notebook definiert sein
        extractor = SignatureExtractor(self.kg)
        try:
            sig = extractor.extract_signature(program_name)
        except ValueError as e:
            return {"error": str(e)}
            
        snap = SensorSnapshot(program_name=program_name, sensor_values=snapshot)
        check_map = classify_checkable_sensors(snap, sig)
        similar = self.index.find_similar(sig, top_k=top_k)
        
        return {
            "signature": sig.as_dict(),
            "checkable": check_map,
            "similar": similar,
        }


class Text2SparqlTool(BaseAgentTool):
    name = "text2sparql_select"
    description = "Generiert und f√ºhrt SPARQL SELECT aus (Fallback)."
    usage_guide = "NUR nutzen, wenn kein anderes Tool passt."

    def __init__(self, llm_invoke_fn: Callable, schema_card_text: str):
        self.llm_invoke = llm_invoke_fn
        self.schema_card = schema_card_text

    def run(self, question: str, max_rows: int = 50, **kwargs) -> Dict[str, Any]:
        system_prompt = f"""
        Du bist ein SPARQL-Generator.
        Regeln: Nur SELECT, Prefixes nutzen (rdf, ag, dp, op).
        Schema:
        {self.schema_card}
        """
        raw = self.llm_invoke(system_prompt, question)
        # Hier nutzen wir jetzt deine Helper-Funktionen:
        q = extract_sparql_from_llm(raw)
        rows = sparql_select_raw(q, max_rows=max_rows)
        return {"sparql": q, "rows": rows}


# ==========================================
# 3. REGISTRY & SETUP
# ==========================================

class ToolRegistry:
    def __init__(self):
        self._tools: Dict[str, BaseAgentTool] = {}

    def register(self, tool: BaseAgentTool):
        self._tools[tool.name] = tool

    def get_system_prompt_part(self) -> str:
        parts = [t.get_documentation() for t in self._tools.values()]
        return "Verf√ºgbare Tools:\n" + "".join(parts)

    def execute(self, tool_name: str, args: Dict[str, Any]) -> Any:
        tool = self._tools.get(tool_name)
        if not tool:
            return {"error": f"Tool '{tool_name}' not found."}
        try:
            return tool.run(**args)
        except Exception as e:
            return {"error": f"Error in '{tool_name}': {e}"}

# Init
registry = ToolRegistry()

# Komplexe Tools (ben√∂tigen Objekte aus vorherigen Zellen)
if 'kg' in globals() and 'routine_index' in globals():
    registry.register(ExceptionAnalysisTool(kg, routine_index))

if 'llm_invoke' in globals() and 'SCHEMA_CARD' in globals():
    registry.register(Text2SparqlTool(llm_invoke, SCHEMA_CARD))



In [6]:
from dataclasses import dataclass
from typing import Optional, Set, Tuple
from rdflib import URIRef, Literal, Namespace
from rdflib.namespace import RDF

AG = Namespace("http://www.semanticweb.org/AgentProgramParams/")
DP = Namespace("http://www.semanticweb.org/AgentProgramParams/dp_")
OP = Namespace("http://www.semanticweb.org/AgentProgramParams/op_")

@dataclass
class SensorSnapshot:
    program_name: str
    sensor_values: Dict[str, Any]

@dataclass
class RoutineSignature:
    pou_name: str
    reachable_pous: List[str]
    called_pou_names: List[str]
    used_variable_names: List[str]
    hardware_addresses: List[str]
    port_names: List[str]

    def as_dict(self) -> Dict[str, Any]:
        return {
            "pou_name": self.pou_name,
            "reachable_pous": self.reachable_pous,
            "called_pou_names": self.called_pou_names,
            "used_variable_names": self.used_variable_names,
            "hardware_addresses": self.hardware_addresses,
            "port_names": self.port_names,
        }

class KGStore:
    def __init__(self, graph: Graph):
        self.g = graph
        self._pou_by_name: Dict[str, URIRef] = {}
        self._build_cache()

    def _build_cache(self) -> None:
        for pou, _, name in self.g.triples((None, DP.hasPOUName, None)):
            if isinstance(name, Literal):
                self._pou_by_name[str(name)] = pou

    def pou_uri_by_name(self, pou_name: str) -> Optional[URIRef]:
        return self._pou_by_name.get(pou_name)

    def pou_name(self, pou_uri: URIRef) -> str:
        v = self.g.value(pou_uri, DP.hasPOUName)
        return str(v) if v else str(pou_uri)

    def get_reachable_pous(self, root_pou_uri: URIRef) -> Set[URIRef]:
        visited: Set[URIRef] = set()
        queue: List[URIRef] = [root_pou_uri]
        while queue:
            cur = queue.pop(0)
            if cur in visited:
                continue
            visited.add(cur)
            for call in self.g.objects(cur, OP.containsPOUCall):
                for called in self.g.objects(call, OP.callsPOU):
                    if isinstance(called, URIRef) and called not in visited:
                        queue.append(called)
        return visited

    def get_called_pous(self, pou_uri: URIRef) -> Set[URIRef]:
        called: Set[URIRef] = set()
        for call in self.g.objects(pou_uri, OP.containsPOUCall):
            for target in self.g.objects(call, OP.callsPOU):
                if isinstance(target, URIRef):
                    called.add(target)
        return called

    def get_used_variables(self, pou_uri: URIRef) -> Set[URIRef]:
        vars_: Set[URIRef] = set()
        for v in self.g.objects(pou_uri, OP.usesVariable):
            if isinstance(v, URIRef):
                vars_.add(v)
        for v in self.g.objects(pou_uri, OP.hasInternalVariable):
            if isinstance(v, URIRef):
                vars_.add(v)
        return vars_

    def get_variable_names(self, var_uri: URIRef) -> Set[str]:
        names: Set[str] = set()
        for _, _, name in self.g.triples((var_uri, DP.hasVariableName, None)):
            if isinstance(name, Literal):
                names.add(str(name))
        return names

    def get_hardware_address(self, var_uri: URIRef) -> Optional[str]:
        v = self.g.value(var_uri, DP.hasHardwareAddress)
        return str(v) if v else None

    def get_ports_of_pou(self, pou_uri: URIRef) -> Set[URIRef]:
        ports: Set[URIRef] = set()
        for p in self.g.objects(pou_uri, OP.hasPort):
            if isinstance(p, URIRef):
                ports.add(p)
        return ports

    def get_port_name(self, port_uri: URIRef) -> str:
        v = self.g.value(port_uri, DP.hasPortName)
        return str(v) if v else ""

kg = KGStore(g)

def tool_list_programs() -> List[Dict[str, Any]]:
    q = """
    SELECT ?programName WHERE {
      ?program rdf:type ag:class_Program ;
               dp:hasProgramName ?programName .
    } ORDER BY ?programName
    """
    return sparql_select_raw(q, max_rows=MAX_SPARQL_ROWS)

def tool_get_program_overview(program_name: str) -> List[Dict[str, Any]]:
    q = f"""
    SELECT ?report WHERE {{
      ?program rdf:type ag:class_Program ;
               dp:hasProgramName \"{program_name}\" .
      OPTIONAL {{ ?program dp:hasConsistencyReport ?report }}
    }}
    """
    return sparql_select_raw(q, max_rows=MAX_SPARQL_ROWS)

def tool_get_called_pous(program_name: str) -> List[Dict[str, Any]]:
    q = f"""
    SELECT DISTINCT ?calleeName WHERE {{
      ?program rdf:type ag:class_Program ;
               dp:hasProgramName \"{program_name}\" ;
               op:containsPOUCall ?call .
      ?call op:callsPOU ?callee .
      OPTIONAL {{ ?callee dp:hasPOUName ?calleeName }}
    }} ORDER BY ?calleeName
    """
    return sparql_select_raw(q, max_rows=MAX_SPARQL_ROWS)

def tool_get_pou_code(pou_name: str) -> List[Dict[str, Any]]:
    q = f"""
    SELECT ?lang ?code ?report WHERE {{
      ?pou dp:hasPOUName \"{pou_name}\" .
      OPTIONAL {{ ?pou dp:hasPOULanguage ?lang }}
      OPTIONAL {{ ?pou dp:hasPOUCode ?code }}
      OPTIONAL {{ ?pou dp:hasConsistencyReport ?report }}
    }}
    """
    return sparql_select_raw(q, max_rows=MAX_SPARQL_ROWS)

def tool_search_variables(name_contains: str) -> List[Dict[str, Any]]:
    needle = name_contains.replace('"', '\\"')
    q = f"""
    SELECT DISTINCT ?name ?type ?addr WHERE {{
      ?var rdf:type ag:class_Variable ;
           dp:hasVariableName ?name ;
           dp:hasVariableType ?type .
      FILTER(CONTAINS(LCASE(STR(?name)), LCASE(\"{needle}\")))
      OPTIONAL {{ ?var dp:hasHardwareAddress ?addr }}
    }} ORDER BY ?name
    """
    return sparql_select_raw(q, max_rows=MAX_SPARQL_ROWS)

def tool_get_variable_trace(name_contains: str) -> List[Dict[str, Any]]:
    needle = name_contains.replace('"', '\\"')
    q = f"""
    SELECT DISTINCT ?varName ?exprText ?calleeName WHERE {{
      ?var rdf:type ag:class_Variable ;
           dp:hasVariableName ?varName .
      FILTER(CONTAINS(LCASE(STR(?varName)), LCASE(\"{needle}\")))

      OPTIONAL {{
        ?expr rdf:type ag:class_Expression ;
              dp:hasExpressionText ?exprText ;
              op:isExpressionCreatedBy ?var .
        OPTIONAL {{
          ?assign rdf:type ag:class_ParameterAssignment ;
                  op:assignsFrom ?expr .
          OPTIONAL {{
            ?pouCall rdf:type ag:class_POUCall ;
                     op:hasAssignment ?assign ;
                     op:callsPOU ?callee .
            OPTIONAL {{ ?callee dp:hasPOUName ?calleeName }}
          }}
        }}
      }}
    }}
    """
    return sparql_select_raw(q, max_rows=MAX_SPARQL_ROWS)



## 6) Routine-Signaturen + Similarity Index
Speichert Signaturen in einer JSON-Datei neben der TTL, damit Similarity Checks schnell sind.

In [7]:
import json
from pathlib import Path

def jaccard(a: Set[str], b: Set[str]) -> float:
    if not a and not b:
        return 0.0
    inter = len(a & b)
    union = len(a | b)
    return inter / union if union else 0.0

class SignatureExtractor:
    def __init__(self, kg: KGStore):
        self.kg = kg

    def extract_signature(self, pou_name: str) -> RoutineSignature:
        pou_uri = self.kg.pou_uri_by_name(pou_name)
        if pou_uri is None:
            raise ValueError(f"POU '{pou_name}' not found in KG.")

        reachable = self.kg.get_reachable_pous(pou_uri)

        reachable_names: Set[str] = set()
        called_names: Set[str] = set()
        used_var_names: Set[str] = set()
        hw_addrs: Set[str] = set()
        port_names: Set[str] = set()

        for rp in reachable:
            reachable_names.add(self.kg.pou_name(rp))
            for callee in self.kg.get_called_pous(rp):
                called_names.add(self.kg.pou_name(callee))
            for var in self.kg.get_used_variables(rp):
                used_var_names |= self.kg.get_variable_names(var)
                ha = self.kg.get_hardware_address(var)
                if ha:
                    hw_addrs.add(ha)
            for port in self.kg.get_ports_of_pou(rp):
                pn = self.kg.get_port_name(port)
                if pn:
                    port_names.add(pn)

        return RoutineSignature(
            pou_name=pou_name,
            reachable_pous=sorted(reachable_names),
            called_pou_names=sorted(called_names),
            used_variable_names=sorted(used_var_names),
            hardware_addresses=sorted(hw_addrs),
            port_names=sorted(port_names),
        )

class RoutineIndex:
    def __init__(self, signatures: List[RoutineSignature]):
        self.signatures = signatures

    def save(self, path: str) -> None:
        Path(path).write_text(
            json.dumps([s.as_dict() for s in self.signatures], indent=2, ensure_ascii=False),
            encoding="utf-8"
        )

    @staticmethod
    def load(path: str) -> "RoutineIndex":
        data = json.loads(Path(path).read_text(encoding="utf-8"))
        sigs = [RoutineSignature(**d) for d in data]
        return RoutineIndex(sigs)

    @staticmethod
    def build_from_kg(kg: KGStore, only_pous: Optional[List[str]] = None) -> "RoutineIndex":
        extractor = SignatureExtractor(kg)
        if only_pous is None:
            only_pous = sorted(kg._pou_by_name.keys())

        sigs: List[RoutineSignature] = []
        for name in only_pous:
            try:
                sigs.append(extractor.extract_signature(name))
            except Exception:
                pass
        return RoutineIndex(sigs)

    def find_similar(self, target: RoutineSignature, top_k: int = 5) -> List[Dict[str, Any]]:
        tgt_hw = set(target.hardware_addresses)
        tgt_vars = set(target.used_variable_names)
        tgt_called = set(target.called_pou_names)

        scored: List[Tuple[float, RoutineSignature]] = []
        for cand in self.signatures:
            cand_hw = set(cand.hardware_addresses)
            cand_vars = set(cand.used_variable_names)
            cand_called = set(cand.called_pou_names)

            sim_hw = jaccard(tgt_hw, cand_hw) if (tgt_hw or cand_hw) else 0.0
            sim_vars = jaccard(tgt_vars, cand_vars)
            sim_called = jaccard(tgt_called, cand_called)

            score = 0.55 * sim_hw + 0.25 * sim_vars + 0.20 * sim_called
            scored.append((score, cand))

        scored.sort(key=lambda x: x[0], reverse=True)
        return [{"score": round(s, 4), "pou_name": r.pou_name} for s, r in scored[:top_k]]

def classify_checkable_sensors(snapshot: SensorSnapshot, sig: RoutineSignature) -> Dict[str, str]:
    checkable_set = set(sig.used_variable_names) | set(sig.hardware_addresses)
    return {k: ("checkable" if k in checkable_set else "not_checkable") for k in snapshot.sensor_values.keys()}

# Build / Load index
from pathlib import Path
import json
from json import JSONDecodeError

p = Path(INDEX_PATH)

def try_load_index(path: Path):
    try:
        if not path.exists() or path.stat().st_size == 0:
            return None
        # BOM-sicher + Whitespace entfernen
        raw = path.read_text(encoding="utf-8-sig").strip()
        if not raw:
            return None
        data = json.loads(raw)
        sigs = [RoutineSignature(**d) for d in data]
        return RoutineIndex(sigs)
    except (JSONDecodeError, UnicodeError):
        return None

routine_index = try_load_index(p)
if routine_index is None:
    routine_index = RoutineIndex.build_from_kg(kg)
    routine_index.save(str(p))
    print("‚úÖ RoutineIndex neu gebaut & gespeichert:", p)
else:
    print("‚úÖ RoutineIndex geladen:", p)

‚úÖ RoutineIndex geladen: D:\MA_Python_Agent\MSRGuard_Anpassung\KGs\ChatBotRoutinen\TestEvents_routine_index.json


## 7) LLM Setup
Planner + Text2SPARQL + Answerer.

In [8]:
import os
from typing import Callable

# === NEU: API Key einlesen ===
# Wir lesen den Key aus deiner Datei und setzen ihn als Umgebungsvariable.
key_path = r"C:\Users\Alexander Verkhov\Desktop\OpenAI API Key.txt"

try:
    with open(key_path, "r", encoding="utf-8") as f:
        # .strip() entfernt Leerzeichen/Zeilenumbr√ºche am Anfang/Ende
        api_key = f.read().strip()
        os.environ["OPENAI_API_KEY"] = api_key
    print("‚úÖ OpenAI API Key erfolgreich aus Datei geladen.")
except Exception as e:
    print(f"‚ùå Fehler beim Laden des API Keys: {e}")
    # Optional: Abbruch, falls Key fehlt
    # raise e

def get_llm_invoke() -> Callable[[str, str], str]:
    if LLM_BACKEND == "openai":
        try:
            from langchain_openai import ChatOpenAI
            from langchain_core.messages import SystemMessage, HumanMessage
        except Exception as e:
            raise RuntimeError(
                "Bitte installiere langchain-openai + langchain-core.\n"
                "pip install -U langchain-openai langchain-core"
            ) from e

        # ChatOpenAI greift nun automatisch auf os.environ["OPENAI_API_KEY"] zu
        llm = ChatOpenAI(
            model=OPENAI_MODEL, 
            temperature=OPENAI_TEMPERATURE, 
            max_tokens=1200
        )

        def _invoke(system: str, user: str) -> str:
            msgs = [SystemMessage(content=system), HumanMessage(content=user)]
            return llm.invoke(msgs).content

        return _invoke

    raise ValueError("LLM_BACKEND nicht unterst√ºtzt. Setze LLM_BACKEND='openai' oder erweitere den Wrapper.")

llm_invoke = get_llm_invoke()
print("‚úÖ LLM Wrapper bereit:", LLM_BACKEND, OPENAI_MODEL)

‚úÖ OpenAI API Key erfolgreich aus Datei geladen.
‚úÖ LLM Wrapper bereit: openai gpt-4o-mini


## 8) Text2SPARQL (Fallback)

In [9]:
TEXT2SPARQL_SYSTEM = f"""
Du erzeugst ausschlie√ülich eine SPARQL SELECT Query f√ºr einen RDF Knowledge Graph eines SPS Programms.
Regeln:
- Gib NUR SPARQL zur√ºck (keine Erkl√§rung, kein Markdown).
- Nur SELECT (kein INSERT/DELETE/UPDATE, kein SERVICE).
- Nutze die Prefixes: rdf, ag, dp, op.
Schema Card:
{SCHEMA_CARD}
"""

def text2sparql(question: str) -> str:
    raw = llm_invoke(TEXT2SPARQL_SYSTEM, question)
    return extract_sparql_from_llm(raw).strip()

def tool_text2sparql_select(question: str, max_rows: int = 50) -> Dict[str, Any]:
    q = text2sparql(question)
    rows = sparql_select_raw(q, max_rows=max_rows)
    return {"sparql": q, "rows": rows}

In [10]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
import json

# ==========================================
# 1. RAG / SEMANTIC SEARCH SETUP
# ==========================================

def build_vector_index(kg_store, tool_registry):
    """
    Erstellt einen FAISS Index aus POU-Namen und Code-Snippets.
    Nutzt die Registry, um den Code zu holen.
    """
    print("üîÑ Baue Vektor-Index auf...")
    docs = []
    
    # Wir iterieren √ºber alle bekannten POUs im KG
    for pou_name in kg_store._pou_by_name.keys():
        try:
            # Code √ºber das existierende Tool holen
            code_res = tool_registry.execute("pou_code", {"pou_name": pou_name})
            
            # Pr√ºfen ob Ergebnis g√ºltig ist
            if isinstance(code_res, list) and code_res and "code" in code_res[0]:
                code_text = code_res[0]["code"]
                if code_text:
                    # Dokument erstellen: Name + Code
                    # Wir k√ºrzen den Code auf 1000 Zeichen f√ºr das Embedding
                    content = f"POU Name: {pou_name}\nCode Content: {code_text[:1000]}"
                    meta = {"type": "POU", "name": pou_name}
                    docs.append(Document(page_content=content, metadata=meta))
        except Exception:
            pass # Fehlerhafte POUs √ºberspringen

    if not docs:
        print("‚ö†Ô∏è Keine Dokumente f√ºr RAG gefunden.")
        return None

    # Embeddings initialisieren
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
    vector_store = FAISS.from_documents(docs, embeddings)
    print(f"‚úÖ Vektor-Index bereit mit {len(docs)} Dokumenten.")
    return vector_store

# Index einmalig bauen (ben√∂tigt 'kg' aus Cell 21 und 'registry' aus Cell 20)
if 'kg' in globals() and 'registry' in globals():
    vector_index = build_vector_index(kg, registry)
else:
    print("‚ö†Ô∏è 'kg' oder 'registry' nicht gefunden. Bitte vorherige Zellen ausf√ºhren!")
    vector_index = None

# Neues Tool f√ºr RAG definieren
class SemanticSearchTool(BaseAgentTool):
    name = "semantic_search"
    description = "Sucht semantisch nach POUs oder Logik anhand von Beschreibungen (RAG)."
    usage_guide = "Wenn der User vage Beschreibungen nutzt (z.B. 'Wie funktioniert der Not-Halt?') und den exakten Namen nicht kennt. Auch gut als Fallback."

    def __init__(self, vector_store):
        self.vs = vector_store

    def run(self, query: str, k: int = 3, **kwargs) -> List[Dict[str, Any]]:
        if not self.vs:
            return [{"error": "Kein Vektor-Index verf√ºgbar."}]
        
        docs = self.vs.similarity_search(query, k=k)
        results = []
        for d in docs:
            results.append({
                "pou_name": d.metadata.get("name"),
                "snippet": d.page_content[:300] + "..."
            })
        return results
    
class GeneralSearchTool(BaseAgentTool):
    name = "general_search"
    description = "Sucht universell nach POUs, Variablen oder Ports. Gibt Typ und Name zur√ºck."
    usage_guide = "Nutzen, wenn unklar ist, ob ein Name eine POU, eine Variable oder ein Port ist (z.B. bei Punkten im Namen)."

    def run(self, name_contains: str, **kwargs) -> List[Dict[str, Any]]:
        # Wir suchen nach dem genauen String UND nach der __dot__ Variante f√ºr URIs
        needle = name_contains.replace('"', '\\"')
        
        # Falls der User nach URIs fragt (z.B. Debugging), bauen wir __dot__ ein
        needle_dot = needle.replace(".", "__dot__")
        
        q = f"""
        SELECT DISTINCT ?name ?type ?category WHERE {{
          {{
            ?s rdf:type ag:class_POU ;
               dp:hasPOUName ?name .
            BIND("POU" AS ?category)
          }}
          UNION
          {{
            ?s rdf:type ag:class_Variable ;
               dp:hasVariableName ?name ;
               dp:hasVariableType ?type .
            BIND("Variable" AS ?category)
          }}
          UNION
          {{
            ?s rdf:type ag:class_Port ;
               dp:hasPortName ?name ;
               dp:hasPortType ?type .
            BIND("Port" AS ?category)
          }}
          
          # Suche sowohl nach normalem Namen als auch URI-Teilen
          FILTER(
            CONTAINS(LCASE(STR(?name)), LCASE("{needle}")) || 
            CONTAINS(LCASE(STR(?s)), LCASE("{needle_dot}"))
          )
        }} LIMIT 20
        """
        return sparql_select_raw(q)


class StringTripleSearchTool(BaseAgentTool):
    name = "string_triple_search"
    description = "Sucht einen String als Substring in allen Tripeln (Subject, Predicate, Object)."
    usage_guide = "Letzter Fallback, wenn strukturierte Tools keine Treffer liefern."

    def __init__(self, kg_store=None):
        self.kg_store = kg_store if kg_store is not None else globals().get("kg", None)
        self.graph = getattr(self.kg_store, "g", None) if self.kg_store is not None else globals().get("g", None)

    def _short_pred(self, p) -> str:
        s = str(p)
        if "#" in s:
            return s.split("#")[-1]
        return s.rstrip("/").split("/")[-1]

    def _resolve_pou_name(self, s) -> Optional[str]:
        try:
            DP_ns = globals().get("DP", None)
            if self.graph is not None and DP_ns is not None:
                v = self.graph.value(s, DP_ns.hasPOUName)
                if v is not None:
                    return str(v)
        except Exception:
            pass
        return None

    def run(
        self,
        term: str,
        max_hits: int = 20,
        context_lines: int = 2,
        only_predicates: Optional[List[str]] = None,
        **kwargs
    ) -> List[Dict[str, Any]]:
        if not self.graph:
            return [{"error": "Global graph not found. Run the graph loading cell first."}]

        import re
        from rdflib.term import Literal

        pat = re.compile(re.escape(term), re.IGNORECASE)
        pred_allow = set(only_predicates) if only_predicates else None

        results: List[Dict[str, Any]] = []
        for s, p, o in self.graph:
            p_short = self._short_pred(p)

            if pred_allow is not None and p_short not in pred_allow:
                continue

            in_s = bool(pat.search(str(s)))
            in_p = bool(pat.search(str(p)))
            in_o = bool(pat.search(str(o)))

            if not (in_s or in_p or in_o):
                continue

            item: Dict[str, Any] = {
                "subject": str(s),
                "subject_pou_name": self._resolve_pou_name(s),
                "predicate": str(p),
                "predicate_short": p_short,
                "match_in": [k for k, v in (("subject", in_s), ("predicate", in_p), ("object", in_o)) if v],
            }

            if isinstance(o, Literal):
                text = str(o)
                lines = text.splitlines()
                ctxs = []

                for i, line in enumerate(lines):
                    if pat.search(line):
                        start = max(0, i - context_lines)
                        end = min(len(lines), i + context_lines + 1)
                        ctxs.append({"line": i + 1, "context": "\n".join(lines[start:end])})
                        if len(ctxs) >= 5:
                            break

                item["object_type"] = "literal"
                item["object_preview"] = text[:400] + ("..." if len(text) > 400 else "")
                if ctxs:
                    item["contexts"] = ctxs
            else:
                item["object_type"] = type(o).__name__
                item["object_preview"] = str(o)

            results.append(item)
            if len(results) >= max_hits:
                break

        return results


# Registrieren (falls Index existiert und noch nicht registriert)
if vector_index:
    # KORREKTUR: Wir greifen direkt auf _tools zu, da get_tool nicht existiert
    if "semantic_search" not in registry._tools:
        registry.register(SemanticSearchTool(vector_index))


# ==========================================
# 2. CHATBOT KLASSE (History + Dynamic Planner)
# ==========================================

class ChatBot:
    def __init__(self, registry: ToolRegistry, llm_invoke_fn: Callable):
        self.registry = registry
        self.llm = llm_invoke_fn
        self.history = [] 

    def _get_dynamic_planner_prompt(self, retry_hint: str = "") -> str:
        tool_docs = self.registry.get_system_prompt_part()
        
        heuristics = []
        for tool in self.registry._tools.values():
            if tool.usage_guide:
                heuristics.append(f"- {tool.usage_guide} -> {tool.name}")
        
        retry_msg = f"\nACHTUNG - VORHERIGER VERSUCH GESCHEITERT:\n{retry_hint}\n" if retry_hint else ""

        return f"""
Du bist ein Planner f√ºr einen PLC Knowledge-Graph ChatBot.
Zerlege die Anfrage in Tool-Aufrufe.

{tool_docs}

STRATEGIE BEI PUNKTEN (z.B. "GVL.Start"):
- Ein Punkt deutet oft auf Variable, Port oder Instanz hin.
- Nutze 'general_search', um herauszufinden, was es ist (POU vs. Variable).
- Wenn du sicher bist, dass es eine Variable ist -> 'variable_trace' oder 'search_variables'.

Heuristiken:
{chr(10).join(heuristics)}
- Wenn nach mehreren Tool Aufrufen keine Treffer kommen, nutze string_triple_search(term) als letzten Fallback
- Sonst -> text2sparql_select

{retry_msg}

Ausgabeformat (NUR JSON):
{{
  "steps": [
    {{"tool": "tool_name", "args": {{"arg1": "wert1"}} }}
  ]
}}
"""

    def _is_result_empty(self, results: Dict[str, Any]) -> bool:
        if not results: return True
        for val in results.values():
            if isinstance(val, dict) and "error" in val: return False
            if isinstance(val, list) and len(val) > 0: return False
            if val: return False
        return True

    def _generate_split_hint(self, user_msg: str) -> str:
        """
        Analysiert die User-Anfrage nach Punkten und generiert Suchvorschl√§ge f√ºr die Teile.
        """
        # Suche nach W√∂rtern mit Punkt (z.B. GVL.Start)
        import re
        candidates = re.findall(r"([a-zA-Z0-9_]+\.[a-zA-Z0-9_]+)", user_msg)
        
        hint = "Die vorherige Suche lieferte KEINE Ergebnisse.\n"
        
        if candidates:
            for c in candidates:
                parts = c.split('.')
                hint += f"Der Begriff '{c}' enth√§lt einen Punkt. Falls die exakte Suche fehlschlug, suche nach den Teilen einzeln:\n"
                hint += f" -> Versuche general_search('{parts[0]}') (Container/Instanz?)\n"
                hint += f" -> Versuche general_search('{parts[1]}') (Element/Port?)\n"
        else:
            hint += "Versuche den Suchbegriff zu verk√ºrzen oder 'general_search' zu nutzen."
            
        return hint


    def _extract_identifier_candidates(self, user_msg: str) -> List[str]:
        """
        Extrahiert m√∂gliche Identifier aus der Frage.
        Wird nur f√ºr den letzten String Fallback genutzt.
        """
        import re

        stop = {
            "wo","wird","ist","sind","warum","wie","was","wer","welche","welcher","welches",
            "implementiert","implementierung","genau","bitte","frage","antwort",
            "in","im","am","an","auf","von","zu","mit","ohne","f√ºr","und","oder","der","die","das","ein","eine","einer","eines",
        }

        candidates: List[str] = []

        # 1) Inhalte in Quotes priorisieren
        candidates += re.findall(r"'([^']{2,80})'", user_msg)
        candidates += re.findall(r'"([^"]{2,80})"', user_msg)

        # 2) Tokens mit Zahl (sehr typisch f√ºr Skills, z.B. TestSkill3)
        candidates += re.findall(r"\b[A-Za-z_]*[A-Za-z]+[A-Za-z_]*\d+[A-Za-z0-9_]*\b", user_msg)

        # 3) Allgemeine Identifier (Fallback)
        candidates += re.findall(r"\b[A-Za-z_][A-Za-z0-9_]{2,}\b", user_msg)

        out: List[str] = []
        seen = set()
        for t in candidates:
            t = t.strip()
            if not t:
                continue
            if t.lower() in stop:
                continue
            key = t.lower()
            if key in seen:
                continue
            seen.add(key)
            out.append(t)

        return out[:5]


    def chat(self, user_msg: str, debug: bool = True):
        # 1. Context & Plan
        history_text = "Vergangener Chat:\n" + "\n".join([f"{r}: {m}" for r, m in self.history[-3:]])
        full_prompt_input = f"{history_text}\nAktuelle Anfrage: {user_msg}"
        
        # Helper JSON Parser
        def safe_json_loads(s):
            import re
            t = s.strip().replace("```json", "").replace("```", "")
            m = re.search(r"(\{.*\})", t, flags=re.DOTALL)
            return json.loads(m.group(1) if m else t)

        # === 1. VERSUCH ===
        planner_sys = self._get_dynamic_planner_prompt()
        plan_raw = self.llm(planner_sys, full_prompt_input)
        
        try:
            plan = safe_json_loads(plan_raw)
        except:
            plan = {"steps": [{"tool": "text2sparql_select", "args": {"question": user_msg}}]}

        results = {}
        for i, step in enumerate(plan.get("steps", []), 1):
            t_name = step.get("tool")
            t_args = step.get("args", {})
            results[f"step_{i}_{t_name}"] = self.registry.execute(t_name, t_args)

        # === 2. VERSUCH (Retry mit Split-Logik) ===
        if self._is_result_empty(results):
            if debug: print("‚ö†Ô∏è Keine Ergebnisse. Starte Smart-Retry (Split Search)...")
            
            # Hier generieren wir den schlauen Hinweis f√ºr den Planner
            split_hint = self._generate_split_hint(user_msg)
            
            planner_sys_retry = self._get_dynamic_planner_prompt(retry_hint=split_hint)
            plan_retry_raw = self.llm(planner_sys_retry, full_prompt_input)
            
            try:
                new_plan = safe_json_loads(plan_retry_raw)
                if new_plan.get("steps"):
                    plan = new_plan
                    results = {}
                    for i, step in enumerate(plan.get("steps", []), 1):
                        t_name = step.get("tool")
                        t_args = step.get("args", {})
                        results[f"step_{i}_{t_name}_retry"] = self.registry.execute(t_name, t_args)
            except:
                pass


        # === 3. VERSUCH (Deterministischer String Fallback) ===
        if self._is_result_empty(results):
            if debug:
                print("‚ö†Ô∏è Noch immer keine Ergebnisse. Starte String-Fallback (Triple Scan)...")

            terms = self._extract_identifier_candidates(user_msg)
            if not terms:
                terms = [user_msg.strip()[:50]]

            for term in terms:
                res3 = self.registry.execute(
                    "string_triple_search",
                    {
                        "term": term,
                        "max_hits": 20,
                        "context_lines": 3,
                        "only_predicates": ["dp_hasPOUCode", "dp_hasProgramCode", "dp_hasExpressionText"],
                    },
                )
                results[f"step_string_triple_search_{term}"] = res3

                if isinstance(res3, list) and res3 and not (isinstance(res3[0], dict) and "error" in res3[0]):
                    break


        # 4. Antwort
        answer_sys = """
        Du bist ein hilfreicher SPS-Experte. Nutze NUR die Tool-Ergebnisse.
        Erkl√§re Zusammenh√§nge.
        """
        payload = {
            "history": self.history[-3:], 
            "current_question": user_msg,
            "final_plan": plan,
            "tool_results": results
        }
        answer = self.llm(answer_sys, json.dumps(payload, ensure_ascii=False, indent=2))

        self.history.append(("User", user_msg))
        self.history.append(("AI", answer))

        return {"answer": answer, "plan": plan, "results": results}

# Standard Tools
registry.register(GeneralSearchTool())
registry.register(ListProgramsTool())
registry.register(CalledPousTool())
registry.register(PouCodeTool())
registry.register(SearchVariablesTool())
registry.register(VariableTraceTool())
registry.register(PouCallersTool())
registry.register(StringTripleSearchTool(kg if "kg" in globals() else None))

print("‚úÖ SPARQL Helper + OOP Tool Registry erfolgreich initialisiert.")
print("-" * 30)
print(registry.get_system_prompt_part())

# Bot instanziieren
bot = ChatBot(registry, llm_invoke)
print("‚úÖ Smart ChatBot (mit General Search & Split-Retry) bereit.")

üîÑ Baue Vektor-Index auf...
‚ö†Ô∏è Keine Dokumente f√ºr RAG gefunden.
‚úÖ SPARQL Helper + OOP Tool Registry erfolgreich initialisiert.
------------------------------
Verf√ºgbare Tools:
- general_search(name_contains)
  Beschreibung: Sucht universell nach POUs, Variablen oder Ports. Gibt Typ und Name zur√ºck.
  Wann nutzen: Nutzen, wenn unklar ist, ob ein Name eine POU, eine Variable oder ein Port ist (z.B. bei Punkten im Namen).
- list_programs()
  Beschreibung: Listet alle verf√ºgbaren Programme im Projekt auf.
  Wann nutzen: Wenn der User fragt 'Welche Programme gibt es?' oder einen Einstiegspunkt sucht.
- called_pous(program_name)
  Beschreibung: Zeigt alle POUs, die von einem Programm aufgerufen werden.
  Wann nutzen: Bei Fragen nach Call-Graph, Struktur, 'Wer ruft wen auf?'.
- pou_code(pou_name)
  Beschreibung: Holt ST-Code, Sprache und Report einer POU.
  Wann nutzen: Wenn User nach 'Code', 'Implementierung' oder 'Inhalt' fragt.
- search_variables(name_contains)
  Beschreibung:

## 11) Chat UI (ipywidgets)

In [None]:
import ipywidgets as widgets
from IPython.display import display, Markdown
import json

# 1. Widgets definieren
debug_toggle = widgets.Checkbox(value=True, description="Debug (Plan + Tool Results anzeigen)")
input_box = widgets.Textarea(
    placeholder="Frage stellen... (z.B. 'Was macht der Not-Halt?')",
    layout=widgets.Layout(width="100%", height="90px")
)
send_btn = widgets.Button(description="Send", button_style="primary")
out = widgets.Output()

display(debug_toggle, input_box, send_btn, out)

# 2. Event-Handler Logik
def on_send(b):
    # Sofort sperren, um Doppelklicks zu verhindern
    b.disabled = True
    
    try:
        user_msg = input_box.value.strip()
        out.clear_output(wait=True) # wait=True verhindert Flackern
        
        if not user_msg:
            return

        with out:
            print(f"User: {user_msg}")
            
            # Bot aufrufen
            try:
                # Hier greifen wir auf dein globales 'bot' Objekt zu
                resp = bot.chat(user_msg, debug=debug_toggle.value)

                display(Markdown("### Antwort"))
                print(resp["answer"])

                if debug_toggle.value:
                    display(Markdown("### Plan"))
                    print(json.dumps(resp["plan"], ensure_ascii=False, indent=2))

                    display(Markdown("### Tool Results (gek√ºrzt)"))
                    print(json.dumps(resp["results"], ensure_ascii=False, indent=2)[:8000])
                    
            except Exception as e:
                print(f"‚ùå Ein Fehler ist aufgetreten: {e}")
                
    finally:
        # Button am Ende immer wieder freigeben
        b.disabled = False

# 3. Handler registrieren
# Wichtig: Wir definieren den Button oben neu, also ist er "frisch". 
# Ein einfaches .on_click reicht.
send_btn.on_click(on_send)

Checkbox(value=True, description='Debug (Plan + Tool Results anzeigen)')

Textarea(value='', layout=Layout(height='90px', width='100%'), placeholder="Frage stellen... (z.B. 'Was macht ‚Ä¶

Button(button_style='primary', description='Send', style=ButtonStyle())

Output()

## 12) Quick Tests

In [12]:
tests = [
    "Welche Programme gibt es?",
    "Welche POUs ruft HRL_SkillSet auf?",
    "Zeig mir den Code von JobMethode_Schablone",
    "Suche Variablen, die 'NotAus' enthalten",
    "Trace f√ºr DI04_EncoderStart02",
]

for q in tests:
    print("\n---\n", q)
    try:
        resp = chat_once(q, debug=False)
        print(resp["answer"][:700])
    except Exception as e:
        print("Fehler:", e)


---
 Welche Programme gibt es?
Fehler: name 'chat_once' is not defined

---
 Welche POUs ruft HRL_SkillSet auf?
Fehler: name 'chat_once' is not defined

---
 Zeig mir den Code von JobMethode_Schablone
Fehler: name 'chat_once' is not defined

---
 Suche Variablen, die 'NotAus' enthalten
Fehler: name 'chat_once' is not defined

---
 Trace f√ºr DI04_EncoderStart02
Fehler: name 'chat_once' is not defined
