# Agentic Text2Cypher Inference

Notebook ini menjalankan inferensi agentic loop untuk Text2Cypher pipeline.

## Fitur:
- Self-correction mechanism dengan maksimal 3 iterasi
- 9 konfigurasi (3 prompts Ã— 3 schemas)
- Tracking per-question dan per-iteration

In [1]:
# Setup path - PENTING: Jalankan cell ini pertama kali!
import sys
import os
from pathlib import Path

# Get the agentic directory (parent of notebooks)
agentic_dir = Path.cwd().parent
project_root = agentic_dir

# Change working directory to agentic folder
os.chdir(agentic_dir)

# Add to Python path
if str(agentic_dir) not in sys.path:
    sys.path.insert(0, str(agentic_dir))

print(f"Working directory: {os.getcwd()}")
print(f"Python path includes: {agentic_dir}")

Working directory: /Users/tsimiscouse/Docs/Sarjana/Skripsi/kg-luthfi/agentic
Python path includes: /Users/tsimiscouse/Docs/Sarjana/Skripsi/kg-luthfi/agentic


In [2]:
# Load environment variables
from dotenv import load_dotenv
load_dotenv()

import os
print(f"GROQ_API_KEY set: {'Yes' if os.getenv('GROQ_API_KEY') else 'No'}")
print(f"NEO4J_URI set: {'Yes' if os.getenv('NEO4J_URI') else 'No'}")

GROQ_API_KEY set: Yes
NEO4J_URI set: Yes


In [3]:
# Reload .env dari direktori agentic
load_dotenv("/Users/tsimiscouse/Docs/Sarjana/Skripsi/kg-luthfi/agentic/.env", override=True)

api_key = os.getenv("GROQ_API_KEY")
print(f"API Key loaded: {api_key[:20]}..." if api_key else "API Key NOT loaded!")

API Key loaded: gsk_3cnZbdKzvw4uXizF...


In [4]:
# Test Groq API dengan model terbaru
import os
from dotenv import load_dotenv

# Force reload .env
env_path = "/Users/tsimiscouse/Docs/Sarjana/Skripsi/kg-luthfi/agentic/.env"
load_dotenv(env_path, override=True)

api_key = os.getenv("GROQ_API_KEY")
print(f"API Key: {api_key[:20]}...")
print(f"API Key length: {len(api_key) if api_key else 0}")

# Test with OpenAI client (Groq compatible)
from openai import OpenAI

client = OpenAI(
    api_key=api_key,
    base_url="https://api.groq.com/openai/v1"
)

# Model updated: qwen-2.5-coder-32b deprecated, using qwen/qwen3-32b
MODEL = "qwen/qwen3-32b"

try:
    response = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": "Say hello in Indonesian"}],
        max_tokens=50
    )
    print(f"Model: {MODEL}")
    print(f"SUCCESS: {response.choices[0].message.content}")
except Exception as e:
    print(f"ERROR: {e}")

API Key: gsk_3cnZbdKzvw4uXizF...
API Key length: 56
Model: qwen/qwen3-32b
SUCCESS: <think>
Okay, the user asked to say hello in Indonesian. Let me think about how to respond. The most common greeting in Indonesian is "Halo." That's straightforward. But maybe I should consider if there are other greetings too. For example


In [5]:
# Import modules
from config.settings import Settings
from config.llm_config import LLMConfig
from data.ground_truth_loader import GroundTruthLoader
from prompts.prompt_manager import PromptManager, PromptType, SchemaFormat
from experiment.experiment_runner import ExperimentRunner, run_experiment
from experiment.batch_processor import BatchProcessor

## 1. Load Ground Truth Data

In [6]:
# Load ground truth
loader = GroundTruthLoader()
ground_truth = loader.load()

print(f"Total questions: {len(ground_truth)}")
print(f"\nStatistics:")
stats = loader.get_statistics()
for key, value in stats.items():
    print(f"  {key}: {value}")

Total questions: 52

Statistics:
  total_items: 52
  by_complexity: {'Easy': 16, 'Medium': 20, 'Hard': 16}
  by_reasoning_level: {'Fakta Eksplisit': 16, 'Fakta Implisit': 36}
  by_sublevel: {'Nodes': 16, 'One-hop': 20, 'Multi-hop': 16}


In [7]:
# Preview first 5 questions
import pandas as pd

df_preview = pd.DataFrame([item.to_dict() for item in ground_truth[:5]])
df_preview

Unnamed: 0,id,reasoning_level,sublevel,complexity,question,cypher_query
0,1,Fakta Eksplisit,Nodes,Easy,Berapakah jumlah SKS untuk mata kuliah 'Aljaba...,MATCH (n:MK {nama: 'Aljabar Linear'}) RETURN n...
1,2,Fakta Eksplisit,Nodes,Easy,Apakah nama lengkap dari mata kuliah dengan ko...,MATCH (n:MK {kode: 'TKU211131'}) RETURN n.nama
2,3,Fakta Eksplisit,Nodes,Easy,Bagaimanakah deskripsi untuk Capaian Lulusan (...,MATCH (n:SO {kode: 'KP.1'}) RETURN n.deskripsi
3,4,Fakta Eksplisit,Nodes,Easy,Tampilkan seluruh mata kuliah yang ditawarkan ...,MATCH (n:MK) WHERE n.semester = 1 RETURN n.nama
4,5,Fakta Eksplisit,Nodes,Easy,Tampilkan nama dan kode untuk semua mata kulia...,MATCH (n:MK) WHERE n.tipe = 'Pilihan' RETURN n...


## 2. Test Single Question

In [8]:
# Initialize components for testing
settings = Settings()
llm_config = LLMConfig()

print(f"Max iterations: {settings.max_iterations}")
print(f"LLM Model: {llm_config.model}")

Max iterations: 3
LLM Model: qwen/qwen3-32b


In [9]:
# Test with single question
from experiment.batch_processor import create_batch_processor

processor = create_batch_processor()

# Process first question with Zero-Shot + Full Schema
test_item = ground_truth[0]
print(f"Question: {test_item.question}")
print(f"Ground Truth: {test_item.cypher_query}")

Question: Berapakah jumlah SKS untuk mata kuliah 'Aljabar Linear'?
Ground Truth: MATCH (n:MK {nama: 'Aljabar Linear'}) RETURN n.sks


In [10]:
# Run single question
state = processor.process_single(
    item=test_item,
    prompt_type=PromptType.ZERO_SHOT,
    schema_format=SchemaFormat.FULL_SCHEMA
)

print(f"\nResult:")
print(f"  Success: {state.success}")
print(f"  Iterations: {state.total_iterations}")
print(f"  Final Query: {state.final_query}")

processor.close()


Result:
  Success: True
  Iterations: 1
  Final Query: MATCH (m:MK {nama: 'Aljabar Linear'}) RETURN m.sks AS jumlah_sks


## 3. Run Single Configuration

In [11]:
# Run single configuration (e.g., Zero-Shot + Full Schema)
from experiment.experiment_runner import ExperimentRunner

runner = ExperimentRunner()

# Limit to first 5 questions for testing
result = runner.run_configuration(
    prompt_type=PromptType.ZERO_SHOT,
    schema_format=SchemaFormat.FULL_SCHEMA,
    ground_truth_items=ground_truth[:5]
)

print(f"Configuration: {result.config_name}")
print(f"Pass@1 Rate: {result.pass_at_1_rate:.2f}%")
print(f"KG Valid Rate: {result.kg_valid_rate:.2f}%")
print(f"LLMetric: {result.llmetric:.2f}")

runner.close()

The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()


Configuration: Zero-Shot_Full
Pass@1 Rate: 80.00%
KG Valid Rate: 100.00%
LLMetric: 88.40


## 4. Run All 9 Configurations

In [15]:
# Run all configurations (FULL EXPERIMENT)
# WARNING: This will take a long time!

# Uncomment the following lines to run the full experiment:
results = run_experiment()

# Or limit to first N questions for testing:
# results = run_experiment(max_questions=10)

The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 3-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
The hypothesis contains 0 counts of 4-gram overlaps.
Therefore the BLEU score evaluates to 0, indepe

In [None]:
# Run specific configurations only
specific_configs = [
    (PromptType.ZERO_SHOT, SchemaFormat.FULL_SCHEMA),
    (PromptType.FEW_SHOT, SchemaFormat.FULL_SCHEMA),
    (PromptType.CHAIN_OF_THOUGHT, SchemaFormat.FULL_SCHEMA),
]

# Uncomment to run:
# results = run_experiment(max_questions=5, configurations=specific_configs)

## 5. View Results

In [13]:
# Load saved results
import json

results_path = project_root / "results" / "experiment_summary.json"

if results_path.exists():
    with open(results_path, "r") as f:
        summary = json.load(f)
    
    print(f"Experiment completed at: {summary['timestamp']}")
    print(f"Total configurations: {summary['total_configurations']}")
else:
    print("No results found. Run the experiment first.")

Experiment completed at: 2025-12-30T15:33:39.924275
Total configurations: 9


In [14]:
# Display results as DataFrame
if results_path.exists():
    configs = summary.get("configurations", {})
    
    rows = []
    for name, config in configs.items():
        rows.append({
            "Configuration": name,
            "Pass@1 Rate": f"{config['pass_at_1_rate']:.2f}%",
            "KG Valid Rate": f"{config['kg_valid_rate']:.2f}%",
            "LLMetric": f"{config['llmetric']:.2f}",
            "Avg Iterations": f"{config['agentic_metrics']['avg_iterations']:.2f}" if config.get('agentic_metrics') else "N/A",
            "Recovery Rate": f"{config['agentic_metrics']['recovery_rate']*100:.2f}%" if config.get('agentic_metrics') else "N/A",
        })
    
    df_results = pd.DataFrame(rows)
    df_results