# Agentic Text2Cypher Inference (Ollama - Local LLM)

Notebook ini menjalankan inferensi agentic loop menggunakan **Ollama** (Local LLM).

## Fitur:
- Self-correction mechanism dengan maksimal 3 iterasi
- 9 konfigurasi (3 prompts × 3 schemas)
- Tracking per-question dan per-iteration

In [1]:
# Setup path - PENTING: Jalankan cell ini pertama kali!
import sys
import os
from pathlib import Path

# Get the agentic directory (parent of notebooks)
agentic_dir = Path.cwd().parent
project_root = agentic_dir

# Change working directory to agentic folder
os.chdir(agentic_dir)

# Add to Python path
if str(agentic_dir) not in sys.path:
    sys.path.insert(0, str(agentic_dir))

print(f"Working directory: {os.getcwd()}")
print(f"Python path includes: {agentic_dir}")

Working directory: /Users/tsimiscouse/Docs/Sarjana/Skripsi/kg-luthfi/agentic
Python path includes: /Users/tsimiscouse/Docs/Sarjana/Skripsi/kg-luthfi/agentic


In [2]:
# Load environment variables
from dotenv import load_dotenv
load_dotenv()

print(f"NEO4J_URI set: {'Yes' if os.getenv('NEO4J_URI') else 'No'}")

NEO4J_URI set: Yes


In [3]:
# Check Ollama Status
import requests

def check_ollama():
    """Check if Ollama is running and model is available."""
    try:
        response = requests.get("http://localhost:11434/api/tags", timeout=5)
        if response.status_code == 200:
            models = response.json().get("models", [])
            print("✅ Ollama is running!")
            print(f"\nAvailable models:")
            for m in models:
                size_gb = m.get('size', 0) / (1024**3)
                print(f"  - {m['name']} ({size_gb:.1f} GB)")
            return True
    except requests.exceptions.ConnectionError:
        print("❌ Ollama is not running!")
        print("   Run: ollama serve")
        return False

check_ollama()

✅ Ollama is running!

Available models:
  - qwen2.5-coder:3b (1.8 GB)


True

In [4]:
# Test Ollama API
from openai import OpenAI

# Ollama uses OpenAI-compatible API
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Ollama doesn't need API key
)

MODEL = "qwen2.5-coder:3b"

try:
    response = client.chat.completions.create(
        model=MODEL,
        messages=[{"role": "user", "content": "Say hello in Indonesian"}],
        max_tokens=50
    )
    print(f"Model: {MODEL}")
    print(f"SUCCESS: {response.choices[0].message.content}")
except Exception as e:
    print(f"ERROR: {e}")

Model: qwen2.5-coder:3b
SUCCESS: Hello! How can I assist you today?


In [5]:
# Suppress warnings for cleaner output
import warnings
import logging

warnings.filterwarnings('ignore', category=UserWarning, module='nltk')
logging.getLogger('validators').setLevel(logging.ERROR)

In [6]:
# Import modules
from config.settings import Settings
from config.llm_config_ollama import LLMConfigOllama
from data.ground_truth_loader import GroundTruthLoader
from prompts.prompt_manager import PromptManager, PromptType, SchemaFormat

## 1. Load Ground Truth Data

In [7]:
# Load ground truth
loader = GroundTruthLoader()
ground_truth = loader.load()

print(f"Total questions: {len(ground_truth)}")
print(f"\nStatistics:")
stats = loader.get_statistics()
for key, value in stats.items():
    print(f"  {key}: {value}")

Total questions: 52

Statistics:
  total_items: 52
  by_complexity: {'Easy': 16, 'Medium': 20, 'Hard': 16}
  by_reasoning_level: {'Fakta Eksplisit': 16, 'Fakta Implisit': 36}
  by_sublevel: {'Nodes': 16, 'One-hop': 20, 'Multi-hop': 16}


In [8]:
# Preview first 5 questions
import pandas as pd

df_preview = pd.DataFrame([item.to_dict() for item in ground_truth[:5]])
df_preview

Unnamed: 0,id,reasoning_level,sublevel,complexity,question,cypher_query
0,1,Fakta Eksplisit,Nodes,Easy,Berapakah jumlah SKS untuk mata kuliah 'Aljaba...,MATCH (n:MK {nama: 'Aljabar Linear'}) RETURN n...
1,2,Fakta Eksplisit,Nodes,Easy,Apakah nama lengkap dari mata kuliah dengan ko...,MATCH (n:MK {kode: 'TKU211131'}) RETURN n.nama
2,3,Fakta Eksplisit,Nodes,Easy,Bagaimanakah deskripsi untuk Capaian Lulusan (...,MATCH (n:SO {kode: 'KP.1'}) RETURN n.deskripsi
3,4,Fakta Eksplisit,Nodes,Easy,Tampilkan seluruh mata kuliah yang ditawarkan ...,MATCH (n:MK) WHERE n.semester = 1 RETURN n.nama
4,5,Fakta Eksplisit,Nodes,Easy,Tampilkan nama dan kode untuk semua mata kulia...,MATCH (n:MK) WHERE n.tipe = 'Pilihan' RETURN n...


## 2. Test Single Question with Ollama

In [9]:
# Initialize components for testing with Ollama
settings = Settings()
llm_config = LLMConfigOllama()

print(f"Max iterations: {settings.max_iterations}")
print(f"LLM Provider: {llm_config.provider}")
print(f"LLM Model: {llm_config.model}")
print(f"Base URL: {llm_config.base_url}")

Max iterations: 3
LLM Provider: ollama
LLM Model: qwen2.5-coder:3b
Base URL: http://localhost:11434/v1


In [10]:
# Create batch processor with Ollama config
from experiment.batch_processor_ollama import create_batch_processor_ollama

processor = create_batch_processor_ollama()

# Process first question with Zero-Shot + Full Schema
test_item = ground_truth[0]
print(f"Question: {test_item.question}")
print(f"Ground Truth: {test_item.cypher_query}")

Question: Berapakah jumlah SKS untuk mata kuliah 'Aljabar Linear'?
Ground Truth: MATCH (n:MK {nama: 'Aljabar Linear'}) RETURN n.sks


In [11]:
# Run single question
state = processor.process_single(
    item=test_item,
    prompt_type=PromptType.ZERO_SHOT,
    schema_format=SchemaFormat.FULL_SCHEMA
)

print(f"\nResult:")
print(f"  Success: {state.success}")
print(f"  Iterations: {state.total_iterations}")
print(f"  Final Query: {state.final_query}")

processor.close()


Result:
  Success: True
  Iterations: 1
  Final Query: MATCH (m:MK {nama: 'Aljabar Linear'}) RETURN m.sks


## 3. Run Single Configuration

In [12]:
# Run single configuration (e.g., Zero-Shot + Full Schema)
from experiment.experiment_runner_ollama import ExperimentRunnerOllama

runner = ExperimentRunnerOllama()

# Limit to first 5 questions for testing
result = runner.run_configuration(
    prompt_type=PromptType.ZERO_SHOT,
    schema_format=SchemaFormat.FULL_SCHEMA,
    ground_truth_items=ground_truth[:5]
)

print(f"Configuration: {result.config_name}")
print(f"Pass@1 Rate: {result.pass_at_1_rate:.2f}%")
print(f"KG Valid Rate: {result.kg_valid_rate:.2f}%")
print(f"LLMetric: {result.llmetric:.2f}")

runner.close()



Configuration: Zero-Shot_Full
Pass@1 Rate: 80.00%
KG Valid Rate: 100.00%
LLMetric: 88.64


## 4. Run All 9 Configurations (Full Experiment)

In [19]:
# Run all configurations with Ollama
# No rate limit - can run full experiment!

from experiment.experiment_runner_ollama import run_experiment_ollama

# Test with 10 questions first
# results = run_experiment_ollama(max_questions=5)

# Uncomment for full experiment (52 questions)
results = run_experiment_ollama()

ERROR:neo4j.io:[#F484]  _: <CONNECTION> error: Failed to read from defunct connection IPv4Address(('si-3574e0e1-9d4c.production-orch-0064.neo4j.io', 7687)) (ResolvedIPv4Address(('34.126.64.110', 7687))): OSError('No data')
ERROR:validators.syntax_validator:Syntax validation error: Failed to read from defunct connection IPv4Address(('si-3574e0e1-9d4c.production-orch-0064.neo4j.io', 7687)) (ResolvedIPv4Address(('34.126.64.110', 7687)))
ERROR:validators.syntax_validator:Syntax validation error: Failed to DNS resolve address 3574e0e1.databases.neo4j.io:7687: [Errno 8] nodename nor servname provided, or not known
ERROR:validators.syntax_validator:Syntax validation error: Failed to DNS resolve address 3574e0e1.databases.neo4j.io:7687: [Errno 8] nodename nor servname provided, or not known
ERROR:validators.syntax_validator:Syntax validation error: Failed to DNS resolve address 3574e0e1.databases.neo4j.io:7687: [Errno 8] nodename nor servname provided, or not known
ERROR:validators.syntax_vali

In [20]:
# Display results summary
print("\n" + "="*60)
print("EXPERIMENT RESULTS SUMMARY (Ollama - Local LLM)")
print("="*60)

for config_name, result in results.items():
    print(f"\n{config_name}:")
    print(f"  Pass@1: {result.pass_at_1_rate:.1f}%")
    print(f"  KG Valid: {result.kg_valid_rate:.1f}%")
    print(f"  LLMetric: {result.llmetric:.2f}")


EXPERIMENT RESULTS SUMMARY (Ollama - Local LLM)

Zero-Shot_Full:
  Pass@1: 23.1%
  KG Valid: 50.0%
  LLMetric: 39.34

Zero-Shot_Nodes+Paths:
  Pass@1: 21.2%
  KG Valid: 44.2%
  LLMetric: 35.84

Zero-Shot_Paths:
  Pass@1: 21.2%
  KG Valid: 48.1%
  LLMetric: 37.75

Few-Shot_Full:
  Pass@1: 28.8%
  KG Valid: 63.5%
  LLMetric: 48.74

Few-Shot_Nodes+Paths:
  Pass@1: 25.0%
  KG Valid: 59.6%
  LLMetric: 45.30

Few-Shot_Paths:
  Pass@1: 25.0%
  KG Valid: 63.5%
  LLMetric: 46.88

CoT_Full:
  Pass@1: 21.2%
  KG Valid: 59.6%
  LLMetric: 43.05

CoT_Nodes+Paths:
  Pass@1: 25.0%
  KG Valid: 57.7%
  LLMetric: 44.64

CoT_Paths:
  Pass@1: 19.2%
  KG Valid: 44.2%
  LLMetric: 36.30


## 5. View Results

In [21]:
# Load saved results
import json

results_path = project_root / "results_ollama" / "experiment_summary.json"

if results_path.exists():
    with open(results_path, "r") as f:
        summary = json.load(f)
    
    print(f"Experiment completed at: {summary['timestamp']}")
    print(f"Total configurations: {summary['total_configurations']}")
    print(f"LLM Provider: Ollama (Local)")
else:
    print("No results found. Run the experiment first.")

Experiment completed at: 2025-12-31T12:37:28.356860
Total configurations: 9
LLM Provider: Ollama (Local)


In [16]:
# Display results as DataFrame
if results_path.exists():
    configs = summary.get("configurations", {})
    
    rows = []
    for name, config in configs.items():
        rows.append({
            "Configuration": name,
            "Pass@1 Rate": f"{config['pass_at_1_rate']:.2f}%",
            "KG Valid Rate": f"{config['kg_valid_rate']:.2f}%",
            "LLMetric": f"{config['llmetric']:.2f}",
            "Avg Iterations": f"{config['agentic_metrics']['avg_iterations']:.2f}" if config.get('agentic_metrics') else "N/A",
            "Recovery Rate": f"{config['agentic_metrics']['recovery_rate']*100:.2f}%" if config.get('agentic_metrics') else "N/A",
        })
    
    df_results = pd.DataFrame(rows)
    df_results

## 6. Compare with Groq Results (Optional)

In [17]:
# Compare Ollama vs Groq results
groq_results_path = project_root / "results" / "experiment_summary.json"
ollama_results_path = project_root / "results_ollama" / "experiment_summary.json"

if groq_results_path.exists() and ollama_results_path.exists():
    with open(groq_results_path) as f:
        groq_summary = json.load(f)
    with open(ollama_results_path) as f:
        ollama_summary = json.load(f)
    
    print("Comparison: Groq (Cloud) vs Ollama (Local)")
    print("="*60)
    
    for config_name in groq_summary.get("configurations", {}).keys():
        groq_config = groq_summary["configurations"].get(config_name, {})
        ollama_config = ollama_summary["configurations"].get(config_name, {})
        
        if groq_config and ollama_config:
            print(f"\n{config_name}:")
            print(f"  Groq Pass@1: {groq_config.get('pass_at_1_rate', 0):.1f}% | Ollama: {ollama_config.get('pass_at_1_rate', 0):.1f}%")
            print(f"  Groq LLMetric: {groq_config.get('llmetric', 0):.2f} | Ollama: {ollama_config.get('llmetric', 0):.2f}")
else:
    print("Run both Groq and Ollama experiments to compare.")

Comparison: Groq (Cloud) vs Ollama (Local)

Zero-Shot_Full:
  Groq Pass@1: 34.6% | Ollama: 80.0%
  Groq LLMetric: 53.56 | Ollama: 88.64

Zero-Shot_Nodes+Paths:
  Groq Pass@1: 25.0% | Ollama: 80.0%
  Groq LLMetric: 47.07 | Ollama: 88.74

Zero-Shot_Paths:
  Groq Pass@1: 28.8% | Ollama: 80.0%
  Groq LLMetric: 48.98 | Ollama: 88.74

Few-Shot_Full:
  Groq Pass@1: 34.6% | Ollama: 80.0%
  Groq LLMetric: 39.68 | Ollama: 88.69

Few-Shot_Nodes+Paths:
  Groq Pass@1: 0.0% | Ollama: 80.0%
  Groq LLMetric: 0.00 | Ollama: 88.69

Few-Shot_Paths:
  Groq Pass@1: 0.0% | Ollama: 80.0%
  Groq LLMetric: 0.00 | Ollama: 88.69

CoT_Full:
  Groq Pass@1: 0.0% | Ollama: 80.0%
  Groq LLMetric: 0.00 | Ollama: 88.57

CoT_Nodes+Paths:
  Groq Pass@1: 0.0% | Ollama: 80.0%
  Groq LLMetric: 0.00 | Ollama: 88.57

CoT_Paths:
  Groq Pass@1: 0.0% | Ollama: 80.0%
  Groq LLMetric: 0.00 | Ollama: 88.69
