# Screener-Reasoner Pipeline Walkthrough

**Date**: 2026-01-30

This notebook explains the complete pipeline for explainable log anomaly detection.

## Architecture Overview

```
┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Log Data   │ ──▶ │  Screener   │ ──▶ │  Retriever  │ ──▶ │  Reasoner   │
│  (BGL/HDFS) │     │ (AllLinLog) │     │   (BM25)    │     │   (LLM)     │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
                           │                   │                   │
                           ▼                   ▼                   ▼
                    Anomaly Detection    Evidence Docs    Structured Explanation
```

## What We Built Today

| Module | Purpose |
|--------|--------|
| `data_loader.py` | Load BGL/HDFS datasets into Session objects |
| `screener.py` | AllLinLog model wrapper for anomaly detection |
| `evidence_store.py` | Build evidence corpus from training data |
| `retriever.py` | BM25 retrieval for RAG |
| `llm_client.py` | Unified LLM client (Ollama/OpenAI) |
| `prompt_builder.py` | Build structured prompts for explanation |
| `verifier.py` | Verify claims against evidence |

## Step 0: Setup

In [None]:
import sys
sys.path.insert(0, '..')

# Import all modules
from src import (
    BGLDataLoader, Session,
    Screener, ScreenerOutput,
    EvidenceStore,
    BM25Retriever,
    PromptBuilder, TraceExplanation,
    LLMClient,
)

print("✓ All modules imported successfully!")

## Step 1: Load Data

The `BGLDataLoader` reads BGL log files and creates `Session` objects using a sliding window approach.

- **Window size**: 10 log lines per session
- **Label**: If ANY line in the window is anomalous, the session is labeled as anomaly
- **Split**: 70% train, 15% validation, 15% test (stratified)

In [None]:
# Load BGL dataset
loader = BGLDataLoader('../logs/BGL.log', windows_size=10)
loader.load()

# Get splits
train_sessions = loader.get_sessions(split='train')
test_sessions = loader.get_sessions(split='test')

print(f"Train sessions: {len(train_sessions):,}")
print(f"Test sessions:  {len(test_sessions):,}")

# Check label distribution
train_anomaly = sum(1 for s in train_sessions if s.label == 1)
test_anomaly = sum(1 for s in test_sessions if s.label == 1)
print(f"\nTrain anomaly rate: {train_anomaly/len(train_sessions):.1%}")
print(f"Test anomaly rate:  {test_anomaly/len(test_sessions):.1%}")

In [None]:
# Examine a sample session
sample = test_sessions[0]
print(f"Session ID: {sample.session_id}")
print(f"Label: {'Anomaly' if sample.label == 1 else 'Normal'}")
print(f"Number of lines: {len(sample.lines)}")
print(f"\nLog content:")
for i, line in enumerate(sample.lines[:5]):
    print(f"  {i+1}. {line[:100]}...")

## Step 2: Screener (Anomaly Detection)

The `Screener` wraps the pre-trained **AllLinLog** model:
- **Architecture**: Linformer (linear attention) for efficient long sequence processing
- **Tokenizer**: GPT-4 BPE (cl100k_base)
- **Output**: Binary classification (normal/anomaly) with probability

In [None]:
# Load the pre-trained Screener model
screener = Screener(
    model_path='../best_model/best_model_20250724_072857.pth',
    dataset='BGL'
)
screener.load()

In [None]:
# Test on a few sessions
print("Testing Screener predictions:\n")

# Get some anomaly and normal sessions
anomaly_sessions = [s for s in test_sessions if s.label == 1][:3]
normal_sessions = [s for s in test_sessions if s.label == 0][:3]

for session in anomaly_sessions + normal_sessions:
    result = screener.predict(session)
    actual = "Anomaly" if session.label == 1 else "Normal"
    predicted = "Anomaly" if result.pred == 1 else "Normal"
    match = "✓" if session.label == result.pred else "✗"
    print(f"{session.session_id}: Actual={actual:7}, Pred={predicted:7}, Prob={result.prob[1]:.4f} {match}")

## Step 3: Evidence Store & Retriever (RAG)

For explainability, we use RAG (Retrieval-Augmented Generation):

1. **Evidence Store**: Build a corpus from training sessions
2. **BM25 Retriever**: Find similar sessions as evidence for explanation

In [None]:
# Build evidence store from training data
# Using a subset for speed (in production, use all training data)
evidence_store = EvidenceStore(dataset='BGL')
evidence_store.build_from_sessions(train_sessions[:3000])

print(f"Evidence store size: {len(evidence_store)} documents")

In [None]:
# Build BM25 retriever
retriever = BM25Retriever(evidence_store=evidence_store)
retriever.build_index()

In [None]:
# Test retrieval for an anomaly session
query_session = anomaly_sessions[0]
print(f"Query session: {query_session.session_id}")
print(f"Query content: {query_session.lines[0][:80]}...\n")

# Retrieve similar evidence
hits = retriever.retrieve_for_session(query_session, top_k=3)

print("Top 3 retrieved evidence:")
for i, hit in enumerate(hits, 1):
    print(f"\n{i}. Score: {hit.score:.4f}")
    print(f"   ID: {hit.evidence_id}")
    print(f"   Text: {hit.text[:100]}...")

## Step 4: Prompt Builder

The `PromptBuilder` creates structured prompts for the LLM:
- Includes the anomalous session content
- Includes retrieved evidence with IDs (E1, E2, ...)
- Requests JSON output with traceable claims

In [None]:
# Build prompt for explanation
builder = PromptBuilder()
screener_output = screener.predict(query_session)

system_prompt, user_prompt = builder.build_prompt(
    session=query_session,
    screener_output=screener_output,
    evidence_hits=hits
)

print("=== SYSTEM PROMPT ===")
print(system_prompt[:500])
print("\n=== USER PROMPT (first 1500 chars) ===")
print(user_prompt[:1500])

## Step 5: LLM Reasoner

The `LLMClient` supports multiple providers with a unified interface:

| Provider | Model | Cost |
|----------|-------|------|
| `ollama` | llama3.1:8b | Free (local) |
| `openai` | gpt-4o | ~$0.007/explanation |

In [None]:
# Test both LLM providers
print("Available LLM providers:\n")

# Ollama (local)
ollama_client = LLMClient(provider="ollama", model="llama3.1:8b")
print(f"Ollama available: {ollama_client.is_available()}")
print(f"  Models: {ollama_client.list_models()}")

# OpenAI (cloud)
openai_client = LLMClient(provider="openai", model="gpt-4o")
print(f"\nOpenAI available: {openai_client.is_available()}")

In [None]:
# Generate explanation with Ollama (local, free)
print("Generating explanation with Ollama (llama3.1:8b)...\n")

response = ollama_client.generate(
    prompt=user_prompt,
    system_prompt=system_prompt,
    json_mode=True,
    temperature=0.1,
    max_tokens=1024
)

print(f"Latency: {response.latency_ms:.0f}ms")
print(f"Tokens: {response.total_tokens}")
print(f"\n=== EXPLANATION ===")
print(response.content)

In [None]:
# Generate explanation with OpenAI (cloud, paid)
print("Generating explanation with OpenAI (gpt-4o)...\n")

response = openai_client.generate(
    prompt=user_prompt,
    system_prompt=system_prompt,
    json_mode=True,
    temperature=0.1,
    max_tokens=1024
)

print(f"Latency: {response.latency_ms:.0f}ms")
print(f"Tokens: {response.total_tokens}")
print(f"Cost: ${response.cost_usd:.4f}")
print(f"\n=== EXPLANATION ===")
print(response.content)

## Step 6: Parse Structured Explanation

The LLM returns a JSON object with:
- `prediction`: "anomaly" or "normal"
- `summary`: Brief explanation
- `claims`: List of claims, each with `evidence_ids`
- `insufficient_evidence`: Boolean flag

In [None]:
import json

# Parse the JSON response
explanation = json.loads(response.content)

print(f"Prediction: {explanation['prediction']}")
print(f"\nSummary: {explanation['summary']}")
print(f"\nClaims:")
for i, claim in enumerate(explanation['claims'], 1):
    print(f"  {i}. {claim['claim']}")
    print(f"     Evidence: {claim['evidence_ids']}")

## Complete Pipeline Function

Here's the complete pipeline wrapped in a single function:

In [None]:
def explain_anomaly(
    session: Session,
    screener: Screener,
    retriever: BM25Retriever,
    llm_client: LLMClient,
    top_k: int = 5
) -> dict:
    """
    Complete pipeline to explain a log anomaly.
    
    Args:
        session: Session to explain
        screener: Loaded Screener model
        retriever: Fitted BM25 retriever
        llm_client: LLM client (Ollama or OpenAI)
        top_k: Number of evidence documents to retrieve
        
    Returns:
        Dict with prediction, explanation, and metadata
    """
    # Step 1: Screen for anomaly
    screener_output = screener.predict(session)
    
    if screener_output.pred == 0:
        return {
            "session_id": session.session_id,
            "prediction": "normal",
            "explanation": "Session classified as normal by the screener.",
            "screener_prob": screener_output.prob[1]
        }
    
    # Step 2: Retrieve evidence
    hits = retriever.retrieve_for_session(session, top_k=top_k)
    
    # Step 3: Build prompt
    builder = PromptBuilder()
    system_prompt, user_prompt = builder.build_prompt(
        session=session,
        screener_output=screener_output,
        evidence_hits=hits
    )
    
    # Step 4: Generate explanation
    response = llm_client.generate(
        prompt=user_prompt,
        system_prompt=system_prompt,
        json_mode=True
    )
    
    # Step 5: Parse and return
    explanation = json.loads(response.content)
    
    return {
        "session_id": session.session_id,
        "screener_prob": screener_output.prob[1],
        "explanation": explanation,
        "evidence_count": len(hits),
        "llm_latency_ms": response.latency_ms,
        "llm_cost_usd": response.cost_usd
    }

print("✓ Pipeline function defined")

In [None]:
# Test the complete pipeline
result = explain_anomaly(
    session=anomaly_sessions[0],
    screener=screener,
    retriever=retriever,
    llm_client=ollama_client  # Use local LLM
)

print(f"Session: {result['session_id']}")
print(f"Screener probability: {result['screener_prob']:.4f}")
print(f"Evidence retrieved: {result['evidence_count']}")
print(f"LLM latency: {result['llm_latency_ms']:.0f}ms")
print(f"\nExplanation:")
print(json.dumps(result['explanation'], indent=2))

## Summary

### What We Built Today

1. **Data Loading** (`BGLDataLoader`)
   - Sliding window session creation
   - Stratified train/val/test split

2. **Anomaly Detection** (`Screener`)
   - AllLinLog model wrapper
   - GPT-4 tokenizer
   - Binary classification with probability

3. **RAG Pipeline**
   - `EvidenceStore`: Build corpus from training data
   - `BM25Retriever`: Retrieve similar sessions

4. **LLM Explanation** (`LLMClient`)
   - Unified interface for Ollama/OpenAI
   - Structured JSON output with traceable claims

### Next Steps

- [ ] Implement `Verifier` to check claim faithfulness
- [ ] Test on HDFS dataset
- [ ] Batch processing pipeline
- [ ] Evaluation metrics (faithfulness, coverage)