# üß™ Pipeline Testing Notebook

This notebook allows testing all components without running Streamlit.

**Structure:**
1. Setup & Imports
2. Test Models (Database)
3. Test Tools (Sanctions, Thresholds)
4. Test LLM Service
5. Test Full Pipeline (Processor)
6. Test RBAC/ABAC (Different Users)
7. Test Validation/Guardrails
8. Test Scenarios (E2E)

**Usage:** Run cells sequentially. Each section is independent after Setup.


## 1. Setup & Imports


In [None]:
# Add project root to path
import sys
from pathlib import Path

# Navigate to project root (one level up from notebooks/)
PROJECT_ROOT = Path().absolute().parent
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

print(f"Project root: {PROJECT_ROOT}")


In [None]:
# Set environment variables BEFORE imports
import os

# Required for local development
os.environ.setdefault("ENV", "LOCAL")
os.environ.setdefault("LLM_PROVIDER", "ollama")  # or "openai", "azure", "anthropic"
os.environ.setdefault("OLLAMA_BASE_URL", "http://localhost:11434")
os.environ.setdefault("OLLAMA_MODEL", "llama3.2")

# Database (use docker compose postgres)
os.environ.setdefault("DATABASE_HOST", "localhost")
os.environ.setdefault("DATABASE_PORT", "5432")
os.environ.setdefault("DATABASE_NAME", "genai_db")
os.environ.setdefault("DATABASE_USER", "genai_user")
os.environ.setdefault("DATABASE_PASSWORD", "localdevpassword123")

print("Environment configured:")
print(f"  ENV: {os.environ['ENV']}")
print(f"  LLM_PROVIDER: {os.environ['LLM_PROVIDER']}")


In [None]:
# Core imports
from app.models import Request, RequestCreate, AnalysisResult, AnalysisOutput
from app.database import init_db, get_session
from app.services.processor import Processor
from app.services.llm_service import get_llm_service
from app.services.auth_mock import get_current_user, UserProfile, Permission, MOCK_USERS
from app.services.validation import run_all_validations
from app.services.tools.definitions import TOOL_DEFINITIONS, TOOL_FUNCTIONS, execute_tool

print("‚úÖ All imports successful!")


## 2. Test Models (Database)


In [None]:
# Check model fields - verify your schema changes
print("Request fields:")
for name, field in Request.__fields__.items():
    print(f"  {name}: {field.annotation}")

print("\nAnalysisResult fields:")
for name, field in AnalysisResult.__fields__.items():
    print(f"  {name}: {field.annotation}")


In [None]:
# Initialize database (creates tables if not exist)
init_db()
print("‚úÖ Database initialized")


In [None]:
# Test creating a request manually (with rollback - won't pollute DB)
with get_session() as session:
    test_request = Request(
        input_text="Test transaction comment",
        context="Testing from notebook",
        group="test_group",
    )
    session.add(test_request)
    session.flush()  # Get ID without committing
    
    print(f"‚úÖ Created request with ID: {test_request.id}")
    print(f"   Input: {test_request.input_text}")
    print(f"   Group: {test_request.group}")
    
    session.rollback()
    print("   (rolled back - test only)")


## 3. Test Tools (Function Calling)


In [None]:
# Check available tools
print(f"Registered tools: {len(TOOL_DEFINITIONS)}")

if TOOL_DEFINITIONS:
    for tool in TOOL_DEFINITIONS:
        func = tool["function"]
        print(f"\nüìå {func['name']}")
        print(f"   Description: {func['description'][:80]}...")
        print(f"   Parameters: {list(func['parameters']['properties'].keys())}")
else:
    print("‚ÑπÔ∏è No tools defined yet. Add tools in Phase 2.")
    print("   File: app/services/tools/definitions.py")

print(f"\nTool functions available: {list(TOOL_FUNCTIONS.keys())}")


In [None]:
# Test tools directly (uncomment after implementing in Phase 2)
# These tests run WITHOUT LLM - just the tool functions

# Example: Test sanctions check
# from app.services.tools.sanctions import check_sanctions_list
# result = check_sanctions_list("Ahmed Ivanov")
# print("Sanctions check result:")
# print(result)

# Example: Test threshold validation  
# from app.services.tools.thresholds import validate_amount_threshold
# result = validate_amount_threshold(9500, "USD")
# print("Threshold check result:")
# print(result)

print("‚ÑπÔ∏è Uncomment tool tests after implementing tools in Phase 2")


## 4. Test LLM Service


In [None]:
# Get LLM service instance
llm_service = get_llm_service()

print(f"LLM Provider: {llm_service.provider.provider_name}")
print(f"Model: {llm_service.provider.get_model_version()}")


In [None]:
# Test simple analysis (no tools)
test_input = "Payment for consulting services from John Smith, amount $5000"

print(f"Testing simple analysis...")
print(f"Input: {test_input}\n")

try:
    response = llm_service.analyze(test_input)
    print("‚úÖ LLM Response:")
    print(f"   Score: {response.score}")
    print(f"   Categories: {response.categories}")
    print(f"   Summary: {response.summary[:200]}...")
except Exception as e:
    print(f"‚ùå Error: {e}")
    print("   Make sure LLM provider is running (ollama, openai key, etc.)")


In [None]:
# Test analysis WITH tools (agent mode)
# Only works if TOOL_DEFINITIONS is not empty

if TOOL_DEFINITIONS:
    test_input = "Wire transfer from Ahmed Ivanov for $9500 USD"
    
    print(f"Testing agent mode with tools...")
    print(f"Input: {test_input}\n")
    
    try:
        response = llm_service.analyze_with_tools(test_input)
        print("‚úÖ Agent Response:")
        print(f"   Score: {response.score}")
        print(f"   Categories: {response.categories}")
        print(f"   Tools used: {response.tools_used}")
        print(f"   Summary: {response.summary[:300]}...")
        
        if response.trace:
            print(f"\n   Trace keys: {list(response.trace.keys())}")
    except Exception as e:
        print(f"‚ùå Error: {e}")
        import traceback
        traceback.print_exc()
else:
    print("‚ÑπÔ∏è No tools defined yet. Agent mode test skipped.")
    print("   Define TOOL_DEFINITIONS in app/services/tools/definitions.py")


## 5. Test Full Pipeline (Processor)

This is what Streamlit does behind the scenes - the complete analysis flow.


In [None]:
# Get a test user for RBAC
user = get_current_user("analyst_a")
print(f"Testing as user: {user.user_id} (role: {user.role})")
print(f"  Permissions: {[p.value for p in user.permissions]}")
print(f"  Groups: {user.groups}")


In [None]:
# Run full analysis pipeline
test_data = RequestCreate(
    input_text="International wire transfer from Elena Volkova for real estate purchase, amount $150,000",
    context="High-value cross-border transaction",
    group="default",
)

print(f"Processing request...")
print(f"  Input: {test_data.input_text}")
print(f"  Context: {test_data.context}\n")

with get_session() as session:
    processor = Processor(session, user=user)
    
    try:
        request, result = processor.process_request(test_data)
        
        print("‚úÖ Pipeline completed!")
        print(f"\nüìã Request (ID: {request.id})")
        print(f"   Input: {request.input_text[:80]}...")
        print(f"   Group: {request.group}")
        
        print(f"\nüìä Analysis Result (ID: {result.id})")
        print(f"   Score: {result.score}")
        print(f"   Categories: {result.categories}")
        print(f"   Summary: {result.summary[:200]}...")
        print(f"   Model: {result.model_version}")
        print(f"   Validation: {result.validation_status}")
        
        if result.llm_trace:
            print(f"\nüîç LLM Trace:")
            print(f"   Keys: {list(result.llm_trace.keys())}")
            if "tools_called" in result.llm_trace:
                print(f"   Tools called: {result.llm_trace['tools_called']}")
                
    except PermissionError as e:
        print(f"‚ùå Permission denied: {e}")
    except Exception as e:
        print(f"‚ùå Error: {e}")
        import traceback
        traceback.print_exc()


## 6. Test RBAC/ABAC (Different Users)


In [None]:
# List all available mock users
print("Available mock users:")
for user_key, user in MOCK_USERS.items():
    print(f"\n  {user_key}:")
    print(f"    Role: {user.role}")
    print(f"    Permissions: {[p.value for p in user.permissions]}")
    print(f"    Groups: {user.groups}")


In [None]:
# Test RBAC - viewer should NOT be able to analyze
viewer = get_current_user("viewer_a")
print(f"Testing as VIEWER: {viewer.user_id}")
print(f"  Permissions: {[p.value for p in viewer.permissions]}")

with get_session() as session:
    processor = Processor(session, user=viewer)
    
    try:
        request, result = processor.process_request(RequestCreate(
            input_text="Test transaction",
        ))
        print("‚ùå Should have failed! Viewer shouldn't be able to analyze.")
    except PermissionError as e:
        print(f"‚úÖ Correctly blocked: {e}")


In [None]:
# Test ABAC - analysts only see their group's data
analyst_a = get_current_user("analyst_a")
analyst_b = get_current_user("analyst_b")

print(f"Analyst A groups: {analyst_a.groups}")
print(f"Analyst B groups: {analyst_b.groups}")

with get_session() as session:
    processor_a = Processor(session, user=analyst_a)
    results_a = processor_a.get_recent_results(limit=10)
    
    processor_b = Processor(session, user=analyst_b)
    results_b = processor_b.get_recent_results(limit=10)
    
    print(f"\nAnalyst A sees {len(results_a)} results")
    print(f"Analyst B sees {len(results_b)} results")
    
    if results_a:
        print(f"Analyst A result groups: {set(r.group for r in results_a)}")
    if results_b:
        print(f"Analyst B result groups: {set(r.group for r in results_b)}")


## 7. Test Validation / Guardrails


In [None]:
# Test validation functions directly
from app.services.llm_service import LLMResponse

# Create mock LLM response WITH potential PII leakage
mock_response = LLMResponse(
    score=75,
    categories=["suspicious", "high_value"],
    summary="This transaction shows signs of potential money laundering. The sender's SSN is 123-45-6789.",
    reasoning="Based on the pattern analysis...",
)

original_input = "Wire transfer from John Smith"

status, details = run_all_validations(mock_response, original_input)

print(f"Validation status: {status}")
print(f"Details: {details}")

# If you implemented PII detection, this should catch the SSN
if details and "PII" in str(details):
    print("\n‚úÖ PII detection working!")
else:
    print("\n‚ÑπÔ∏è PII detection not implemented yet (Phase 4)")


## 8. Test Scenarios (E2E)


In [None]:
# Define test scenarios for the KYC/AML case
TEST_SCENARIOS = [
    {
        "name": "Clean Transaction",
        "input": "Payment for consulting services from ABC Corp, $2,500",
        "expected_risk": "LOW",
    },
    {
        "name": "Near Threshold (Structuring)",
        "input": "Cash deposit $9,500 - monthly savings",
        "expected_risk": "MEDIUM",
    },
    {
        "name": "Sanctions Match",
        "input": "Wire transfer from Ahmed Ivanov for equipment purchase, $15,000",
        "expected_risk": "CRITICAL",
    },
    {
        "name": "PEP Transaction",
        "input": "Donation from Elena Volkova for charity event, $50,000",
        "expected_risk": "HIGH",
    },
]

print(f"Defined {len(TEST_SCENARIOS)} test scenarios:")
for i, scenario in enumerate(TEST_SCENARIOS, 1):
    print(f"  {i}. {scenario['name']} - Expected: {scenario['expected_risk']}")


In [None]:
# Run all test scenarios
def run_test_scenario(scenario: dict, user: UserProfile):
    """Run a single test scenario and return results."""
    with get_session() as session:
        processor = Processor(session, user=user)
        
        request_data = RequestCreate(
            input_text=scenario["input"],
            context=f"Test: {scenario['name']}",
        )
        
        request, result = processor.process_request(request_data)
        
        return {
            "name": scenario["name"],
            "expected": scenario["expected_risk"],
            "actual_score": result.score,
            "categories": result.categories,
            "validation": result.validation_status,
            "summary": result.summary[:100] + "...",
        }

# Run scenarios
user = get_current_user("analyst_a")

print("Running test scenarios...\n")
for scenario in TEST_SCENARIOS:
    try:
        result = run_test_scenario(scenario, user)
        
        # Determine risk level from score
        score = result["actual_score"]
        if score <= 25:
            actual_level = "LOW"
        elif score <= 50:
            actual_level = "MEDIUM"
        elif score <= 75:
            actual_level = "HIGH"
        else:
            actual_level = "CRITICAL"
        
        match = "‚úÖ" if actual_level == result["expected"] else "‚ö†Ô∏è"
        
        print(f"{match} {result['name']}")
        print(f"   Expected: {result['expected']}, Got: {actual_level} (score: {score})")
        print(f"   Categories: {result['categories']}")
        print(f"   Validation: {result['validation']}")
        print()
        
    except Exception as e:
        print(f"‚ùå {scenario['name']}: {e}\n")


## 9. Debug Helpers


In [None]:
# Helper: View recent results from DB
from sqlmodel import select

with get_session() as session:
    stmt = select(AnalysisResult).order_by(AnalysisResult.created_at.desc()).limit(5)
    results = session.exec(stmt).all()
    
    print(f"Last {len(results)} analysis results:\n")
    for r in results:
        print(f"ID: {r.id} | Score: {r.score} | Status: {r.validation_status}")
        print(f"   Categories: {r.categories}")
        print(f"   Created: {r.created_at}")
        print()


In [None]:
# Helper: View LLM trace for a specific result
import json

result_id = 1  # Change this to inspect different results

with get_session() as session:
    result = session.get(AnalysisResult, result_id)
    
    if result and result.llm_trace:
        print(f"LLM Trace for result {result_id}:")
        print(json.dumps(result.llm_trace, indent=2, default=str))
    else:
        print(f"No trace found for result {result_id}")


In [None]:
# Helper: Clear all test data (use carefully!)
# Uncomment to run

# from sqlmodel import text
# with get_session() as session:
#     session.exec(text("DELETE FROM analysis_results"))
#     session.exec(text("DELETE FROM requests"))
#     print("‚úÖ All test data cleared")


---

## üìù Quick Reference

### Before Interview:
1. `docker compose up -d postgres` - Start database
2. Check LLM provider (Ollama running OR API key set)
3. Run Setup cells (1.1 - 1.3)

### During Interview - Quick Validation:
| Phase | Test Section |
|-------|-------------|
| Phase 1 (Models) | Section 2 |
| Phase 2 (Tools) | Section 3 |
| Phase 3 (Prompts) | Sections 4-5 |
| Phase 4 (Validation) | Section 7 |
| Phase 5 (Processor) | Section 5 |
| Phase 6 (UI) | Streamlit browser |

### Hotkeys:
- `Shift+Enter` - Run cell and move to next
- `Ctrl+Enter` - Run cell and stay
- `Esc + A` - Insert cell above
- `Esc + B` - Insert cell below
