# üß™ Pipeline Testing Notebook

This notebook allows testing all components without running Streamlit.

**Structure:**
1. Setup & Imports
2. Test Models (Database)
3. Test Tools (Restrictions, Classification)
4. Test LLM Service
5. Test Full Pipeline (Processor)
6. Test RBAC/ABAC (Different Users)
7. Test Validation/Guardrails
8. Test Scenarios (E2E)

**Usage:** Run cells sequentially. Each section is independent after Setup.


## 1. Setup & Imports


In [1]:
# Add project root to path
import sys
from pathlib import Path

# Navigate to project root (one level up from notebooks/)
# Path().absolute() gets current directory (notebooks/), .parent gets project root
PROJECT_ROOT = Path().absolute().parent
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

print(f"Project root: {PROJECT_ROOT}")


Project root: c:\Users\zamko\Documents\vlzm\kyc-analyzer


In [2]:
# Set environment variables BEFORE imports
import os

# Required for local development
os.environ.setdefault("ENV", "LOCAL")
os.environ.setdefault("LLM_PROVIDER", "openai")  # or "openai", "azure", "anthropic"
os.environ.setdefault("OLLAMA_BASE_URL", "http://localhost:11434")
os.environ.setdefault("OLLAMA_MODEL", "llama3.2")

# Database (matches docker-compose.yml - service name is "db", not "postgres")
# Run: docker compose up -d db
# NOTE: Using = instead of setdefault() to OVERRIDE .env file values
os.environ["DATABASE_HOST"] = "localhost"
os.environ["DATABASE_PORT"] = "5432"
os.environ["DATABASE_NAME"] = "app_db"  # Matches POSTGRES_DB in docker-compose (overrides .env)
os.environ["DATABASE_USER"] = "postgres"  # Matches POSTGRES_USER in docker-compose
os.environ["DATABASE_PASSWORD"] = "localdevpassword123"

print("Environment configured:")
print(f"  ENV: {os.environ['ENV']}")
print(f"  LLM_PROVIDER: {os.environ['LLM_PROVIDER']}")
print(f"  DATABASE: {os.environ['DATABASE_NAME']} @ {os.environ['DATABASE_HOST']}:{os.environ['DATABASE_PORT']}")


Environment configured:
  ENV: LOCAL
  LLM_PROVIDER: openai
  DATABASE: app_db @ localhost:5432


In [3]:
# Core imports
from app.models import Request, RequestCreate, AnalysisResult, AnalysisOutput
from app.database import init_db, get_session
from app.services.processor import Processor
from app.services.llm_service import get_llm_service
from app.services.auth_mock import get_current_user, UserProfile, Permission, MOCK_USERS, ROLE_PERMISSIONS
from app.services.validation import run_all_validations
from app.services.tools.definitions import TOOL_DEFINITIONS, TOOL_FUNCTIONS, execute_tool

print("‚úÖ All imports successful!")


‚úÖ All imports successful!


## 2. Test Models (Database)


In [4]:
# Check model fields - verify your schema changes
print("Request fields:")
for name, field in Request.model_fields.items():
    print(f"  {name}: {field.annotation}")

print("\nAnalysisResult fields:")
for name, field in AnalysisResult.model_fields.items():
    print(f"  {name}: {field.annotation}")


Request fields:
  id: typing.Optional[int]
  input_text: <class 'str'>
  context: typing.Optional[str]
  group: <class 'str'>
  created_by_user_id: typing.Optional[str]
  created_at: <class 'datetime.datetime'>

AnalysisResult fields:
  id: typing.Optional[int]
  request_id: <class 'int'>
  score: <class 'int'>
  categories: list[str]
  summary: <class 'str'>
  processed_content: typing.Optional[str]
  model_version: <class 'str'>
  group: <class 'str'>
  analyzed_by_user_id: typing.Optional[str]
  llm_trace: <class 'dict'>
  human_feedback: typing.Optional[bool]
  feedback_comment: typing.Optional[str]
  feedback_by_user_id: typing.Optional[str]
  feedback_at: typing.Optional[datetime.datetime]
  validation_status: <class 'str'>
  validation_details: typing.Optional[str]
  embedding: typing.Optional[typing.List[float]]
  created_at: <class 'datetime.datetime'>


In [5]:
# Initialize database (creates tables if not exist)
init_db()
print("‚úÖ Database initialized")

‚úÖ Database initialized


In [6]:
# Test creating a request manually (with rollback - won't pollute DB)
# This is just for testing the model structure
with get_session() as session:
    test_request = Request(
        input_text="Test document content",
        context="Testing from notebook",
        group="test_group",
    )
    session.add(test_request)
    session.flush()  # Get ID without committing
    
    print(f"‚úÖ Created request with ID: {test_request.id}")
    print(f"   Input: {test_request.input_text}")
    print(f"   Group: {test_request.group}")
    
    session.rollback()
    print("   (rolled back - test only)")


‚úÖ Created request with ID: 5
   Input: Test document content
   Group: test_group
   (rolled back - test only)


In [7]:
# Create a request that WILL be saved to DB (no rollback)
with get_session() as session:
    saved_request = Request(
        input_text="Customer service email from John Smith at ABC Corp regarding account inquiry",
        context="Test document saved to DB",
        group="default",
    )
    session.add(saved_request)
    # No rollback - this will be committed!
    
print(f"‚úÖ Saved request to DB:")
print(f"   ID: {saved_request.id}")
print(f"   Input: {saved_request.input_text}")
print(f"   Group: {saved_request.group}")


‚úÖ Saved request to DB:
   ID: 6
   Input: Customer service email from John Smith at ABC Corp regarding account inquiry
   Group: default


In [8]:
# Check what's actually in the database
from sqlmodel import select

with get_session() as session:
    # Query all requests
    stmt = select(Request)
    requests = session.exec(stmt).all()
    
    print(f"üìä Found {len(requests)} requests in database:\n")
    for req in requests:
        print(f"  ID: {req.id}")
        print(f"    Input: {req.input_text[:60]}...")
        print(f"    Group: {req.group}")
        print(f"    Created: {req.created_at}")
        print()


üìä Found 4 requests in database:

  ID: 1
    Input: Customer John Smith requested password reset for account A12...
    Group: group_a
    Created: 2026-01-09 19:23:56.431698

  ID: 2
    Input: Suspicious login attempt detected for user account from unkn...
    Group: group_a
    Created: 2026-01-09 19:24:15.013165

  ID: 4
    Input: Customer service email from John Smith at ABC Corp regarding...
    Group: default
    Created: 2026-01-09 19:43:08.568944

  ID: 6
    Input: Customer service email from John Smith at ABC Corp regarding...
    Group: default
    Created: 2026-01-09 19:48:01.345932



## 3. Test Tools (Function Calling)


In [9]:
# Check available tools
print(f"Registered tools: {len(TOOL_DEFINITIONS)}")

if TOOL_DEFINITIONS:
    for tool in TOOL_DEFINITIONS:
        func = tool["function"]
        print(f"\nüìå {func['name']}")
        print(f"   Description: {func['description'][:80]}...")
        print(f"   Parameters: {list(func['parameters']['properties'].keys())}")
else:
    print("‚ÑπÔ∏è No tools defined yet. Add tools in Phase 2.")
    print("   File: app/services/tools/definitions.py")

print(f"\nTool functions available: {list(TOOL_FUNCTIONS.keys())}")


Registered tools: 3

üìå get_current_time
   Description: Returns the current date and time in ISO format. Use this when you need to know ...
   Parameters: ['timezone']

üìå calculate
   Description: Performs arithmetic calculations. Supports basic operations: addition (+), subtr...
   Parameters: ['expression']

üìå lookup_database
   Description: Looks up information in the database by query string. Use this when you need to ...
   Parameters: ['query', 'table']

Tool functions available: ['get_current_time', 'calculate', 'lookup_database']


In [10]:
# Test tools directly (uncomment after implementing in Phase 2)
# These tests run WITHOUT LLM - just the tool functions

# Example: Test restricted entity check
# from app.services.tools.restrictions import check_restricted_list
# result = check_restricted_list("Ahmed Ivanov")
# print("Restricted entity check result:")
# print(result)

# Example: Test data classification validation  
# from app.services.tools.classification import validate_data_classification
# result = validate_data_classification("confidential", "internal")
# print("Data classification check result:")
# print(result)

print("‚ÑπÔ∏è Uncomment tool tests after implementing tools in Phase 2")


‚ÑπÔ∏è Uncomment tool tests after implementing tools in Phase 2


## 4. Test LLM Service



In [11]:
# Get LLM service instance
llm_service = get_llm_service()

print(f"LLM Provider: {llm_service.provider.provider_name}")
print(f"Model: {llm_service.provider.get_model_version()}")


LLM Provider: openai
Model: openai/gpt-5.2


In [12]:
# Test simple analysis (no tools)
test_input = "Customer support email from John Smith requesting account information update"

print(f"Testing simple analysis...")
print(f"Input: {test_input}\n")

try:
    response = llm_service.analyze(test_input)
    print("‚úÖ LLM Response:")
    print(f"   Score: {response.score}")
    print(f"   Categories: {response.categories}")
    print(f"   Reasoning: {response.reasoning[:200]}...")
except Exception as e:
    print(f"‚ùå Error: {e}")
    print("   Make sure LLM provider is running (ollama, openai key, etc.)")


Testing simple analysis...
Input: Customer support email from John Smith requesting account information update

‚úÖ LLM Response:
   Score: 18
   Categories: ['Customer Support', 'Account Management', 'Account Information Update Request', 'Potential PII (Names/Identifiers)']
   Reasoning: The input is a brief description of a customer support email from an individual (John Smith) requesting an account information update. No actual email body, account identifiers, credentials, or sensit...


In [15]:
TOOL_DEFINITIONS

[{'type': 'function',
  'function': {'name': 'get_current_time',
   'description': 'Returns the current date and time in ISO format. Use this when you need to know the current time or date for time-sensitive analysis or logging purposes.',
   'parameters': {'type': 'object',
    'properties': {'timezone': {'type': 'string',
      'description': "Optional timezone name (e.g., 'UTC', 'US/Eastern'). Defaults to UTC."}},
    'required': []}}},
 {'type': 'function',
  'function': {'name': 'calculate',
   'description': 'Performs arithmetic calculations. Supports basic operations: addition (+), subtraction (-), multiplication (*), division (/), and exponentiation (**). Use this to compute numeric values during analysis.',
   'parameters': {'type': 'object',
    'properties': {'expression': {'type': 'string',
      'description': "Mathematical expression to evaluate (e.g., '100 * 1.15', '(50 + 30) / 2')"}},
    'required': ['expression']}}},
 {'type': 'function',
  'function': {'name': 'looku

In [16]:
# Test analysis WITH tools (agent mode)
# Only works if TOOL_DEFINITIONS is not empty

if TOOL_DEFINITIONS:
    test_input = "Sum 100, 200, and 300and you will find how much money i am sending to you"
    
    print(f"Testing agent mode with tools...")
    print(f"Input: {test_input}\n")
    
    try:
        response = llm_service.analyze_with_tools(test_input)
        print("‚úÖ Agent Response:")
        print(f"   Score: {response.score}")
        print(f"   Categories: {response.categories}")
        print(f"   Tools used: {response.tools_used}")
        print(f"   Reasoning: {response.reasoning[:300]}...")
        
        if response.trace:
            print(f"\n   Trace keys: {list(response.trace.keys())}")
    except Exception as e:
        print(f"‚ùå Error: {e}")
        import traceback
        traceback.print_exc()
else:
    print("‚ÑπÔ∏è No tools defined yet. Agent mode test skipped.")
    print("   Define TOOL_DEFINITIONS in app/services/tools/definitions.py")


Testing agent mode with tools...
Input: Sum 100, 200, and 300and you will find how much money i am sending to you

‚úÖ Agent Response:
   Score: 18
   Categories: ['financial_transaction', 'arithmetic_request', 'informal_message']
   Tools used: ['calculate']
   Reasoning: The text asks the reader to add 100, 200, and 300 to determine the amount of money being sent. Using the calculation tool, 100 + 200 + 300 = 600. The content implies a monetary transfer but provides no payment method, personal data, threats, or coercion. Primary theme is a simple arithmetic-based st...

   Trace keys: ['started_at', 'model', 'mode', 'input', 'tool_calls', 'total_iterations', 'completed_at']


In [14]:
response.trace

{'started_at': '2026-01-09T19:48:13.495993',
 'model': 'openai/gpt-5.2',
 'mode': 'agent',
 'input': {'input_text': 'Document contains personal information about Ahmed Ivanov including contact details and identification numbers',
  'context': None},
 'tool_calls': [{'tool': 'lookup_database',
   'arguments': {'query': 'Ahmed Ivanov'},
   'result': '{"found": false, "query": "Ahmed Ivanov", "table": null, "message": "No results found for \'Ahmed Ivanov\'"}',
   'status': 'success'}],
 'total_iterations': 2,
 'completed_at': '2026-01-09T19:48:18.107434'}

## 5. Test Full Pipeline (Processor)

This is what Streamlit does behind the scenes - the complete analysis flow.


In [18]:
# Get a test user for RBAC
user = get_current_user("analyst_a")
print(f"Testing as user: {user.id} (role: {user.role})")
print(f"  Permissions: {[p.value for p in ROLE_PERMISSIONS.get(user.role, set())]}")
print(f"  Group: {user.group.value}")


Testing as user: usr_003 (role: UserRole.ANALYST)
  Permissions: ['view', 'view_sensitive', 'analyze']
  Group: group_a


In [19]:
# Run full analysis pipeline
test_data = RequestCreate(
    input_text="Internal document contains sensitive personal information about Elena Volkova including full address, phone number, and date of birth",
    context="High-risk data exposure scenario",
    group="default",
)

print(f"Processing request...")
print(f"  Input: {test_data.input_text}")
print(f"  Context: {test_data.context}\n")

with get_session() as session:
    processor = Processor(session, user=user)
    
    try:
        request, result = processor.process_request(test_data)
        
        print("‚úÖ Pipeline completed!")
        print(f"\nüìã Request (ID: {request.id})")
        print(f"   Input: {request.input_text[:80]}...")
        print(f"   Group: {request.group}")
        
        print(f"\nüìä Analysis Result (ID: {result.id})")
        print(f"   Score: {result.score}")
        print(f"   Categories: {result.categories}")
        print(f"   Summary: {result.summary[:200]}...")
        print(f"   Model: {result.model_version}")
        print(f"   Validation: {result.validation_status}")
        
        if result.llm_trace:
            print(f"\nüîç LLM Trace:")
            print(f"   Keys: {list(result.llm_trace.keys())}")
            if "tools_called" in result.llm_trace:
                print(f"   Tools called: {result.llm_trace['tools_called']}")
                
    except PermissionError as e:
        print(f"‚ùå Permission denied: {e}")
    except Exception as e:
        print(f"‚ùå Error: {e}")
        import traceback
        traceback.print_exc()


Processing request...
  Input: Internal document contains sensitive personal information about Elena Volkova including full address, phone number, and date of birth
  Context: High-risk data exposure scenario

‚úÖ Pipeline completed!

üìã Request (ID: 3)
   Input: Internal document contains sensitive personal information about Elena Volkova in...
   Group: group_a

üìä Analysis Result (ID: 1)
   Score: 92
   Categories: ['Personally Identifiable Information (PII)', 'Sensitive Data Exposure', 'Privacy/Compliance Risk', 'Internal Document Handling', 'Data Loss Prevention (DLP)']
   Summary: The input indicates an internal document contains sensitive personal information about an identifiable individual (Elena Volkova), specifically full address, phone number, and date of birth. These ele...
   Model: openai/gpt-5.2
   Validation: PASS

üîç LLM Trace:
   Keys: ['started_at', 'model', 'mode', 'input', 'completed_at', 'raw_response_preview']


## 6. Test RBAC/ABAC (Different Users)

In [22]:
# List all available mock users
print("Available mock users:")
for user_key, user in MOCK_USERS.items():
    print(f"\n  {user_key}:")
    print(f"    Role: {user.role}")
    print(f"  Permissions: {[p.value for p in ROLE_PERMISSIONS.get(user.role, set())]}")
    print(f"    Groups: {user.group}")


Available mock users:

  admin_default:
    Role: UserRole.ADMIN
  Permissions: ['view_sensitive', 'view', 'analyze', 'export_data', 'view_all_groups', 'manage_users']
    Groups: Group.DEFAULT

  senior_default:
    Role: UserRole.SENIOR_ANALYST
  Permissions: ['view_sensitive', 'view', 'analyze', 'export_data', 'view_all_groups']
    Groups: Group.DEFAULT

  analyst_a:
    Role: UserRole.ANALYST
  Permissions: ['view', 'view_sensitive', 'analyze']
    Groups: Group.GROUP_A

  analyst_b:
    Role: UserRole.ANALYST
  Permissions: ['view', 'view_sensitive', 'analyze']
    Groups: Group.GROUP_B

  viewer_a:
    Role: UserRole.VIEWER
  Permissions: ['view']
    Groups: Group.GROUP_A


In [23]:
# Test RBAC - viewer should NOT be able to analyze
from app.services.auth_mock import Permission


viewer = get_current_user("viewer_a")
print(f"Testing as VIEWER: {viewer.id}")
print(f"  Permissions: {[p.value for p in ROLE_PERMISSIONS.get(user.role, set[Permission]())]}")

with get_session() as session:
    processor = Processor(session, user=viewer)
    
    try:
        request, result = processor.process_request(RequestCreate(
            input_text="Test document content",
        ))
        print("‚ùå Should have failed! Viewer shouldn't be able to analyze.")
    except PermissionError as e:
        print(f"‚úÖ Correctly blocked: {e}")


Testing as VIEWER: usr_005
  Permissions: ['view']
‚úÖ Correctly blocked: Access denied. User 'Eve Viewer (Group A)' with role 'viewer' does not have permission 'analyze'.


In [24]:
# Test ABAC - analysts only see their group's data
analyst_a = get_current_user("analyst_a")
analyst_b = get_current_user("analyst_b")

print(f"Analyst A groups: {analyst_a.group}")
print(f"Analyst B groups: {analyst_b.group}")

with get_session() as session:
    processor_a = Processor(session, user=analyst_a)
    results_a = processor_a.get_recent_results(limit=10)
    
    processor_b = Processor(session, user=analyst_b)
    results_b = processor_b.get_recent_results(limit=10)
    
    print(f"\nAnalyst A sees {len(results_a)} results")
    print(f"Analyst B sees {len(results_b)} results")
    
    if results_a:
        print(f"Analyst A result groups: {set(r.group for r in results_a)}")
    if results_b:
        print(f"Analyst B result groups: {set(r.group for r in results_b)}")


Analyst A groups: Group.GROUP_A
Analyst B groups: Group.GROUP_B

Analyst A sees 1 results
Analyst B sees 0 results
Analyst A result groups: {'group_a'}


## 7. Test Validation / Guardrails

In [25]:
# Test validation functions directly
from app.services.llm_service import LLMResponse

# Create mock LLM response WITH potential PII leakage
mock_response = LLMResponse(
    score=75,
    categories=["sensitive_data", "high_risk"],
    reasoning="This document contains sensitive personal information. The individual's SSN is 123-45-6789.",
)

original_input = "Document from John Smith"

result = run_all_validations(
    response_text=mock_response.reasoning,
    score=mock_response.score,              
    categories=mock_response.categories
)

print(f"Validation status: {result.status}")
print(f"Details: {result.details}")
print(f"Passed: {result.passed}")


Validation status: PASS
Details: None
Passed: True


## 8. Test Scenarios (E2E)

In [26]:
# Define test scenarios for content analysis
TEST_SCENARIOS = [
    {
        "name": "Clean Document",
        "input": "General customer service email from ABC Corp regarding product inquiry",
        "expected_risk": "LOW",
    },
    {
        "name": "Partial PII Exposure",
        "input": "Customer support ticket contains name and email address for monthly subscription inquiry",
        "expected_risk": "MEDIUM",
    },
    {
        "name": "Restricted Entity Match",
        "input": "Document contains personal information about Ahmed Ivanov including full contact details and identification numbers",
        "expected_risk": "CRITICAL",
    },
    {
        "name": "Sensitive Person Data",
        "input": "Internal document includes comprehensive personal information about Elena Volkova: full address, phone number, date of birth, and employment details",
        "expected_risk": "HIGH",
    },
]

print(f"Defined {len(TEST_SCENARIOS)} test scenarios:")
for i, scenario in enumerate(TEST_SCENARIOS, 1):
    print(f"  {i}. {scenario['name']} - Expected: {scenario['expected_risk']}")


Defined 4 test scenarios:
  1. Clean Document - Expected: LOW
  2. Partial PII Exposure - Expected: MEDIUM
  3. Restricted Entity Match - Expected: CRITICAL
  4. Sensitive Person Data - Expected: HIGH


## 9. Test RAG (Retrieval-Augmented Generation)

RAG uses pgvector to find similar historical cases based on semantic similarity.
This feature can be disabled with `RAG_ENABLED=false`.


In [None]:
from app.services.rag_service import calculate_similarity
from app.services.rag_service import get_rag_service

with get_session() as session:
    rag_service = get_rag_service(session)

    text_1 = "Welcome powerpoint presentation"
    text_2 = "Customer John Smith requested password reset for account A12345"
    embedding_1 = rag_service.get_embedding(text_1)
    embedding_2 = rag_service.get_embedding(text_2)

    distance, similarity_pct = calculate_similarity(embedding_1, embedding_2)

    print(f"Cosine distance: {distance:.4f}")
    print(f"Similarity: {similarity_pct:.1f}%")

In [27]:
# Check if RAG is enabled and available
from app.services.rag_service import RAGService, get_rag_service
from app.services.secret_manager import get_settings

settings = get_settings()
print(f"RAG Enabled: {settings.rag_enabled}")
print(f"Embedding Model: {settings.embedding_model}")
print(f"Embedding Dimensions: {settings.embedding_dimensions}")

# Check pgvector availability
from app.models import PGVECTOR_AVAILABLE
print(f"pgvector Available: {PGVECTOR_AVAILABLE}")


RAG Enabled: True
Embedding Model: text-embedding-3-small
Embedding Dimensions: 1536
pgvector Available: True


In [29]:
# Test embedding generation directly
with get_session() as session:
    rag_service = get_rag_service(session)
    
    if rag_service.is_enabled:
        test_text = "Customer support email requesting account information update"
        
        print(f"Testing embedding generation...")
        print(f"Input: {test_text}\n")
        
        try:
            embedding = rag_service.get_embedding(test_text)
            print(f"‚úÖ Embedding generated successfully!")
            print(f"   Dimensions: {len(embedding)}")
            print(f"   First 5 values: {embedding[:5]}")
        except Exception as e:
            print(f"‚ùå Error generating embedding: {e}")
            print("   Make sure OPENAI_API_KEY is set in your environment")
    else:
        print("‚ÑπÔ∏è RAG is disabled. Set RAG_ENABLED=true to test embeddings.")


Testing embedding generation...
Input: Customer support email requesting account information update

‚úÖ Embedding generated successfully!
   Dimensions: 1536
   First 5 values: [0.05845680832862854, -0.015425351448357105, 0.02822798863053322, -0.0036467909812927246, 0.0382862351834774]


In [30]:
# Test full pipeline with embedding generation
# This creates analysis results WITH embeddings for similarity search

user = get_current_user("analyst_a")

# Create several test cases for RAG to search through
test_cases = [
    {
        "input": "Customer John Smith requested password reset for account A12345",
        "context": "Password reset scenario",
    },
    {
        "input": "Suspicious login attempt detected for user account from unknown IP address",
        "context": "Security alert scenario",
    },
    {
        "input": "Internal memo regarding employee salary information and benefits",
        "context": "HR document scenario",
    },
]

print("Creating test cases with embeddings...\n")

created_results = []
with get_session() as session:
    processor = Processor(session, user=user)
    
    # Check if RAG is enabled
    print(f"RAG enabled: {processor.is_rag_enabled()}\n")
    
    for i, case in enumerate(test_cases, 1):
        print(f"Processing case {i}: {case['input'][:50]}...")
        
        try:
            request, result = processor.process_request(RequestCreate(
                input_text=case["input"],
                context=case["context"],
            ))
            
            created_results.append(result)
            
            # Check if embedding was created
            has_embedding = result.embedding is not None and len(result.embedding) > 0
            print(f"   ‚úÖ Result {result.id} created, Score: {result.score}, Has embedding: {has_embedding}")
            
        except Exception as e:
            print(f"   ‚ùå Error: {e}")

print(f"\n‚úÖ Created {len(created_results)} results for similarity testing")


Creating test cases with embeddings...

RAG enabled: True

Processing case 1: Customer John Smith requested password reset for a...
   ‚úÖ Result 2 created, Score: 35, Has embedding: True
Processing case 2: Suspicious login attempt detected for user account...


Validation failed for request 5: FAIL_LOW_QUALITY - Uncertainty detected: 'unknown'


   ‚úÖ Result 3 created, Score: 68, Has embedding: True
Processing case 3: Internal memo regarding employee salary informatio...
   ‚úÖ Result 4 created, Score: 62, Has embedding: True

‚úÖ Created 3 results for similarity testing


In [43]:
# Test similarity search
# Search for cases similar to a new query

test_query = "Welcome powerpoint presentation"

print(f"üîç Searching for similar cases...")
print(f"Query: {test_query}\n")

with get_session() as session:
    rag_service = get_rag_service(session)
    
    if rag_service.is_enabled:
        try:
            similar_cases = rag_service.find_similar_cases(
                query_text=test_query,
                limit=3,
            )
            
            if similar_cases:
                print(f"‚úÖ Found {len(similar_cases)} similar cases:\n")
                for i, case in enumerate(similar_cases, 1):
                    print(f"  {i}. Result ID: {case.id}")
                    print(f"     Score: {case.score}")
                    print(f"     Categories: {case.categories}")
                    print(f"     Summary: {case.summary[:100]}...")
                    print()
            else:
                print("‚ÑπÔ∏è No similar cases found (database may be empty or no embeddings)")
                
        except Exception as e:
            print(f"‚ùå Error during similarity search: {e}")
    else:
        print("‚ÑπÔ∏è RAG is disabled. Set RAG_ENABLED=true to test similarity search.")


üîç Searching for similar cases...
Query: Welcome powerpoint presentation

‚úÖ Found 3 similar cases:

  1. Result ID: 7
     Score: 5
     Categories: ['Document/File Description', 'Presentation', 'Low Information Content']
     Summary: The input is a brief, generic description indicating a 'Welcome' PowerPoint presentation with the ad...

  2. Result ID: 4
     Score: 8
     Categories: ['Document type: presentation', 'General/benign content', 'Low informational density']
     Summary: The input consists of a short title-like phrase: "Welcome presentation" with an added note indicatin...

  3. Result ID: 3
     Score: 62
     Categories: ['HR/Employment', 'Compensation & Benefits', 'Confidential/Proprietary Information', 'Personal Data (PII) - Potential', 'Internal Communications']
     Summary: The input describes an internal HR memo focused on employee salary information and benefits. This ty...



In [32]:
# Test finding similar cases through Processor (with ABAC filtering)
# This demonstrates how the UI/API would use similar case search

user = get_current_user("analyst_a")

with get_session() as session:
    processor = Processor(session, user=user)
    
    # First, get the most recent result
    recent_results = processor.get_recent_results(limit=1)
    
    if recent_results:
        current_result = recent_results[0]
        print(f"Finding cases similar to Result {current_result.id}:")
        print(f"   Score: {current_result.score}")
        print(f"   Categories: {current_result.categories}\n")
        
        if processor.is_rag_enabled():
            similar = processor.find_similar_cases(current_result, limit=3)
            
            if similar:
                print(f"‚úÖ Found {len(similar)} similar cases:\n")
                for i, case in enumerate(similar, 1):
                    print(f"  {i}. Result ID: {case.id}")
                    print(f"     Score: {case.score}")
                    print(f"     Summary: {case.summary[:80]}...")
                    print()
            else:
                print("‚ÑπÔ∏è No similar cases found")
        else:
            print("‚ÑπÔ∏è RAG is disabled")
    else:
        print("‚ÑπÔ∏è No results in database to test with")


Finding cases similar to Result 4:
   Score: 62
   Categories: ['HR/Employment', 'Compensation & Benefits', 'Confidential/PII Risk', 'Internal Communications', 'Compliance & Policy']

‚úÖ Found 3 similar cases:

  1. Result ID: 1
     Score: 92
     Summary: The input indicates an internal document contains sensitive personal information...

  2. Result ID: 2
     Score: 35
     Summary: The input describes a customer support/security event: a password reset request ...

  3. Result ID: 3
     Score: 68
     Summary: The input describes a suspicious login attempt for a user account originating fr...



In [33]:
# Run all test scenarios
def run_test_scenario(scenario: dict, user: UserProfile):
    """Run a single test scenario and return results."""
    with get_session() as session:
        processor = Processor(session, user=user)
        
        request_data = RequestCreate(
            input_text=scenario["input"],
            context=f"Test: {scenario['name']}",
        )
        
        request, result = processor.process_request(request_data)
        
        return {
            "name": scenario["name"],
            "expected": scenario["expected_risk"],
            "actual_score": result.score,
            "categories": result.categories,
            "validation": result.validation_status,
            "summary": result.summary[:100] + "...",
        }

# Run scenarios
user = get_current_user("analyst_a")

print("Running test scenarios...\n")
for scenario in TEST_SCENARIOS:
    try:
        result = run_test_scenario(scenario, user)
        
        # Determine risk level from score
        score = result["actual_score"]
        if score <= 25:
            actual_level = "LOW"
        elif score <= 50:
            actual_level = "MEDIUM"
        elif score <= 75:
            actual_level = "HIGH"
        else:
            actual_level = "CRITICAL"
        
        match = "‚úÖ" if actual_level == result["expected"] else "‚ö†Ô∏è"
        
        print(f"{match} {result['name']}")
        print(f"   Expected: {result['expected']}, Got: {actual_level} (score: {score})")
        print(f"   Categories: {result['categories']}")
        print(f"   Validation: {result['validation']}")
        print()
        
    except Exception as e:
        print(f"‚ùå {scenario['name']}: {e}\n")


Running test scenarios...

‚úÖ Clean Document
   Expected: LOW, Got: LOW (score: 5)
   Categories: ['Customer Service', 'Product Inquiry', 'Administrative/Meta Content', 'Low-Risk/Benign']
   Validation: PASS

‚úÖ Partial PII Exposure
   Expected: MEDIUM, Got: MEDIUM (score: 42)
   Categories: ['PII', 'Customer Support', 'Subscription/Billing Inquiry', 'Data Privacy']
   Validation: PASS

‚ö†Ô∏è Restricted Entity Match
   Expected: CRITICAL, Got: HIGH (score: 72)
   Categories: ['Personal Data / PII', 'Sensitive Identifiers', 'Contact Information', 'Restricted Entity / Watchlist Screening Context', 'Compliance / Privacy Risk']
   Validation: PASS



Validation failed for request 10: FAIL_LOW_QUALITY - Uncertainty detected: 'n/a'


‚ö†Ô∏è Sensitive Person Data
   Expected: HIGH, Got: CRITICAL (score: 88)
   Categories: ['Personal Data (PII)', 'Sensitive Personal Information', 'Privacy/Confidentiality Risk', 'Internal Document Handling', 'Data Protection/Compliance']
   Validation: FAIL_LOW_QUALITY



## 10. Debug Helpers

In [34]:
# Helper: View recent results from DB
from sqlmodel import select

with get_session() as session:
    stmt = select(AnalysisResult).order_by(AnalysisResult.created_at.desc()).limit(5)
    results = session.exec(stmt).all()
    
    print(f"Last {len(results)} analysis results:\n")
    for r in results:
        print(f"ID: {r.id} | Score: {r.score} | Status: {r.validation_status}")
        print(f"   Categories: {r.categories}")
        print(f"   Created: {r.created_at}")
        print()


Last 5 analysis results:

ID: 8 | Score: 88 | Status: FAIL_LOW_QUALITY
   Categories: ['Personal Data (PII)', 'Sensitive Personal Information', 'Privacy/Confidentiality Risk', 'Internal Document Handling', 'Data Protection/Compliance']
   Created: 2026-01-09 17:44:39.323043

ID: 7 | Score: 72 | Status: PASS
   Categories: ['Personal Data / PII', 'Sensitive Identifiers', 'Contact Information', 'Restricted Entity / Watchlist Screening Context', 'Compliance / Privacy Risk']
   Created: 2026-01-09 17:44:34.184519

ID: 6 | Score: 42 | Status: PASS
   Categories: ['PII', 'Customer Support', 'Subscription/Billing Inquiry', 'Data Privacy']
   Created: 2026-01-09 17:44:29.010817

ID: 5 | Score: 5 | Status: PASS
   Categories: ['Customer Service', 'Product Inquiry', 'Administrative/Meta Content', 'Low-Risk/Benign']
   Created: 2026-01-09 17:44:23.161128

ID: 4 | Score: 62 | Status: PASS
   Categories: ['HR/Employment', 'Compensation & Benefits', 'Confidential/PII Risk', 'Internal Communications'

In [36]:
# Helper: View LLM trace for a specific result
import json

result_id = 1  # Change this to inspect different results

with get_session() as session:
    result = session.get(AnalysisResult, result_id)
    
    if result and result.llm_trace:
        print(f"LLM Trace for result {result_id}:")
        print(json.dumps(result.llm_trace, indent=2, default=str))
    else:
        print(f"No trace found for result {result_id}")


LLM Trace for result 1:
{
  "started_at": "2026-01-09T17:38:50.802751",
  "model": "openai/gpt-5.2",
  "mode": "simple",
  "input": {
    "input_text": "Internal document contains sensitive personal information about Elena Volkova including full address, phone number, and date of birth",
    "context": "High-risk data exposure scenario"
  },
  "completed_at": "2026-01-09T17:38:56.324234",
  "raw_response_preview": "{\n  \"score\": 92,\n  \"categories\": [\n    \"Personally Identifiable Information (PII)\",\n    \"Sensitive Data Exposure\",\n    \"Privacy/Compliance Risk\",\n    \"Internal Document Handling\",\n    \"Data Loss Prevention (DLP)\"\n  ],\n  \"summary\": \"The input indicates an internal document contains sensitive personal information about an identifiable individual (Elena Volkova), specifically full address, phone number, and date of birth. These elements constitute high-risk PII because they can enable identity thef"
}


In [38]:
# Helper: Clear all test data (use carefully!)
# Uncomment to run

from sqlmodel import text
with get_session() as session:
    session.exec(text("TRUNCATE analysis_results RESTART IDENTITY CASCADE"))
    session.exec(text("TRUNCATE requests RESTART IDENTITY CASCADE"))
    print("‚úÖ All test data cleared")


: 

---

## üìù Quick Reference

### Before Starting:
1. `docker compose up -d db` - Start database with pgvector (service name is **db**)
2. Check LLM provider (Ollama running OR API key set)
3. Run Setup cells (1.1 - 1.3)

### Quick Validation:
| Phase | Test Section |
|-------|-------------|
| Phase 1 (Models) | Section 2 |
| Phase 2 (Tools) | Section 3 |
| Phase 3 (Prompts) | Sections 4-5 |
| Phase 4 (Validation) | Section 7 |
| Phase 5 (Processor) | Section 5 |
| Phase 6 (UI) | Streamlit browser |
| RAG (Vector Search) | Section 9 |

### Hotkeys:
- `Shift+Enter` - Run cell and move to next
- `Ctrl+Enter` - Run cell and stay
- `Esc + A` - Insert cell above
- `Esc + B` - Insert cell below
