# 🔗 Workflow Integration - Complete SAR Processing System

Welcome to Phase 4 of the Financial Services Agentic AI Project!

In this notebook, you'll integrate both AI agents into a complete **end-to-end SAR processing workflow** that demonstrates real-world financial compliance automation.

## 🎯 Learning Objectives
- Build a complete two-stage AI workflow with human oversight
- Implement human-in-the-loop decision gates for compliance
- Generate complete SAR documents from AI analysis
- Create comprehensive audit trails for regulatory examination
- Demonstrate cost optimization through intelligent agent coordination

## 📋 Business Context
This workflow simulates how banks actually process suspicious activity reports:
1. **Risk Screening**: AI agents analyze transaction patterns for suspicious activity
2. **Human Review**: Compliance officers review AI findings before proceeding
3. **Narrative Generation**: Only approved cases get full compliance documentation
4. **SAR Filing**: Complete regulatory forms are generated for submission
5. **Audit Documentation**: Every decision is logged for regulatory examination

## 🏗️ System Architecture

```
📊 CSV Data → 🔍 Risk Analyst → 👤 Human Decision → ✅ Compliance Officer → 📄 SAR Document
              (Chain-of-Thought)    (Gate)         (ReACT Framework)     (FinCEN Ready)
```

## 🚀 Prerequisites Check

Before starting, ensure you have completed:
- ✅ Phase 1: Foundation components (`foundation_sar.py`)
- ✅ Phase 2: Risk Analyst Agent (`risk_analyst_agent.py`)
- ✅ Phase 3: Compliance Officer Agent (`compliance_officer_agent.py`)
- ✅ Both agents pass their comprehensive test scenarios

If any component is missing, return to previous notebooks to complete implementation.

In [None]:
# Setup and Environment Configuration
import os
import sys
import json
import pandas as pd
import uuid
import hashlib
from datetime import datetime, timedelta
from dotenv import load_dotenv

# Add src directory to Python path for module imports
sys.path.append(os.path.abspath('../src'))

# Load environment variables
load_dotenv('../.env')

print("📚 Libraries imported successfully!")
print("🔐 Environment variables loaded")
print("📂 Source directory added to Python path")

In [None]:
# OpenAI Setup for Vocareum
import openai

# Initialize OpenAI client for Vocareum
openai_api_key = os.getenv('OPENAI_API_KEY')

if not openai_api_key:
    print("⚠️ WARNING: No OpenAI API key found!")
    print("Please set OPENAI_API_KEY in your .env file")
    print("Get your Vocareum OpenAI API key from 'Cloud Resources' in your workspace")
else:
    # Vocareum requires routing through their servers
    client = openai.OpenAI(
        base_url="https://openai.vocareum.com/v1",
        api_key=openai_api_key
    )
    print("✅ OpenAI client initialized with Vocareum routing")
    print(f"🔑 API key: {openai_api_key[:8]}...{openai_api_key[-4:]}")
    print("📍 Base URL: https://openai.vocareum.com/v1")

In [None]:
# TODO: Import Your Implemented Components
# Students: Import your foundation components and agents

print("📋 TODO: Import your implemented components")
print("Uncomment and modify these imports once you've implemented all components:")

# from foundation_sar import (
#     CustomerData,
#     AccountData,
#     TransactionData,
#     CaseData,
#     RiskAnalystOutput,
#     ComplianceOfficerOutput,
#     ExplainabilityLogger,
#     DataLoader,
#     load_csv_data
# )
# from risk_analyst_agent import RiskAnalystAgent
# from compliance_officer_agent import ComplianceOfficerAgent

# TODO: Create agent instances
# explainability_logger = ExplainabilityLogger("../outputs/audit_logs/workflow_integration.jsonl")
# risk_agent = RiskAnalystAgent(client, explainability_logger)
# compliance_agent = ComplianceOfficerAgent(client, explainability_logger)

print("✅ Ready to import components after implementation")

## 📊 Step 1: Data Loading and Preprocessing

Load the financial data and prepare it for analysis.

In [None]:
# TODO: Load and Preprocess Financial Data
# Students: Load customer, account, and transaction data

def load_and_preprocess_data():
    """
    TODO: Load CSV data and prepare for analysis
    
    This function should:
    1. Load customers.csv, accounts.csv, transactions.csv
    2. Handle missing values appropriately
    3. Create data dictionaries for processing
    4. Return cleaned datasets
    """
    print("📊 Loading Financial Data")
    print("📋 TODO: Load CSV files from ../data/ directory")
    print("📋 TODO: Handle NaN values in optional fields")
    print("📋 TODO: Convert to dictionaries for processing")
    
    # Example structure (uncomment and modify):
    # customers_df = pd.read_csv("../data/customers.csv", dtype={'ssn_last_4': str})
    # accounts_df = pd.read_csv("../data/accounts.csv")
    # transactions_df = pd.read_csv("../data/transactions.csv")
    # 
    # # Handle NaN values
    # transactions_df['counterparty'] = transactions_df['counterparty'].fillna('')
    # transactions_df['location'] = transactions_df['location'].fillna('')
    # customers_df['phone'] = customers_df['phone'].fillna('')
    # 
    # # Convert to dictionaries
    # customers_data = customers_df.to_dict('records')
    # accounts_data = accounts_df.to_dict('records')
    # transactions_data = transactions_df.to_dict('records')
    # 
    # print(f"📈 Loaded: {len(customers_data)} customers, {len(accounts_data)} accounts, {len(transactions_data)} transactions")
    # return customers_data, accounts_data, transactions_data
    
    return None, None, None

# Load data
customers_data, accounts_data, transactions_data = load_and_preprocess_data()

## 🎯 Step 2: Customer Risk Screening

Implement intelligent customer screening to identify high-risk cases for detailed analysis.

In [None]:
# TODO: Implement Customer Risk Screening
# Students: Create risk-based customer screening logic

def screen_high_risk_customers(customers_data, accounts_data, transactions_data, top_n=5):
    """
    TODO: Implement risk-based customer screening
    
    Screening criteria should include:
    1. High risk ratings (Medium, High)
    2. Large transaction amounts (>$100K total)
    3. High transaction frequency (>50 transactions)
    4. Recent activity patterns
    
    Returns top N highest-risk customers for detailed analysis
    """
    print("🔍 Customer Risk Screening")
    print("📋 TODO: Implement risk-based screening criteria")
    print("📋 TODO: Calculate risk scores for each customer")
    print("📋 TODO: Select top N customers for SAR analysis")
    
    # Example screening logic (uncomment and modify):
    # selected_customers = []
    # 
    # for customer in customers_data:
    #     # Get customer accounts and transactions
    #     customer_accounts = [acc for acc in accounts_data if acc['customer_id'] == customer['customer_id']]
    #     customer_transactions = [txn for txn in transactions_data if any(txn['account_id'] == acc['account_id'] for acc in customer_accounts)]
    #     
    #     # Calculate risk indicators
    #     total_amount = sum(abs(txn['amount']) for txn in customer_transactions)
    #     transaction_count = len(customer_transactions)
    #     risk_rating = customer['risk_rating']
    #     
    #     # Apply screening criteria
    #     risk_flags = []
    #     if risk_rating in ['Medium', 'High']:
    #         risk_flags.append('high_risk_rating')
    #     if total_amount > 100000:
    #         risk_flags.append('large_amounts')
    #     if transaction_count > 50:
    #         risk_flags.append('high_frequency')
    #     
    #     # Select high-risk customers
    #     if len(risk_flags) >= 2:  # Multiple risk flags
    #         selected_customers.append({
    #             'customer': customer,
    #             'accounts': customer_accounts,
    #             'transactions': customer_transactions,
    #             'total_amount': total_amount,
    #             'transaction_count': transaction_count,
    #             'risk_flags': risk_flags
    #         })
    # 
    # # Sort by risk score and take top N
    # selected_customers.sort(key=lambda x: (len(x['risk_flags']), x['total_amount']), reverse=True)
    # return selected_customers[:top_n]
    
    print(f"📊 Selected 0 customers for analysis (implement screening logic)")
    return []

# Run customer screening
selected_customers = screen_high_risk_customers(customers_data, accounts_data, transactions_data)

## 🤖 Step 3: Two-Stage AI Analysis with Human Gates

Implement the core two-stage workflow:
1. **Stage 1**: Risk Analyst performs Chain-of-Thought analysis
2. **Human Gate**: Review and decision to proceed
3. **Stage 2**: Compliance Officer generates ReACT narratives (only if approved)

In [None]:
# TODO: Implement Two-Stage AI Workflow
# Students: Build the complete workflow with human decision gates

def run_two_stage_sar_workflow(selected_customers):
    """
    TODO: Implement complete two-stage SAR processing workflow
    
    For each customer:
    1. Create CaseData object
    2. Run Risk Analyst analysis (Chain-of-Thought)
    3. Present findings to human reviewer
    4. Get human decision (proceed/reject)
    5. If approved: Run Compliance Officer (ReACT)
    6. Generate complete SAR document
    7. Log all decisions for audit
    """
    print("🤖 Two-Stage SAR Processing Workflow")
    print("📋 TODO: Implement complete workflow logic")
    
    # Initialize tracking
    processed_cases = []
    approved_sars = []
    rejected_cases = []
    audit_decisions = []
    
    print("📋 TODO: For each selected customer:")
    print("   1. Create CaseData from customer, accounts, transactions")
    print("   2. Run Risk Analyst analysis")
    print("   3. Display analysis results to human reviewer")
    print("   4. Get human decision (input('Proceed with SAR filing? (yes/no): '))")
    print("   5. If 'yes': Run Compliance Officer narrative generation")
    print("   6. Create complete SAR document with all metadata")
    print("   7. Save SAR to ../outputs/filed_sars/ directory")
    print("   8. Log decision to audit trail")
    
    # Example workflow structure (uncomment and implement):
    # for i, customer_data in enumerate(selected_customers, 1):
    #     print(f"\n🔍 CUSTOMER {i}/{len(selected_customers)}: {customer_data['customer']['name']}")
    #     print("=" * 60)
    #     
    #     try:
    #         # Create case data
    #         loader = DataLoader(explainability_logger)
    #         case_data = loader.create_case_from_data(
    #             customer_data['customer'],
    #             customer_data['accounts'], 
    #             customer_data['transactions']
    #         )
    #         
    #         # STAGE 1: Risk Analysis
    #         print("🔍 STAGE 1: Risk Analysis")
    #         risk_analysis = risk_agent.analyze_case(case_data)
    #         
    #         # Display analysis results
    #         print(f"Classification: {risk_analysis.classification}")
    #         print(f"Confidence: {risk_analysis.confidence_score:.2f}")
    #         print(f"Risk Level: {risk_analysis.risk_level}")
    #         print(f"Reasoning: {risk_analysis.reasoning}")
    #         
    #         # HUMAN DECISION GATE
    #         decision = input("🤔 Proceed with SAR filing? (yes/no): ").strip().lower()
    #         should_proceed = decision in ['yes', 'y']
    #         
    #         if should_proceed:
    #             # STAGE 2: Compliance Narrative
    #             print("📝 STAGE 2: Compliance Narrative Generation")
    #             compliance_review = compliance_agent.generate_compliance_narrative(case_data, risk_analysis)
    #             
    #             # Generate complete SAR document
    #             sar_document = create_sar_document(case_data, risk_analysis, compliance_review)
    #             
    #             # Save SAR
    #             save_sar_document(sar_document)
    #             approved_sars.append(sar_document)
    #             
    #             print(f"✅ SAR FILED: {sar_document['sar_id']}")
    #         else:
    #             rejected_cases.append({'case_id': case_data.case_id, 'reason': 'human_rejection'})
    #             print("❌ SAR REJECTED by human reviewer")
    #         
    #         # Log decision
    #         audit_decisions.append({
    #             'case_id': case_data.case_id,
    #             'customer_name': case_data.customer.name,
    #             'decision': 'PROCEED' if should_proceed else 'REJECT',
    #             'ai_classification': risk_analysis.classification,
    #             'ai_confidence': risk_analysis.confidence_score,
    #             'reviewer_decision': decision
    #         })
    #         
    #     except Exception as e:
    #         print(f"❌ Error processing customer: {e}")
    
    return processed_cases, approved_sars, rejected_cases, audit_decisions

# Run the complete workflow
processed_cases, approved_sars, rejected_cases, audit_decisions = run_two_stage_sar_workflow(selected_customers)

## 📄 Step 4: SAR Document Generation

Create complete, FinCEN-ready SAR documents with all required metadata.

In [None]:
# TODO: Implement SAR Document Generation
# Students: Create complete SAR documents for regulatory submission

def create_sar_document(case_data, risk_analysis, compliance_review):
    """
    TODO: Create complete SAR document
    
    SAR document should include:
    1. SAR metadata (ID, filing date, type, checksum)
    2. Subject information (customer details)
    3. Suspicious activity description
    4. AI analysis results
    5. Compliance narrative
    6. Regulatory citations
    7. Filing institution information
    """
    print("📄 Creating SAR Document")
    print("📋 TODO: Generate unique SAR ID")
    print("📋 TODO: Include all required metadata")
    print("📋 TODO: Format for FinCEN submission")
    
    # Example SAR document structure (uncomment and implement):
    # sar_id = f"SAR_{uuid.uuid4()}"
    # filing_date = datetime.now().isoformat()
    # 
    # sar_document = {
    #     'sar_metadata': {
    #         'sar_id': sar_id,
    #         'filing_date': filing_date,
    #         'filing_type': 'Suspicious Activity Report',
    #         'ai_generated': True,
    #         'review_status': 'human_approved'
    #     },
    #     'subject_information': {
    #         'customer_name': case_data.customer.name,
    #         'customer_id': case_data.customer.customer_id,
    #         'address': case_data.customer.address,
    #         'customer_since': case_data.customer.customer_since,
    #         'risk_rating': case_data.customer.risk_rating
    #     },
    #     'suspicious_activity': {
    #         'classification': risk_analysis.classification,
    #         'risk_level': risk_analysis.risk_level,
    #         'confidence_score': risk_analysis.confidence_score,
    #         'narrative': compliance_review.narrative,
    #         'key_indicators': risk_analysis.key_indicators,
    #         'ai_reasoning': risk_analysis.reasoning
    #     },
    #     'regulatory_compliance': {
    #         'citations': getattr(compliance_review, 'regulatory_citations', []),
    #         'narrative_word_count': len(compliance_review.narrative.split()),
    #         'compliance_status': 'approved'
    #     },
    #     'audit_trail': {
    #         'case_id': case_data.case_id,
    #         'processing_date': filing_date,
    #         'ai_agents_used': ['RiskAnalyst', 'ComplianceOfficer'],
    #         'human_reviewer': 'compliance_officer'
    #     }
    # }
    # 
    # return sar_document
    
    return {}

def save_sar_document(sar_document):
    """TODO: Save SAR document to outputs directory"""
    print("📋 TODO: Save SAR to ../outputs/filed_sars/ directory")
    # os.makedirs("../outputs/filed_sars", exist_ok=True)
    # filename = f"../outputs/filed_sars/{sar_document['sar_metadata']['sar_id']}.json"
    # with open(filename, 'w') as f:
    #     json.dump(sar_document, f, indent=2)

print("📄 SAR document generation functions defined")

## 📊 Step 5: Workflow Metrics and Analysis

Analyze the efficiency and effectiveness of your AI-powered SAR processing system.

In [None]:
# TODO: Implement Workflow Analysis and Metrics
# Students: Calculate efficiency metrics and cost analysis

def analyze_workflow_efficiency(processed_cases, approved_sars, rejected_cases, audit_decisions):
    """
    TODO: Calculate workflow efficiency metrics
    
    Metrics to calculate:
    1. Processing efficiency (time per case)
    2. Cost optimization (two-stage vs single-stage)
    3. Human decision patterns
    4. AI accuracy validation
    5. Regulatory compliance rates
    """
    print("📊 Workflow Efficiency Analysis")
    print("📋 TODO: Calculate processing metrics")
    
    # Example metrics calculation (uncomment and implement):
    # total_cases = len(processed_cases)
    # approved_cases = len(approved_sars)
    # rejected_cases_count = len(rejected_cases)
    # 
    # if total_cases > 0:
    #     approval_rate = approved_cases / total_cases
    #     rejection_rate = rejected_cases_count / total_cases
    # else:
    #     approval_rate = rejection_rate = 0
    # 
    # print(f"📈 WORKFLOW METRICS:")
    # print(f"   Total Cases Processed: {total_cases}")
    # print(f"   SARs Filed: {approved_cases}")
    # print(f"   Cases Rejected: {rejected_cases_count}")
    # print(f"   Approval Rate: {approval_rate:.1%}")
    # print(f"   Rejection Rate: {rejection_rate:.1%}")
    # 
    # # Cost optimization analysis
    # print(f"\n💰 COST OPTIMIZATION:")
    # print(f"   Two-stage processing saves costs by only running")
    # print(f"   expensive compliance generation on approved cases")
    # print(f"   Cost savings: {rejection_rate:.1%} of compliance calls avoided")
    
    print("💡 Implement metrics calculation after running workflow")

def validate_ai_decisions(audit_decisions):
    """TODO: Analyze AI decision patterns and accuracy"""
    print("📋 TODO: Validate AI classification accuracy")
    print("📋 TODO: Analyze confidence score distributions") 
    print("📋 TODO: Review human override patterns")

# Run analysis
analyze_workflow_efficiency(processed_cases, approved_sars, rejected_cases, audit_decisions)
validate_ai_decisions(audit_decisions)

## 🏁 Step 6: Complete System Demonstration

Test your complete system with comprehensive scenarios to validate production readiness.

In [None]:
# TODO: Run Complete System Test
# Students: Demonstrate your complete SAR processing system

def demonstrate_complete_system():
    """
    TODO: Run complete system demonstration
    
    This should:
    1. Process multiple customers through the complete workflow
    2. Show both approved and rejected cases
    3. Generate multiple SAR documents
    4. Demonstrate audit trail creation
    5. Show efficiency metrics
    """
    print("🏁 Complete SAR Processing System Demonstration")
    print("📋 TODO: Run complete workflow with multiple customers")
    print("📋 TODO: Show both approval and rejection scenarios")
    print("📋 TODO: Generate audit reports")
    print("📋 TODO: Calculate final efficiency metrics")
    
    # Example demonstration (uncomment after implementation):
    # print("🚀 Running complete system test...")
    # 
    # # Load fresh data
    # customers_data, accounts_data, transactions_data = load_and_preprocess_data()
    # 
    # # Screen customers
    # selected_customers = screen_high_risk_customers(customers_data, accounts_data, transactions_data, top_n=3)
    # 
    # # Run workflow
    # processed_cases, approved_sars, rejected_cases, audit_decisions = run_two_stage_sar_workflow(selected_customers)
    # 
    # # Generate final report
    # analyze_workflow_efficiency(processed_cases, approved_sars, rejected_cases, audit_decisions)
    # 
    # print(f"🎉 System demonstration complete!")
    # print(f"📄 SAR documents saved to: ../outputs/filed_sars/")
    # print(f"📊 Audit logs saved to: ../outputs/audit_logs/")

demonstrate_complete_system()

## 📝 Implementation Checklist

### ✅ Workflow Integration Deliverables
- [ ] **Data Loading**: Load and preprocess CSV data with proper error handling
- [ ] **Customer Screening**: Implement risk-based screening to identify high-risk cases
- [ ] **Two-Stage Workflow**: Build complete Risk Analyst → Human Gate → Compliance Officer flow
- [ ] **Human Decision Gates**: Implement interactive approval/rejection points
- [ ] **SAR Document Generation**: Create complete FinCEN-ready documents with metadata
- [ ] **Audit Trail Creation**: Log all decisions and reasoning for regulatory examination
- [ ] **Efficiency Metrics**: Calculate cost optimization and processing efficiency
- [ ] **System Demonstration**: Test complete workflow with multiple scenarios

### ✅ Technical Requirements
- [ ] **Error Handling**: Robust exception handling for all workflow steps
- [ ] **Data Validation**: Proper validation of all inputs and outputs
- [ ] **File Management**: Organize outputs in appropriate directories
- [ ] **Logging**: Comprehensive audit logging for compliance
- [ ] **Performance**: Efficient processing of multiple cases
- [ ] **User Experience**: Clear prompts and feedback for human reviewers

### ✅ Business Requirements  
- [ ] **Regulatory Compliance**: Ensure all SAR documents meet FinCEN requirements
- [ ] **Cost Optimization**: Demonstrate savings from two-stage processing
- [ ] **Audit Readiness**: Create examination-ready documentation
- [ ] **Quality Assurance**: Validate AI decisions with human oversight
- [ ] **Scalability**: Design for processing larger datasets

## 🎯 Success Criteria

By completion, your integrated system should:
- ✅ Process real financial data with proper validation
- ✅ Execute complete two-stage AI workflow with human gates
- ✅ Generate regulatory-compliant SAR documents
- ✅ Create comprehensive audit trails for all decisions
- ✅ Demonstrate measurable cost optimization benefits
- ✅ Handle errors gracefully and provide clear user feedback

## 🚀 Next Steps

1. **Complete Implementation**: Fill in all TODO sections with working code
2. **Test Thoroughly**: Run complete workflow with various scenarios
3. **Validate Outputs**: Ensure SAR documents meet regulatory requirements
4. **Document Results**: Create final project documentation and metrics
5. **Prepare Presentation**: Demonstrate your system's capabilities and business value

**Congratulations on building a complete AI-powered SAR processing system! 🎉**