# Lesson 3: Financial Risk Assessment Prompt Optimization (SOLUTION)

## Systematic Refinement for Financial AI Accuracy

This is the complete solution for implementing systematic prompt refinement and optimization for financial risk assessment systems.

## Outline:

- **Baseline Risk Assessment**: Initial prompt for customer risk evaluation
- **Systematic Refinement Framework**: Automated testing and measurement systems
- **Optimization Techniques**: Best practices for instruction clarity and compliance
- **Advanced Refinement**: Context optimization and parameter tuning
- **Performance Evaluation**: Quantitative comparison of prompt variations
- **Production Validation**: Testing with complex real-world scenarios

In [1]:
# Import necessary libraries
import os
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display
import json
import time
import statistics
from typing import List, Dict, Tuple

# Load environment variables from the root .env file
load_dotenv('../../.env')

True

In [2]:
# Setup OpenAI client
client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY")  # Load from .env file
)

def get_completion(system_prompt, user_prompt, model="gpt-4o-mini", temperature=0.3):
    """
    Function to get a completion from the OpenAI API.
    """
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt},
            ],
            temperature=temperature,
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"

In [3]:
# Test scenarios for prompt optimization
test_scenarios = {
    "low_risk": {
        "scenario": """
Customer Profile: Local Retail Store
- Business Type: Family-owned grocery store
- Account Age: 8 years
- Monthly Revenue: $45,000
- Transaction Pattern: Daily deposits $1,200-2,000, consistent timing
- Geographic Scope: Single location, local suppliers
- Documentation: Complete business licenses, tax returns current
- Owner Background: Clean credit history, local community member
- Banking History: No previous compliance issues
""",
        "expected_risk": "Low"
    },
    "medium_risk": {
        "scenario": """
Customer Profile: Import/Export Business
- Business Type: International trade company
- Account Age: 2.5 years
- Monthly Revenue: $800,000
- Transaction Pattern: Large wire transfers ($50K-200K), irregular timing
- Geographic Scope: Asia-Pacific region, multiple countries
- Documentation: Some supplier documentation missing
- Owner Background: Limited credit history, recent immigrant
- Banking History: Minor documentation delays, no violations
""",
        "expected_risk": "Medium"
    },
    "high_risk": {
        "scenario": """
Customer Profile: Consulting Services LLC
- Business Type: Financial consulting
- Account Age: 6 months
- Monthly Revenue: $2.3M
- Transaction Pattern: Immediate large transfers after deposits
- Geographic Scope: Offshore jurisdictions (Cayman, Malta)
- Documentation: Minimal business documentation
- Owner Background: Multiple previous business dissolutions
- Banking History: Previous account closures at other banks
""",
        "expected_risk": "High"
    }
}

print("Test scenarios loaded for prompt optimization...")

Test scenarios loaded for prompt optimization...


## 1. Baseline Risk Assessment

Basic prompt to establish performance benchmark.

In [4]:
# Baseline prompt
baseline_prompt = "You are a bank risk analyst. Assess the customer risk level as Low, Medium, or High."

def test_prompt_baseline(prompt, scenarios):
    """Test a prompt against all scenarios and return results"""
    results = {}
    for risk_level, data in scenarios.items():
        response = get_completion(prompt, data["scenario"])
        results[risk_level] = {
            "response": response,
            "expected": data["expected_risk"]
        }
    return results

# Test baseline prompt
print("=== BASELINE PROMPT PERFORMANCE ===")
baseline_results = test_prompt_baseline(baseline_prompt, test_scenarios)

for risk_level, result in baseline_results.items():
    print(f"\n{risk_level.upper()} RISK SCENARIO:")
    print(f"Expected: {result['expected']}")
    print(f"Response: {result['response'][:200]}...")  # Truncated for display

=== BASELINE PROMPT PERFORMANCE ===

LOW_RISK RISK SCENARIO:
Expected: Low
Response: Based on the provided customer profile for the local retail store, the risk assessment is as follows:

- **Business Type**: Family-owned grocery stores typically have a lower risk profile due to their...

MEDIUM_RISK RISK SCENARIO:
Expected: Medium
Response: Based on the provided customer profile, I would assess the customer risk level as **Medium**. 

Here are the key factors influencing this assessment:

1. **Business Type**: Import/export businesses ca...

HIGH_RISK RISK SCENARIO:
Expected: High
Response: Based on the provided customer profile for Consulting Services LLC, the risk assessment is as follows:

1. **Business Type**: Financial consulting can be associated with higher risk due to the nature ...


## 2. Systematic Refinement Framework

Automated testing framework with multiple prompt variations.

In [5]:
# Prompt variations for systematic testing
prompt_variations = {
    "baseline": baseline_prompt,
    
    "structured_output": """
You are a bank risk analyst with expertise in customer due diligence. Assess customer risk and respond with:
Risk Level: [Low/Medium/High]
Key Factors: [List main risk factors]
Confidence: [High/Medium/Low]
Recommendation: [Specific next steps]
""",
    
    "detailed_criteria": """
You are a bank risk analyst. Evaluate customers using these criteria:
- Account age and banking history (longer = lower risk)
- Transaction patterns and volumes (consistent patterns = lower risk)
- Geographic risk factors (high-risk jurisdictions = higher risk)
- Documentation completeness (complete docs = lower risk)
- Business type and industry risk (regulated industries = lower risk)

Assign risk level: Low, Medium, or High based on overall assessment.
""",
    
    "compliance_focused": """
You are a compliance officer conducting customer due diligence under BSA/AML guidelines.
Apply regulatory standards to assess risk level based on:
- Customer profile and beneficial ownership
- Transaction monitoring and pattern analysis
- Geographic and jurisdictional risk factors
- Documentation and KYC compliance status

Classify as: Low Risk, Medium Risk, or High Risk with regulatory justification.
"""
}

print("Prompt variations created for testing...")

Prompt variations created for testing...


In [6]:
# Automated testing framework
def evaluate_prompt_accuracy(results, scenarios):
    """Evaluate how well prompt results match expected risk levels"""
    correct = 0
    total = 0
    
    for risk_level, result in results.items():
        expected = result['expected']
        response = result['response'].upper()
        
        # Simple matching - check if expected risk level appears in response
        if expected.upper() in response:
            correct += 1
        total += 1
    
    return correct / total if total > 0 else 0

def test_all_variations(variations, scenarios):
    """Test all prompt variations and calculate performance metrics"""
    results = {}
    
    for variation_name, prompt in variations.items():
        print(f"\nTesting: {variation_name}")
        variation_results = test_prompt_baseline(prompt, scenarios)
        accuracy = evaluate_prompt_accuracy(variation_results, scenarios)
        
        results[variation_name] = {
            "prompt": prompt,
            "results": variation_results,
            "accuracy": accuracy
        }
        
        print(f"Accuracy: {accuracy:.2%}")
    
    return results

print("=== AUTOMATED PROMPT TESTING ===")
all_results = test_all_variations(prompt_variations, test_scenarios)

=== AUTOMATED PROMPT TESTING ===

Testing: baseline
Accuracy: 100.00%

Testing: structured_output
Accuracy: 100.00%

Testing: detailed_criteria
Accuracy: 66.67%

Testing: compliance_focused
Accuracy: 100.00%


## 3. Optimization Techniques

Advanced prompt optimization incorporating best practices.

In [7]:
# Optimized prompt combining best techniques
optimized_prompt = """
You are a Senior Risk Assessment Specialist at a major financial institution with 10+ years of experience in customer due diligence and BSA/AML compliance.

Evaluate the customer using this systematic framework:

1. ACCOUNT FACTORS (Weight: 25%)
   - Account age and relationship history
   - Previous compliance issues or violations
   - Banking relationship stability

2. TRANSACTION ANALYSIS (Weight: 35%)
   - Volume relative to stated business model
   - Pattern consistency and predictability
   - Timing and frequency anomalies

3. GEOGRAPHIC RISK (Weight: 20%)
   - High-risk jurisdiction involvement
   - Cross-border transaction complexity
   - Regulatory environment considerations

4. DOCUMENTATION & COMPLIANCE (Weight: 20%)
   - KYC documentation completeness
   - Business model verification
   - Beneficial ownership transparency

Provide your assessment in this format:
RISK LEVEL: [Low/Medium/High]
CONFIDENCE: [High/Medium/Low]
KEY RISK FACTORS: [Top 2-3 concerning factors]
MITIGATING FACTORS: [Positive factors that reduce risk]
RECOMMENDATION: [Specific action required]
REGULATORY NOTES: [BSA/AML compliance considerations]
"""

# Test optimization techniques
optimization_tests = {
    "instruction_clarity": {
        "prompt": optimized_prompt,
        "temperature": 0.1  # Lower temperature for consistency
    },
    "context_optimization": {
        "prompt": optimized_prompt + "\n\nEmphasize regulatory compliance and risk mitigation in your assessment.",
        "temperature": 0.3
    }
}

print("Testing optimized prompt...")
optimized_results = test_prompt_baseline(optimized_prompt, test_scenarios)
optimized_accuracy = evaluate_prompt_accuracy(optimized_results, test_scenarios)
print(f"Optimized Prompt Accuracy: {optimized_accuracy:.2%}")

Testing optimized prompt...
Optimized Prompt Accuracy: 100.00%


## 4. Advanced Refinement

Testing parameter optimization and consistency.

In [8]:
# Temperature consistency testing
def test_temperature_consistency(prompt, scenario, temperatures=[0.1, 0.3, 0.7]):
    """Test how different temperatures affect response consistency"""
    consistency_results = {}
    
    for temp in temperatures:
        responses = []
        # Run same prompt multiple times with different temperature
        for _ in range(3):  # Test 3 times for consistency
            response = get_completion(prompt, scenario, temperature=temp)
            responses.append(response)
        
        consistency_results[temp] = responses
    
    return consistency_results

print("=== TEMPERATURE CONSISTENCY TEST ===")
temp_results = test_temperature_consistency(
    optimized_prompt, 
    test_scenarios["medium_risk"]["scenario"]
)

for temp, responses in temp_results.items():
    print(f"\nTemperature {temp}:")
    # Check for consistent risk level assignment
    risk_levels = []
    for response in responses:
        if "RISK LEVEL: Low" in response:
            risk_levels.append("Low")
        elif "RISK LEVEL: Medium" in response:
            risk_levels.append("Medium")
        elif "RISK LEVEL: High" in response:
            risk_levels.append("High")
    
    print(f"Risk Level Consistency: {risk_levels}")
    if len(set(risk_levels)) == 1:
        print("✅ Consistent risk assessment")
    else:
        print("⚠️ Inconsistent risk assessment")

=== TEMPERATURE CONSISTENCY TEST ===



Temperature 0.1:
Risk Level Consistency: ['Medium', 'Medium', 'Medium']
✅ Consistent risk assessment

Temperature 0.3:
Risk Level Consistency: ['Medium', 'Medium', 'Medium']
✅ Consistent risk assessment

Temperature 0.7:
Risk Level Consistency: ['Medium', 'Medium', 'Medium']
✅ Consistent risk assessment


## 5. Performance Evaluation

Comprehensive comparison of all prompt variations.

In [9]:
# Comprehensive evaluation framework
def comprehensive_evaluation(prompt, scenarios):
    """Comprehensive evaluation including accuracy, consistency, and quality"""
    results = test_prompt_baseline(prompt, scenarios)
    
    # Calculate accuracy
    accuracy = evaluate_prompt_accuracy(results, scenarios)
    
    # Calculate response length consistency
    response_lengths = [len(result['response']) for result in results.values()]
    length_consistency = 1 - (statistics.stdev(response_lengths) / statistics.mean(response_lengths)) if len(response_lengths) > 1 else 1
    
    # Calculate structured response adherence
    structured_responses = sum(1 for result in results.values() if 'RISK LEVEL:' in result['response'] or 'Risk Level:' in result['response'])
    structure_adherence = structured_responses / len(results)
    
    # Calculate regulatory compliance mentions
    compliance_mentions = sum(1 for result in results.values() if any(term in result['response'].upper() for term in ['BSA', 'AML', 'REGULATORY', 'COMPLIANCE']))
    compliance_focus = compliance_mentions / len(results)
    
    return {
        "accuracy": accuracy,
        "length_consistency": max(0, length_consistency),  # Ensure non-negative
        "structure_adherence": structure_adherence,
        "compliance_focus": compliance_focus,
        "overall_score": (accuracy + max(0, length_consistency) + structure_adherence + compliance_focus) / 4
    }

print("=== COMPREHENSIVE EVALUATION ===")
final_comparison = {
    "baseline": comprehensive_evaluation(baseline_prompt, test_scenarios),
    "structured_output": comprehensive_evaluation(prompt_variations["structured_output"], test_scenarios),
    "detailed_criteria": comprehensive_evaluation(prompt_variations["detailed_criteria"], test_scenarios),
    "compliance_focused": comprehensive_evaluation(prompt_variations["compliance_focused"], test_scenarios),
    "optimized": comprehensive_evaluation(optimized_prompt, test_scenarios)
}

for prompt_name, metrics in final_comparison.items():
    print(f"\n{prompt_name.upper()} PROMPT:")
    print(f"Accuracy: {metrics['accuracy']:.2%}")
    print(f"Length Consistency: {metrics['length_consistency']:.2%}")
    print(f"Structure Adherence: {metrics['structure_adherence']:.2%}")
    print(f"Compliance Focus: {metrics['compliance_focus']:.2%}")
    print(f"Overall Score: {metrics['overall_score']:.2%}")

# Find best performing prompt
best_prompt = max(final_comparison.items(), key=lambda x: x[1]['overall_score'])
print(f"\n🏆 BEST PERFORMING PROMPT: {best_prompt[0].upper()}")
print(f"Overall Score: {best_prompt[1]['overall_score']:.2%}")

=== COMPREHENSIVE EVALUATION ===

BASELINE PROMPT:
Accuracy: 100.00%
Length Consistency: 79.63%
Structure Adherence: 0.00%
Compliance Focus: 100.00%
Overall Score: 69.91%

STRUCTURED_OUTPUT PROMPT:
Accuracy: 100.00%
Length Consistency: 54.13%
Structure Adherence: 100.00%
Compliance Focus: 100.00%
Overall Score: 88.53%

DETAILED_CRITERIA PROMPT:
Accuracy: 100.00%
Length Consistency: 90.53%
Structure Adherence: 33.33%
Compliance Focus: 100.00%
Overall Score: 80.97%

COMPLIANCE_FOCUSED PROMPT:
Accuracy: 100.00%
Length Consistency: 81.25%
Structure Adherence: 0.00%
Compliance Focus: 100.00%
Overall Score: 70.31%

OPTIMIZED PROMPT:
Accuracy: 100.00%
Length Consistency: 59.24%
Structure Adherence: 100.00%
Compliance Focus: 100.00%
Overall Score: 89.81%

🏆 BEST PERFORMING PROMPT: OPTIMIZED
Overall Score: 89.81%


## 6. Production Validation

Testing with complex, realistic scenarios.

In [10]:
# Complex production scenario
production_scenario = """
Customer Profile: TechFinance Solutions Group
- Business Type: Financial technology consulting and software development
- Account Age: 14 months
- Monthly Revenue: $1.8M (highly variable month-to-month)
- Ownership Structure: 3 beneficial owners, 2 located internationally

Transaction Patterns:
- Large incoming wires from fintech clients ($100K-500K)
- Immediate outbound transfers to software development contractors
- 60% of outbound transfers go to Eastern Europe and Asia
- Some transfers occur outside normal business hours
- Total monthly volume: $2.1M average

Risk Factors:
- Rapid business growth (revenue up 400% in 12 months)
- Some contractor documentation incomplete
- Beneficial owners include non-US persons
- Industry known for regulatory complexity
- Previous bank relationship ended due to "business restructuring"

Mitigating Factors:
- Licensed fintech business in regulated state
- Provides monthly business reports voluntarily
- Contracts and client relationships well-documented
- CEO has clean background and strong industry reputation
- Business model aligns with transaction patterns
- All tax obligations current and properly filed
"""

print("=== PRODUCTION VALIDATION ===")
production_result = get_completion(optimized_prompt, production_scenario)
print("Complex Scenario Assessment:")
print(production_result)

print("\n=== VALIDATION ANALYSIS ===")
print("Key factors the optimized prompt should identify:")
print("✓ Mixed risk signals requiring careful analysis")
print("✓ Industry-specific considerations")
print("✓ Geographic and jurisdictional factors")
print("✓ Documentation and compliance gaps")
print("✓ Business model validation")
print("✓ Regulatory compliance recommendations")

=== PRODUCTION VALIDATION ===
Complex Scenario Assessment:
RISK LEVEL: High  
CONFIDENCE: Medium  
KEY RISK FACTORS:  
1. High volume of large incoming and outgoing wires, especially to high-risk jurisdictions (Eastern Europe and Asia).  
2. Incomplete contractor documentation and previous banking relationship issues.  
3. Rapid business growth with significant revenue fluctuations.  

MITIGATING FACTORS:  
- The business is licensed in a regulated state, indicating compliance with local regulations.  
- The CEO has a clean background and a strong reputation in the industry, which adds credibility.  
- The company provides monthly business reports, demonstrating transparency and proactive communication.  

RECOMMENDATION:  
Conduct a comprehensive review of the incomplete contractor documentation and implement enhanced due diligence (EDD) for all international transactions, particularly those involving Eastern Europe and Asia. Consider monitoring the account more closely for unusual tr

## Summary

This exercise demonstrated systematic prompt refinement for financial risk assessment:

### Key Optimization Results:
- **Baseline Accuracy**: Simple prompts provide basic functionality
- **Structured Output**: Significantly improves response consistency
- **Detailed Criteria**: Enhances decision-making transparency
- **Compliance Focus**: Ensures regulatory alignment
- **Optimized Framework**: Combines best practices for superior performance

### Performance Improvements:
- Increased accuracy through systematic criteria
- Enhanced consistency via structured output formats
- Better compliance alignment with regulatory focus
- Improved decision transparency through detailed analysis

### Financial Services Applications:
- **Risk Assessment**: Systematic evaluation of customer risk profiles
- **Regulatory Compliance**: Built-in BSA/AML consideration
- **Decision Support**: Structured, actionable recommendations
- **Quality Assurance**: Measurable performance improvements

### Optimization Methodology:
- **Systematic Testing**: Automated evaluation of multiple variations
- **Quantitative Metrics**: Objective performance measurement
- **Iterative Improvement**: Continuous refinement based on results
- **Production Validation**: Real-world scenario testing

These techniques provide a robust framework for optimizing AI prompts in financial services applications! 🎉