# Lesson 3: Financial Risk Assessment Prompt Optimization

## Systematic Refinement for Financial AI Accuracy

In this hands-on exercise, you will learn to systematically refine and optimize prompts for a financial risk assessment system. You'll implement automated testing frameworks, measure prompt performance, and iteratively improve accuracy for critical financial decision-making scenarios.

## Outline:

- **Baseline Risk Assessment**: Initial prompt for customer risk evaluation
- **Systematic Refinement Framework**: Automated testing and measurement systems
- **Optimization Techniques**: Best practices for instruction clarity and compliance
- **Advanced Refinement**: Context optimization and parameter tuning
- **Performance Evaluation**: Quantitative comparison of prompt variations
- **Production Validation**: Testing with complex real-world scenarios

## Setup Instructions

Before starting this exercise:

1. **Install Required Packages**: Run `pip install -r ../../requirements.txt` in your terminal
2. **Configure API Key**: 
   - Open the `.env` file in the root directory
   - Replace `your_openai_api_key_here` with your actual OpenAI API key
   - Save the file
3. **Verify Setup**: Run the import and setup cells below to ensure everything works

**Note**: The notebook automatically loads your API key from the `.env` file in the root directory.

In [None]:
# Import necessary libraries
import os
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display
import json
import time
import statistics
from typing import List, Dict, Tuple

# Load environment variables from the root .env file
load_dotenv('../../.env')

In [None]:
# Setup OpenAI client
client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY")  # Load from .env file
)

def get_completion(system_prompt, user_prompt, model="gpt-4o-mini", temperature=0.3):
    """
    Function to get a completion from the OpenAI API.
    """
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt},
            ],
            temperature=temperature,
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"

## Financial Risk Assessment Scenarios

We'll use these customer scenarios to test and optimize our risk assessment prompts:

In [None]:
# Test scenarios for prompt optimization
test_scenarios = {
    "low_risk": {
        "scenario": """
Customer Profile: Local Retail Store
- Business Type: Family-owned grocery store
- Account Age: 8 years
- Monthly Revenue: $45,000
- Transaction Pattern: Daily deposits $1,200-2,000, consistent timing
- Geographic Scope: Single location, local suppliers
- Documentation: Complete business licenses, tax returns current
- Owner Background: Clean credit history, local community member
- Banking History: No previous compliance issues
""",
        "expected_risk": "Low"
    },
    "medium_risk": {
        "scenario": """
Customer Profile: Import/Export Business
- Business Type: International trade company
- Account Age: 2.5 years
- Monthly Revenue: $800,000
- Transaction Pattern: Large wire transfers ($50K-200K), irregular timing
- Geographic Scope: Asia-Pacific region, multiple countries
- Documentation: Some supplier documentation missing
- Owner Background: Limited credit history, recent immigrant
- Banking History: Minor documentation delays, no violations
""",
        "expected_risk": "Medium"
    },
    "high_risk": {
        "scenario": """
Customer Profile: Consulting Services LLC
- Business Type: Financial consulting
- Account Age: 6 months
- Monthly Revenue: $2.3M
- Transaction Pattern: Immediate large transfers after deposits
- Geographic Scope: Offshore jurisdictions (Cayman, Malta)
- Documentation: Minimal business documentation
- Owner Background: Multiple previous business dissolutions
- Banking History: Previous account closures at other banks
""",
        "expected_risk": "High"
    }
}

print("Test scenarios loaded for prompt optimization...")

## 1. Baseline Risk Assessment (3 min)

First, let's create a basic prompt and establish baseline performance for comparison.

In [None]:
# TODO: Fill in the missing parts marked with ++++++++++++
# Your Task: Create a basic risk assessment prompt
# For example: baseline_prompt = "You are a bank risk analyst. Assess the customer risk level as Low, Medium, or High."
baseline_prompt = "++++++++++++++"

def test_prompt_baseline(prompt, scenarios):
    """Test a prompt against all scenarios and return results"""
    results = {}
    for risk_level, data in scenarios.items():
        response = get_completion(prompt, data["scenario"])
        results[risk_level] = {
            "response": response,
            "expected": data["expected_risk"]
        }
    return results

# Test baseline prompt
print("=== BASELINE PROMPT PERFORMANCE ===")
baseline_results = test_prompt_baseline(baseline_prompt, test_scenarios)

for risk_level, result in baseline_results.items():
    print(f"\n{risk_level.upper()} RISK SCENARIO:")
    print(f"Expected: {result['expected']}")
    print(f"Response: {result['response'][:200]}...")  # Truncated for display

## 2. Systematic Refinement Framework (5 min)

Now let's build an automated testing framework to systematically improve our prompts.

In [None]:
# TODO: Fill in the missing prompt variations marked with ++++++++++++
# Your Task: Create prompt variations to test different approaches
prompt_variations = {
    "baseline": baseline_prompt,
    
    "structured_output": """++++++++++++++
(Create a prompt that requests structured output with specific format)
Example: You are a bank risk analyst. Assess customer risk and respond with:
Risk Level: [Low/Medium/High]
Key Factors: [List main risk factors]
Confidence: [High/Medium/Low]
""",
    
    "detailed_criteria": """++++++++++++++
(Create a prompt with specific risk assessment criteria)
Example: You are a bank risk analyst. Evaluate customers using these criteria:
- Account age and history
- Transaction patterns and volumes
- Geographic risk factors
- Documentation completeness
Assign risk level: Low, Medium, or High
""",
    
    "compliance_focused": """++++++++++++++
(Create a prompt emphasizing regulatory compliance)
Example: You are a compliance officer conducting customer due diligence.
Apply BSA/AML guidelines to assess risk level based on:
- Customer profile and background
- Transaction monitoring findings
- Geographic and jurisdictional factors
Classify as: Low Risk, Medium Risk, or High Risk
"""
}

print("Prompt variations created for testing...")

In [None]:
# Automated testing framework
def evaluate_prompt_accuracy(results, scenarios):
    """Evaluate how well prompt results match expected risk levels"""
    correct = 0
    total = 0
    
    for risk_level, result in results.items():
        expected = result['expected']
        response = result['response'].upper()
        
        # Simple matching - check if expected risk level appears in response
        if expected.upper() in response:
            correct += 1
        total += 1
    
    return correct / total if total > 0 else 0

def test_all_variations(variations, scenarios):
    """Test all prompt variations and calculate performance metrics"""
    results = {}
    
    for variation_name, prompt in variations.items():
        print(f"\nTesting: {variation_name}")
        variation_results = test_prompt_baseline(prompt, scenarios)
        accuracy = evaluate_prompt_accuracy(variation_results, scenarios)
        
        results[variation_name] = {
            "prompt": prompt,
            "results": variation_results,
            "accuracy": accuracy
        }
        
        print(f"Accuracy: {accuracy:.2%}")
    
    return results

# TODO: Run the automated testing (uncomment when prompts are filled in)
# print("=== AUTOMATED PROMPT TESTING ===")
# all_results = test_all_variations(prompt_variations, test_scenarios)

## 3. Optimization Techniques (4 min)

Let's apply specific optimization techniques to improve prompt performance.

In [None]:
# TODO: Fill in the optimized prompt marked with ++++++++++++
# Your Task: Create an optimized prompt combining the best techniques from your testing
optimized_prompt = """++++++++++++++
(Create your best prompt incorporating lessons from variation testing)

Consider including:
- Clear role definition
- Specific assessment criteria
- Structured output format
- Compliance considerations
- Examples or guidelines

Example structure:
You are a [specific role] with expertise in [domain].
Evaluate the customer using these criteria:
[List specific criteria]
Provide your assessment in this format:
[Specific output structure]
"""

# Test optimization techniques
optimization_tests = {
    "instruction_clarity": {
        "prompt": optimized_prompt,
        "temperature": 0.1  # Lower temperature for consistency
    },
    "context_optimization": {
        "prompt": optimized_prompt + "\n\nFocus on regulatory compliance and risk mitigation.",
        "temperature": 0.3
    }
}

print("Optimization techniques prepared for testing...")

## 4. Advanced Refinement (3 min)

Let's test advanced refinement techniques including parameter optimization.

In [None]:
# TODO: Test different temperature settings
# Your Task: Test how temperature affects response consistency
def test_temperature_consistency(prompt, scenario, temperatures=[0.1, 0.3, 0.7]):
    """Test how different temperatures affect response consistency"""
    consistency_results = {}
    
    for temp in temperatures:
        responses = []
        # Run same prompt multiple times with different temperature
        for _ in range(3):  # Test 3 times for consistency
            response = get_completion(prompt, scenario, temperature=temp)
            responses.append(response)
        
        consistency_results[temp] = responses
    
    return consistency_results

# TODO: Test your optimized prompt with different temperatures
print("=== TEMPERATURE CONSISTENCY TEST ===")
# Uncomment when optimized_prompt is ready:
# temp_results = test_temperature_consistency(
#     optimized_prompt, 
#     test_scenarios["medium_risk"]["scenario"]
# )
# 
# for temp, responses in temp_results.items():
#     print(f"\nTemperature {temp}:")
#     for i, response in enumerate(responses, 1):
#         print(f"Run {i}: {response[:100]}...")  # Truncated for display

## 5. Performance Evaluation (3 min)

Let's compare all our prompt variations and select the best performing approach.

In [None]:
# TODO: Create a comprehensive evaluation framework
# Your Task: Implement evaluation metrics beyond simple accuracy

def comprehensive_evaluation(prompt, scenarios):
    """Comprehensive evaluation including accuracy, consistency, and quality"""
    results = test_prompt_baseline(prompt, scenarios)
    
    # Calculate accuracy
    accuracy = evaluate_prompt_accuracy(results, scenarios)
    
    # TODO: Add more evaluation metrics
    # Calculate response length consistency
    response_lengths = [len(result['response']) for result in results.values()]
    length_consistency = 1 - (statistics.stdev(response_lengths) / statistics.mean(response_lengths)) if len(response_lengths) > 1 else 1
    
    # Calculate structured response adherence (simple check)
    structured_responses = sum(1 for result in results.values() if 'Risk Level:' in result['response'] or 'Risk:' in result['response'])
    structure_adherence = structured_responses / len(results)
    
    return {
        "accuracy": accuracy,
        "length_consistency": length_consistency,
        "structure_adherence": structure_adherence,
        "overall_score": (accuracy + length_consistency + structure_adherence) / 3
    }

# TODO: Compare your best prompts
print("=== COMPREHENSIVE EVALUATION ===")
# Uncomment when prompts are ready:
# final_comparison = {
#     "baseline": comprehensive_evaluation(baseline_prompt, test_scenarios),
#     "optimized": comprehensive_evaluation(optimized_prompt, test_scenarios)
# }
# 
# for prompt_name, metrics in final_comparison.items():
#     print(f"\n{prompt_name.upper()} PROMPT:")
#     print(f"Accuracy: {metrics['accuracy']:.2%}")
#     print(f"Length Consistency: {metrics['length_consistency']:.2%}")
#     print(f"Structure Adherence: {metrics['structure_adherence']:.2%}")
#     print(f"Overall Score: {metrics['overall_score']:.2%}")

## 6. Production Validation (2 min)

Test your refined prompts with complex, realistic scenarios.

In [None]:
# TODO: Create a complex validation scenario
# Your Task: Design a challenging real-world scenario to test your optimized prompt
production_scenario = """++++++++++++++
(Create a complex, realistic customer scenario that combines multiple risk factors)

Example elements to include:
- Mixed risk signals (some concerning, some legitimate)
- Multiple business entities or relationships
- Complex transaction patterns
- Regulatory edge cases
- Documentation and compliance challenges

Customer Profile: [Your complex scenario]
- Business Type: ++++++++++++++
- Account Details: ++++++++++++++
- Transaction Patterns: ++++++++++++++
- Risk Factors: ++++++++++++++
- Mitigating Factors: ++++++++++++++
"""

# TODO: Test your optimized prompt with the production scenario
print("=== PRODUCTION VALIDATION ===")
# Uncomment when scenario and optimized_prompt are ready:
# production_result = get_completion(optimized_prompt, production_scenario)
# print("Complex Scenario Assessment:")
# print(production_result)
print("Create your production scenario above, then uncomment the test code.")

## 7. Reflection & Optimization Analysis

Analyze your optimization process and document key learnings.

### Which optimization techniques were most effective for financial risk assessment?

**TODO: Add your analysis below where you see ++++++++++++**

**Prompt Structure Improvements:**
- Most effective structural changes: ++++++++++++
- Impact on accuracy: ++++++++++++
- Impact on consistency: ++++++++++++

**Instruction Clarity:**
- Key refinements that improved clarity: ++++++++++++
- Effect on response quality: ++++++++++++
- Compliance alignment improvements: ++++++++++++

**Parameter Optimization:**
- Optimal temperature setting: ++++++++++++
- Trade-offs between creativity and consistency: ++++++++++++
- Model selection considerations: ++++++++++++

**Performance Metrics:**
- Accuracy improvement (baseline vs optimized): ++++++++++++
- Most valuable evaluation metric: ++++++++++++
- Areas requiring further refinement: ++++++++++++

**Financial Services Applications:**
- Key regulatory considerations: ++++++++++++
- Risk mitigation through prompt design: ++++++++++++
- Recommendations for production deployment: ++++++++++++

**Optimization Process Insights:**
- Most valuable refinement methodology: ++++++++++++
- Challenges in automated evaluation: ++++++++++++
- Recommendations for continuous improvement: ++++++++++++

## Summary

In this exercise, we implemented systematic prompt refinement for financial risk assessment:

1. **Baseline Assessment**: Established initial prompt performance benchmarks
2. **Systematic Framework**: Built automated testing and evaluation systems
3. **Optimization Techniques**: Applied prompt engineering best practices
4. **Advanced Refinement**: Tested parameter optimization and consistency
5. **Performance Evaluation**: Implemented comprehensive quality metrics
6. **Production Validation**: Tested with complex, realistic scenarios

Key insights for financial AI prompt optimization:
- **Systematic Testing**: Automated frameworks enable objective comparison
- **Multiple Metrics**: Accuracy alone isn't sufficient for financial applications
- **Regulatory Alignment**: Compliance considerations must be built into prompts
- **Iterative Improvement**: Continuous refinement delivers measurable gains
- **Production Validation**: Real-world scenarios reveal optimization opportunities

These optimization techniques provide a robust foundation for building reliable, compliant, and accurate financial AI systems! 🎉