# Lesson 5: Self-Learning Loan Agent - SOLUTION
## How AI Agents Learn Financial Approval Patterns Through Feedback

**Objective**: Build an agent that learns loan approval patterns using categorical credit and income levels through feedback loops.

**Complex Financial Rule**: The agent discovers that loans are approved when `(MEDIUM credit + HIGH income) OR (HIGH credit)` through experience!

### Core Concept: Self-Learning Through Feedback
🤖 **Agent sees (credit level, income level)** → ⚖️ **Decides APPROVE/DENY** → 📊 **Gets ground truth feedback** → 🧠 **Learns categorical combinations**

**Key Insight**: Agent learns which categorical combinations lead to successful loans through systematic feedback and self-correction!

## Setup

In [23]:
import openai
import random

# Initialize OpenAI client
client = openai.OpenAI()

print("🔧 Ready to build a self-correcting agent!")

🔧 Ready to build a self-correcting agent!


## The Categorical Learning Scenario: Credit Level + Income Level

Our agent analyzes categorical financial data to discover loan approval patterns! It must learn which combinations of credit and income levels lead to successful loans.

In [27]:
# Two-variable loan applications with CATEGORICAL variables
# Credit Score: LOW (300-599) | MEDIUM (600-749) | HIGH (750-850)
# Income: LOW (<50K) | MEDIUM (50K-79K) | HIGH (80K+)

def categorize_credit(score):
    if score < 600: return "LOW"
    elif score < 750: return "MEDIUM" 
    else: return "HIGH"

def categorize_income(income):
    if income < 50000: return "LOW"
    elif income < 80000: return "MEDIUM"
    else: return "HIGH"

# 20 training examples for comprehensive learning
LOAN_APPLICATIONS = [
    # LOW CREDIT examples (should all be DENIED)
    {'name': 'Alex', 'credit_score': 550, 'income': 45000},   # LOW + LOW → DENY
    {'name': 'Blake', 'credit_score': 580, 'income': 65000},  # LOW + MEDIUM → DENY
    {'name': 'Casey', 'credit_score': 590, 'income': 85000},  # LOW + HIGH → DENY
    
    # MEDIUM CREDIT examples (only approve if HIGH income)
    {'name': 'Dana', 'credit_score': 650, 'income': 40000},   # MEDIUM + LOW → DENY
    {'name': 'Emma', 'credit_score': 680, 'income': 65000},   # MEDIUM + MEDIUM → DENY
    {'name': 'Felix', 'credit_score': 670, 'income': 90000},  # MEDIUM + HIGH → APPROVE
    {'name': 'Grace', 'credit_score': 720, 'income': 85000},  # MEDIUM + HIGH → APPROVE
    {'name': 'Henry', 'credit_score': 640, 'income': 75000},  # MEDIUM + MEDIUM → DENY
    {'name': 'Iris', 'credit_score': 690, 'income': 95000},   # MEDIUM + HIGH → APPROVE
    
    # HIGH CREDIT examples (should all be APPROVED)
    {'name': 'Jack', 'credit_score': 780, 'income': 35000},   # HIGH + LOW → APPROVE
    {'name': 'Kate', 'credit_score': 820, 'income': 60000},   # HIGH + MEDIUM → APPROVE
    {'name': 'Leo', 'credit_score': 760, 'income': 120000},   # HIGH + HIGH → APPROVE
    {'name': 'Maya', 'credit_score': 800, 'income': 45000},   # HIGH + LOW → APPROVE
    {'name': 'Noah', 'credit_score': 770, 'income': 70000},   # HIGH + MEDIUM → APPROVE
    
    # Additional mixed examples for thorough learning
    {'name': 'Olivia', 'credit_score': 610, 'income': 55000}, # MEDIUM + MEDIUM → DENY
    {'name': 'Paul', 'credit_score': 740, 'income': 40000},   # MEDIUM + LOW → DENY
    {'name': 'Quinn', 'credit_score': 750, 'income': 100000}, # HIGH + HIGH → APPROVE
    {'name': 'Ruby', 'credit_score': 660, 'income': 88000},   # MEDIUM + HIGH → APPROVE
    {'name': 'Sam', 'credit_score': 570, 'income': 75000},    # LOW + MEDIUM → DENY
    {'name': 'Tina', 'credit_score': 790, 'income': 55000},   # HIGH + MEDIUM → APPROVE
]

# GROUND TRUTH RULE (Hidden from agent):
# APPROVE = (credit_score >= 600 AND income >= 80000) OR (credit_score >= 750)
# In categorical terms: APPROVE = (MEDIUM credit AND HIGH income) OR (HIGH credit)

def get_ground_truth_outcome(credit_score, income):
    """The actual rule the agent needs to learn"""
    return (credit_score >= 600 and income >= 80000) or (credit_score >= 750)

print(f"📊 Created {len(LOAN_APPLICATIONS)} training examples")
print("🎯 Ground Truth Rule: APPROVE if (MEDIUM credit + HIGH income) OR (HIGH credit)")
print("🤖 Agent will learn this pattern through feedback...")

print("🏦 Categorical Learning System Ready!")
print("Variables: Credit Level + Income Level")
print("\nSample applications:")
for app in LOAN_APPLICATIONS[:4]:
    credit_cat = categorize_credit(app['credit_score'])
    income_cat = categorize_income(app['income'])
    print(f"  {app['name']}: {credit_cat} credit + {income_cat} income")

print("\n🤐 HIDDEN GROUND TRUTH (agent must discover this):")
print("   ✅ SUCCESS: (MEDIUM credit + HIGH income) OR (HIGH credit)")
print("   ❌ FAILURE: All other combinations")
print("\n💡 Agent will learn this categorical rule through trial and error!")

📊 Created 20 training examples
🎯 Ground Truth Rule: APPROVE if (MEDIUM credit + HIGH income) OR (HIGH credit)
🤖 Agent will learn this pattern through feedback...
🏦 Categorical Learning System Ready!
Variables: Credit Level + Income Level

Sample applications:
  Alex: LOW credit + LOW income
  Blake: LOW credit + MEDIUM income
  Casey: LOW credit + HIGH income
  Dana: MEDIUM credit + LOW income

🤐 HIDDEN GROUND TRUTH (agent must discover this):
   ✅ SUCCESS: (MEDIUM credit + HIGH income) OR (HIGH credit)
   ❌ FAILURE: All other combinations

💡 Agent will learn this categorical rule through trial and error!


## Step 1: Categorical Learning Agent - SOLUTION

In [25]:
class CategoricalLearningAgent:
    def __init__(self):
        # Learning state - track successes and failures by categorical combinations
        self.learnings = {
            # Track outcomes for each categorical combination
            'combinations': {},  # {(credit_cat, income_cat): {'successes': 0, 'failures': 0}}
            'total_decisions': 0,
            'correct_decisions': 0
        }
        self.decision_history = []
        
    def make_loan_decision(self, applicant):
        """Make decision using categorical variables and learned patterns"""
        credit_score = applicant['credit_score']
        income = applicant['income']
        name = applicant['name']
        
        # Convert to categories
        credit_cat = categorize_credit(credit_score)
        income_cat = categorize_income(income)
        
        # Check learned patterns for this combination
        combo_key = (credit_cat, income_cat)
        learned_info = ""
        
        if combo_key in self.learnings['combinations']:
            combo_data = self.learnings['combinations'][combo_key]
            successes = combo_data['successes']
            failures = combo_data['failures']
            total = successes + failures
            if total > 0:
                success_rate = successes / total
                learned_info = f"\nLearned pattern for {credit_cat} credit + {income_cat} income: {successes} successes, {failures} failures (success rate: {success_rate:.1%})"
        
        # Show general learning state if we have enough experience
        experience_summary = ""
        if self.learnings['total_decisions'] >= 3:
            accuracy = self.learnings['correct_decisions'] / self.learnings['total_decisions']
            experience_summary = f"\nCurrent learning accuracy: {accuracy:.1%} ({self.learnings['correct_decisions']}/{self.learnings['total_decisions']})"
            
            # Show successful combinations learned so far
            successful_combos = []
            for combo, data in self.learnings['combinations'].items():
                if data['successes'] > data['failures']:
                    successful_combos.append(f"{combo[0]} credit + {combo[1]} income")
            
            if successful_combos:
                experience_summary += f"\nLearned successful patterns: {', '.join(successful_combos[:3])}"
        
        # For first few decisions, make random choices to explore
        if self.learnings['total_decisions'] < 3:
            decision = random.choice(['APPROVE', 'DENY'])
            reasoning = f"Early exploration phase ({self.learnings['total_decisions'] + 1}/3) - making random decision to learn patterns"
        else:
            # Use LLM with learned knowledge
            prompt = f"""
You are a loan officer who has learned approval patterns from experience.

Current Application:
- Applicant: {name}
- Credit Score: {credit_score} ({credit_cat})
- Annual Income: ${income:,} ({income_cat})
{learned_info}{experience_summary}

Based on your learned patterns, decide APPROVE or DENY.
If you've seen this combination before, use that knowledge.
If not, make your best guess based on similar patterns.

Format: DECISION | Reason: [brief explanation]
"""
            
            try:
                response = client.chat.completions.create(
                    model="gpt-3.5-turbo",
                    messages=[
                        {"role": "system", "content": "You learn loan approval patterns from categorical combinations of credit score and income levels."},
                        {"role": "user", "content": prompt}
                    ],
                    max_tokens=100,
                    temperature=0.3
                )
                
                decision_text = response.choices[0].message.content
                parts = decision_text.split("|")
                decision_part = parts[0].strip()
                reasoning = parts[1].split(":", 1)[1].strip() if len(parts) > 1 else "No specific reasoning"
                
                decision = 'APPROVE' if 'APPROVE' in decision_part.upper() else 'DENY'
                
            except Exception as e:
                decision = 'DENY'
                reasoning = f"Error in decision making: {str(e)}"
        
        return {
            'decision': decision,
            'applicant': name,
            'credit_score': credit_score,
            'income': income,
            'credit_category': credit_cat,
            'income_category': income_cat,
            'reasoning': reasoning
        }
    
    def receive_feedback(self, decision_info, loan_was_successful, ground_truth_decision):
        """SOLUTION: Learn from ALL decisions - both right and wrong predictions!"""
        
        credit_score = decision_info['credit_score']
        income = decision_info['income']
        agent_decision = decision_info['decision']
        
        # Store the complete experience
        self.decision_history.append({
            'decision': decision_info,
            'successful_outcome': loan_was_successful,
            'credit_score': credit_score,
            'income': income,
            'ground_truth_decision': ground_truth_decision,
            'agent_decision': agent_decision
        })
        
        self.experience_count += 1
        combination = (credit_score, income)
        
        # SOLUTION: Learn from ALL decisions - both approvals and denials
        if agent_decision == 'APPROVE':
            if loan_was_successful:
                self.learned_patterns['successful_combinations'].append(combination)
                print(f"   📈 LEARNED SUCCESS: Credit {credit_score} + Income ${income:,} → ✅ CORRECT APPROVAL")
            else:
                self.learned_patterns['failed_combinations'].append(combination)
                print(f"   📉 LEARNED FAILURE: Credit {credit_score} + Income ${income:,} → ❌ WRONG APPROVAL")
        else:  # agent_decision == 'DENY'
            if ground_truth_decision == 'DENY':
                print(f"   ✅ CORRECT DENIAL: Credit {credit_score} + Income ${income:,} → Agent was right to deny")
                # Don't add to failed_combinations since this was correct
            else:  # ground_truth_decision == 'APPROVE' 
                print(f"   ❌ MISSED OPPORTUNITY: Credit {credit_score} + Income ${income:,} → Should have APPROVED!")
                # Learn this as a missed success pattern
                self.learned_patterns['successful_combinations'].append(combination)
                print(f"   🔄 SELF-CORRECTING: Adding to success patterns for future decisions")
    
    def receive_feedback(self, decision_result, ground_truth_outcome):
        """Learn from the outcome and update categorical combination knowledge"""
        credit_cat = decision_result['credit_category']
        income_cat = decision_result['income_category']
        agent_decision = decision_result['decision']
        applicant = decision_result['applicant']
        
        # Update decision tracking
        self.learnings['total_decisions'] += 1
        self.decision_history.append({
            'applicant': applicant,
            'agent_decision': agent_decision,
            'ground_truth': ground_truth_outcome,
            'combination': (credit_cat, income_cat)
        })
        
        combo_key = (credit_cat, income_cat)
        if combo_key not in self.learnings['combinations']:
            self.learnings['combinations'][combo_key] = {'successes': 0, 'failures': 0}
        
        # Determine if this was a correct decision
        was_correct = False
        
        if agent_decision == 'APPROVE' and ground_truth_outcome:
            # Correct approval
            self.learnings['combinations'][combo_key]['successes'] += 1
            self.learnings['correct_decisions'] += 1
            was_correct = True
            print(f"   ✅ CORRECT APPROVAL: {credit_cat} + {income_cat} → SUCCESS")
            
        elif agent_decision == 'APPROVE' and not ground_truth_outcome:
            # Wrong approval  
            self.learnings['combinations'][combo_key]['failures'] += 1
            print(f"   ❌ WRONG APPROVAL: {credit_cat} + {income_cat} → FAILURE")
            
        elif agent_decision == 'DENY' and not ground_truth_outcome:
            # Correct denial
            self.learnings['correct_decisions'] += 1
            was_correct = True
            print(f"   ✅ CORRECT DENIAL: {credit_cat} + {income_cat} → Good judgment")
            
        elif agent_decision == 'DENY' and ground_truth_outcome:
            # Missed opportunity - should have approved
            self.learnings['combinations'][combo_key]['successes'] += 1  # Learn this should succeed
            print(f"   🔄 MISSED OPPORTUNITY: {credit_cat} + {income_cat} → Should have APPROVED!")
            print(f"      Learning this combination for future decisions...")
        
        return was_correct
    
    def analyze_learned_patterns(self):
        """Analyze the categorical patterns discovered"""
        if not self.learnings['combinations']:
            return "📚 No patterns learned yet - need more experience"
        
        analysis = f"🧠 CATEGORICAL LEARNING STATE after {self.learnings['total_decisions']} decisions:\n\n"
        
        # Show accuracy
        if self.learnings['total_decisions'] > 0:
            accuracy = self.learnings['correct_decisions'] / self.learnings['total_decisions']
            analysis += f"🎯 Current Accuracy: {accuracy:.1%} ({self.learnings['correct_decisions']}/{self.learnings['total_decisions']})\n\n"
        
        # Analyze each combination
        analysis += "📊 Learned Combination Patterns:\n"
        for combo, data in sorted(self.learnings['combinations'].items()):
            credit_cat, income_cat = combo
            successes = data['successes']
            failures = data['failures']
            total = successes + failures
            
            if total > 0:
                success_rate = successes / total
                status = "✅ APPROVE" if success_rate > 0.5 else "❌ DENY"
                analysis += f"   {credit_cat} credit + {income_cat} income: {status} ({successes}✅ {failures}❌, {success_rate:.0%} success)\n"
        
        # Generate hypothesis about the rule
        analysis += f"\n🔍 DISCOVERED RULE HYPOTHESIS:\n"
        successful_combos = []
        for combo, data in self.learnings['combinations'].items():
            if data['successes'] > data['failures']:
                successful_combos.append(f"{combo[0]} credit + {combo[1]} income")
        
        if successful_combos:
            analysis += f"   APPROVE when: {' OR '.join(successful_combos)}\n"
        else:
            analysis += "   Still learning the approval patterns...\n"
        
        return analysis

print("🎯 CategoricalLearningAgent ready - learns categorical approval patterns!")

🎯 CategoricalLearningAgent ready - learns categorical approval patterns!


## Step 2: Two-Variable Decision Simulation - SOLUTION

In [32]:
def run_categorical_learning_simulation(rounds=20):
    """Run simulation with categorical learning agent"""
    
    print("🎯 CATEGORICAL LEARNING SIMULATION")
    print("="*60)
    print("🎲 Variables: Credit (LOW/MEDIUM/HIGH) + Income (LOW/MEDIUM/HIGH)")
    print("🤐 Hidden Rule: APPROVE if (MEDIUM credit + HIGH income) OR (HIGH credit)")
    print("🧠 Agent Strategy: Random exploration → Pattern-based decisions")
    print()
    
    # Create learning agent
    agent = CategoricalLearningAgent()
    accuracy_over_time = []
    
    for round_num in range(rounds):
        print(f"📋 Round {round_num + 1}/{rounds}")
        print("-" * 30)
        
        # Get application
        applicant = LOAN_APPLICATIONS[round_num % len(LOAN_APPLICATIONS)]
        
        credit_cat = categorize_credit(applicant['credit_score'])
        income_cat = categorize_income(applicant['income'])
        
        print(f"👤 {applicant['name']}")
        print(f"💳 Credit: {applicant['credit_score']} ({credit_cat})")
        print(f"💰 Income: ${applicant['income']:,} ({income_cat})")
        
        # Ground truth outcome
        ground_truth = get_ground_truth_outcome(applicant['credit_score'], applicant['income'])
        print(f"🤐 Ground Truth: {'SUCCESS' if ground_truth else 'FAILURE'}")
        
        # Agent makes decision
        print("🧠 Agent deciding...")
        decision_result = agent.make_loan_decision(applicant)
        
        print(f"🤖 Agent Decision: {decision_result['decision']}")
        print(f"💭 Reasoning: {decision_result['reasoning']}")
        
        # Apply feedback and learning
        print("📚 Learning from outcome...")
        was_correct = agent.receive_feedback(decision_result, ground_truth)
        
        # Track accuracy over time
        if agent.learnings['total_decisions'] > 0:
            current_accuracy = agent.learnings['correct_decisions'] / agent.learnings['total_decisions']
            accuracy_over_time.append(current_accuracy)
        
        print(f"📊 Current Decision #{agent.learnings['total_decisions']}")
        if round_num >= 2:  # Show accuracy after a few rounds
            print(f"🎯 Running Accuracy: {current_accuracy:.1%}")
        
        print()
        
        # Show learning analysis every 5 rounds
        if (round_num + 1) % 5 == 0:
            print("🧠 LEARNING ANALYSIS:")
            print("-" * 40)
            print(agent.analyze_learned_patterns())
            print()
    
    return agent, accuracy_over_time

# Run the simulation
print("🚀 Starting Categorical Learning Simulation...")
trained_agent, accuracy_progression = run_categorical_learning_simulation(20)

print("\n🎯 SIMULATION COMPLETE!")
print("="*50)
print(f"✅ Trained agent ready for evaluation")
print(f"📈 Accuracy progression tracked over {len(accuracy_progression)} decisions")

🚀 Starting Categorical Learning Simulation...
🎯 CATEGORICAL LEARNING SIMULATION
🎲 Variables: Credit (LOW/MEDIUM/HIGH) + Income (LOW/MEDIUM/HIGH)
🤐 Hidden Rule: APPROVE if (MEDIUM credit + HIGH income) OR (HIGH credit)
🧠 Agent Strategy: Random exploration → Pattern-based decisions

📋 Round 1/20
------------------------------
👤 Alex
💳 Credit: 550 (LOW)
💰 Income: $45,000 (LOW)
🤐 Ground Truth: FAILURE
🧠 Agent deciding...
🤖 Agent Decision: APPROVE
💭 Reasoning: Early exploration phase (1/3) - making random decision to learn patterns
📚 Learning from outcome...
   ❌ WRONG APPROVAL: LOW + LOW → FAILURE
📊 Current Decision #1

📋 Round 2/20
------------------------------
👤 Blake
💳 Credit: 580 (LOW)
💰 Income: $65,000 (MEDIUM)
🤐 Ground Truth: FAILURE
🧠 Agent deciding...
🤖 Agent Decision: DENY
💭 Reasoning: Early exploration phase (2/3) - making random decision to learn patterns
📚 Learning from outcome...
   ✅ CORRECT DENIAL: LOW + MEDIUM → Good judgment
📊 Current Decision #2

📋 Round 3/20
-----------

## Step 3: Run the Loan Approval Simulation - SOLUTION

In [None]:
# SOLUTION: Run the two-variable learning simulation
agent = run_two_variable_learning(rounds=12)

## Analyze the Two-Variable Learning - SOLUTION

In [None]:
# SOLUTION: Analyze what the agent learned from two-variable combinations
print("🎯 TWO-VARIABLE PATTERN ANALYSIS")
print("="*50)

print("📊 LOAN DECISIONS & GROUND TRUTH OUTCOMES:")
for i, history in enumerate(agent.decision_history):
    decision_info = history['decision']
    outcome = "✅ Success" if history['successful_outcome'] else "❌ Failure"
    credit = decision_info['credit_score']
    income = decision_info['income']
    agent_decision = history['agent_decision']
    ground_truth_decision = history['ground_truth_decision']
    
    # Show match/mismatch between agent and ground truth
    decision_match = "✅" if agent_decision == ground_truth_decision else "❌"
    print(f"   {i+1}. {decision_info['applicant']}: Credit {credit}, Income ${income:,}")
    print(f"      Agent: {agent_decision} | Ground Truth: {ground_truth_decision} {decision_match} | Outcome: {outcome}")

# Show what the agent learned vs the ground truth
print(f"\n🤐 GROUND TRUTH RULE:")
print(f"   SUCCESS = (Credit ≥ 650 AND Income ≥ $70K) OR (Credit ≥ 700)")

print(f"\n🧠 AGENT'S DISCOVERED PATTERNS:")
if agent.learned_patterns['successful_combinations']:
    print(f"   ✅ Learned successful combinations:")
    for credit, income in agent.learned_patterns['successful_combinations']:
        meets_rule = (credit >= 650 and income >= 70000) or (credit >= 700)
        rule_check = "✓ Matches rule" if meets_rule else "✗ Rule violation"
        print(f"      Credit {credit} + Income ${income:,} ({rule_check})")

if agent.learned_patterns['failed_combinations']:
    print(f"   ❌ Learned failed combinations:")
    for credit, income in agent.learned_patterns['failed_combinations']:
        meets_rule = (credit >= 650 and income >= 70000) or (credit >= 700)
        rule_check = "✗ Should have failed" if meets_rule else "✓ Correctly failed"
        print(f"      Credit {credit} + Income ${income:,} ({rule_check})")

# Analyze learning progression
print(f"\n📈 LEARNING PROGRESSION ANALYSIS:")
correct_decisions = sum(1 for h in agent.decision_history if h['agent_decision'] == h['ground_truth_decision'])
total_decisions = len(agent.decision_history)
accuracy = correct_decisions / total_decisions * 100 if total_decisions > 0 else 0

print(f"   Overall accuracy: {correct_decisions}/{total_decisions} ({accuracy:.1f}%)")

# Check early vs late accuracy
if total_decisions >= 6:
    early_decisions = agent.decision_history[:total_decisions//2]
    late_decisions = agent.decision_history[total_decisions//2:]
    
    early_correct = sum(1 for h in early_decisions if h['agent_decision'] == h['ground_truth_decision'])
    late_correct = sum(1 for h in late_decisions if h['agent_decision'] == h['ground_truth_decision'])
    
    early_accuracy = early_correct / len(early_decisions) * 100
    late_accuracy = late_correct / len(late_decisions) * 100
    
    print(f"   Early decisions accuracy: {early_correct}/{len(early_decisions)} ({early_accuracy:.1f}%)")
    print(f"   Later decisions accuracy: {late_correct}/{len(late_decisions)} ({late_accuracy:.1f}%)")
    
    if late_accuracy > early_accuracy:
        print(f"   🎯 IMPROVEMENT: Agent got better over time! (+{late_accuracy - early_accuracy:.1f}%)")
    elif late_accuracy < early_accuracy:
        print(f"   📉 DECLINE: Agent got worse over time (-{early_accuracy - late_accuracy:.1f}%)")
    else:
        print(f"   📊 STABLE: Agent maintained consistent performance")

# Test the agent's learning against new combinations
print(f"\n🔍 TESTING LEARNED RULE ON NEW COMBINATIONS:")
test_cases = [
    (720, 80000, "High credit + Good income"),
    (650, 90000, "Medium credit + High income"),  
    (750, 45000, "High credit + Low income"),
    (620, 60000, "Medium credit + Medium income"),
    (680, 75000, "Good credit + Good income"),
]

for credit, income, description in test_cases:
    # What ground truth says
    ground_truth = get_ground_truth_outcome(credit, income, 'APPROVE')
    expected = "SUCCESS" if ground_truth else "FAILURE"
    
    # What the agent would likely decide (based on learned patterns)
    similar_successes = [c for c, i in agent.learned_patterns['successful_combinations'] 
                        if abs(c - credit) <= 50 and abs(i - income) <= 20000]
    similar_failures = [c for c, i in agent.learned_patterns['failed_combinations'] 
                       if abs(c - credit) <= 50 and abs(i - income) <= 20000]
    
    if similar_successes and not similar_failures:
        agent_likely = "Would likely APPROVE"
    elif similar_failures and not similar_successes:
        agent_likely = "Would likely DENY"
    else:
        agent_likely = "Uncertain"
    
    print(f"   Credit {credit} + Income ${income:,} ({description})")
    print(f"     Ground truth: {expected} | Agent: {agent_likely}")

print(f"\n💡 KEY INSIGHTS:")
print(f"   🎯 Agent learns from BOTH correct and incorrect decisions")
print(f"   🔄 Missed opportunities teach agent about good combinations it denied")
print(f"   📈 Wrong approvals teach agent about bad combinations to avoid")
print(f"   🧠 Self-correction: Agent adjusts future decisions based on all feedback")
print(f"   🎭 Pattern recognition: Agent discovers complex rules through trial and error")

# Show learning progression
success_count = len(agent.learned_patterns['successful_combinations'])
failure_count = len(agent.learned_patterns['failed_combinations'])
print(f"\n📈 LEARNING PROGRESSION:")
print(f"   Total experiences: {success_count + failure_count}")
print(f"   Successful patterns learned: {success_count}")
print(f"   Failed patterns learned: {failure_count}")
print(f"   🚀 Agent now has experience-based knowledge for future decisions!")

# Show specific self-corrections
print(f"\n🔄 SELF-CORRECTION EXAMPLES:")
print(f"   When agent DENIES but should APPROVE → Learns it as success pattern")
print(f"   When agent APPROVES but should DENY → Learns it as failure pattern")
print(f"   Next similar case → Agent uses learned patterns to decide correctly")

## Understanding Check: Two-Variable Financial Pattern Learning

This demo shows **financial AI learning** with exactly TWO variables and a **ground truth rule** - perfect for understanding pattern discovery:

### 🎯 **The Two-Variable Scenario**
- **Input**: Credit Score (300-850) + Annual Income ($20K-$200K)
- **Decision**: APPROVE or DENY loans
- **Ground Truth**: `(Credit ≥ 650 AND Income ≥ $70K) OR (Credit ≥ 700)`
- **Learning**: Discover this hidden rule through experience

### 🤐 **Ground Truth Examples**
- **Alice**: Credit 720 + Income $80K → ✅ SUCCESS (high credit)
- **Bob**: Credit 650 + Income $90K → ✅ SUCCESS (medium credit + high income)  
- **Carol**: Credit 750 + Income $45K → ✅ SUCCESS (high credit overrides low income)
- **David**: Credit 600 + Income $40K → ❌ FAILURE (doesn't meet either condition)

### 🔄 **Learning Loop with Instant Feedback & Self-Correction**
1. **See Combination** → Agent analyzes (Credit: 650, Income: $90K)
2. **Make Decision** → Agent decides DENY (initially cautious)
3. **Compare vs Ground Truth** → Ground Truth says "Should APPROVE!"
4. **Get Outcome Feedback** → "This would have SUCCEEDED!"
5. **Self-Correct** → Agent stores: (650, $90K) = Success pattern
6. **Apply Learning** → Next similar case, agent approves

### 🎯 **Key Self-Correction Moments**
- **Agent denies** Credit 650 + Income $90K → **Ground Truth**: "Should APPROVE!" → **Learns**: This combination works (missed opportunity)
- **Agent approves** Credit 620 + Income $60K → **Ground Truth**: "Should DENY!" → **Learns**: This combination fails (wrong approval)
- **Agent denies** Credit 600 + Income $40K → **Ground Truth**: "Should DENY!" → **Confirms**: Correct decision (no learning needed)

### 🧠 **What Agent Discovers**
- **Specific Combinations**: Agent learns exact (credit, income) pairs that work
- **Pattern Recognition**: Gradually identifies high credit OR (medium credit + high income)
- **Rule Convergence**: Eventually discovers the hidden ground truth rule

### 💡 **Why This Works Perfectly**
- **Deterministic**: Ground truth gives consistent feedback (no randomness)
- **Clear Patterns**: Two variables create observable combinations
- **Instant Feedback**: Agent knows immediately if decision was correct
- **Corrective Learning**: Wrong decisions provide precise correction

### 🎯 **Key Learning Moments**
- **Agent denies** Credit 650 + Income $90K → **Feedback**: "This SHOULD succeed!" → **Learns**: This combination works
- **Agent approves** Credit 620 + Income $60K → **Feedback**: "This SHOULD fail!" → **Learns**: This combination doesn't work

### 🔍 **Observable Pattern Discovery**
You can literally watch the agent learn:
1. **Early**: Makes inconsistent decisions
2. **Middle**: Starts recognizing successful combinations  
3. **Late**: Discovers rule structure (high credit OR medium credit + high income)

**Real Financial Application**: This is exactly how recommendation systems, credit scoring models, and risk assessment AI learn optimal decision boundaries from historical data!

**Educational Value**: Students see precise moment-by-moment learning as agent discovers a complex two-variable rule through experience rather than being programmed with it.

# Lesson 5: Two-Variable Learning Agent - SOLUTION
## How AI Agents Learn Complex Financial Patterns

**Objective**: Build an agent that learns the relationship between TWO variables: credit score + income.

**Complex Financial Rule**: The agent discovers that `(Credit ≥ 650 AND Income ≥ $70K) OR (Credit ≥ 700)` leads to successful loans!

### Core Concept: Two-Variable Pattern Learning
🤖 **Agent sees (credit, income)** → ⚖️ **Decides APPROVE/DENY** → 📊 **Gets ground truth feedback** → 🧠 **Learns specific combinations**

**Key Insight**: Agent learns a complex rule: "Success depends on credit score AND income combinations" and discovers the exact pattern through experience!

## Setup

In [None]:
import openai
import random

# Initialize OpenAI client
client = openai.OpenAI()

print("🔧 Ready to build a self-correcting agent!")

## The Two-Variable Learning Scenario: Credit Score + Income

Our agent sees exactly TWO variables: credit score (300-850) and annual income ($20K-$200K). It must discover the hidden rule that determines loan approval success!

In [None]:
# Two-variable loan applications with CATEGORICAL variables
# Credit Score: LOW (300-599) | MEDIUM (600-749) | HIGH (750-850)
# Income: LOW (<50K) | MEDIUM (50K-79K) | HIGH (80K+)

def categorize_credit(score):
    if score < 600: return "LOW"
    elif score < 750: return "MEDIUM" 
    else: return "HIGH"

def categorize_income(income):
    if income < 50000: return "LOW"
    elif income < 80000: return "MEDIUM"
    else: return "HIGH"

# 20 training examples for comprehensive learning
LOAN_APPLICATIONS = [
    # LOW CREDIT examples (should all be DENIED)
    {'name': 'Alex', 'credit_score': 550, 'income': 45000},   # LOW + LOW → DENY
    {'name': 'Blake', 'credit_score': 580, 'income': 65000},  # LOW + MEDIUM → DENY
    {'name': 'Casey', 'credit_score': 590, 'income': 85000},  # LOW + HIGH → DENY
    
    # MEDIUM CREDIT examples (only approve if HIGH income)
    {'name': 'Dana', 'credit_score': 650, 'income': 40000},   # MEDIUM + LOW → DENY
    {'name': 'Emma', 'credit_score': 680, 'income': 65000},   # MEDIUM + MEDIUM → DENY
    {'name': 'Felix', 'credit_score': 670, 'income': 90000},  # MEDIUM + HIGH → APPROVE
    {'name': 'Grace', 'credit_score': 720, 'income': 85000},  # MEDIUM + HIGH → APPROVE
    {'name': 'Henry', 'credit_score': 640, 'income': 75000},  # MEDIUM + MEDIUM → DENY
    {'name': 'Iris', 'credit_score': 690, 'income': 95000},   # MEDIUM + HIGH → APPROVE
    
    # HIGH CREDIT examples (should all be APPROVED)
    {'name': 'Jack', 'credit_score': 780, 'income': 35000},   # HIGH + LOW → APPROVE
    {'name': 'Kate', 'credit_score': 820, 'income': 60000},   # HIGH + MEDIUM → APPROVE
    {'name': 'Leo', 'credit_score': 760, 'income': 120000},   # HIGH + HIGH → APPROVE
    {'name': 'Maya', 'credit_score': 800, 'income': 45000},   # HIGH + LOW → APPROVE
    {'name': 'Noah', 'credit_score': 770, 'income': 70000},   # HIGH + MEDIUM → APPROVE
    
    # Additional mixed examples for thorough learning
    {'name': 'Olivia', 'credit_score': 610, 'income': 55000}, # MEDIUM + MEDIUM → DENY
    {'name': 'Paul', 'credit_score': 740, 'income': 40000},   # MEDIUM + LOW → DENY
    {'name': 'Quinn', 'credit_score': 750, 'income': 100000}, # HIGH + HIGH → APPROVE
    {'name': 'Ruby', 'credit_score': 660, 'income': 88000},   # MEDIUM + HIGH → APPROVE
    {'name': 'Sam', 'credit_score': 570, 'income': 75000},    # LOW + MEDIUM → DENY
    {'name': 'Tina', 'credit_score': 790, 'income': 55000},   # HIGH + MEDIUM → APPROVE
]

# GROUND TRUTH RULE (Hidden from agent):
# APPROVE = (credit_score >= 600 AND income >= 80000) OR (credit_score >= 750)
# In categorical terms: APPROVE = (MEDIUM credit AND HIGH income) OR (HIGH credit)

def get_ground_truth_outcome(credit_score, income):
    """The actual rule the agent needs to learn"""
    return (credit_score >= 600 and income >= 80000) or (credit_score >= 750)

print(f"📊 Created {len(LOAN_APPLICATIONS)} training examples")
print("🎯 Ground Truth Rule: APPROVE if (MEDIUM credit + HIGH income) OR (HIGH credit)")
print("🤖 Agent will learn this pattern through feedback...")

print("🏦 Categorical Learning System Ready!")
print("Variables: Credit Score + Annual Income")
print("\nSample applications:")
for app in TWO_VARIABLE_APPLICATIONS[:4]:
    print(f"  {app['name']}: Credit {app['credit_score']}, Income ${app['income']:,}")

print("\n🤐 HIDDEN GROUND TRUTH (agent must discover this):")
print("   ✅ SUCCESS: (Credit ≥ 650 AND Income ≥ $70K) OR (Credit ≥ 700)")
print("   ❌ FAILURE: All other combinations")
print("\n💡 Agent will learn this rule through trial and error!")

## Step 1: Two-Variable Learning Agent - SOLUTION

In [None]:
class CategoricalLearningAgent:
    def __init__(self):
        # Learning state - track successes and failures by categorical combinations
        self.learnings = {
            # Track outcomes for each categorical combination
            'combinations': {},  # {(credit_cat, income_cat): {'successes': 0, 'failures': 0}}
            'total_decisions': 0,
            'correct_decisions': 0
        }
        self.decision_history = []
        
    def make_loan_decision(self, applicant):
        """Make decision using categorical variables and learned patterns"""
        credit_score = applicant['credit_score']
        income = applicant['income']
        name = applicant['name']
        
        # Convert to categories
        credit_cat = categorize_credit(credit_score)
        income_cat = categorize_income(income)
        
        # Check learned patterns for this combination
        combo_key = (credit_cat, income_cat)
        learned_info = ""
        
        if combo_key in self.learnings['combinations']:
            combo_data = self.learnings['combinations'][combo_key]
            successes = combo_data['successes']
            failures = combo_data['failures']
            total = successes + failures
            if total > 0:
                success_rate = successes / total
                learned_info = f"\nLearned pattern for {credit_cat} credit + {income_cat} income: {successes} successes, {failures} failures (success rate: {success_rate:.1%})"
        
        # Show general learning state if we have enough experience
        experience_summary = ""
        if self.learnings['total_decisions'] >= 3:
            accuracy = self.learnings['correct_decisions'] / self.learnings['total_decisions']
            experience_summary = f"\nCurrent learning accuracy: {accuracy:.1%} ({self.learnings['correct_decisions']}/{self.learnings['total_decisions']})"
            
            # Show successful combinations learned so far
            successful_combos = []
            for combo, data in self.learnings['combinations'].items():
                if data['successes'] > data['failures']:
                    successful_combos.append(f"{combo[0]} credit + {combo[1]} income")
            
            if successful_combos:
                experience_summary += f"\nLearned successful patterns: {', '.join(successful_combos[:3])}"
        
        # For first few decisions, make random choices to explore
        if self.learnings['total_decisions'] < 3:
            decision = random.choice(['APPROVE', 'DENY'])
            reasoning = f"Early exploration phase ({self.learnings['total_decisions'] + 1}/3) - making random decision to learn patterns"
        else:
            # Use LLM with learned knowledge
            prompt = f"""
You are a loan officer who has learned approval patterns from experience.

Current Application:
- Applicant: {name}
- Credit Score: {credit_score} ({credit_cat})
- Annual Income: ${income:,} ({income_cat})
{learned_info}{experience_summary}

Based on your learned patterns, decide APPROVE or DENY.
If you've seen this combination before, use that knowledge.
If not, make your best guess based on similar patterns.

Format: DECISION | Reason: [brief explanation]
"""
            
            try:
                response = client.chat.completions.create(
                    model="gpt-3.5-turbo",
                    messages=[
                        {"role": "system", "content": "You learn loan approval patterns from categorical combinations of credit score and income levels."},
                        {"role": "user", "content": prompt}
                    ],
                    max_tokens=100,
                    temperature=0.3
                )
                
                decision_text = response.choices[0].message.content
                parts = decision_text.split("|")
                decision_part = parts[0].strip()
                reasoning = parts[1].split(":", 1)[1].strip() if len(parts) > 1 else "No specific reasoning"
                
                decision = 'APPROVE' if 'APPROVE' in decision_part.upper() else 'DENY'
                
            except Exception as e:
                decision = 'DENY'
                reasoning = f"Error in decision making: {str(e)}"
        
        return {
            'decision': decision,
            'applicant': name,
            'credit_score': credit_score,
            'income': income,
            'credit_category': credit_cat,
            'income_category': income_cat,
            'reasoning': reasoning
        }
    
    def receive_feedback(self, decision_info, loan_was_successful, ground_truth_decision):
        """SOLUTION: Learn from ALL decisions - both right and wrong predictions!"""
        
        credit_score = decision_info['credit_score']
        income = decision_info['income']
        agent_decision = decision_info['decision']
        
        # Store the complete experience
        self.decision_history.append({
            'decision': decision_info,
            'successful_outcome': loan_was_successful,
            'credit_score': credit_score,
            'income': income,
            'ground_truth_decision': ground_truth_decision,
            'agent_decision': agent_decision
        })
        
        self.experience_count += 1
        combination = (credit_score, income)
        
        # SOLUTION: Learn from ALL decisions - both approvals and denials
        if agent_decision == 'APPROVE':
            if loan_was_successful:
                self.learned_patterns['successful_combinations'].append(combination)
                print(f"   📈 LEARNED SUCCESS: Credit {credit_score} + Income ${income:,} → ✅ CORRECT APPROVAL")
            else:
                self.learned_patterns['failed_combinations'].append(combination)
                print(f"   📉 LEARNED FAILURE: Credit {credit_score} + Income ${income:,} → ❌ WRONG APPROVAL")
        else:  # agent_decision == 'DENY'
            if ground_truth_decision == 'DENY':
                print(f"   ✅ CORRECT DENIAL: Credit {credit_score} + Income ${income:,} → Agent was right to deny")
                # Don't add to failed_combinations since this was correct
            else:  # ground_truth_decision == 'APPROVE' 
                print(f"   ❌ MISSED OPPORTUNITY: Credit {credit_score} + Income ${income:,} → Should have APPROVED!")
                # Learn this as a missed success pattern
                self.learned_patterns['successful_combinations'].append(combination)
                print(f"   🔄 SELF-CORRECTING: Adding to success patterns for future decisions")
    
    def receive_feedback(self, decision_result, ground_truth_outcome):
        """Learn from the outcome and update categorical combination knowledge"""
        credit_cat = decision_result['credit_category']
        income_cat = decision_result['income_category']
        agent_decision = decision_result['decision']
        applicant = decision_result['applicant']
        
        # Update decision tracking
        self.learnings['total_decisions'] += 1
        self.decision_history.append({
            'applicant': applicant,
            'agent_decision': agent_decision,
            'ground_truth': ground_truth_outcome,
            'combination': (credit_cat, income_cat)
        })
        
        combo_key = (credit_cat, income_cat)
        if combo_key not in self.learnings['combinations']:
            self.learnings['combinations'][combo_key] = {'successes': 0, 'failures': 0}
        
        # Determine if this was a correct decision
        was_correct = False
        
        if agent_decision == 'APPROVE' and ground_truth_outcome:
            # Correct approval
            self.learnings['combinations'][combo_key]['successes'] += 1
            self.learnings['correct_decisions'] += 1
            was_correct = True
            print(f"   ✅ CORRECT APPROVAL: {credit_cat} + {income_cat} → SUCCESS")
            
        elif agent_decision == 'APPROVE' and not ground_truth_outcome:
            # Wrong approval  
            self.learnings['combinations'][combo_key]['failures'] += 1
            print(f"   ❌ WRONG APPROVAL: {credit_cat} + {income_cat} → FAILURE")
            
        elif agent_decision == 'DENY' and not ground_truth_outcome:
            # Correct denial
            self.learnings['correct_decisions'] += 1
            was_correct = True
            print(f"   ✅ CORRECT DENIAL: {credit_cat} + {income_cat} → Good judgment")
            
        elif agent_decision == 'DENY' and ground_truth_outcome:
            # Missed opportunity - should have approved
            self.learnings['combinations'][combo_key]['successes'] += 1  # Learn this should succeed
            print(f"   🔄 MISSED OPPORTUNITY: {credit_cat} + {income_cat} → Should have APPROVED!")
            print(f"      Learning this combination for future decisions...")
        
        return was_correct
    
    def analyze_learned_patterns(self):
        """Analyze the categorical patterns discovered"""
        if not self.learnings['combinations']:
            return "📚 No patterns learned yet - need more experience"
        
        analysis = f"🧠 CATEGORICAL LEARNING STATE after {self.learnings['total_decisions']} decisions:\n\n"
        
        # Show accuracy
        if self.learnings['total_decisions'] > 0:
            accuracy = self.learnings['correct_decisions'] / self.learnings['total_decisions']
            analysis += f"🎯 Current Accuracy: {accuracy:.1%} ({self.learnings['correct_decisions']}/{self.learnings['total_decisions']})\n\n"
        
        # Analyze each combination
        analysis += "📊 Learned Combination Patterns:\n"
        for combo, data in sorted(self.learnings['combinations'].items()):
            credit_cat, income_cat = combo
            successes = data['successes']
            failures = data['failures']
            total = successes + failures
            
            if total > 0:
                success_rate = successes / total
                status = "✅ APPROVE" if success_rate > 0.5 else "❌ DENY"
                analysis += f"   {credit_cat} credit + {income_cat} income: {status} ({successes}✅ {failures}❌, {success_rate:.0%} success)\n"
        
        # Generate hypothesis about the rule
        analysis += f"\n🔍 DISCOVERED RULE HYPOTHESIS:\n"
        successful_combos = []
        for combo, data in self.learnings['combinations'].items():
            if data['successes'] > data['failures']:
                successful_combos.append(f"{combo[0]} credit + {combo[1]} income")
        
        if successful_combos:
            analysis += f"   APPROVE when: {' OR '.join(successful_combos)}\n"
        else:
            analysis += "   Still learning the approval patterns...\n"
        
        return analysis

print("🎯 CategoricalLearningAgent ready - learns categorical approval patterns!")

## Step 2: Two-Variable Decision Simulation - SOLUTION

In [None]:
def run_categorical_learning_simulation(rounds=20):
    """Run simulation with categorical learning agent"""
    
    print("🎯 CATEGORICAL LEARNING SIMULATION")
    print("="*60)
    print("🎲 Variables: Credit (LOW/MEDIUM/HIGH) + Income (LOW/MEDIUM/HIGH)")
    print("🤐 Hidden Rule: APPROVE if (MEDIUM credit + HIGH income) OR (HIGH credit)")
    print("🧠 Agent Strategy: Random exploration → Pattern-based decisions")
    print()
    
    # Create learning agent
    agent = CategoricalLearningAgent()
    accuracy_over_time = []
    
    for round_num in range(rounds):
        print(f"📋 Round {round_num + 1}/{rounds}")
        print("-" * 30)
        
        # Get application
        applicant = LOAN_APPLICATIONS[round_num % len(LOAN_APPLICATIONS)]
        
        credit_cat = categorize_credit(applicant['credit_score'])
        income_cat = categorize_income(applicant['income'])
        
        print(f"👤 {applicant['name']}")
        print(f"💳 Credit: {applicant['credit_score']} ({credit_cat})")
        print(f"💰 Income: ${applicant['income']:,} ({income_cat})")
        
        # Ground truth outcome
        ground_truth = get_ground_truth_outcome(applicant['credit_score'], applicant['income'])
        print(f"🤐 Ground Truth: {'SUCCESS' if ground_truth else 'FAILURE'}")
        
        # Agent makes decision
        print("🧠 Agent deciding...")
        decision_result = agent.make_loan_decision(applicant)
        
        print(f"🤖 Agent Decision: {decision_result['decision']}")
        print(f"💭 Reasoning: {decision_result['reasoning']}")
        
        # Apply feedback and learning
        print("📚 Learning from outcome...")
        was_correct = agent.receive_feedback(decision_result, ground_truth)
        
        # Track accuracy over time
        if agent.learnings['total_decisions'] > 0:
            current_accuracy = agent.learnings['correct_decisions'] / agent.learnings['total_decisions']
            accuracy_over_time.append(current_accuracy)
        
        print(f"📊 Current Decision #{agent.learnings['total_decisions']}")
        if round_num >= 2:  # Show accuracy after a few rounds
            print(f"🎯 Running Accuracy: {current_accuracy:.1%}")
        
        print()
        
        # Show learning analysis every 5 rounds
        if (round_num + 1) % 5 == 0:
            print("🧠 LEARNING ANALYSIS:")
            print("-" * 40)
            print(agent.analyze_learned_patterns())
            print()
    
    return agent, accuracy_over_time

# Run the simulation
print("🚀 Starting Categorical Learning Simulation...")
trained_agent, accuracy_progression = run_categorical_learning_simulation(20)
        print("⚡ Instant ground truth feedback...")
        loan_outcome = get_ground_truth_outcome(applicant['credit_score'], applicant['income'], decision['decision'])
        
        if decision['decision'] == 'APPROVE':
            if loan_outcome:
                print("✅ SUCCESS: Loan succeeds (correct approval)")
            else:
                print("❌ FAILURE: Loan fails (wrong approval)")
        else:
            if ground_truth_decision == 'DENY':
                print("✅ SAFE: Correct denial (no risk)")
            else:
                print("❌ MISSED: Should have approved (missed opportunity)")
        
        # 3. Agent learns from instant feedback including missed opportunities
        print("🧠 Learning from outcome...")
        agent.receive_feedback(decision, loan_outcome, ground_truth_decision)
        
        # 4. Show learned patterns after a few rounds
        if round_num >= 2:
            patterns = agent.analyze_learned_patterns()
            print(f"📚 Current patterns:\n{patterns}")
        
        print()
    
    print("🎯 LEARNING COMPLETE!")
    print("="*50)
    
    # SOLUTION: Final analysis of discovered patterns
    final_patterns = agent.analyze_learned_patterns()
    print(f"🧠 Final learned patterns:\n{final_patterns}")
    
    # Test the agent's understanding
    print(f"\n🔍 TESTING LEARNED RULE:")
    test_cases = [
        (720, 80000),   # High credit + Good income → Should APPROVE
        (650, 90000),   # Medium credit + High income → Should APPROVE
        (750, 45000),   # High credit + Low income → Should APPROVE (credit ≥ 700)
        (620, 60000),   # Medium credit + Medium income → Should DENY
        (680, 75000),   # Good credit + Good income → Should APPROVE
    ]
    
    for credit, income in test_cases:
        ground_truth = get_ground_truth_outcome(credit, income, 'APPROVE')
        expected = "APPROVE" if ground_truth else "DENY"
        print(f"   Credit {credit} + Income ${income:,} → Ground truth: {expected}")
    
    return agent

print("🎯 Two-variable learning simulation ready!")

## Step 3: Run the Loan Approval Simulation - SOLUTION

In [None]:
# SOLUTION: Run the two-variable learning simulation
agent = run_two_variable_learning(rounds=12)

## Analyze the Two-Variable Learning - SOLUTION

In [None]:
# SOLUTION: Analyze what the agent learned from two-variable combinations
print("🎯 TWO-VARIABLE PATTERN ANALYSIS")
print("="*50)

print("📊 LOAN DECISIONS & GROUND TRUTH OUTCOMES:")
for i, history in enumerate(agent.decision_history):
    decision_info = history['decision']
    outcome = "✅ Success" if history['successful_outcome'] else "❌ Failure"
    credit = decision_info['credit_score']
    income = decision_info['income']
    agent_decision = history['agent_decision']
    ground_truth_decision = history['ground_truth_decision']
    
    # Show match/mismatch between agent and ground truth
    decision_match = "✅" if agent_decision == ground_truth_decision else "❌"
    print(f"   {i+1}. {decision_info['applicant']}: Credit {credit}, Income ${income:,}")
    print(f"      Agent: {agent_decision} | Ground Truth: {ground_truth_decision} {decision_match} | Outcome: {outcome}")

# Show what the agent learned vs the ground truth
print(f"\n🤐 GROUND TRUTH RULE:")
print(f"   SUCCESS = (Credit ≥ 650 AND Income ≥ $70K) OR (Credit ≥ 700)")

print(f"\n🧠 AGENT'S DISCOVERED PATTERNS:")
if agent.learned_patterns['successful_combinations']:
    print(f"   ✅ Learned successful combinations:")
    for credit, income in agent.learned_patterns['successful_combinations']:
        meets_rule = (credit >= 650 and income >= 70000) or (credit >= 700)
        rule_check = "✓ Matches rule" if meets_rule else "✗ Rule violation"
        print(f"      Credit {credit} + Income ${income:,} ({rule_check})")

if agent.learned_patterns['failed_combinations']:
    print(f"   ❌ Learned failed combinations:")
    for credit, income in agent.learned_patterns['failed_combinations']:
        meets_rule = (credit >= 650 and income >= 70000) or (credit >= 700)
        rule_check = "✗ Should have failed" if meets_rule else "✓ Correctly failed"
        print(f"      Credit {credit} + Income ${income:,} ({rule_check})")

# Analyze learning progression
print(f"\n📈 LEARNING PROGRESSION ANALYSIS:")
correct_decisions = sum(1 for h in agent.decision_history if h['agent_decision'] == h['ground_truth_decision'])
total_decisions = len(agent.decision_history)
accuracy = correct_decisions / total_decisions * 100 if total_decisions > 0 else 0

print(f"   Overall accuracy: {correct_decisions}/{total_decisions} ({accuracy:.1f}%)")

# Check early vs late accuracy
if total_decisions >= 6:
    early_decisions = agent.decision_history[:total_decisions//2]
    late_decisions = agent.decision_history[total_decisions//2:]
    
    early_correct = sum(1 for h in early_decisions if h['agent_decision'] == h['ground_truth_decision'])
    late_correct = sum(1 for h in late_decisions if h['agent_decision'] == h['ground_truth_decision'])
    
    early_accuracy = early_correct / len(early_decisions) * 100
    late_accuracy = late_correct / len(late_decisions) * 100
    
    print(f"   Early decisions accuracy: {early_correct}/{len(early_decisions)} ({early_accuracy:.1f}%)")
    print(f"   Later decisions accuracy: {late_correct}/{len(late_decisions)} ({late_accuracy:.1f}%)")
    
    if late_accuracy > early_accuracy:
        print(f"   🎯 IMPROVEMENT: Agent got better over time! (+{late_accuracy - early_accuracy:.1f}%)")
    elif late_accuracy < early_accuracy:
        print(f"   📉 DECLINE: Agent got worse over time (-{early_accuracy - late_accuracy:.1f}%)")
    else:
        print(f"   📊 STABLE: Agent maintained consistent performance")

# Test the agent's learning against new combinations
print(f"\n🔍 TESTING LEARNED RULE ON NEW COMBINATIONS:")
test_cases = [
    (720, 80000, "High credit + Good income"),
    (650, 90000, "Medium credit + High income"),  
    (750, 45000, "High credit + Low income"),
    (620, 60000, "Medium credit + Medium income"),
    (680, 75000, "Good credit + Good income"),
]

for credit, income, description in test_cases:
    # What ground truth says
    ground_truth = get_ground_truth_outcome(credit, income, 'APPROVE')
    expected = "SUCCESS" if ground_truth else "FAILURE"
    
    # What the agent would likely decide (based on learned patterns)
    similar_successes = [c for c, i in agent.learned_patterns['successful_combinations'] 
                        if abs(c - credit) <= 50 and abs(i - income) <= 20000]
    similar_failures = [c for c, i in agent.learned_patterns['failed_combinations'] 
                       if abs(c - credit) <= 50 and abs(i - income) <= 20000]
    
    if similar_successes and not similar_failures:
        agent_likely = "Would likely APPROVE"
    elif similar_failures and not similar_successes:
        agent_likely = "Would likely DENY"
    else:
        agent_likely = "Uncertain"
    
    print(f"   Credit {credit} + Income ${income:,} ({description})")
    print(f"     Ground truth: {expected} | Agent: {agent_likely}")

print(f"\n💡 KEY INSIGHTS:")
print(f"   🎯 Agent learns from BOTH correct and incorrect decisions")
print(f"   🔄 Missed opportunities teach agent about good combinations it denied")
print(f"   📈 Wrong approvals teach agent about bad combinations to avoid")
print(f"   🧠 Self-correction: Agent adjusts future decisions based on all feedback")
print(f"   🎭 Pattern recognition: Agent discovers complex rules through trial and error")

# Show learning progression
success_count = len(agent.learned_patterns['successful_combinations'])
failure_count = len(agent.learned_patterns['failed_combinations'])
print(f"\n📈 LEARNING PROGRESSION:")
print(f"   Total experiences: {success_count + failure_count}")
print(f"   Successful patterns learned: {success_count}")
print(f"   Failed patterns learned: {failure_count}")
print(f"   🚀 Agent now has experience-based knowledge for future decisions!")

# Show specific self-corrections
print(f"\n🔄 SELF-CORRECTION EXAMPLES:")
print(f"   When agent DENIES but should APPROVE → Learns it as success pattern")
print(f"   When agent APPROVES but should DENY → Learns it as failure pattern")
print(f"   Next similar case → Agent uses learned patterns to decide correctly")

## Understanding Check: Two-Variable Financial Pattern Learning

This demo shows **financial AI learning** with exactly TWO variables and a **ground truth rule** - perfect for understanding pattern discovery:

### 🎯 **The Two-Variable Scenario**
- **Input**: Credit Score (300-850) + Annual Income ($20K-$200K)
- **Decision**: APPROVE or DENY loans
- **Ground Truth**: `(Credit ≥ 650 AND Income ≥ $70K) OR (Credit ≥ 700)`
- **Learning**: Discover this hidden rule through experience

### 🤐 **Ground Truth Examples**
- **Alice**: Credit 720 + Income $80K → ✅ SUCCESS (high credit)
- **Bob**: Credit 650 + Income $90K → ✅ SUCCESS (medium credit + high income)  
- **Carol**: Credit 750 + Income $45K → ✅ SUCCESS (high credit overrides low income)
- **David**: Credit 600 + Income $40K → ❌ FAILURE (doesn't meet either condition)

### 🔄 **Learning Loop with Instant Feedback & Self-Correction**
1. **See Combination** → Agent analyzes (Credit: 650, Income: $90K)
2. **Make Decision** → Agent decides DENY (initially cautious)
3. **Compare vs Ground Truth** → Ground Truth says "Should APPROVE!"
4. **Get Outcome Feedback** → "This would have SUCCEEDED!"
5. **Self-Correct** → Agent stores: (650, $90K) = Success pattern
6. **Apply Learning** → Next similar case, agent approves

### 🎯 **Key Self-Correction Moments**
- **Agent denies** Credit 650 + Income $90K → **Ground Truth**: "Should APPROVE!" → **Learns**: This combination works (missed opportunity)
- **Agent approves** Credit 620 + Income $60K → **Ground Truth**: "Should DENY!" → **Learns**: This combination fails (wrong approval)
- **Agent denies** Credit 600 + Income $40K → **Ground Truth**: "Should DENY!" → **Confirms**: Correct decision (no learning needed)

### 🧠 **What Agent Discovers**
- **Specific Combinations**: Agent learns exact (credit, income) pairs that work
- **Pattern Recognition**: Gradually identifies high credit OR (medium credit + high income)
- **Rule Convergence**: Eventually discovers the hidden ground truth rule

### 💡 **Why This Works Perfectly**
- **Deterministic**: Ground truth gives consistent feedback (no randomness)
- **Clear Patterns**: Two variables create observable combinations
- **Instant Feedback**: Agent knows immediately if decision was correct
- **Corrective Learning**: Wrong decisions provide precise correction

### 🎯 **Key Learning Moments**
- **Agent denies** Credit 650 + Income $90K → **Feedback**: "This SHOULD succeed!" → **Learns**: This combination works
- **Agent approves** Credit 620 + Income $60K → **Feedback**: "This SHOULD fail!" → **Learns**: This combination doesn't work

### 🔍 **Observable Pattern Discovery**
You can literally watch the agent learn:
1. **Early**: Makes inconsistent decisions
2. **Middle**: Starts recognizing successful combinations  
3. **Late**: Discovers rule structure (high credit OR medium credit + high income)

**Real Financial Application**: This is exactly how recommendation systems, credit scoring models, and risk assessment AI learn optimal decision boundaries from historical data!

**Educational Value**: Students see precise moment-by-moment learning as agent discovers a complex two-variable rule through experience rather than being programmed with it.

# Lesson 5: Two-Variable Learning Agent - SOLUTION
## How AI Agents Learn Complex Financial Patterns

**Objective**: Build an agent that learns the relationship between TWO variables: credit score + income.

**Complex Financial Rule**: The agent discovers that `(Credit ≥ 650 AND Income ≥ $70K) OR (Credit ≥ 700)` leads to successful loans!

### Core Concept: Two-Variable Pattern Learning
🤖 **Agent sees (credit, income)** → ⚖️ **Decides APPROVE/DENY** → 📊 **Gets ground truth feedback** → 🧠 **Learns specific combinations**

**Key Insight**: Agent learns a complex rule: "Success depends on credit score AND income combinations" and discovers the exact pattern through experience!

## Setup

In [None]:
import openai
import random

# Initialize OpenAI client
client = openai.OpenAI()

print("🔧 Ready to build a self-correcting agent!")

## The Two-Variable Learning Scenario: Credit Score + Income

Our agent sees exactly TWO variables: credit score (300-850) and annual income ($20K-$200K). It must discover the hidden rule that determines loan approval success!

In [None]:
# Two-variable loan applications with CATEGORICAL variables
# Credit Score: LOW (300-599) | MEDIUM (600-749) | HIGH (750-850)
# Income: LOW (<50K) | MEDIUM (50K-79K) | HIGH (80K+)

def categorize_credit(score):
    if score < 600: return "LOW"
    elif score < 750: return "MEDIUM" 
    else: return "HIGH"

def categorize_income(income):
    if income < 50000: return "LOW"
    elif income < 80000: return "MEDIUM"
    else: return "HIGH"

# 20 training examples for comprehensive learning
LOAN_APPLICATIONS = [
    # LOW CREDIT examples (should all be DENIED)
    {'name': 'Alex', 'credit_score': 550, 'income': 45000},   # LOW + LOW → DENY
    {'name': 'Blake', 'credit_score': 580, 'income': 65000},  # LOW + MEDIUM → DENY
    {'name': 'Casey', 'credit_score': 590, 'income': 85000},  # LOW + HIGH → DENY
    
    # MEDIUM CREDIT examples (only approve if HIGH income)
    {'name': 'Dana', 'credit_score': 650, 'income': 40000},   # MEDIUM + LOW → DENY
    {'name': 'Emma', 'credit_score': 680, 'income': 65000},   # MEDIUM + MEDIUM → DENY
    {'name': 'Felix', 'credit_score': 670, 'income': 90000},  # MEDIUM + HIGH → APPROVE
    {'name': 'Grace', 'credit_score': 720, 'income': 85000},  # MEDIUM + HIGH → APPROVE
    {'name': 'Henry', 'credit_score': 640, 'income': 75000},  # MEDIUM + MEDIUM → DENY
    {'name': 'Iris', 'credit_score': 690, 'income': 95000},   # MEDIUM + HIGH → APPROVE
    
    # HIGH CREDIT examples (should all be APPROVED)
    {'name': 'Jack', 'credit_score': 780, 'income': 35000},   # HIGH + LOW → APPROVE
    {'name': 'Kate', 'credit_score': 820, 'income': 60000},   # HIGH + MEDIUM → APPROVE
    {'name': 'Leo', 'credit_score': 760, 'income': 120000},   # HIGH + HIGH → APPROVE
    {'name': 'Maya', 'credit_score': 800, 'income': 45000},   # HIGH + LOW → APPROVE
    {'name': 'Noah', 'credit_score': 770, 'income': 70000},   # HIGH + MEDIUM → APPROVE
    
    # Additional mixed examples for thorough learning
    {'name': 'Olivia', 'credit_score': 610, 'income': 55000}, # MEDIUM + MEDIUM → DENY
    {'

## Optional: Performance Evaluation - Learned vs Non-Learned Agent

Let's test our trained agent against a baseline agent that hasn't learned any patterns to see the improvement from feedback loops.

In [33]:
class BaselineAgent:
    """Non-learning agent for comparison - makes random decisions"""
    def __init__(self):
        self.decisions_made = 0
    
    def make_loan_decision(self, applicant):
        """Make random decisions without learning"""
        credit_score = applicant['credit_score']
        income = applicant['income']
        name = applicant['name']
        
        # Make random decision
        decision = random.choice(['APPROVE', 'DENY'])
        self.decisions_made += 1
        
        return {
            'decision': decision,
            'applicant': name,
            'credit_score': credit_score,
            'income': income,
            'credit_category': categorize_credit(credit_score),
            'income_category': categorize_income(income),
            'reasoning': f"Random decision #{self.decisions_made} (no learning)"
        }

def evaluate_agent_performance(agent, test_applications, agent_name):
    """Evaluate an agent's performance on test cases"""
    correct_decisions = 0
    total_decisions = len(test_applications)
    detailed_results = []
    
    print(f"\n🧪 TESTING {agent_name}")
    print("-" * 40)
    
    for app in test_applications:
        # Agent makes decision
        decision_result = agent.make_loan_decision(app)
        agent_decision = decision_result['decision']
        
        # Check against ground truth
        ground_truth = get_ground_truth_outcome(app['credit_score'], app['income'])
        should_approve = ground_truth
        
        # Determine if decision was correct
        correct = (agent_decision == 'APPROVE' and should_approve) or (agent_decision == 'DENY' and not should_approve)
        
        if correct:
            correct_decisions += 1
            status = "✅"
        else:
            status = "❌"
        
        credit_cat = categorize_credit(app['credit_score'])
        income_cat = categorize_income(app['income'])
        
        detailed_results.append({
            'name': app['name'],
            'combination': f"{credit_cat}+{income_cat}",
            'agent_decision': agent_decision,
            'should_approve': should_approve,
            'correct': correct
        })
        
        print(f"{status} {app['name']}: {credit_cat} credit + {income_cat} income → Agent: {agent_decision}, Truth: {'APPROVE' if should_approve else 'DENY'}")
    
    accuracy = correct_decisions / total_decisions
    print(f"\n📊 {agent_name} Performance:")
    print(f"   Accuracy: {accuracy:.1%} ({correct_decisions}/{total_decisions})")
    
    return accuracy, detailed_results

print("🏗️ Performance evaluation framework ready!")

🏗️ Performance evaluation framework ready!


In [34]:
# Create test cases - new applications not seen during training
test_cases = [
    {'name': 'Test1', 'credit_score': 620, 'income': 45000},   # MEDIUM + LOW → DENY
    {'name': 'Test2', 'credit_score': 680, 'income': 85000},   # MEDIUM + HIGH → APPROVE  
    {'name': 'Test3', 'credit_score': 780, 'income': 40000},   # HIGH + LOW → APPROVE
    {'name': 'Test4', 'credit_score': 550, 'income': 90000},   # LOW + HIGH → DENY
    {'name': 'Test5', 'credit_score': 720, 'income': 65000},   # MEDIUM + MEDIUM → DENY
    {'name': 'Test6', 'credit_score': 800, 'income': 95000},   # HIGH + HIGH → APPROVE
    {'name': 'Test7', 'credit_score': 590, 'income': 55000},   # LOW + MEDIUM → DENY
    {'name': 'Test8', 'credit_score': 660, 'income': 88000},   # MEDIUM + HIGH → APPROVE
]

print("🏆 PERFORMANCE COMPARISON")
print("="*50)
print("Testing both agents on NEW applications they haven't seen...")

# Test the trained agent (assuming it exists from previous simulation)
if 'trained_agent' in locals():
    trained_accuracy, trained_results = evaluate_agent_performance(trained_agent, test_cases, "TRAINED AGENT (with learning)")
    
    # Test baseline agent (average over multiple runs due to randomness)
    baseline_accuracies = []
    for run in range(5):  # Average over 5 runs
        baseline_agent = BaselineAgent()
        accuracy, _ = evaluate_agent_performance(baseline_agent, test_cases, f"BASELINE AGENT (run {run+1})")
        baseline_accuracies.append(accuracy)
    
    avg_baseline_accuracy = sum(baseline_accuracies) / len(baseline_accuracies)
    
    print(f"\n🎯 FINAL PERFORMANCE COMPARISON:")
    print("="*50)
    print(f"✅ TRAINED AGENT (with feedback loops):  {trained_accuracy:.1%}")
    print(f"🎲 BASELINE AGENT (random decisions):     {avg_baseline_accuracy:.1%}")
    print(f"📈 IMPROVEMENT: {trained_accuracy - avg_baseline_accuracy:+.1%}")
    
    if trained_accuracy > avg_baseline_accuracy:
        print(f"\n🏆 SUCCESS! The learning agent outperformed random decisions!")
        print(f"   The feedback loop enabled the agent to discover the categorical pattern:")
        print(f"   APPROVE when: (MEDIUM credit + HIGH income) OR (HIGH credit)")
    else:
        print(f"\n⚠️  The agent needs more training to learn the pattern effectively.")
else:
    print("⚠️ Run the training simulation first to create the trained_agent!")

🏆 PERFORMANCE COMPARISON
Testing both agents on NEW applications they haven't seen...

🧪 TESTING TRAINED AGENT (with learning)
----------------------------------------
✅ Test1: MEDIUM credit + LOW income → Agent: DENY, Truth: DENY
✅ Test2: MEDIUM credit + HIGH income → Agent: APPROVE, Truth: APPROVE
✅ Test3: HIGH credit + LOW income → Agent: APPROVE, Truth: APPROVE
❌ Test4: LOW credit + HIGH income → Agent: APPROVE, Truth: DENY
✅ Test5: MEDIUM credit + MEDIUM income → Agent: DENY, Truth: DENY
✅ Test6: HIGH credit + HIGH income → Agent: APPROVE, Truth: APPROVE
✅ Test7: LOW credit + MEDIUM income → Agent: DENY, Truth: DENY
✅ Test8: MEDIUM credit + HIGH income → Agent: APPROVE, Truth: APPROVE

📊 TRAINED AGENT (with learning) Performance:
   Accuracy: 87.5% (7/8)

🧪 TESTING BASELINE AGENT (run 1)
----------------------------------------
❌ Test1: MEDIUM credit + LOW income → Agent: APPROVE, Truth: DENY
❌ Test2: MEDIUM credit + HIGH income → Agent: DENY, Truth: APPROVE
✅ Test3: HIGH credit +