# Lesson 5: Self-Learning Loan Agent - EXERCISE
## How AI Agents Learn Financial Approval Patterns Through Feedback

**Objective**: Build an agent that learns loan approval patterns using categorical credit and income levels through feedback loops.

**Your Mission**: Help the agent discover that loans are approved when `(MEDIUM credit + HIGH income) OR (HIGH credit)` through experience!

### Core Concept: Self-Learning Through Feedback
ü§ñ **Agent sees (credit level, income level)** ‚Üí ‚öñÔ∏è **Decides APPROVE/DENY** ‚Üí üìä **Gets ground truth feedback** ‚Üí üß† **Learns categorical combinations**

**Key Challenge**: Agent starts with random decisions but gradually learns which categorical combinations lead to success through feedback loops!

## Setup

In [None]:
import openai
import random

# Initialize OpenAI client
client = openai.OpenAI()

print("üîß Ready to build a self-learning agent!")

## The Self-Learning Scenario: Credit Level + Income Level

Our agent analyzes categorical financial data to discover loan approval patterns! It must learn which combinations of credit and income levels lead to successful loans.

In [None]:
# TODO: Define categorical helper functions
# Credit Score: LOW (300-599) | MEDIUM (600-749) | HIGH (750-850)
# Income: LOW (<50K) | MEDIUM (50K-79K) | HIGH (80K+)

def categorize_credit(score):
    # TODO: Implement credit categorization logic
    # HINT: Use if/elif/else to return "LOW", "MEDIUM", or "HIGH"
    pass

def categorize_income(income):
    # TODO: Implement income categorization logic
    # HINT: Use if/elif/else to return "LOW", "MEDIUM", or "HIGH"
    pass

# Training examples for comprehensive learning
LOAN_APPLICATIONS = [
    # LOW CREDIT examples (should all be DENIED)
    {'name': 'Alex', 'credit_score': 550, 'income': 45000},   # LOW + LOW ‚Üí DENY
    {'name': 'Blake', 'credit_score': 580, 'income': 65000},  # LOW + MEDIUM ‚Üí DENY
    {'name': 'Casey', 'credit_score': 590, 'income': 85000},  # LOW + HIGH ‚Üí DENY
    
    # MEDIUM CREDIT examples (only approve if HIGH income)
    {'name': 'Dana', 'credit_score': 650, 'income': 40000},   # MEDIUM + LOW ‚Üí DENY
    {'name': 'Emma', 'credit_score': 680, 'income': 65000},   # MEDIUM + MEDIUM ‚Üí DENY
    {'name': 'Felix', 'credit_score': 670, 'income': 90000},  # MEDIUM + HIGH ‚Üí APPROVE
    {'name': 'Grace', 'credit_score': 720, 'income': 85000},  # MEDIUM + HIGH ‚Üí APPROVE
    {'name': 'Henry', 'credit_score': 640, 'income': 75000},  # MEDIUM + MEDIUM ‚Üí DENY
    {'name': 'Iris', 'credit_score': 690, 'income': 95000},   # MEDIUM + HIGH ‚Üí APPROVE
    
    # HIGH CREDIT examples (should all be APPROVED)
    {'name': 'Jack', 'credit_score': 780, 'income': 35000},   # HIGH + LOW ‚Üí APPROVE
    {'name': 'Kate', 'credit_score': 820, 'income': 60000},   # HIGH + MEDIUM ‚Üí APPROVE
    {'name': 'Leo', 'credit_score': 760, 'income': 120000},   # HIGH + HIGH ‚Üí APPROVE
    {'name': 'Maya', 'credit_score': 800, 'income': 45000},   # HIGH + LOW ‚Üí APPROVE
    {'name': 'Noah', 'credit_score': 770, 'income': 70000},   # HIGH + MEDIUM ‚Üí APPROVE
    
    # Additional mixed examples for thorough learning
    {'name': 'Olivia', 'credit_score': 610, 'income': 55000}, # MEDIUM + MEDIUM ‚Üí DENY
    {'name': 'Paul', 'credit_score': 740, 'income': 40000},   # MEDIUM + LOW ‚Üí DENY
    {'name': 'Quinn', 'credit_score': 750, 'income': 100000}, # HIGH + HIGH ‚Üí APPROVE
    {'name': 'Ruby', 'credit_score': 660, 'income': 88000},   # MEDIUM + HIGH ‚Üí APPROVE
    {'name': 'Sam', 'credit_score': 570, 'income': 75000},    # LOW + MEDIUM ‚Üí DENY
    {'name': 'Tina', 'credit_score': 790, 'income': 55000},   # HIGH + MEDIUM ‚Üí APPROVE
]

# TODO: Define the ground truth function
# GROUND TRUTH RULE (Hidden from agent):
# APPROVE = (credit_score >= 600 AND income >= 80000) OR (credit_score >= 750)
# In categorical terms: APPROVE = (MEDIUM credit AND HIGH income) OR (HIGH credit)

def get_ground_truth_outcome(credit_score, income):
    """The actual rule the agent needs to learn"""
    # TODO: Implement the ground truth rule
    # HINT: Return True if loan should be approved, False otherwise
    pass

print(f"üìä Created {len(LOAN_APPLICATIONS)} training examples")
print("üéØ Ground Truth Rule: APPROVE if (MEDIUM credit + HIGH income) OR (HIGH credit)")
print("ü§ñ Agent will learn this pattern through feedback...")

print("üè¶ Self-Learning System Ready!")
print("Variables: Credit Level + Income Level")
print("\nSample applications:")
for app in LOAN_APPLICATIONS[:4]:
    credit_cat = categorize_credit(app['credit_score'])
    income_cat = categorize_income(app['income'])
    print(f"  {app['name']}: {credit_cat} credit + {income_cat} income")

print("\nü§ê HIDDEN GROUND TRUTH (agent must discover this):")
print("   ‚úÖ SUCCESS: (MEDIUM credit + HIGH income) OR (HIGH credit)")
print("   ‚ùå FAILURE: All other combinations")
print("\nüí° Agent will learn this categorical rule through trial and error!")

## Step 1: Self-Learning Agent Class

**Your Task**: Complete the `SelfLearningLoanAgent` class that learns from feedback and improves its decisions over time.

In [None]:
class SelfLearningLoanAgent:
    def __init__(self):
        # TODO: Initialize learning state
        # HINT: Track successes and failures by categorical combinations
        self.learnings = {
            # TODO: Initialize combinations dictionary
            # HINT: {(credit_cat, income_cat): {'successes': 0, 'failures': 0}}
            'combinations': {},
            'total_decisions': 0,
            'correct_decisions': 0
        }
        self.decision_history = []
        
    def make_loan_decision(self, applicant):
        """Make decision using categorical variables and learned patterns"""
        credit_score = applicant['credit_score']
        income = applicant['income']
        name = applicant['name']
        
        # TODO: Convert to categories using helper functions
        credit_cat = # TODO: Use categorize_credit function
        income_cat = # TODO: Use categorize_income function
        
        # Check learned patterns for this combination
        combo_key = (credit_cat, income_cat)
        learned_info = ""
        
        # TODO: Check if we've seen this combination before
        if combo_key in self.learnings['combinations']:
            combo_data = self.learnings['combinations'][combo_key]
            successes = combo_data['successes']
            failures = combo_data['failures']
            total = successes + failures
            if total > 0:
                success_rate = successes / total
                learned_info = f"\nLearned pattern for {credit_cat} credit + {income_cat} income: {successes} successes, {failures} failures (success rate: {success_rate:.1%})"
        
        # TODO: For first few decisions, make random choices to explore
        if self.learnings['total_decisions'] < 3:
            decision = # TODO: Make random choice between 'APPROVE' and 'DENY'
            reasoning = f"Early exploration phase ({self.learnings['total_decisions'] + 1}/3) - making random decision to learn patterns"
        else:
            # TODO: Use LLM with learned knowledge
            prompt = f"""
You are a loan officer who has learned approval patterns from experience.

Current Application:
- Applicant: {name}
- Credit Score: {credit_score} ({credit_cat})
- Annual Income: ${income:,} ({income_cat})
{learned_info}

Based on your learned patterns, decide APPROVE or DENY.
If you've seen this combination before, use that knowledge.
If not, make your best guess based on similar patterns.

Format: DECISION | Reason: [brief explanation]
"""
            
            try:
                # TODO: Make API call to OpenAI
                response = client.chat.completions.create(
                    model="gpt-3.5-turbo",
                    messages=[
                        {"role": "system", "content": "You learn loan approval patterns from categorical combinations of credit score and income levels."},
                        {"role": "user", "content": prompt}
                    ],
                    max_tokens=100,
                    temperature=0.3
                )
                
                decision_text = response.choices[0].message.content
                parts = decision_text.split("|")
                decision_part = parts[0].strip()
                reasoning = parts[1].split(":", 1)[1].strip() if len(parts) > 1 else "No specific reasoning"
                
                # TODO: Parse decision from response
                decision = # TODO: Extract 'APPROVE' or 'DENY' from decision_part
                
            except Exception as e:
                decision = 'DENY'
                reasoning = f"Error in decision making: {str(e)}"
        
        return {
            'decision': decision,
            'applicant': name,
            'credit_score': credit_score,
            'income': income,
            'credit_category': credit_cat,
            'income_category': income_cat,
            'reasoning': reasoning
        }
    
    def receive_feedback(self, decision_result, ground_truth_outcome):
        """Learn from the outcome and update categorical combination knowledge"""
        # TODO: Extract information from decision result
        credit_cat = # TODO: Get credit category from decision_result
        income_cat = # TODO: Get income category from decision_result
        agent_decision = # TODO: Get agent decision from decision_result
        applicant = # TODO: Get applicant name from decision_result
        
        # TODO: Update decision tracking
        self.learnings['total_decisions'] += 1
        self.decision_history.append({
            'applicant': applicant,
            'agent_decision': agent_decision,
            'ground_truth': ground_truth_outcome,
            'combination': (credit_cat, income_cat)
        })
        
        combo_key = (credit_cat, income_cat)
        if combo_key not in self.learnings['combinations']:
            self.learnings['combinations'][combo_key] = {'successes': 0, 'failures': 0}
        
        # TODO: Determine if this was a correct decision and update learning
        was_correct = False
        
        if agent_decision == 'APPROVE' and ground_truth_outcome:
            # TODO: Handle correct approval
            # HINT: Increment successes and correct_decisions
            pass
            
        elif agent_decision == 'APPROVE' and not ground_truth_outcome:
            # TODO: Handle wrong approval
            # HINT: Increment failures
            pass
            
        elif agent_decision == 'DENY' and not ground_truth_outcome:
            # TODO: Handle correct denial
            # HINT: Increment correct_decisions
            pass
            
        elif agent_decision == 'DENY' and ground_truth_outcome:
            # TODO: Handle missed opportunity - should have approved
            # HINT: This should be learned as a success pattern
            pass
        
        return was_correct
    
    def analyze_learned_patterns(self):
        """Analyze the categorical patterns discovered"""
        if not self.learnings['combinations']:
            return "üìö No patterns learned yet - need more experience"
        
        analysis = f"üß† LEARNING STATE after {self.learnings['total_decisions']} decisions:\n\n"
        
        # Show accuracy
        if self.learnings['total_decisions'] > 0:
            accuracy = self.learnings['correct_decisions'] / self.learnings['total_decisions']
            analysis += f"üéØ Current Accuracy: {accuracy:.1%} ({self.learnings['correct_decisions']}/{self.learnings['total_decisions']})\n\n"
        
        # TODO: Analyze each combination
        analysis += "üìä Learned Combination Patterns:\n"
        for combo, data in sorted(self.learnings['combinations'].items()):
            credit_cat, income_cat = combo
            successes = data['successes']
            failures = data['failures']
            total = successes + failures
            
            if total > 0:
                success_rate = successes / total
                status = "‚úÖ APPROVE" if success_rate > 0.5 else "‚ùå DENY"
                analysis += f"   {credit_cat} credit + {income_cat} income: {status} ({successes}‚úÖ {failures}‚ùå, {success_rate:.0%} success)\n"
        
        return analysis

print("üéØ SelfLearningLoanAgent class ready for implementation!")

## Step 2: Learning Simulation

**Your Task**: Complete the simulation function that runs the agent through multiple loan applications.

In [None]:
def run_self_learning_simulation(rounds=20):
    """Run simulation with self-learning agent"""
    
    print("üéØ SELF-LEARNING SIMULATION")
    print("="*60)
    print("üé≤ Variables: Credit (LOW/MEDIUM/HIGH) + Income (LOW/MEDIUM/HIGH)")
    print("ü§ê Hidden Rule: APPROVE if (MEDIUM credit + HIGH income) OR (HIGH credit)")
    print("üß† Agent Strategy: Random exploration ‚Üí Pattern-based decisions")
    print()
    
    # TODO: Create learning agent
    agent = # TODO: Instantiate SelfLearningLoanAgent
    accuracy_over_time = []
    
    for round_num in range(rounds):
        print(f"üìã Round {round_num + 1}/{rounds}")
        print("-" * 30)
        
        # TODO: Get application
        applicant = LOAN_APPLICATIONS[round_num % len(LOAN_APPLICATIONS)]
        
        credit_cat = categorize_credit(applicant['credit_score'])
        income_cat = categorize_income(applicant['income'])
        
        print(f"üë§ {applicant['name']}")
        print(f"üí≥ Credit: {applicant['credit_score']} ({credit_cat})")
        print(f"üí∞ Income: ${applicant['income']:,} ({income_cat})")
        
        # TODO: Get ground truth outcome
        ground_truth = # TODO: Use get_ground_truth_outcome function
        print(f"ü§ê Ground Truth: {'SUCCESS' if ground_truth else 'FAILURE'}")
        
        # TODO: Agent makes decision
        print("üß† Agent deciding...")
        decision_result = # TODO: Call agent.make_loan_decision
        
        print(f"ü§ñ Agent Decision: {decision_result['decision']}")
        print(f"üí≠ Reasoning: {decision_result['reasoning']}")
        
        # TODO: Apply feedback and learning
        print("üìö Learning from outcome...")
        was_correct = # TODO: Call agent.receive_feedback
        
        # Track accuracy over time
        if agent.learnings['total_decisions'] > 0:
            current_accuracy = agent.learnings['correct_decisions'] / agent.learnings['total_decisions']
            accuracy_over_time.append(current_accuracy)
        
        print(f"üìä Current Decision #{agent.learnings['total_decisions']}")
        if round_num >= 2:  # Show accuracy after a few rounds
            print(f"üéØ Running Accuracy: {current_accuracy:.1%}")
        
        print()
        
        # Show learning analysis every 5 rounds
        if (round_num + 1) % 5 == 0:
            print("üß† LEARNING ANALYSIS:")
            print("-" * 40)
            print(agent.analyze_learned_patterns())
            print()
    
    return agent, accuracy_over_time

print("üöÄ Ready to run self-learning simulation!")

## Step 3: Run the Simulation

**Your Task**: Execute the simulation and observe how the agent learns.

In [None]:
# TODO: Run the simulation
print("üöÄ Starting Self-Learning Simulation...")
# TODO: Call run_self_learning_simulation and store results
trained_agent, accuracy_progression = # TODO: Call simulation function

print("\nüéØ SIMULATION COMPLETE!")
print("="*50)
print(f"‚úÖ Trained agent ready for evaluation")
print(f"üìà Accuracy progression tracked over {len(accuracy_progression)} decisions")

## Step 4: Analyze Learning Results

**Your Task**: Examine what patterns the agent discovered through feedback loops.

In [None]:
# TODO: Show final learning analysis
print("üß† FINAL LEARNING ANALYSIS")
print("="*50)

# TODO: Display learned patterns
final_analysis = # TODO: Call trained_agent.analyze_learned_patterns()
print(final_analysis)

print("\nüéØ TARGET RULE VERIFICATION:")
print("Ground truth: APPROVE if (MEDIUM credit + HIGH income) OR (HIGH credit)")
print("\nüí° Did the agent discover this rule through feedback loops?")

## üéâ Congratulations!

You've successfully built a self-learning loan agent that:

‚úÖ **Learns from Feedback**: Uses ground truth outcomes to improve decisions  
‚úÖ **Discovers Patterns**: Identifies categorical combinations that lead to success  
‚úÖ **Self-Corrects**: Learns from both correct decisions and mistakes  
‚úÖ **Improves Over Time**: Accuracy increases as experience grows  

### Key Concepts Learned:
- **Feedback Loops**: How agents learn from outcomes to improve future decisions
- **Pattern Recognition**: Discovering hidden rules through experience
- **Self-Correction**: Learning from missed opportunities and wrong decisions
- **Categorical Learning**: Working with discrete categories vs continuous variables

### Real-World Applications:
- Credit scoring systems
- Risk assessment algorithms
- Recommendation engines
- Any ML system that learns decision boundaries

**Next Steps**: Try modifying the ground truth rule or adding more categorical variables to see how the agent adapts!