# Database AI Agents: Text-to-SQL for Finance Operations

**Objective**: Build an AI agent that converts natural language questions into safe SQL queries for finance operations.

**Key Features**:
- üîç Natural language to SQL conversion
- üõ°Ô∏è Safety guardrails (read-only, whitelisted tables, limits)
- ‚è∞ Time window enforcement for transaction queries
- üîó SQLAlchemy integration with PostgreSQL/SQLite
- üìä Professional financial summaries
- üîÑ Retry logic with error feedback

**Time**: ~25-30 minutes

**Scenario**: Support a Finance Ops team with quick answers about corporate card spend without writing SQL.

## ‚ö†Ô∏è IMPORTANT: Database Setup Required

**Before starting**: You must create the finance database by running:
```bash
python setup_database.py
```

This creates `finance_demo.db` with realistic corporate spending data for the exercises.

## üìö Learning Objectives

In this exercise, you'll learn to:
1. **Design data models** for text-to-SQL systems
2. **Build LLM prompts** with database schema context
3. **Implement safety guardrails** for production SQL generation
4. **Add retry logic** with error feedback for robustness
5. **Create professional summaries** using AI

In [None]:
# Import required libraries
import os
import re
import json
from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import List, Optional, Dict, Any, Tuple
import pandas as pd
from sqlalchemy import create_engine, text, MetaData, Table
from sqlalchemy.exc import SQLAlchemyError
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize OpenAI client
client = OpenAI(
    base_url="https://openai.vocareum.com/v1",
    api_key=os.getenv("OPENAI_API_KEY")
)

print("üîß Environment Setup:")
print(f"   ‚úÖ OpenAI API Key: {'‚úì Configured' if os.getenv('OPENAI_API_KEY') else '‚ùå Missing'}")
print(f"   üîß Database: {'‚úì Will use SQLite for demo' if not os.getenv('DATABASE_URL') else '‚úì PostgreSQL configured'}")

## üìã Task 1: Define Data Models

Create the core data structures for our text-to-SQL system. You'll need:
1. A `QueryResult` class to store SQL generation results
2. A `DatabaseSchema` class to represent our database structure
3. The actual finance schema definition

In [None]:
# Task 1: Data models for our text-to-SQL system

@dataclass
class QueryResult:
    """Represents the result of a text-to-SQL operation"""
    # YOUR CODE HERE
    # üí° HINT: Use @dataclass field annotations to define these attributes:
    # - original_question (str): The user's natural language question
    # - generated_sql (str): The SQL generated by the LLM
    # - executed_sql (str): The final SQL after safety checks
    # - data (pd.DataFrame): The query results
    # - row_count (int): Number of rows returned
    # - summary (str): AI-generated summary of results
    # - time_filter_applied (Optional[str], default=None): Applied time filter
    # - assumptions_made (Optional[List[str]], default=None): List of assumptions
    
    # üìö LEARNING TIP: @dataclass automatically generates __init__, __repr__, etc.
    # Use type hints to make your code self-documenting!
    pass
    
    def __post_init__(self):
        # YOUR CODE HERE
        # üí° HINT: Check if assumptions_made is None, and if so, initialize it as an empty list
        # This ensures we always have a list to work with, even if not provided during creation
        # 
        # üìö LEARNING TIP: __post_init__ runs after the dataclass __init__ method
        # It's perfect for setting up derived or default values
        pass

@dataclass
class DatabaseSchema:
    """Represents our known database schema for validation"""
    # YOUR CODE HERE  
    # üí° HINT: Define these three fields using type annotations:
    # - tables: Dict[str, List[str]]     # Maps table names to their column names
    # - relationships: Dict[str, str]    # Maps foreign keys to primary keys  
    # - time_columns: Dict[str, str]     # Maps table names to their time columns
    #
    # üìö LEARNING TIP: Strong typing helps catch errors early and makes code more readable
    # The schema will be used by the LLM to understand available tables and relationships
    pass

# Define our finance database schema
FINANCE_SCHEMA = DatabaseSchema(
    tables={
        'employees': ['employee_id', 'full_name', 'department', 'cost_center'],
        'cards': ['card_id', 'employee_id', 'last4', 'status'],
        'merchants': ['merchant_id', 'merchant_name', 'category'],
        'transactions': ['txn_id', 'card_id', 'merchant_id', 'txn_time', 'amount_usd', 'currency_code', 'city', 'channel'],
        'departments': ['department', 'cost_center_manager']
    },
    relationships={
        'cards.employee_id': 'employees.employee_id',
        'transactions.card_id': 'cards.card_id',
        'transactions.merchant_id': 'merchants.merchant_id',
        'employees.department': 'departments.department'
    },
    time_columns={
        'transactions': 'txn_time'
    }
)

print("üìã Database Schema Loaded:")
for table, columns in FINANCE_SCHEMA.tables.items():
    print(f"   üìä {table}: {len(columns)} columns")

In [None]:
# Database connection and schema utilities

def get_schema_description(schema: DatabaseSchema) -> str:
    """Get a formatted description of database schema for LLM"""
    desc = "Available Tables and Columns:\n"
    
    for table, columns in schema.tables.items():
        desc += f"\n{table}:\n"
        for col in columns:
            desc += f"  - {col}\n"
    
    desc += "\nKey Relationships:\n"
    for rel, target in schema.relationships.items():
        desc += f"  - {rel} ‚Üí {target}\n"
        
    desc += "\nTime Columns (for filtering):\n"
    for table, time_col in schema.time_columns.items():
        desc += f"  - {table}.{time_col}\n"
            
    return desc

def check_database_exists():
    """Check if the finance database exists and show stats"""
    db_path = "finance_demo.db"
    if not os.path.exists(db_path):
        print(f"‚ùå Database file '{db_path}' not found!")
        print("   Run 'python setup_database.py' first to create the database.")
        return False
    
    # Connect and show stats
    engine = create_engine(f"sqlite:///{db_path}", echo=False)
    with engine.connect() as conn:
        # Get table counts
        tables_info = []
        for table in FINANCE_SCHEMA.tables.keys():
            result = conn.execute(text(f"SELECT COUNT(*) FROM {table}"))
            count = result.fetchone()[0]
            tables_info.append(f"   üìä {table}: {count:,} records")
        
        # Get total transaction volume
        result = conn.execute(text("SELECT SUM(amount_usd), COUNT(*) FROM transactions"))
        total_amount, txn_count = result.fetchone()
        
        print("‚úÖ Database connection successful!")
        print("üìã Database Statistics:")
        for info in tables_info:
            print(info)
        print(f"   üí∞ Total transaction volume: ${total_amount:,.2f}")
        print(f"   üìÖ Data range: Last 120 days of realistic financial data")
    
    return engine

# Initialize database connection
print("üîó Connecting to finance database...")
engine = check_database_exists()

if engine:
    print("ü§ñ Ready to initialize the Finance Text-to-SQL Agent!")
else:
    print("‚ö†Ô∏è  Please run the database setup script first.")

## ü§ñ Task 2: Build the AI Agent Class

Create the main `FinanceTextToSQLAgent` class. You'll implement:
1. The main `process_question()` method
2. SQL generation with retry logic
3. Safety validation
4. Query execution
5. Summary generation

In [None]:
class FinanceTextToSQLAgent:
    """AI agent for converting natural language to safe SQL queries for finance operations"""
    
    def __init__(self, engine, schema: DatabaseSchema):
        self.engine = engine
        self.schema = schema
        self.query_history = []
        
    def process_question(self, question: str, show_sql_answer: bool = False) -> QueryResult:
        """
        Main method to process a natural language question
        
        Args:
            question: Natural language question about finance data
            show_sql_answer: Whether to display SQL queries during processing
            
        Returns:
            QueryResult with SQL, data, and summary
        """
        print(f"üîç Processing: {question}")
        
        # YOUR CODE HERE
        # üí° IMPLEMENTATION GUIDE: Build the main processing pipeline with these steps:
        
        # STEP 1: Generate SQL with retry logic
        # - Call self._generate_sql_with_retry(question, show_sql_answer)
        # - This returns (generated_sql, generation_attempts)
        # - Conditionally print the SQL if show_sql_answer is True
        
        # STEP 2: Apply safety checks  
        # - Call self._apply_safety_checks(generated_sql, question)
        # - This returns (safe_sql, assumptions) 
        # - Conditionally print the safe SQL if show_sql_answer is True
        
        # STEP 3: Execute the query
        # - Call self._execute_query(safe_sql)
        # - This returns (data, row_count)
        
        # STEP 4: Generate summary
        # - Call self._generate_summary(question, safe_sql, data, assumptions)
        # - This returns a summary string
        
        # STEP 5: Create and store the result
        # - Create a QueryResult object with all the collected data
        # - Add it to self.query_history list
        # - Return the result
        
        # üìö LEARNING TIP: This pipeline separates concerns - generation, safety, execution, summarization
        # Each step can be tested and debugged independently
        
        pass
    
    def _generate_sql_with_retry(self, question: str, show_sql_answer: bool = False, max_attempts: int = 3) -> Tuple[str, int]:
        """
        Generate SQL with retry logic and error feedback
        
        Args:
            question: Natural language question
            show_sql_answer: Whether to show SQL generation details
            max_attempts: Maximum retry attempts
            
        Returns:
            Tuple of (final_sql, attempts_used)
        """
        # YOUR CODE HERE
        # üí° IMPLEMENTATION GUIDE: Build robust SQL generation with retry logic
        
        # STEP 1: Set up retry loop
        # - Create a loop from 1 to max_attempts (inclusive)
        # - Initialize last_error = None outside the loop
        # - Use try/except to handle generation errors
        
        # STEP 2: For each attempt:
        # - Call self._generate_sql(question, previous_error=last_error, attempt=attempt)
        # - Call self._validate_sql_syntax(sql) to check for common issues
        # - If validation passes (returns None), return (sql, attempt)
        # - If validation fails, store the error in last_error and continue
        
        # STEP 3: Handle display logic
        # - If show_sql_answer is True, print attempt results
        # - Show success message when SQL generation succeeds on retry
        # - Show error messages and retry notifications
        
        # STEP 4: Fallback handling  
        # - If all attempts fail, return the last generated SQL anyway
        # - Production systems need graceful degradation
        
        # üìö LEARNING TIP: Retry logic with error feedback teaches the LLM from mistakes
        # Each retry includes the previous error to guide better generation
        
        last_error = None
        
        for attempt in range(1, max_attempts + 1):
            try:
                # Generate SQL with optional error feedback
                sql = self._generate_sql(question, previous_error=last_error, attempt=attempt)
                
                # Test the generated SQL with a quick validation
                validation_error = self._validate_sql_syntax(sql)
                
                if validation_error is None:
                    if attempt > 1 and show_sql_answer:
                        print(f"‚úÖ SQL generation successful on attempt {attempt}")
                    return sql, attempt
                else:
                    last_error = validation_error
                    if show_sql_answer:
                        print(f"‚ùå Attempt {attempt} failed: {validation_error}")
                        if attempt < max_attempts:
                            print(f"üîÑ Retry attempt {attempt + 1} with error feedback...")
                    
            except Exception as e:
                last_error = f"Generation error: {str(e)}"
                if show_sql_answer:
                    print(f"‚ùå Attempt {attempt} failed: {last_error}")
        
        # If all attempts failed, return the last generated SQL anyway
        if show_sql_answer:
            print(f"‚ö†Ô∏è  All {max_attempts} attempts failed, using last attempt")
        return sql, max_attempts
    
    def _generate_sql(self, question: str, previous_error: Optional[str] = None, attempt: int = 1) -> str:
        """Generate SQL query from natural language using LLM with optional error feedback"""
        
        schema_info = get_schema_description(self.schema)
        
        # YOUR CODE HERE
        # Build the prompt for SQL generation:
        # 1. Include schema information
        # 2. Add error feedback if this is a retry attempt  
        # 3. Include SQLite-specific rules and forbidden functions
        # 4. Call OpenAI API to generate SQL
        # 5. Clean up the response and return
        
        # Base prompt rules
        base_rules = """Important Rules:
1. Only use SELECT statements (no INSERT, UPDATE, DELETE, DROP, etc.)
2. Only query from the tables listed above
3. Always include a LIMIT clause (max 20 rows)
4. For transaction queries, always include a time filter on txn_time
5. Use proper JOINs to get related data
6. Use meaningful column aliases for readability
7. Order results logically (e.g., by amount DESC for spending queries)
8. **CRITICAL: Use SQLite functions ONLY - NO MySQL/PostgreSQL syntax**

SQLite Date/Time Functions (USE THESE):
- Time filters: txn_time >= datetime('now', '-30 days')
- Extract month: strftime('%m', txn_time) AS month
- Extract year: strftime('%Y', txn_time) AS year
- Extract date: date(txn_time) AS transaction_date
- Month name: strftime('%B', txn_time) AS month_name

FORBIDDEN Functions (DO NOT USE):
- MONTH() ‚ùå Use strftime('%m', txn_time) ‚úÖ
- YEAR() ‚ùå Use strftime('%Y', txn_time) ‚úÖ  
- NOW() ‚ùå Use datetime('now') ‚úÖ
- INTERVAL ‚ùå Use datetime('now', '-X days') ‚úÖ
- DATE_FORMAT() ‚ùå Use strftime() ‚úÖ"""
        
        # Add error feedback for retry attempts
        error_feedback = ""
        if previous_error and attempt > 1:
            error_feedback =  # YOUR CODE HERE
        
        prompt = f"""You are a SQL expert helping finance operations teams. Convert this natural language question into a SELECT SQL query.

Database Schema:
 # YOUR CODE HERE

{error_feedback}{base_rules}

Question: {question}

Return only the SQL query, no explanations or markdown formatting:"""
        
        try:
            # YOUR CODE HERE
            # üí° IMPLEMENTATION GUIDE: Call OpenAI API for SQL generation
            
            # STEP 1: Make the API call
            # - Use client.chat.completions.create()
            # - Model: "gpt-4" 
            # - Messages: [{"role": "system", "content": prompt}, {"role": "user", "content": question}]
            # - Temperature: 0.1 (low for consistent SQL generation)
            # - Max tokens: 500 (enough for complex queries)
            
            # STEP 2: Extract and clean the response
            # - Get the content from response.choices[0].message.content
            # - Strip whitespace with .strip()
            # - Remove markdown formatting using regex:
            #   * sql = re.sub(r'```sql\n?', '', sql)  # Remove ```sql
            #   * sql = re.sub(r'```\n?', '', sql)     # Remove closing ```
            
            # STEP 3: Return the cleaned SQL
            # 
            # üìö LEARNING TIP: Low temperature ensures consistent SQL generation
            # Regex cleanup handles common LLM response formatting issues
            pass
            
        except Exception as e:
            error_msg = f"Error generating SQL (attempt {attempt})" if attempt > 1 else "Error generating SQL"
            print(f"‚ùå {error_msg}: {e}")
            return "SELECT 'Error generating SQL' as error_message LIMIT 1;"
    
    def _validate_sql_syntax(self, sql: str) -> Optional[str]:
        """
        Quick validation of SQL syntax and common issues
        
        Returns:
            None if valid, error message if invalid
        """
        # YOUR CODE HERE
        # üí° IMPLEMENTATION GUIDE: Quick syntax validation for common SQL issues
        
        # STEP 1: Prepare for validation
        # - Convert SQL to uppercase: sql_upper = sql.upper().strip()
        # - This makes pattern matching case-insensitive
        
        # STEP 2: Check for forbidden database functions
        # - Look for 'MONTH(' or 'YEAR(' in sql_upper (MySQL/PostgreSQL syntax)
        # - Return: "Invalid function: Use strftime() instead of MONTH()/YEAR()"
        # - Check for 'NOW()' AND 'INTERVAL' together (another MySQL pattern)
        # - Return: "Invalid syntax: Use datetime('now', '-X days') instead of NOW() - INTERVAL"
        # - Check for 'DATE_FORMAT(' (MySQL function)
        # - Return: "Invalid function: Use strftime() instead of DATE_FORMAT()"
        
        # STEP 3: Validate basic structure
        # - Ensure query starts with 'SELECT': if not sql_upper.startswith('SELECT')
        # - Return: "Query must start with SELECT"
        
        # STEP 4: Check business rules
        # - For transaction queries, ensure LIMIT is present
        # - If 'TRANSACTIONS' in sql_upper and 'LIMIT' not in sql_upper
        # - Return: "Missing LIMIT clause for transaction query"
        
        # STEP 5: Return validation result
        # - Return None if all checks pass (valid SQL)
        # - Return error message string if any check fails
        
        # üìö LEARNING TIP: Fast syntax validation catches 80% of common LLM errors
        # This prevents database errors and guides retry attempts
        
        sql_upper = sql.upper().strip()
        
        # Check for common MySQL/PostgreSQL syntax issues
        if 'MONTH(' in sql_upper or 'YEAR(' in sql_upper:
            return "Invalid function: Use strftime() instead of MONTH()/YEAR()"
        
        if 'NOW()' in sql_upper and 'INTERVAL' in sql_upper:
            return "Invalid syntax: Use datetime('now', '-X days') instead of NOW() - INTERVAL"
        
        if 'DATE_FORMAT(' in sql_upper:
            return "Invalid function: Use strftime() instead of DATE_FORMAT()"
        
        # Check for basic structure
        if not sql_upper.startswith('SELECT'):
            return "Query must start with SELECT"
        
        # Check for required elements in transaction queries
        if 'TRANSACTIONS' in sql_upper and 'LIMIT' not in sql_upper:
            return "Missing LIMIT clause for transaction query"
        
        return None
    
    def _apply_safety_checks(self, sql: str, question: str) -> Tuple[str, List[str]]:
        """Apply safety checks and modifications to the generated SQL"""
        
        # Implement safety checks:
        # 1. Ensure it's a SELECT statement
        # 2. Check for forbidden keywords (INSERT, UPDATE, DELETE, etc.)
        # 3. Ensure LIMIT is present
        # 4. Add time filter for transaction queries if missing
        # 5. Return (safe_sql, assumptions_made)
        
        assumptions = []
        sql_upper = sql.upper().strip()
        
        # 1. Ensure it's a SELECT statement
         # YOUR CODE HERE
        
        # 2. Check for forbidden keywords
        forbidden = ['INSERT', 'UPDATE', 'DELETE', 'DROP', 'CREATE', 'ALTER', 'EXEC']
        # YOUR CODE HERE
        
        # 3. Ensure LIMIT is present
        if 'LIMIT' not in sql_upper:
            sql = sql.rstrip(';') + ' LIMIT 20;'
            assumptions.append("Added LIMIT 20 for performance")
        
        # 4. Add time filter for transaction queries if missing
        if 'transactions' in sql.lower() and 'txn_time' not in sql.lower():
            where_clause = "txn_time >= datetime('now', '-90 days')"
            
            if 'WHERE' in sql_upper:
                sql = sql.replace('WHERE', f'WHERE {where_clause} AND ', 1)
            else:
                # Insert WHERE clause before ORDER BY or LIMIT
                if 'ORDER BY' in sql_upper:
                    sql = sql.replace('ORDER BY', f'WHERE {where_clause} ORDER BY', 1)
                elif 'LIMIT' in sql_upper:
                    sql = sql.replace('LIMIT', f'WHERE {where_clause} LIMIT', 1)
                else:
                    sql = sql.rstrip(';') + f' WHERE {where_clause};'
            
            assumptions.append("Applied default 90-day time filter for transactions")
        
        return sql, assumptions
    
    def _execute_query(self, sql: str) -> Tuple[pd.DataFrame, int]:
        """Execute SQL query and return results as DataFrame"""
        
        try:
            with self.engine.connect() as conn:
                result =  # YOUR CODE HERE
                df =  # YOUR CODE HERE
                row_count = len(df)
                
                print(f"üìä Query executed: {row_count} rows returned")
                return df, row_count
                
        except SQLAlchemyError as e:
            print(f"‚ùå Database error: {e}")
            error_df = pd.DataFrame({'error': [f"Database error: {str(e)}"]})
            return error_df, 0
        except Exception as e:
            print(f"‚ùå Execution error: {e}")
            error_df = pd.DataFrame({'error': [f"Execution error: {str(e)}"]})
            return error_df, 0
    
    def _generate_summary(self, question: str, sql: str, data: pd.DataFrame, assumptions: List[str]) -> str:
        """Generate natural language summary of query results"""
        
        if 'error' in data.columns:
            return f"Query failed: {data['error'].iloc[0]}"
        
        row_count = len(data)
        summary_stats = self._get_data_summary(data)
        
        # YOUR CODE HERE
        # Build prompt for summary generation:
        # 1. Include question, SQL, row count, data summary, assumptions
        # 2. Ask for a professional 2-4 sentence summary
        # 3. Call OpenAI API to generate summary
        # 4. Return the summary text
        
        prompt = f"""You are a financial analyst summarizing query results for a finance operations team.

Original Question: {question}
SQL Executed: {sql}
Rows Returned: {row_count}
Data Summary: {summary_stats}
Assumptions: {', '.join(assumptions) if assumptions else 'None'}

Write a 2-4 sentence professional summary that:
1. Describes what was analyzed
2. Mentions time filters or assumptions made
3. Highlights key insights from the results
4. Uses clear language for finance operations staff

Summary:"""
        
        try:
            # YOUR CODE HERE
            # Call OpenAI API to generate summary
            # Return the summary text
            pass
            
        except Exception as e:
            print(f"‚ùå Error generating summary: {e}")
            assumptions_text = f" (Assumptions: {', '.join(assumptions)})" if assumptions else ""
            return f"Query returned {row_count} rows{assumptions_text}. Review results for insights."
    
    def _get_data_summary(self, data: pd.DataFrame) -> str:
        """Get summary statistics for LLM context"""
        
        if data.empty:
            return "No data returned"
        
        stats = []
        
        # Amount columns
        amount_cols = [col for col in data.columns if 'amount' in col.lower() or 'spend' in col.lower()]
        for col in amount_cols:
            if data[col].dtype in ['float64', 'int64']:
                total = data[col].sum()
                avg = data[col].mean()
                stats.append(f"{col} total: ${total:,.2f}, average: ${avg:,.2f}")
        
        # Categorical columns
        categorical_cols = [col for col in data.columns if data[col].dtype == 'object']
        for col in categorical_cols[:2]:
            unique_count = data[col].nunique()
            stats.append(f"{col}: {unique_count} unique values")
        
        return "; ".join(stats) if stats else "Mixed data types"

# Initialize the agent
agent = FinanceTextToSQLAgent(engine, FINANCE_SCHEMA)
print("ü§ñ Finance Text-to-SQL Agent initialized and ready!")

In [None]:
# Utility function for displaying results

def display_result(result: QueryResult, show_sql: bool = True):
    """Display query result in a formatted, professional way"""
    
    print("=" * 80)
    print("üìä FINANCE DATABASE QUERY RESULT")
    print("=" * 80)
    
    print(f"\nüîç Question:")
    print(f"   {result.original_question}")
    
    if show_sql:
        print(f"\nüìù Executed SQL:")
        print(f"   {result.executed_sql}")
    
    if result.assumptions_made:
        print(f"\n‚ö†Ô∏è Assumptions Made:")
        for assumption in result.assumptions_made:
            print(f"   ‚Ä¢ {assumption}")
    
    print(f"\nüìä Results ({result.row_count} rows):")
    if not result.data.empty and 'error' not in result.data.columns:
        # Format the display nicely
        pd.set_option('display.max_columns', None)
        pd.set_option('display.width', None)
        pd.set_option('display.max_colwidth', 30)
        print(result.data.to_string(index=False, max_rows=20))
    else:
        print("   No data returned or error occurred")
    
    print(f"\nüí° Summary:")
    print(f"   {result.summary}")
    
    print("=" * 80)

print("‚úÖ Display utilities loaded")

## üß™ Task 3: Test Your Implementation

Test your AI agent with various finance questions. Try both simple and complex queries to see how your retry logic and safety features work.

In [None]:
# Test Case 1: Top merchants by spend
print("üß™ Test Case 1: Top merchants by total spend")
result1 = agent.process_question("Show me the top 10 merchants by total spend in the last 30 days", show_sql_answer=True)
display_result(result1, show_sql=False)  # SQL already shown during processing

In [None]:
# Test Case 2: Department spending analysis
print("\nüß™ Test Case 2: Number of employees by department")
result2 = agent.process_question("Show the number of employees by department", show_sql_answer=True)
display_result(result2, show_sql=False)  # Hide SQL for cleaner output

In [None]:
# Test Case 3: High-value transactions
print("\nüß™ Test Case 3: High-value transactions with employee details")
result3 = agent.process_question("All transactions over $1000 in the past 30 days, show employee name, merchant, amount, and card last 4 digits")
display_result(result3)

In [None]:
# Test Case 4: Travel expense analysis
print("\nüß™ Test Case 4: Travel expenses by employee")
result4 = agent.process_question("Total travel expenses by employee in the last 60 days, include employee name and department")
display_result(result4)

# Test Case 5: Retry logic demonstration
print("\nüß™ Test Case 5: Complex query that might trigger retry logic")
result5 = agent.process_question("Show quarterly spending trends by month and department with year-over-year comparison", show_sql_answer=True)
display_result(result5, show_sql=False)

## üõ°Ô∏è Task 4: Test Safety Features

Verify that your safety guardrails work correctly by testing potentially dangerous queries.

In [None]:
# Safety Test 1: Attempt forbidden operations
print("üõ°Ô∏è Safety Test 1: Attempt to DELETE data")
safety_result1 = agent.process_question("Delete all transactions from Alice Johnson")
display_result(safety_result1)

In [None]:
# Safety Test 2: Query without time filter (should add default)
print("\nüõ°Ô∏è Safety Test 2: Missing time filter - should add 90-day default")
safety_result2 = agent.process_question("Show all transactions by employee Maya Patel")
display_result(safety_result2)

## üéØ Exercise Summary

Congratulations! You've built a comprehensive Database AI Agent for finance operations. 

### üéì **What You've Learned**

1. **Data Model Design**: Created structured classes for query results and database schemas
2. **LLM Integration**: Built sophisticated prompts with context and error feedback
3. **Safety Systems**: Implemented multiple layers of validation and guardrails
4. **Retry Logic**: Added robust error handling with iterative improvement
5. **Professional Output**: Generated business-ready summaries and formatted results

### ‚úÖ **Core Features Implemented**

1. **Natural Language Processing**: Converts plain English questions to SQL using GPT-4
2. **Safety Guardrails**: Enforces read-only operations, table whitelisting, and row limits  
3. **Time Window Enforcement**: Automatically adds time constraints for transaction queries
4. **Professional Summaries**: Generates clear explanations suitable for finance teams
5. **Database Integration**: SQLAlchemy support for PostgreSQL/SQLite connectivity
6. **Retry Logic**: Up to 3 attempts with error feedback for robust SQL generation
7. **Configurable Output**: Optional SQL display for cleaner user experience

### üõ°Ô∏è **Security & Safety Measures**

- **Query Validation**: Blocks DML operations (INSERT, UPDATE, DELETE, DROP)
- **Table Whitelisting**: Only allows queries against approved schema tables
- **Automatic Limits**: Adds LIMIT 20 to prevent large result sets
- **Time Constraints**: Requires time filters for transaction queries (90-day default)
- **Error Handling**: Graceful failure with informative error messages

### üí° **Key Learning Outcomes**

You now understand how to build AI systems that:
- Safely bridge natural language and database operations
- Implement robust guardrails for production environments
- Handle edge cases and provide meaningful error messages
- Generate professional summaries for business stakeholders
- Track and audit AI-generated database interactions

This foundation enables building sophisticated financial analysis tools that democratize data access while maintaining security and compliance standards!

### üöÄ **Next Steps**

Consider extending your agent with:
- **Role-Based Access Control**: Different permissions by user role
- **Query Optimization**: Suggest indexes for frequently used patterns
- **Saved Reports**: Allow users to bookmark and schedule common queries
- **Data Visualization**: Generate charts and graphs from query results
- **Advanced Analytics**: Trend analysis and spending pattern detection