# BudgetBot: AI Personal Finance Advisor

## Introduction

Welcome to BudgetBot, an AI-powered personal finance advisor that helps users analyze their spending patterns, provides personalized financial recommendations, and answers finance-related questions with reliable sources.

This project is my submission for the 5-Day Gen AI Intensive Course Capstone Project. As an IITM BS Data Science student, I've applied the knowledge gained from the course to create a practical solution that demonstrates multiple Gen AI capabilities.

### Problem Statement

Many individuals struggle with managing their finances effectively. They need help with:
- Understanding their spending patterns
- Getting personalized financial advice
- Learning about financial concepts and best practices
- Creating and maintaining budgets
- Planning for financial goals

BudgetBot addresses these needs by providing an intelligent assistant that can analyze transaction data, offer personalized recommendations, and answer finance-related questions in a conversational manner.

### Gen AI Capabilities Showcased

This project demonstrates the following Gen AI capabilities:

1. **Structured Output/JSON Mode**: For transaction categorization, budget breakdown, and financial health reports
2. **RAG (Retrieval Augmented Generation)**: For accessing financial knowledge and providing evidence-based recommendations
3. **Embeddings**: For semantic understanding of transactions and finding spending patterns
4. **Function Calling**: For financial calculations and data processing



## Setup

First, let's install the necessary libraries for our project.

In [29]:
%pip install -q google-generativeai langgraph langchain-google-genai pandas numpy matplotlib seaborn scikit-learn langchain

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


### Setting up the Gemini API

We'll use the Gemini API for our generative AI capabilities. Let's set up the API key and client.

In [30]:
import os
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from typing import TypedDict, List, Dict, Any, Optional
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

import google.generativeai as genai
from langgraph.graph import StateGraph, END
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import HumanMessage
from IPython.display import Markdown, display

# Set up API key (replace with your actual key)
GOOGLE_API_KEY = "your_google_api_key"
genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel('models/gemini-2.0-flash-lite')
embedding_model = 'models/embedding-001'

# Function to display model responses
def display_response(response):
    if hasattr(response, 'text'):
        display(Markdown(response.text))
    else:
        display(Markdown(response))

print("Setup complete!")

Setup complete!


### Automated retry for API requests

Let's set up an automatic retry mechanism to handle potential API rate limits.

## 1. Create Sample Financial Data


In [31]:
def create_sample_data():
    """Create sample transaction data"""
    np.random.seed(42)
    
    categories = {
        'Groceries': ['Whole Foods', 'Trader Joe\'s', 'Safeway', 'Walmart'],
        'Dining': ['Starbucks', 'Chipotle', 'McDonald\'s', 'Subway'],
        'Transportation': ['Uber', 'Lyft', 'Gas Station', 'Public Transit'],
        'Shopping': ['Amazon', 'Best Buy', 'Apple Store', 'Target'],
        'Entertainment': ['Netflix', 'Spotify', 'Movie Theater', 'Hulu'],
        'Utilities': ['Electric Bill', 'Water Bill', 'Internet Provider', 'Phone Bill'],
        'Housing': ['Rent Payment', 'Mortgage Payment', 'Home Insurance'],
        'Healthcare': ['Pharmacy', 'Doctor Visit', 'Health Insurance', 'Gym'],
        'Income': ['Salary Deposit', 'Freelance Payment', 'Tax Refund']
    }
    
    amount_ranges = {
        'Groceries': (30, 200),
        'Dining': (10, 100),
        'Transportation': (5, 150),
        'Shopping': (20, 500),
        'Entertainment': (10, 100),
        'Utilities': (50, 300),
        'Housing': (800, 2500),
        'Healthcare': (20, 500),
        'Income': (1000, 5000)
    }
    
    # Generate 2 months of transactions
    import datetime
    end_date = datetime.datetime.now()
    start_date = end_date - datetime.timedelta(days=60)
    date_range = pd.date_range(start=start_date, end=end_date, freq='D')
    
    transactions = []
    for date in date_range:
        num_transactions = np.random.randint(1, 4)
        for _ in range(num_transactions):
            category = np.random.choice(list(categories.keys()))
            merchant = np.random.choice(categories[category])
            min_amount, max_amount = amount_ranges[category]
            amount = round(np.random.uniform(min_amount, max_amount), 2)
            if category != 'Income':
                amount = -amount
            transactions.append({
                'Date': date,
                'Description': merchant,
                'Amount': amount,
                'Category': category
            })
    
    return pd.DataFrame(transactions), categories

# Create data
transactions_df, categories = create_sample_data()
print(f"Generated {len(transactions_df)} transactions")
print(transactions_df.head())


Generated 124 transactions
                        Date  Description  Amount        Category
0 2025-08-07 17:59:57.758430       Amazon -108.05        Shopping
1 2025-08-07 17:59:57.758430     Pharmacy -306.49      Healthcare
2 2025-08-07 17:59:57.758430  Gas Station  -13.42  Transportation
3 2025-08-08 17:59:57.758430       Target -359.87        Shopping
4 2025-08-09 17:59:57.758430      Spotify  -74.98   Entertainment


## 2. Capability 1: Structured Output/JSON Mode


In [41]:
def categorize_transaction(description, amount):
    """Categorize transaction using structured output"""
    prompt = f"""
    Categorize this transaction into one of these categories:
    {', '.join(categories.keys())}

    Transaction: {description}
    Amount: ${abs(amount):.2f} {'spent' if amount < 0 else 'received'}

    Respond with JSON:
    {{
        "category": "category_name",
        "confidence": 0.85,
        "reasoning": "explanation"
    }}
    
    IMPORTANT: Replace 0.85 with a realistic confidence score between 0.5 and 1.0 based on how certain you are about the categorization.
    
    Note: Provide a realistic confidence score between 0.5 and 1.0 based on how certain you are about the categorization.
    """
    
    response = model.generate_content(prompt)
    
    try:
        json_start = response.text.find('{')
        json_end = response.text.rfind('}')
        if json_start != -1 and json_end != -1:
            json_str = response.text[json_start:json_end+1]
            return json.loads(json_str)
    except:
        pass
    
    return {
        "category": "Shopping" if amount < 0 else "Income",
        "confidence": 0.5,
        "reasoning": "Default categorization"
    }

# Test structured output
test_transactions = [
    ("Whole Foods", -85.42),
    ("Netflix", -15.99),
    ("Salary Deposit", 3500.00)
]

print("Testing Structured Output/JSON Mode:")
for desc, amount in test_transactions:
    result = categorize_transaction(desc, amount)
    print(f"{desc}: {result['category']} (Confidence: {result['confidence']:.2f})")

print("\nStructured Output/JSON Mode working!")


Testing Structured Output/JSON Mode:
Whole Foods: Groceries (Confidence: 0.95)
Netflix: Entertainment (Confidence: 0.95)
Salary Deposit: Income (Confidence: 0.98)

Structured Output/JSON Mode working!


## 3. Capability 2: Embeddings


In [33]:
def get_embedding(text):
    """Get embedding for text"""
    try:
        result = genai.embed_content(
            model=embedding_model,
            content=text,
            task_type="retrieval_document"
        )
        return result['embedding']
    except:
        return [0] * 768  # Fallback

def find_similar_transactions(query, df, top_n=3):
    """Find similar transactions using embeddings"""
    # Get embeddings for a sample of transactions
    sample_df = df.sample(min(20, len(df))).copy()
    
    embeddings = []
    for _, row in sample_df.iterrows():
        text = f"{row['Description']} - ${abs(row['Amount']):.2f}"
        embedding = get_embedding(text)
        embeddings.append(embedding)
    
    # Get query embedding
    query_embedding = get_embedding(query)
    
    # Calculate similarities
    similarities = cosine_similarity([query_embedding], embeddings)[0]
    
    # Get top similar transactions
    top_indices = np.argsort(similarities)[-top_n:][::-1]
    
    results = []
    for idx in top_indices:
        row = sample_df.iloc[idx]
        results.append({
            'description': row['Description'],
            'amount': row['Amount'],
            'category': row['Category'],
            'similarity': similarities[idx]
        })
    
    return results

# Test embeddings
query = "Coffee shop spending"
similar = find_similar_transactions(query, transactions_df)
print(f"Similar transactions to '{query}':")
for tx in similar:
    print(f"  {tx['description']} - ${abs(tx['amount']):.2f} (Similarity: {tx['similarity']:.3f})")

print("\nEmbeddings working!")


Similar transactions to 'Coffee shop spending':
  Starbucks - $54.05 (Similarity: 0.836)
  Spotify - $43.31 (Similarity: 0.798)
  Movie Theater - $17.68 (Similarity: 0.773)

Embeddings working!


## 4. Capability 3: RAG (Retrieval Augmented Generation)


In [34]:
# Financial knowledge base
financial_knowledge = [
    "The 50/30/20 rule suggests allocating 50% of income to needs, 30% to wants, and 20% to savings.",
    "An emergency fund should cover 3-6 months of essential expenses.",
    "High-interest debt should be prioritized for repayment before investing.",
    "Dollar-cost averaging is investing a fixed amount regularly regardless of market conditions.",
    "A good credit score (above 700) helps qualify for better interest rates.",
    "Diversification reduces risk by spreading money across different asset classes.",
    "Compound interest is interest earned on both principal and previously earned interest.",
    "Inflation erodes purchasing power over time, making investing important for long-term goals.",
    "Automating savings and bill payments ensures consistency and avoids late fees.",
    "The rule of 72 estimates how long it takes for an investment to double: divide 72 by annual return."
]

def get_financial_advice(query, num_results=3):
    """Get financial advice using RAG"""
    # Simple similarity search (in production, use a proper vector database)
    query_embedding = get_embedding(query)
    
    similarities = []
    for knowledge in financial_knowledge:
        knowledge_embedding = get_embedding(knowledge)
        similarity = cosine_similarity([query_embedding], [knowledge_embedding])[0][0]
        similarities.append((knowledge, similarity))
    
    # Get top relevant knowledge
    similarities.sort(key=lambda x: x[1], reverse=True)
    relevant_knowledge = [item[0] for item in similarities[:num_results]]
    
    # Generate response
    prompt = f"""
    You are a financial advisor. Answer this question using the provided knowledge:
    
    Question: {query}
    
    Knowledge:
    {chr(10).join([f"- {k}" for k in relevant_knowledge])}
    
    Provide helpful, specific advice based on the knowledge.
    """
    
    response = model.generate_content(prompt)
    
    return {
        'response': response.text,
        'sources': relevant_knowledge
    }

# Test RAG
query = "How should I budget my monthly income?"
advice = get_financial_advice(query)
print(f"Question: {query}")
print(f"\nAdvice: {advice['response'][:300]}...")
print(f"\nSources: {advice['sources']}")

print("\nRAG working!")


Question: How should I budget my monthly income?

Advice: Okay, here's a breakdown of how to budget your monthly income, incorporating the provided knowledge:

**1. Assess Your Current Situation:**

*   **Calculate Your Net Monthly Income:**  This is your income *after* taxes and any other deductions (like health insurance premiums) are taken out. This is ...

Sources: ['An emergency fund should cover 3-6 months of essential expenses.', 'The 50/30/20 rule suggests allocating 50% of income to needs, 30% to wants, and 20% to savings.', 'High-interest debt should be prioritized for repayment before investing.']

RAG working!


## 5. Capability 4: Function Calling


In [35]:
def calculate_budget_allocation(monthly_income):
    """Calculate 50/30/20 budget allocation"""
    return {
        "needs": monthly_income * 0.5,
        "wants": monthly_income * 0.3,
        "savings": monthly_income * 0.2
    }

def calculate_emergency_fund(monthly_expenses, months=6):
    """Calculate emergency fund amount"""
    return monthly_expenses * months

def calculate_investment_growth(principal, annual_return, years):
    """Calculate investment growth"""
    return principal * (1 + annual_return) ** years

def financial_calculator(query):
    """Route query to appropriate financial function"""
    prompt = f"""
    Based on this query, determine which function to call:
    Query: {query}
    
    Available functions:
    1. calculate_budget_allocation(monthly_income)
    2. calculate_emergency_fund(monthly_expenses, months=6)
    3. calculate_investment_growth(principal, annual_return, years)
    
    Respond with JSON:
    {{
        "function": "function_name",
        "parameters": {{"param1": value1, "param2": value2}},
        "explanation": "why this function"
    }}
    """
    
    response = model.generate_content(prompt)
    
    try:
        json_start = response.text.find('{')
        json_end = response.text.rfind('}')
        if json_start != -1 and json_end != -1:
            json_str = response.text[json_start:json_end+1]
            function_call = json.loads(json_str)
            
            # Execute function
            func_name = function_call['function']
            params = function_call['parameters']
            
            if func_name == 'calculate_budget_allocation':
                result = calculate_budget_allocation(**params)
            elif func_name == 'calculate_emergency_fund':
                result = calculate_emergency_fund(**params)
            elif func_name == 'calculate_investment_growth':
                result = calculate_investment_growth(**params)
            else:
                result = {"error": "Unknown function"}
            
            return {
                'function': func_name,
                'parameters': params,
                'result': result,
                'explanation': function_call['explanation']
            }
    except:
        pass
    
    return {'error': 'Failed to parse function call'}

# Test function calling
query = "How should I budget $5000 monthly income?"
result = financial_calculator(query)
print(f"Query: {query}")
print(f"Function: {result.get('function', 'Error')}")
print(f"Result: {result.get('result', 'Error')}")

print("\nFunction Calling working!")


Query: How should I budget $5000 monthly income?
Function: calculate_budget_allocation
Result: {'needs': 2500.0, 'wants': 1500.0, 'savings': 1000.0}

Function Calling working!


## 6. Capability 5: LangGraph Agents


In [36]:
# Define state for LangGraph
class FinancialState(TypedDict):
    query: str
    conversation_history: List[Dict[str, str]]
    analysis_results: Dict[str, Any]
    final_response: str
    requires_calculation: bool
    requires_rag: bool
    requires_analysis: bool

# Initialize LangChain model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-lite",
    google_api_key=GOOGLE_API_KEY,
    temperature=0.1
)

print("LangGraph setup complete!")


LangGraph setup complete!


In [37]:
# LangGraph Nodes
def classify_query(state: FinancialState) -> FinancialState:
    """Classify what the user needs"""
    query = state["query"]
    
    # Simple classification logic
    state["requires_calculation"] = any(word in query.lower() for word in ["calculate", "budget", "how much", "payment"])
    state["requires_rag"] = any(word in query.lower() for word in ["what", "how", "explain", "rule", "strategy"])
    state["requires_analysis"] = any(word in query.lower() for word in ["analyze", "pattern", "spending", "similar"])
    
    return state

def financial_calculator_node(state: FinancialState) -> FinancialState:
    """Perform financial calculations"""
    if state["requires_calculation"]:
        result = financial_calculator(state["query"])
        state["analysis_results"] = {"calculation": result}
    return state

def rag_node(state: FinancialState) -> FinancialState:
    """Retrieve financial knowledge"""
    if state["requires_rag"]:
        advice = get_financial_advice(state["query"])
        state["analysis_results"] = {**state.get("analysis_results", {}), "rag": advice}
    return state

def analysis_node(state: FinancialState) -> FinancialState:
    """Analyze transaction data"""
    if state["requires_analysis"]:
        similar = find_similar_transactions(state["query"], transactions_df)
        state["analysis_results"] = {**state.get("analysis_results", {}), "analysis": similar}
    return state

def response_generator(state: FinancialState) -> FinancialState:
    """Generate final response"""
    query = state["query"]
    results = state.get("analysis_results", {})
    
    # Build context
    context_parts = []
    
    if "calculation" in results:
        calc = results["calculation"]
        context_parts.append(f"Calculation: {calc.get('function', 'N/A')} - {calc.get('result', 'N/A')}")
    
    if "rag" in results:
        rag = results["rag"]
        context_parts.append(f"Financial Knowledge: {rag.get('response', 'N/A')[:200]}...")
    
    if "analysis" in results:
        analysis = results["analysis"]
        context_parts.append(f"Similar Transactions: {len(analysis)} found")
    
    context = "\n".join(context_parts)
    
    # Generate response
    prompt = f"""
    You are a financial advisor. Answer this question:
    
    Question: {query}
    
    Context:
    {context}
    
    Provide a helpful, comprehensive response.
    """
    
    response = llm.invoke([HumanMessage(content=prompt)])
    state["final_response"] = response.content
    
    return state

print("LangGraph nodes defined!")


LangGraph nodes defined!


In [38]:
# Build LangGraph workflow
def build_financial_advisor():
    workflow = StateGraph(FinancialState)
    
    # Add nodes
    workflow.add_node("classify", classify_query)
    workflow.add_node("calculator", financial_calculator_node)
    workflow.add_node("rag", rag_node)
    workflow.add_node("analysis", analysis_node)
    workflow.add_node("response", response_generator)
    
    # Define workflow
    workflow.set_entry_point("classify")
    workflow.add_edge("classify", "calculator")
    workflow.add_edge("calculator", "rag")
    workflow.add_edge("rag", "analysis")
    workflow.add_edge("analysis", "response")
    workflow.add_edge("response", END)
    
    return workflow.compile()

# Create the advisor
financial_advisor = build_financial_advisor()

def chat_with_advisor(query: str):
    """Chat with the financial advisor"""
    initial_state = {
        "query": query,
        "conversation_history": [],
        "analysis_results": {},
        "final_response": "",
        "requires_calculation": False,
        "requires_rag": False,
        "requires_analysis": False
    }
    
    result = financial_advisor.invoke(initial_state)
    return result

print("LangGraph workflow built!")


LangGraph workflow built!


## 7. Test All Capabilities


In [39]:
# Test the complete LangGraph financial advisor
test_queries = [
    "How should I budget my $5000 monthly income?",
    "What's the 50/30/20 rule?",
    "Analyze my coffee shop spending patterns",
    "Calculate my emergency fund if I spend $3000 monthly"
]

for i, query in enumerate(test_queries, 1):
    print(f"\n{'='*60}")
    print(f"TEST {i}: {query}")
    print(f"{'='*60}")
    
    result = chat_with_advisor(query)
    
    print(f"\nResponse:")
    print(result["final_response"])
    
    print(f"\nCapabilities used:")
    print(f"- Calculation: {result['requires_calculation']}")
    print(f"- RAG: {result['requires_rag']}")
    print(f"- Analysis: {result['requires_analysis']}")

print(f"\n{'='*60}")
print("ALL 5 GEN AI CAPABILITIES WORKING!")
print("Structured Output/JSON Mode")
print("Embeddings")
print("RAG (Retrieval Augmented Generation)")
print("Function Calling")
print("LangGraph Agents")
print(f"{'='*60}")



TEST 1: How should I budget my $5000 monthly income?

Response:
Okay, let's create a budget for your $5000 monthly income, using the 50/30/20 rule as a solid starting point. Here's a breakdown, along with some specific actions and recommendations, based on the calculation you provided:

**1. The 50/30/20 Rule Applied to Your Income:**

*   **Needs (50%): $2500** - These are your essential expenses.
*   **Wants (30%): $1500** - These are your discretionary expenses.
*   **Savings & Debt Repayment (20%): $1000** - This is for your financial future and paying down any debts.

**2. Detailed Breakdown and Recommendations:**

**A. Needs ($2500):**

*   **Housing (Rent/Mortgage):** This is typically your largest expense. Ensure it fits comfortably within your needs allocation.
    *   **Action:** Review your current housing costs. If they exceed a reasonable portion of your needs, consider downsizing or finding a more affordable option.
*   **Utilities:** Electricity, gas, water, internet, a