# Single Agent RAG: Employee Benefits Assistant

This notebook demonstrates building a **single-agent RAG (Retrieval-Augmented Generation) system** that retrieves employee benefits and HR policy information from a vector store and provides accurate answers.

## Key Concepts Demonstrated

- **Document Retrieval**: Semantic search with Chroma vector database
- **LLM-Powered Answers**: GPT-4 generates responses from retrieved context
- **Autonomous Decision-Making**: Agent decides when to re-search for better information
- **Self-Improving Queries**: LLM generates better search terms when needed
- **Confidence Assessment**: Evaluate answer quality and completeness

## Scenario

An HR chatbot that helps employees get instant answers to benefits questions without searching through policy documents. The agent handles questions about health insurance, PTO, 401(k), remote work, professional development, and parental leave.

## Architecture

**Simple & Autonomous RAG Pattern:**
1. User asks a question
2. System retrieves relevant policy documents
3. LLM generates answer from retrieved context
4. LLM evaluates if answer is complete
5. If incomplete, LLM generates better search query and retries
6. Return answer with sources and confidence level

In [1]:
# Import required libraries
import os
import json
from dataclasses import dataclass
from typing import List, Optional, Dict, Any
from datetime import datetime

# OpenAI for LLM and embeddings
from openai import OpenAI

# Chroma for vector storage
import chromadb

# Initialize OpenAI client with Vocareum endpoint
client = OpenAI(
    base_url="https://openai.vocareum.com/v1",
    api_key=os.getenv("OPENAI_API_KEY")
)

print("üîß Environment Setup:")
print(f"   ‚úÖ OpenAI API Key: {'‚úì Configured' if os.getenv('OPENAI_API_KEY') else '‚ùå Missing'}")
print("   üìö Knowledge Base: Employee benefits and HR policies") 
print("   ü§ñ Autonomous RAG with OpenAI + Chroma")

üîß Environment Setup:
   ‚úÖ OpenAI API Key: ‚úì Configured
   üìö Knowledge Base: Employee benefits and HR policies
   ü§ñ Autonomous RAG with OpenAI + Chroma


## Data Models for RAG System

Define the structure for our RAG responses:

In [2]:
@dataclass
class RAGResponse:
    """Response from our autonomous RAG agent"""
    query: str
    answer: str
    sources: List[str]
    retrieved_chunks: int
    needed_retry: bool
    confidence: str  # "high", "medium", "low"

print("üìã Data model defined for RAG responses")

üìã Data model defined for RAG responses


## Knowledge Base Setup

Load HR policy documents into Chroma vector database:

In [3]:
def split_text_into_chunks(text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
    """Split text into overlapping chunks for better retrieval"""
    words = text.split()
    chunks = []
    
    for i in range(0, len(words), chunk_size - overlap):
        chunk_words = words[i:i + chunk_size]
        chunks.append(' '.join(chunk_words))
    
    return chunks

def load_hr_policies_to_chroma():
    """Load HR policy documents into Chroma with embeddings"""
    
    # Initialize Chroma client
    chroma_client = chromadb.PersistentClient(path="./chroma_db")
    
    collection_name = "hr_policies"
    
    # Check if collection already exists
    try:
        existing_collections = [col.name for col in chroma_client.list_collections()]
        if collection_name in existing_collections:
            print(f"üìö Using existing collection: {collection_name}")
            collection = chroma_client.get_collection(collection_name)
            print(f"   üìä Documents in collection: {collection.count()}")
            return collection
    except Exception:
        pass
    
    print(f"üìù Creating new collection: {collection_name}")
    
    # Create new collection
    collection = chroma_client.create_collection(
        name=collection_name,
        metadata={"description": "Employee benefits and HR policies"}
    )
    
    # Load documents from data directory
    data_dir = "./data"
    documents = []
    metadatas = []
    ids = []
    
    if not os.path.exists(data_dir):
        print(f"‚ùå Data directory '{data_dir}' not found!")
        return collection
    
    doc_id = 0
    for filename in os.listdir(data_dir):
        if filename.endswith('.txt'):
            file_path = os.path.join(data_dir, filename)
            with open(file_path, 'r', encoding='utf-8') as f:
                content = f.read()
                
                # Split content into chunks
                chunks = split_text_into_chunks(content, chunk_size=500, overlap=50)
                
                for i, chunk in enumerate(chunks):
                    documents.append(chunk)
                    metadatas.append({
                        'filename': filename,
                        'doc_title': filename.replace('_', ' ').replace('.txt', '').replace('2025', '').title().strip(),
                        'chunk_id': f"{filename}_{i}"
                    })
                    ids.append(f"doc_{doc_id}")
                    doc_id += 1
                
                print(f"   üìÑ Loaded: {filename} ({len(chunks)} chunks)")
    
    if documents:
        # Add documents to collection (Chroma auto-generates embeddings)
        collection.add(
            documents=documents,
            metadatas=metadatas,
            ids=ids
        )
        print(f"üéâ Added {len(documents)} document chunks to Chroma")
    else:
        print("‚ùå No documents found to add")
    
    return collection

# Setup the knowledge base
print("üöÄ Setting up RAG system...")
chroma_collection = load_hr_policies_to_chroma()

if chroma_collection.count() > 0:
    print("\n‚úÖ RAG system ready!")
    print(f"   üìä Collection: {chroma_collection.count()} document chunks")
    print(f"   üîç Ready to answer HR policy questions")
else:
    print("‚ùå Failed to setup RAG system")

üöÄ Setting up RAG system...
üìù Creating new collection: hr_policies
   üìÑ Loaded: parental_leave_policy_2025.txt (3 chunks)
   üìÑ Loaded: health_insurance_benefits_2025.txt (1 chunks)
   üìÑ Loaded: professional_development_policy_2025.txt (3 chunks)
   üìÑ Loaded: paid_time_off_policy_2025.txt (1 chunks)
   üìÑ Loaded: remote_work_policy_2025.txt (2 chunks)
   üìÑ Loaded: retirement_401k_plan_2025.txt (2 chunks)
üéâ Added 12 document chunks to Chroma

‚úÖ RAG system ready!
   üìä Collection: 12 document chunks
   üîç Ready to answer HR policy questions


## Build the Autonomous RAG Agent

This agent orchestrates the entire RAG pipeline with autonomous decision-making:

In [4]:
class EmployeeBenefitsRAGAgent:
    """Autonomous RAG agent for employee benefits questions"""
    
    def __init__(self, chroma_collection):
        self.collection = chroma_collection
        self.query_history = []
    
    def retrieve_documents(self, query: str, n_results: int = 4) -> Dict[str, Any]:
        """Retrieve relevant documents from Chroma"""
        results = self.collection.query(
            query_texts=[query],
            n_results=n_results
        )
        
        return {
            'documents': results['documents'][0] if results['documents'] else [],
            'metadatas': results['metadatas'][0] if results['metadatas'] else [],
            'distances': results['distances'][0] if results['distances'] else [],
            'count': len(results['documents'][0]) if results['documents'] else 0
        }
    
    def generate_answer(self, query: str, context_docs: List[str]) -> str:
        """Generate answer using LLM with retrieved context"""
        
        context = "\n\n".join([f"Document {i+1}:\n{doc}" for i, doc in enumerate(context_docs)])
        
        prompt = f"""You are a helpful HR assistant helping employees understand their benefits and policies. Answer based ONLY on the provided documents.

RETRIEVED POLICY DOCUMENTS:
{context}

EMPLOYEE QUESTION: {query}

INSTRUCTIONS:
- Answer based ONLY on information in the documents above
- If documents contain the answer, provide clear, helpful response
- If documents lack information, say "I don't have enough information in the policy documents to answer that question"
- Include specific details like amounts, time periods, eligibility requirements
- Be conversational but professional
- For complex topics, break down the answer into clear points

ANSWER:"""

        response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.1,
            max_tokens=500
        )
        
        return response.choices[0].message.content
    
    def should_retry(self, query: str, answer: str, retrieved_docs: List[str]) -> bool:
        """Let LLM decide if we need to search again"""
        
        prompt = f"""Evaluate if a RAG system should search again for better information.

QUESTION: {query}
ANSWER: {answer}
DOCUMENTS FOUND: {len(retrieved_docs)}

Respond with ONLY "YES" or "NO".

Say "YES" if:
- Answer says not enough information
- Answer is vague or incomplete
- Question asks for specifics but answer doesn't provide them

Say "NO" if:
- Answer provides specific, helpful information
- Answer appropriately explains what's unavailable

DECISION:"""

        response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0,
            max_tokens=10
        )
        
        return response.choices[0].message.content.strip().upper() == "YES"
    
    def improve_query(self, original_query: str, previous_answer: str) -> str:
        """Generate better search query for retry"""
        
        prompt = f"""Generate a better search query for HR policy documents.

ORIGINAL: {original_query}
PREVIOUS ANSWER: {previous_answer}

The search didn't find good information. Generate a NEW search query that might find better results.

Tips:
- Use different keywords or synonyms
- Try broader or more specific terms
- Focus on key HR/benefits concepts
- Keep under 10 words

NEW QUERY:"""

        response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3,
            max_tokens=50
        )
        
        return response.choices[0].message.content.strip()
    
    def assess_confidence(self, answer: str, retrieved_count: int, best_distance: float) -> str:
        """Assess confidence in the answer"""
        if "don't have enough information" in answer.lower():
            return "low"
        elif retrieved_count >= 3 and best_distance < 0.8:
            return "high"
        elif retrieved_count >= 2:
            return "medium"
        else:
            return "low"
    
    def process_query(self, query: str, show_thinking: bool = False) -> RAGResponse:
        """Main method - autonomous RAG processing with retry logic"""
        
        if show_thinking:
            print(f"ü§î Processing: {query}")
        
        # Step 1: Initial retrieval
        results = self.retrieve_documents(query, n_results=4)
        docs = results['documents']
        
        if show_thinking:
            print(f"üìö Found {results['count']} relevant documents")
        
        # Step 2: Generate initial answer
        answer = self.generate_answer(query, docs)
        
        if show_thinking:
            print(f"üí≠ Generated initial answer")
        
        # Step 3: Let LLM decide if retry needed
        needs_retry = False
        total_chunks = results['count']
        
        if results['count'] > 0:
            needs_retry = self.should_retry(query, answer, docs)
        
        # Step 4: Retry if needed
        if needs_retry:
            if show_thinking:
                print(f"üîÑ Agent decided to search again...")
            
            better_query = self.improve_query(query, answer)
            if show_thinking:
                print(f"üéØ New search: {better_query}")
            
            retry_results = self.retrieve_documents(better_query, n_results=6)
            
            if retry_results['count'] > results['count']:
                results = retry_results
                docs = results['documents']
                answer = self.generate_answer(query, docs)
                total_chunks = retry_results['count']
                
                if show_thinking:
                    print(f"‚úÖ Found {retry_results['count']} documents with improved query")
        
        # Step 5: Extract sources
        sources = []
        if results['metadatas']:
            sources = list(set([meta.get('doc_title', 'Unknown') for meta in results['metadatas']]))
        
        # Step 6: Assess confidence
        best_distance = min(results['distances']) if results['distances'] else 1.0
        confidence = self.assess_confidence(answer, results['count'], best_distance)
        
        # Create response
        response = RAGResponse(
            query=query,
            answer=answer,
            sources=sources,
            retrieved_chunks=total_chunks,
            needed_retry=needs_retry,
            confidence=confidence
        )
        
        self.query_history.append(response)
        return response

# Initialize the agent
if chroma_collection.count() > 0:
    rag_agent = EmployeeBenefitsRAGAgent(chroma_collection)
    print("ü§ñ Employee Benefits RAG Agent initialized!")
    print("   üß† Autonomous decision-making enabled")
    print("   üîÑ Self-improving search queries")
    print("   üìä Confidence assessment active")
else:
    print("‚ùå Cannot initialize agent - no documents loaded")

ü§ñ Employee Benefits RAG Agent initialized!
   üß† Autonomous decision-making enabled
   üîÑ Self-improving search queries
   üìä Confidence assessment active


## Test the RAG System

Let's test with realistic employee benefits questions:

In [5]:
def display_response(response: RAGResponse):
    """Pretty print RAG response"""
    print("=" * 70)
    print(f"‚ùì Question: {response.query}")
    print("=" * 70)
    print(f"\nüí¨ Answer:\n{response.answer}")
    
    if response.sources:
        print(f"\nüìö Sources:")
        for source in response.sources:
            print(f"   ‚Ä¢ {source}")
    
    print(f"\nüìä Metadata:")
    print(f"   ‚Ä¢ Chunks retrieved: {response.retrieved_chunks}")
    print(f"   ‚Ä¢ Confidence: {response.confidence}")
    print(f"   ‚Ä¢ Needed retry: {'Yes' if response.needed_retry else 'No'}")
    print("\n")

# Test queries
test_questions = [
    "How much does the company match for 401k contributions?",
    "What are the PTO accrual rates for someone with 4 years of service?",
    "What's covered under the Premium health insurance plan?",
    "How much paid parental leave do primary caregivers get?"
]

print("üß™ Testing Employee Benefits RAG Agent\n")

for question in test_questions:
    response = rag_agent.process_query(question, show_thinking=True)
    display_response(response)

üß™ Testing Employee Benefits RAG Agent

ü§î Processing: How much does the company match for 401k contributions?
üìö Found 4 relevant documents
üí≠ Generated initial answer
‚ùì Question: How much does the company match for 401k contributions?

üí¨ Answer:
The company matches 401(k) contributions based on the following formula:

- The company will match 100% of the first 3% of your salary that you contribute.
- Then, the company will match 50% of the next 2% of your salary that you contribute.

This means that if you contribute 5% or more of your salary, the maximum company contribution will be 4% of your salary. For example, if you contribute 1% of your salary, the company will match 1%, making a total of 2%. If you contribute 3%, the company will match 3%, making a total of 6%. If you contribute 5% or more, the company will match 4%, so if you contribute 10%, the total contribution will be 14%.

The company match is contributed with each paycheck and is subject to a vesting sched

## Test Autonomous Re-Retrieval

Test with an ambiguous query that should trigger the agent's autonomous retry logic:

In [6]:
# Ambiguous query that may need better search terms
ambiguous_query = "What are the remote work requirements?"

print("üîç Testing Autonomous Decision-Making")
print(f"Query: {ambiguous_query}")
print("-" * 70)

response = rag_agent.process_query(ambiguous_query, show_thinking=True)
display_response(response)

print("üß† Agent Analysis:")
if response.needed_retry:
    print("   ‚úÖ Agent autonomously decided to search again for better results")
else:
    print("   ‚úÖ Agent was satisfied with initial retrieval")

üîç Testing Autonomous Decision-Making
Query: What are the remote work requirements?
----------------------------------------------------------------------
ü§î Processing: What are the remote work requirements?
üìö Found 4 relevant documents
üí≠ Generated initial answer
‚ùì Question: What are the remote work requirements?

üí¨ Answer:
The remote work requirements are outlined in the Remote Work and Flexible Schedule Policy. Here are the key points:

1. **Eligibility**: Your role should not require on-site presence and you should have demonstrated self-management and productivity. You should have consistent performance ratings of "Meets Expectations" or above. You also need a reliable internet connection (minimum 50 Mbps download) and a dedicated workspace free from distractions. 

2. **Work Hours and Availability**: You should be available for meetings and collaboration during core hours, which are 10AM-3PM local time. You're expected to respond to messages within 30 minutes. You 

## Test Out-of-Scope Handling

Test how the agent handles questions outside the knowledge base:

In [7]:
# Question outside our HR policies
out_of_scope_query = "What's the company's revenue for last quarter?"

print("üõ°Ô∏è Testing Out-of-Scope Question Handling")
print(f"Query: {out_of_scope_query}")
print("-" * 70)

response = rag_agent.process_query(out_of_scope_query, show_thinking=True)
display_response(response)

if "don't have enough information" in response.answer.lower():
    print("‚úÖ Agent correctly identified insufficient information")
else:
    print("‚ùì Review agent's handling of out-of-scope questions")

üõ°Ô∏è Testing Out-of-Scope Question Handling
Query: What's the company's revenue for last quarter?
----------------------------------------------------------------------
ü§î Processing: What's the company's revenue for last quarter?
üìö Found 4 relevant documents
üí≠ Generated initial answer
üîÑ Agent decided to search again...
üéØ New search: "Company's HR policies on employee compensation and benefits"
‚úÖ Found 6 documents with improved query
‚ùì Question: What's the company's revenue for last quarter?

üí¨ Answer:
I don't have enough information in the policy documents to answer that question.

üìö Sources:
   ‚Ä¢ Paid Time Off Policy
   ‚Ä¢ Remote Work Policy
   ‚Ä¢ Parental Leave Policy
   ‚Ä¢ Retirement 401K Plan
   ‚Ä¢ Professional Development Policy

üìä Metadata:
   ‚Ä¢ Chunks retrieved: 6
   ‚Ä¢ Confidence: low
   ‚Ä¢ Needed retry: Yes


‚úÖ Agent correctly identified insufficient information


## Query History Analysis

Analyze the agent's performance across all queries:

In [8]:
print("üìä RAG Agent Performance Summary")
print("=" * 70)
print(f"Total queries processed: {len(rag_agent.query_history)}\n")

if len(rag_agent.query_history) > 0:
    # Calculate metrics
    retry_count = sum(1 for r in rag_agent.query_history if r.needed_retry)
    high_confidence = sum(1 for r in rag_agent.query_history if r.confidence == "high")
    total = len(rag_agent.query_history)
    
    print(f"ü§ñ Autonomous Behavior:")
    print(f"   ‚Ä¢ Retry rate: {retry_count}/{total} ({retry_count/total*100:.1f}%)")
    print(f"   ‚Ä¢ High confidence answers: {high_confidence}/{total} ({high_confidence/total*100:.1f}%)")
    
    avg_chunks = sum(r.retrieved_chunks for r in rag_agent.query_history) / total
    print(f"\nüìà Retrieval Metrics:")
    print(f"   ‚Ä¢ Average chunks per query: {avg_chunks:.1f}")
    
    # Show confidence distribution
    confidence_dist = {}
    for r in rag_agent.query_history:
        confidence_dist[r.confidence] = confidence_dist.get(r.confidence, 0) + 1
    
    print(f"\nüéØ Confidence Distribution:")
    for conf, count in sorted(confidence_dist.items()):
        print(f"   ‚Ä¢ {conf.capitalize()}: {count} ({count/total*100:.1f}%)")

print(f"\n‚ú® Key Features Demonstrated:")
print(f"   ‚úÖ Autonomous retry decisions (LLM-driven)")
print(f"   ‚úÖ Self-improving search queries")
print(f"   ‚úÖ Confidence assessment")
print(f"   ‚úÖ Source citation")
print(f"   ‚úÖ Graceful handling of missing information")

üìä RAG Agent Performance Summary
Total queries processed: 6

ü§ñ Autonomous Behavior:
   ‚Ä¢ Retry rate: 1/6 (16.7%)
   ‚Ä¢ High confidence answers: 2/6 (33.3%)

üìà Retrieval Metrics:
   ‚Ä¢ Average chunks per query: 4.3

üéØ Confidence Distribution:
   ‚Ä¢ High: 2 (33.3%)
   ‚Ä¢ Low: 1 (16.7%)
   ‚Ä¢ Medium: 3 (50.0%)

‚ú® Key Features Demonstrated:
   ‚úÖ Autonomous retry decisions (LLM-driven)
   ‚úÖ Self-improving search queries
   ‚úÖ Confidence assessment
   ‚úÖ Source citation
   ‚úÖ Graceful handling of missing information


## Key Takeaways

### ‚úÖ **Core RAG Concepts Demonstrated**

1. **Document Retrieval** - Semantic search with vector embeddings
2. **Context-Aware Generation** - LLM generates answers from retrieved documents only
3. **Autonomous Decision-Making** - Agent decides when to search again
4. **Query Improvement** - LLM generates better search terms automatically
5. **Source Attribution** - Tracks which documents were used
6. **Confidence Assessment** - Evaluates answer quality

### üèóÔ∏è **Architecture Pattern**

**Simple Autonomous RAG:**
- Direct OpenAI + Chroma integration
- No complex frameworks needed
- LLM handles all decision-making
- Self-improving through retry logic
- Minimal code, maximum intelligence

### üöÄ **Production Considerations**

- **Chunking Strategy**: Overlap ensures context isn't lost at boundaries
- **Retrieval Count**: Start with 4, increase to 6 on retry for better coverage
- **Temperature**: Low (0.1) for factual answers, higher (0.3) for query generation
- **Prompt Engineering**: Clear instructions ensure LLM stays grounded in documents
- **Error Handling**: Gracefully admits when information is insufficient

### üí° **Applications**

This pattern extends to:
- **Customer Support**: Product documentation, troubleshooting guides
- **Legal/Compliance**: Policy documents, regulatory requirements
- **Healthcare**: Medical protocols, patient information
- **Education**: Course materials, study guides
- **Finance**: Product information, regulatory filings