# Task 3: RAG Qualitative Evaluation

This notebook runs 10 representative test questions through the RAG pipeline and generates an evaluation table for the final report.

**Prerequisites:**
- FAISS index built (`python src/index_vector_store.py`)
- Ollama running with Mistral model (`ollama pull mistral:7b-instruct`)

In [1]:
import sys
sys.path.insert(0, '..')

from src.rag import RAGPipeline
from datetime import datetime
import json
from pathlib import Path
import pandas as pd

## Test Questions

10 representative questions covering:
- Single product queries
- Issue-specific queries
- Cross-product analysis
- Company response patterns

In [2]:
TEST_QUESTIONS = [
    {"question": "What are the main complaints about credit cards?", "product_filter": None},
    {"question": "Why are customers unhappy with personal loans?", "product_filter": "personal_loan"},
    {"question": "What issues do people have with money transfers?", "product_filter": "money_transfer"},
    {"question": "What are common problems with savings accounts?", "product_filter": "savings_account"},
    {"question": "What billing disputes are customers reporting?", "product_filter": None},
    {"question": "Are there complaints about unauthorized transactions or fraud?", "product_filter": None},
    {"question": "What problems do customers face when trying to close accounts?", "product_filter": None},
    {"question": "What are the most frequent types of complaints across all products?", "product_filter": None},
    {"question": "How do companies typically respond to customer complaints?", "product_filter": None},
    {"question": "What issues are related to fees and interest charges?", "product_filter": None},
]

## Initialize RAG Pipeline

In [3]:
print("Initializing RAG pipeline...")
rag = RAGPipeline()
print("Ready!")

Initializing RAG pipeline...
Ready!


## Run Evaluation

In [4]:
results = []

for i, q in enumerate(TEST_QUESTIONS, 1):
    print(f"\n{'='*70}")
    print(f"Question {i}: {q['question']}")
    if q['product_filter']:
        print(f"Filter: {q['product_filter']}")
    print("-" * 70)
    
    answer, sources = rag.answer(q['question'], product_filter=q['product_filter'])
    
    print(f"\nAnswer:\n{answer}")
    print(f"\nSources ({len(sources)}):")
    for j, src in enumerate(sources, 1):
        print(f"  {j}. [{src['product']}] {src['issue']}")
    
    results.append({
        "id": i,
        "question": q['question'],
        "filter": q['product_filter'],
        "answer": answer,
        "num_sources": len(sources),
        "products": list(set(s['product'] for s in sources)),
        "issues": [s['issue'] for s in sources],
        "source_texts": [s['text'][:200] for s in sources]
    })


Question 1: What are the main complaints about credit cards?
----------------------------------------------------------------------

Answer:
The main complaints about credit cards include issues with credit card applications being denied (Complaint 1), concerns about predatory terms and promotional offers (Complaint 2), disputes related to purchases shown on statements (Complaint 4), and problems with making payments (Complaint 5). Additionally, there are complaints about the resolution process for credit card issues, as mentioned in Complaint 3.

Sources (5):
  1. [credit_card] Getting a credit card
  2. [credit_card] Advertising and marketing, including promotional offers
  3. [credit_card] Other features, terms, or problems
  4. [credit_card] Problem with a purchase shown on your statement
  5. [credit_card] Problem when making payments

Question 2: Why are customers unhappy with personal loans?
Filter: personal_loan
-----------------------------------------------------------------

## Evaluation Summary Table

In [5]:
# Create summary DataFrame
summary_df = pd.DataFrame([
    {
        "#": r['id'],
        "Question": r['question'][:50] + "..." if len(r['question']) > 50 else r['question'],
        "Filter": r['filter'] or "All",
        "Sources": r['num_sources'],
        "Products": ", ".join(r['products'][:2])
    }
    for r in results
])

print("\nEVALUATION SUMMARY")
print("=" * 70)
summary_df


EVALUATION SUMMARY


Unnamed: 0,#,Question,Filter,Sources,Products
0,1,What are the main complaints about credit cards?,All,5,credit_card
1,2,Why are customers unhappy with personal loans?,personal_loan,5,personal_loan
2,3,What issues do people have with money transfers?,money_transfer,5,money_transfer
3,4,What are common problems with savings accounts?,savings_account,5,savings_account
4,5,What billing disputes are customers reporting?,All,5,"credit_card, personal_loan"
5,6,Are there complaints about unauthorized transa...,All,5,"savings_account, money_transfer"
6,7,What problems do customers face when trying to...,All,5,"savings_account, credit_card"
7,8,What are the most frequent types of complaints...,All,5,"savings_account, money_transfer"
8,9,How do companies typically respond to customer...,All,5,"savings_account, credit_card"
9,10,What issues are related to fees and interest c...,All,5,"savings_account, credit_card"


## Save Results

In [6]:
# Save to reports folder
reports_dir = Path('../reports')
reports_dir.mkdir(exist_ok=True)

timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

# Save JSON
json_path = reports_dir / f'evaluation_{timestamp}.json'
with open(json_path, 'w') as f:
    json.dump({
        'timestamp': datetime.now().isoformat(),
        'model': 'mistral:7b-instruct',
        'embedding_model': 'paraphrase-MiniLM-L3-v2',
        'results': results
    }, f, indent=2)
print(f"Saved: {json_path}")

# Save CSV summary
csv_path = reports_dir / f'evaluation_summary_{timestamp}.csv'
summary_df.to_csv(csv_path, index=False)
print(f"Saved: {csv_path}")

Saved: ../reports/evaluation_20251226_115115.json
Saved: ../reports/evaluation_summary_20251226_115115.csv


## Detailed Results for Report

In [7]:
# Generate Markdown table for report
print("\n## Evaluation Results (Markdown)\n")
print("| # | Question | Sources | Products | Quality |")
print("|---|----------|---------|----------|---------|")
for r in results:
    q_short = r['question'][:40] + "..." if len(r['question']) > 40 else r['question']
    products = ", ".join(r['products'][:2])
    # Quality score placeholder - to be filled manually
    print(f"| {r['id']} | {q_short} | {r['num_sources']} | {products} | ⭐⭐⭐ |")


## Evaluation Results (Markdown)

| # | Question | Sources | Products | Quality |
|---|----------|---------|----------|---------|
| 1 | What are the main complaints about credi... | 5 | credit_card | ⭐⭐⭐ |
| 2 | Why are customers unhappy with personal ... | 5 | personal_loan | ⭐⭐⭐ |
| 3 | What issues do people have with money tr... | 5 | money_transfer | ⭐⭐⭐ |
| 4 | What are common problems with savings ac... | 5 | savings_account | ⭐⭐⭐ |
| 5 | What billing disputes are customers repo... | 5 | credit_card, personal_loan | ⭐⭐⭐ |
| 6 | Are there complaints about unauthorized ... | 5 | savings_account, money_transfer | ⭐⭐⭐ |
| 7 | What problems do customers face when try... | 5 | savings_account, credit_card | ⭐⭐⭐ |
| 8 | What are the most frequent types of comp... | 5 | savings_account, money_transfer | ⭐⭐⭐ |
| 9 | How do companies typically respond to cu... | 5 | savings_account, credit_card | ⭐⭐⭐ |
| 10 | What issues are related to fees and inte... | 5 | savings_account, credit_card |

## Sample Q&A for Report

In [8]:
# Show first 3 Q&A pairs in detail for the report
print("\n## Sample Questions & Answers\n")
for r in results[:3]:
    print(f"### Q{r['id']}: {r['question']}\n")
    print(f"**Answer:**\n{r['answer']}\n")
    print(f"**Sources:** {r['num_sources']} chunks from {', '.join(r['products'])}\n")
    print("---\n")


## Sample Questions & Answers

### Q1: What are the main complaints about credit cards?

**Answer:**
The main complaints about credit cards include issues with credit card applications being denied (Complaint 1), concerns about predatory terms and promotional offers (Complaint 2), disputes related to purchases shown on statements (Complaint 4), and problems with making payments (Complaint 5). Additionally, there are complaints about the resolution process for credit card issues, as mentioned in Complaint 3.

**Sources:** 5 chunks from credit_card

---

### Q2: Why are customers unhappy with personal loans?

**Answer:**
Customers are unhappy with CrediTrust Financial's personal loans due to several issues. Some complaints revolve around unclear communication regarding loan approval, taking out the loan or lease (Complaints 1, 2), and misleading information provided during the application process. Another common theme is problems with the payoff process at the end of the loan and a lack

## Observations

### Strengths
- Retrieval finds relevant complaints for each query
- Product filtering works correctly
- LLM synthesizes information from multiple sources

### Limitations
- Some duplicate chunks retrieved (same complaint, different chunks)
- L2 distance doesn't normalize for vector magnitude
- Answers sometimes generic when sources lack specifics

### Future Improvements
- Add reranker for better relevance
- Deduplicate by complaint_id
- Use cosine similarity instead of L2