# EduVerse USA Chatbot — Response Generation

## NLP Pipeline Module 6

This notebook demonstrates fact-grounded response generation with hallucination prevention.

### Key Principles
- Ground responses in retrieved passages
- Use intent and entities for personalization
- Prevent hallucination with guardrails
- Recommend official source verification

---
## 1. Setup

In [3]:
from dataclasses import dataclass
from typing import Dict, List, Optional
import textwrap

print("Setup complete")

Setup complete


---
## 2. Input Structure

In [5]:
@dataclass
class GenerationInput:
    """Inputs for response generation."""
    user_query: str
    intent: str
    confidence: float
    entities: Dict
    passages: List[str]
    context: Optional[str] = None

example = GenerationInput(
    user_query="What GRE score do I need for Stanford CS?",
    intent="test_prep",
    confidence=0.92,
    entities={'universities': ['Stanford'], 'tests': ['GRE']},
    passages=["GRE: Verbal (130-170), Quant (130-170). Target 315+ for MS, 320+ for PhD."],
    context="Universities: Stanford | Programs: CS"
)

print("Example input created")

Example input created


---
## 3. Hallucination Guardrails

In [7]:
GUARDRAILS = [
    "Only state facts from retrieved passages.",
    "For specific deadlines/fees, recommend checking official sources.",
    "If information is missing, ask clarifying questions.",
    "Use hedging language ('typically', 'generally') for uncertainty.",
    "Never invent statistics or specific numbers."
]

print("Guardrails:")
for i, rule in enumerate(GUARDRAILS, 1):
    print(f"  {i}. {rule}")

Guardrails:
  1. Only state facts from retrieved passages.
  2. For specific deadlines/fees, recommend checking official sources.
  3. If information is missing, ask clarifying questions.
  4. Use hedging language ('typically', 'generally') for uncertainty.
  5. Never invent statistics or specific numbers.


---
## 4. Response Templates

In [9]:
TEMPLATES = {
    'admissions': {
        'greeting': "Here's what you need to know about admissions",
        'prompt': "Would you like more details about any requirement?"
    },
    'sop': {
        'greeting': "Let me help with your Statement of Purpose",
        'prompt': "Would you like me to review your SOP draft?"
    },
    'scholarships': {
        'greeting': "Here are funding options to consider",
        'prompt': "Want details about specific scholarships?"
    },
    'test_prep': {
        'greeting': "Here's guidance for test preparation",
        'prompt': "Would you like a study plan?"
    }
}

print(f"Templates defined for {len(TEMPLATES)} intents")

Templates defined for 4 intents


---
## 5. Response Generator

In [11]:
def generate_response(inp: GenerationInput) -> str:
    """Generate grounded response."""
    parts = []
    template = TEMPLATES.get(inp.intent, {})
    
    # Greeting with entities
    greeting = template.get('greeting', 'Here is the information')
    if inp.entities.get('universities'):
        greeting += f" for {', '.join(inp.entities['universities'])}"
    parts.append(greeting + ":")
    parts.append("")
    
    # Retrieved content
    if inp.passages:
        parts.append("**Key Information:**")
        for passage in inp.passages[:2]:
            parts.append(textwrap.fill(passage, width=70))
        parts.append("")
    
    # Intent-specific guidance
    if inp.intent == 'test_prep' and 'GRE' in inp.entities.get('tests', []):
        parts.append("**Recommendations:**")
        parts.append("• For competitive CS programs, aim for GRE 320+")
        parts.append("• Focus on quantitative section for STEM")
        parts.append("")
    
    # Low confidence clarification
    if inp.confidence < 0.7:
        parts.append(f"*{template.get('prompt', 'Can you clarify?')}*")
    else:
        parts.append(template.get('prompt', ''))
    
    # Verification reminder
    parts.append("")
    parts.append("*Always verify official deadlines on university websites.*")
    
    return "\n".join(parts)

---
## 6. Test Response

In [13]:
response = generate_response(example)

print("Generated Response:")
print("=" * 60)
print(response)

Generated Response:
Here's guidance for test preparation for Stanford:

**Key Information:**
GRE: Verbal (130-170), Quant (130-170). Target 315+ for MS, 320+ for
PhD.

**Recommendations:**
• For competitive CS programs, aim for GRE 320+
• Focus on quantitative section for STEM

Would you like a study plan?

*Always verify official deadlines on university websites.*


---
## 7. Multiple Test Cases

In [15]:
test_cases = [
    GenerationInput(
        user_query="How do I write a good SOP?",
        intent="sop",
        confidence=0.95,
        entities={'programs': ['MS in CS']},
        passages=["A strong SOP includes: motivation, academic background, experience, program fit, career goals."]
    ),
    GenerationInput(
        user_query="Scholarships at MIT?",
        intent="scholarships",
        confidence=0.88,
        entities={'universities': ['MIT']},
        passages=["Options: merit scholarships, TA/RA assistantships, fellowships, tuition waivers."]
    ),
    GenerationInput(
        user_query="What do I need to apply?",
        intent="admissions",
        confidence=0.65,  # Low confidence
        entities={},
        passages=["Requirements: transcripts, test scores, SOP, LORs, resume."]
    )
]

for i, test in enumerate(test_cases, 1):
    print(f"\n{'='*60}")
    print(f"Test {i}: {test.user_query}")
    print(f"Intent: {test.intent} ({test.confidence:.0%})")
    print(f"{'='*60}")
    print(generate_response(test))


Test 1: How do I write a good SOP?
Intent: sop (95%)
Let me help with your Statement of Purpose:

**Key Information:**
A strong SOP includes: motivation, academic background, experience,
program fit, career goals.

Would you like me to review your SOP draft?

*Always verify official deadlines on university websites.*

Test 2: Scholarships at MIT?
Intent: scholarships (88%)
Here are funding options to consider for MIT:

**Key Information:**
Options: merit scholarships, TA/RA assistantships, fellowships,
tuition waivers.

Want details about specific scholarships?

*Always verify official deadlines on university websites.*

Test 3: What do I need to apply?
Intent: admissions (65%)
Here's what you need to know about admissions:

**Key Information:**
Requirements: transcripts, test scores, SOP, LORs, resume.

*Would you like more details about any requirement?*

*Always verify official deadlines on university websites.*


---
## 8. Quality Metrics

In [17]:
def evaluate_response(response: str, inp: GenerationInput) -> Dict:
    """Evaluate response quality."""
    metrics = {}
    
    # Grounding check
    metrics['grounded'] = any(
        p[:30].lower() in response.lower() for p in inp.passages
    ) if inp.passages else False
    
    # Entity mention
    entity_mentioned = False
    for vals in inp.entities.values():
        for v in vals:
            if v.lower() in response.lower():
                entity_mentioned = True
    metrics['entity_used'] = entity_mentioned
    
    # Verification reminder
    metrics['has_verification'] = 'verify' in response.lower()
    
    # Length
    metrics['word_count'] = len(response.split())
    
    return metrics

metrics = evaluate_response(response, example)

print("Quality Metrics:")
for key, val in metrics.items():
    status = "✓" if val else "✗" if isinstance(val, bool) else val
    print(f"  {key}: {status}")

Quality Metrics:
  grounded: ✓
  entity_used: ✓
  has_verification: ✓
  word_count: ✓


---
## 9. Summary

### Response Generation Features
- **Grounded generation** using retrieved passages
- **Intent-aware templates** for structure
- **Entity personalization** based on context
- **Hallucination prevention** with verification reminders

### Key Takeaways
1. Template + retrieval = consistent, informative responses
2. Confidence thresholds trigger clarification
3. Quality metrics help evaluate responses
4. Verification reminders prevent misinformation