[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/hf-transformer-trove/blob/main/examples/basic1.6/02-reasoning.ipynb)
[![View on GitHub](https://img.shields.io/badge/View_on-GitHub-blue?logo=github)](https://github.com/vuhung16au/hf-transformer-trove/blob/main/examples/basic1.6/02-reasoning.ipynb)

# 02 - Reasoning for Hate Speech Classification: Step-by-Step AI Analysis

## 🎯 Learning Objectives
By the end of this notebook, you will understand:
- **Chain-of-Thought (CoT) Prompting** for complex classification tasks
- **Step-by-step reasoning** in AI systems for hate speech detection
- **Context-aware analysis** of social media content
- **Explainable AI** approaches for content moderation
- **Policy-based reasoning** for transparent decision-making
- **Comparison** between direct classification and reasoning-based approaches

## 📋 Prerequisites
- Basic understanding of transformer models and NLP
- Familiarity with text classification concepts
- Knowledge of prompt engineering basics
- Understanding of content moderation challenges

## 📚 What We'll Cover
1. **Introduction**: Understanding reasoning in AI classification
2. **Chain-of-Thought Prompting**: Core concepts and implementation
3. **Hate Speech Detection**: Traditional vs reasoning approaches
4. **Step-by-Step Analysis**: Implementing reasoning pipelines
5. **Policy-Based Reasoning**: Transparent content moderation
6. **Evaluation & Comparison**: Performance analysis
7. **Production Considerations**: Real-world deployment insights
8. **Summary**: Best practices and key takeaways

## Why Reasoning Matters for Hate Speech Detection

**Traditional hate speech classifiers** often struggle with:
- **Implicit hate speech**: Sarcasm, coded language, cultural references
- **Context dependency**: Same words can be hateful or harmless based on context
- **Evolving language**: New slang, euphemisms, and dog whistles
- **False positives**: Misclassifying legitimate discussions about hate speech

**Reasoning-based approaches** address these by:
- 🧠 **Analyzing intent**: Understanding the purpose behind the message
- 🎯 **Identifying targets**: Recognizing who or what is being targeted
- 📝 **Explaining decisions**: Providing transparent justification
- 📋 **Policy alignment**: Matching decisions to specific platform policies

### The Mathematics of Reasoning

Traditional classification: $P(\text{hate}|\text{text}) = \text{classifier}(\text{text})$

Reasoning-based classification:
$$P(\text{hate}|\text{text}, \text{reasoning}) = \text{LLM}(\text{text} + \text{reasoning\_prompt})$$

Where the reasoning prompt guides the model through:
1. **Target identification**: Who/what is being discussed?
2. **Intent analysis**: What is the speaker trying to achieve?
3. **Policy matching**: Does this violate specific guidelines?
4. **Final classification**: Based on the reasoning chain

## Setup and Installation

In [None]:
# Install required packages (uncomment if needed)
# !pip install transformers torch datasets numpy pandas matplotlib seaborn tqdm

# Import essential libraries
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import time
import warnings
import json
import re
from typing import List, Dict, Optional, Union, Tuple
from collections import Counter
from dataclasses import dataclass

# Hugging Face imports
from transformers import (
    pipeline, 
    AutoTokenizer, 
    AutoModelForCausalLM,
    AutoConfig,
    GenerationConfig
)

warnings.filterwarnings('ignore')

# Configure plotting style
plt.style.use('default')
sns.set_palette("husl")

print("📚 Libraries imported successfully!")

In [None]:
def get_device() -> torch.device:
    """
    Automatically detect and return the best available device.
    
    Priority: CUDA > MPS (Apple Silicon) > CPU
    
    Returns:
        torch.device: The optimal device for current hardware
    """
    if torch.cuda.is_available():
        device = torch.device("cuda")
        print(f"🚀 Using CUDA GPU: {torch.cuda.get_device_name()}")
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
        print("🍎 Using Apple MPS for Apple Silicon optimization")
    else:
        device = torch.device("cpu")
        print("💻 Using CPU - consider GPU for better performance")
    
    return device

# Get optimal device
device = get_device()

print("\n📚 Setup completed successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"Device: {device}")

## Chain-of-Thought Reasoning Implementation

Let's implement a reasoning system that guides AI models through step-by-step analysis for hate speech detection.

In [None]:
@dataclass
class ReasoningStep:
    """Represents a single step in the reasoning chain."""
    step_number: int
    question: str
    analysis: str
    conclusion: str

@dataclass 
class ReasoningResult:
    """Complete reasoning chain result."""
    text: str
    steps: List[ReasoningStep]
    final_classification: str
    confidence: float
    explanation: str

class HateSpeechReasoner:
    """
    A reasoning system for hate speech classification using Chain-of-Thought prompting.
    
    This class demonstrates how to implement step-by-step reasoning for complex 
    classification tasks, providing transparent and explainable AI decisions.
    """
    
    def __init__(self, model_name: str = "microsoft/DialoGPT-medium"):
        """
        Initialize the reasoning system.
        
        Args:
            model_name: HuggingFace model for text generation
        """
        self.model_name = model_name
        self.policy = self._load_hate_speech_policy()
        self.reasoning_template = self._create_reasoning_template()
        
        print(f"🧠 HateSpeechReasoner initialized")
        print(f"📋 Loaded policy with {len(self.policy)} key principles")
    
    def _load_hate_speech_policy(self) -> Dict[str, str]:
        """
        Define hate speech policy for reasoning.
        
        In production, this would be loaded from external policy documents.
        """
        return {
            "definition": "Content that attacks, threatens, or promotes violence against individuals or groups based on protected characteristics",
            "protected_characteristics": [
                "race", "ethnicity", "religion", "gender", "sexual orientation", 
                "disability", "age", "nationality", "political affiliation"
            ],
            "violations": {
                "direct_attack": "Explicit insults, slurs, or derogatory language",
                "dehumanization": "Comparing groups to animals, objects, or diseases",
                "threat": "Encouraging violence or harm against individuals/groups",
                "stereotype": "Harmful generalizations that perpetuate discrimination"
            },
            "exceptions": [
                "Educational discussion about hate speech",
                "Reporting on hate speech incidents",
                "Academic research on discrimination",
                "Self-identification by group members"
            ]
        }
    
    def _create_reasoning_template(self) -> str:
        """
        Create a Chain-of-Thought reasoning template.
        
        This template guides the AI through systematic analysis.
        """
        return """
Analyze the following social media post for hate speech using step-by-step reasoning:

POST: "{text}"

REASONING STEPS:

Step 1: TARGET IDENTIFICATION
Question: Who or what group is being discussed or referenced in this post?
Analysis: [Identify any individuals, groups, or characteristics mentioned]
Conclusion: [State the target(s) clearly]

Step 2: INTENT ANALYSIS  
Question: What is the apparent intent or purpose of this message?
Analysis: [Examine tone, context clues, and linguistic patterns]
Conclusion: [Categorize the intent: informative, critical, hostile, etc.]

Step 3: LANGUAGE EVALUATION
Question: Does the language used contain slurs, derogatory terms, or coded hate speech?
Analysis: [Check for explicit and implicit harmful language]
Conclusion: [Assessment of language harmfulness]

Step 4: POLICY VIOLATION CHECK
Question: Does this content violate hate speech policies?
Analysis: [Compare against policy definitions and violation types]
Conclusion: [Specific policy violation or compliance]

Step 5: FINAL CLASSIFICATION
Based on the above analysis:
Classification: [HATE_SPEECH or NOT_HATE_SPEECH]
Confidence: [High/Medium/Low]
Explanation: [Brief summary of reasoning]
        """.strip()
    
    def analyze_with_reasoning(self, text: str, use_mock: bool = True) -> ReasoningResult:
        """
        Perform reasoning-based hate speech analysis.
        
        Args:
            text: Social media post to analyze
            use_mock: Whether to use mock reasoning (for demonstration)
            
        Returns:
            ReasoningResult with complete analysis chain
        """
        if use_mock:
            return self._mock_reasoning_analysis(text)
        else:
            # In production, this would use a language model
            return self._llm_reasoning_analysis(text)
    
    def _mock_reasoning_analysis(self, text: str) -> ReasoningResult:
        """
        Demonstrate reasoning analysis with educational examples.
        
        This shows how the reasoning process works without requiring
        large language models that may not be available in all environments.
        """
        # Analyze text characteristics for demonstration
        text_lower = text.lower()
        
        # Step 1: Target Identification
        protected_groups = ["muslim", "christian", "jewish", "black", "white", "asian", 
                          "hispanic", "latino", "gay", "lesbian", "trans", "women", "men"]
        
        identified_targets = [group for group in protected_groups if group in text_lower]
        
        step1 = ReasoningStep(
            step_number=1,
            question="Who or what group is being discussed?",
            analysis=f"Scanning for mentions of protected groups: {', '.join(identified_targets) if identified_targets else 'none detected'}",
            conclusion=f"Target(s): {', '.join(identified_targets) if identified_targets else 'No specific protected group targeted'}"
        )
        
        # Step 2: Intent Analysis
        negative_indicators = ["hate", "stupid", "inferior", "disgusting", "should die", "kill", "destroy"]
        positive_indicators = ["love", "support", "respect", "celebrate", "appreciate"]
        
        negative_count = sum(1 for word in negative_indicators if word in text_lower)
        positive_count = sum(1 for word in positive_indicators if word in text_lower)
        
        if negative_count > positive_count:
            intent = "hostile or negative"
        elif positive_count > negative_count:
            intent = "supportive or positive"
        else:
            intent = "neutral or unclear"
        
        step2 = ReasoningStep(
            step_number=2,
            question="What is the apparent intent?",
            analysis=f"Negative indicators: {negative_count}, Positive indicators: {positive_count}",
            conclusion=f"Intent appears to be: {intent}"
        )
        
        # Step 3: Language Evaluation
        slurs = ["faggot", "retard", "nigger", "kike", "chink"]  # Educational examples only
        derogatory = ["scum", "vermin", "animals", "trash", "filth"]
        
        contains_slurs = any(slur in text_lower for slur in slurs)
        contains_derogatory = any(word in text_lower for word in derogatory)
        
        if contains_slurs:
            language_assessment = "Contains explicit slurs"
        elif contains_derogatory:
            language_assessment = "Contains derogatory language"
        else:
            language_assessment = "No explicit harmful language detected"
        
        step3 = ReasoningStep(
            step_number=3,
            question="Does the language contain harmful terms?",
            analysis=f"Checking for slurs and derogatory terms",
            conclusion=language_assessment
        )
        
        # Step 4: Policy Violation Check
        violation_score = 0
        violations = []
        
        if identified_targets and negative_count > 0:
            violation_score += 2
            violations.append("Negative targeting of protected group")
        
        if contains_slurs:
            violation_score += 3
            violations.append("Use of explicit slurs")
        
        if contains_derogatory and identified_targets:
            violation_score += 2
            violations.append("Derogatory language toward protected group")
        
        policy_violation = "Yes" if violation_score >= 2 else "No"
        
        step4 = ReasoningStep(
            step_number=4,
            question="Does this violate hate speech policy?",
            analysis=f"Violation score: {violation_score}/5, Issues: {', '.join(violations) if violations else 'none'}",
            conclusion=f"Policy violation: {policy_violation}"
        )
        
        # Step 5: Final Classification
        is_hate_speech = violation_score >= 2
        classification = "HATE_SPEECH" if is_hate_speech else "NOT_HATE_SPEECH"
        
        if violation_score >= 4:
            confidence = 0.9
        elif violation_score >= 2:
            confidence = 0.7
        else:
            confidence = 0.8
        
        explanation = f"Content classified as {classification} based on {', '.join(violations) if violations else 'absence of policy violations'}"
        
        return ReasoningResult(
            text=text,
            steps=[step1, step2, step3, step4],
            final_classification=classification,
            confidence=confidence,
            explanation=explanation
        )

# Initialize the reasoning system
reasoner = HateSpeechReasoner()
print("✅ Reasoning system ready for analysis!")

## Demonstrating Reasoning with Social Media Examples

Let's analyze various types of social media posts to see how reasoning helps with accurate classification.

In [None]:
# Create diverse test cases representing different scenarios
test_cases = [
    {
        "text": "I love spending time with my diverse group of friends!",
        "category": "Positive/Supportive",
        "expected": "NOT_HATE_SPEECH"
    },
    {
        "text": "These immigrants are destroying our country and culture.",
        "category": "Implicit Hate",
        "expected": "HATE_SPEECH"
    },
    {
        "text": "The statistics show unemployment rates vary across different demographics.",
        "category": "Factual/Neutral",
        "expected": "NOT_HATE_SPEECH"
    },
    {
        "text": "Muslims are all terrorists and should be banned from our country.",
        "category": "Explicit Hate",
        "expected": "HATE_SPEECH"
    },
    {
        "text": "I'm concerned about the rise in hate speech incidents on campus.",
        "category": "Discussion About Hate Speech",
        "expected": "NOT_HATE_SPEECH"
    },
    {
        "text": "Women are just naturally worse at math and science than men.",
        "category": "Stereotyping",
        "expected": "HATE_SPEECH"
    }
]

print(f"📝 Prepared {len(test_cases)} diverse test cases for reasoning analysis")
print("\n📋 Test Case Categories:")
for i, case in enumerate(test_cases, 1):
    print(f"  {i}. {case['category']}: '{case['text'][:50]}...'")

In [None]:
def display_reasoning_analysis(result: ReasoningResult, case_num: int):
    """
    Display the complete reasoning chain in a clear, educational format.
    """
    print(f"\n{'='*80}")
    print(f"🔍 CASE {case_num}: REASONING ANALYSIS")
    print(f"{'='*80}")
    
    print(f"📝 **Original Text**: '{result.text}'")
    print(f"\n🧠 **STEP-BY-STEP REASONING:**\n")
    
    for step in result.steps:
        print(f"**Step {step.step_number}**: {step.question}")
        print(f"   📊 Analysis: {step.analysis}")
        print(f"   💡 Conclusion: {step.conclusion}\n")
    
    # Final classification with visual indicators
    if result.final_classification == "HATE_SPEECH":
        emoji = "🚨"
        color_desc = "RED FLAG"
    else:
        emoji = "✅"
        color_desc = "SAFE"
    
    print(f"🎯 **FINAL CLASSIFICATION**: {emoji} {result.final_classification} ({color_desc})")
    print(f"🎯 **Confidence Level**: {result.confidence:.1%}")
    print(f"📝 **Explanation**: {result.explanation}")

# Analyze each test case with detailed reasoning
results = []

for i, case in enumerate(test_cases, 1):
    print(f"\n🔄 Analyzing Case {i}: {case['category']}...")
    
    # Perform reasoning analysis
    result = reasoner.analyze_with_reasoning(case['text'])
    results.append({
        'case': case,
        'result': result,
        'correct': result.final_classification == case['expected']
    })
    
    # Display the reasoning chain
    display_reasoning_analysis(result, i)
    
    # Show accuracy check
    accuracy_emoji = "✅" if result.final_classification == case['expected'] else "❌"
    print(f"\n{accuracy_emoji} **Expected**: {case['expected']} | **Got**: {result.final_classification}")

print(f"\n\n🎯 **ANALYSIS COMPLETE**: Processed {len(test_cases)} cases with reasoning chains")

## Performance Analysis: Reasoning vs Traditional Classification

Let's compare the performance and explainability of reasoning-based vs traditional approaches.

In [None]:
# Performance analysis and comparison
def analyze_performance(results: List[Dict]):
    """
    Analyze the performance of the reasoning-based approach.
    """
    total_cases = len(results)
    correct_predictions = sum(1 for r in results if r['correct'])
    accuracy = correct_predictions / total_cases
    
    # Categorize results by type
    categories = {}
    for r in results:
        category = r['case']['category']
        if category not in categories:
            categories[category] = {'correct': 0, 'total': 0}
        categories[category]['total'] += 1
        if r['correct']:
            categories[category]['correct'] += 1
    
    print("📊 **PERFORMANCE ANALYSIS**")
    print(f"{'='*50}")
    print(f"🎯 **Overall Accuracy**: {accuracy:.1%} ({correct_predictions}/{total_cases})")
    print(f"\n📋 **Performance by Category**:")
    
    for category, stats in categories.items():
        cat_accuracy = stats['correct'] / stats['total']
        print(f"   • {category}: {cat_accuracy:.1%} ({stats['correct']}/{stats['total']})")
    
    return accuracy, categories

accuracy, category_performance = analyze_performance(results)

In [None]:
# Visualize reasoning benefits
def create_comparison_visualization():
    """
    Create visualizations showing the benefits of reasoning approaches.
    """
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
    
    # 1. Accuracy Comparison (Simulated)
    methods = ['Traditional\nClassifier', 'Reasoning-Based\nClassifier']
    accuracies = [0.72, accuracy]  # Simulated traditional accuracy
    colors = ['lightcoral', 'lightgreen']
    
    bars1 = ax1.bar(methods, accuracies, color=colors, alpha=0.8)
    ax1.set_ylabel('Accuracy')
    ax1.set_title('🎯 Classification Accuracy Comparison')
    ax1.set_ylim(0, 1)
    
    # Add value labels on bars
    for bar, acc in zip(bars1, accuracies):
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height + 0.01,
                f'{acc:.1%}', ha='center', va='bottom', fontweight='bold')
    
    # 2. Confidence Distribution
    confidences = [r['result'].confidence for r in results]
    ax2.hist(confidences, bins=5, color='skyblue', alpha=0.7, edgecolor='black')
    ax2.set_xlabel('Confidence Level')
    ax2.set_ylabel('Number of Cases')
    ax2.set_title('📊 Confidence Distribution')
    ax2.axvline(np.mean(confidences), color='red', linestyle='--', 
                label=f'Mean: {np.mean(confidences):.1%}')
    ax2.legend()
    
    # 3. Category Performance
    categories = list(category_performance.keys())
    cat_accuracies = [category_performance[cat]['correct']/category_performance[cat]['total'] 
                     for cat in categories]
    
    bars3 = ax3.barh(categories, cat_accuracies, color='lightblue', alpha=0.8)
    ax3.set_xlabel('Accuracy')
    ax3.set_title('📋 Performance by Content Category')
    ax3.set_xlim(0, 1.1)
    
    # Add value labels
    for i, (bar, acc) in enumerate(zip(bars3, cat_accuracies)):
        ax3.text(acc + 0.02, bar.get_y() + bar.get_height()/2,
                f'{acc:.1%}', va='center', fontweight='bold')
    
    # 4. Benefits Radar Chart (Simulated)
    attributes = ['Accuracy', 'Explainability', 'Context\nAwareness', 
                 'Policy\nAlignment', 'Bias\nReduction']
    traditional_scores = [0.72, 0.2, 0.4, 0.3, 0.5]
    reasoning_scores = [accuracy, 0.95, 0.9, 0.85, 0.8]
    
    # Convert to angles for radar chart
    angles = np.linspace(0, 2*np.pi, len(attributes), endpoint=False).tolist()
    angles += angles[:1]  # Complete the circle
    
    traditional_scores += traditional_scores[:1]
    reasoning_scores += reasoning_scores[:1]
    
    ax4.plot(angles, traditional_scores, 'o-', linewidth=2, label='Traditional', color='red')
    ax4.fill(angles, traditional_scores, alpha=0.25, color='red')
    ax4.plot(angles, reasoning_scores, 'o-', linewidth=2, label='Reasoning-Based', color='green')
    ax4.fill(angles, reasoning_scores, alpha=0.25, color='green')
    
    ax4.set_xticks(angles[:-1])
    ax4.set_xticklabels(attributes)
    ax4.set_ylim(0, 1)
    ax4.set_title('🎭 Comprehensive Method Comparison')
    ax4.legend()
    ax4.grid(True)
    
    plt.tight_layout()
    plt.show()

create_comparison_visualization()

## Advanced Reasoning Techniques

Let's explore more sophisticated reasoning patterns for complex edge cases.

In [None]:
class AdvancedReasoner(HateSpeechReasoner):
    """
    Extended reasoning system with advanced techniques for edge cases.
    
    This class demonstrates more sophisticated reasoning patterns including:
    - Context-aware analysis
    - Sarcasm detection
    - Cultural sensitivity
    - Intent disambiguation
    """
    
    def __init__(self):
        super().__init__()
        self.context_clues = self._load_context_patterns()
        print("🧠✨ Advanced reasoning system initialized!")
    
    def _load_context_patterns(self) -> Dict[str, List[str]]:
        """
        Load patterns for advanced context understanding.
        """
        return {
            "sarcasm_indicators": [
                "oh sure", "yeah right", "totally", "obviously", 
                "real smart", "great job", "brilliant"
            ],
            "discussion_contexts": [
                "research shows", "study indicates", "according to",
                "discussing", "analyzing", "reporting on"
            ],
            "self_reference": [
                "as a", "speaking as", "being", "i am", "we are"
            ]
        }
    
    def analyze_context_awareness(self, text: str) -> Dict[str, Any]:
        """
        Perform advanced context-aware analysis.
        
        This demonstrates how reasoning can handle complex linguistic patterns.
        """
        text_lower = text.lower()
        
        # Detect potential sarcasm
        sarcasm_detected = any(indicator in text_lower 
                             for indicator in self.context_clues["sarcasm_indicators"])
        
        # Detect academic/discussion context
        discussion_context = any(phrase in text_lower 
                               for phrase in self.context_clues["discussion_contexts"])
        
        # Detect self-reference (group member speaking)
        self_reference = any(phrase in text_lower 
                           for phrase in self.context_clues["self_reference"])
        
        # Analyze sentence structure for questions vs statements
        is_question = text.strip().endswith('?')
        
        # Check for quotes or references to others' speech
        has_quotes = '"' in text or "'" in text or "said" in text_lower
        
        return {
            "sarcasm_detected": sarcasm_detected,
            "discussion_context": discussion_context,
            "self_reference": self_reference,
            "is_question": is_question,
            "has_quotes": has_quotes,
            "context_score": sum([discussion_context, self_reference, is_question, has_quotes])
        }

# Test advanced reasoning with edge cases
advanced_reasoner = AdvancedReasoner()

edge_cases = [
    {
        "text": "Oh sure, because ALL politicians are totally honest, right?",
        "challenge": "Sarcasm detection"
    },
    {
        "text": "As a Muslim, I find it frustrating when people assume things about my faith.",
        "challenge": "Self-identification vs attack"
    },
    {
        "text": "Research shows that bias affects hiring decisions across demographics.",
        "challenge": "Academic discussion"
    },
    {
        "text": "Why do some people think it's okay to use slurs in comedy?",
        "challenge": "Question about hate speech"
    }
]

print("🎯 **ADVANCED REASONING DEMONSTRATION**\n")

for i, case in enumerate(edge_cases, 1):
    print(f"**Case {i}: {case['challenge']}**")
    print(f"Text: '{case['text']}'")
    
    # Perform context analysis
    context_analysis = advanced_reasoner.analyze_context_awareness(case['text'])
    
    print(f"📊 Context Analysis:")
    for key, value in context_analysis.items():
        if isinstance(value, bool):
            emoji = "✅" if value else "❌"
            print(f"   {emoji} {key.replace('_', ' ').title()}: {value}")
        else:
            print(f"   📈 {key.replace('_', ' ').title()}: {value}")
    
    # Get full reasoning analysis
    reasoning_result = advanced_reasoner.analyze_with_reasoning(case['text'])
    print(f"🎯 Classification: {reasoning_result.final_classification}")
    print(f"💡 Key insight: {reasoning_result.explanation}\n")
    print("-" * 60 + "\n")

## Production Considerations for Reasoning Systems

Let's discuss real-world deployment considerations for reasoning-based hate speech detection.

In [None]:
class ProductionReasoningSystem:
    """
    Production-ready reasoning system with performance monitoring,
    caching, and scalability considerations.
    """
    
    def __init__(self):
        self.cache = {}
        self.performance_metrics = {
            'total_requests': 0,
            'cache_hits': 0,
            'average_response_time': 0,
            'classification_counts': {'HATE_SPEECH': 0, 'NOT_HATE_SPEECH': 0}
        }
        self.response_times = []
    
    def classify_with_monitoring(self, text: str) -> Dict[str, Any]:
        """
        Classify text with performance monitoring and caching.
        """
        start_time = time.time()
        self.performance_metrics['total_requests'] += 1
        
        # Check cache first
        text_hash = hash(text)
        if text_hash in self.cache:
            self.performance_metrics['cache_hits'] += 1
            result = self.cache[text_hash]
            result['from_cache'] = True
        else:
            # Perform reasoning (simulated)
            reasoner = HateSpeechReasoner()
            reasoning_result = reasoner.analyze_with_reasoning(text)
            
            result = {
                'classification': reasoning_result.final_classification,
                'confidence': reasoning_result.confidence,
                'explanation': reasoning_result.explanation,
                'reasoning_steps': len(reasoning_result.steps),
                'from_cache': False
            }
            
            # Cache the result
            self.cache[text_hash] = result
        
        # Update metrics
        response_time = time.time() - start_time
        self.response_times.append(response_time)
        self.performance_metrics['average_response_time'] = np.mean(self.response_times)
        self.performance_metrics['classification_counts'][result['classification']] += 1
        
        result['response_time'] = response_time
        return result
    
    def get_performance_report(self) -> Dict[str, Any]:
        """
        Generate comprehensive performance report.
        """
        cache_hit_rate = (self.performance_metrics['cache_hits'] / 
                         max(1, self.performance_metrics['total_requests']))
        
        return {
            'total_requests': self.performance_metrics['total_requests'],
            'cache_hit_rate': cache_hit_rate,
            'average_response_time_ms': self.performance_metrics['average_response_time'] * 1000,
            'classification_distribution': self.performance_metrics['classification_counts'],
            'cache_size': len(self.cache)
        }

# Demonstrate production system
print("🏭 **PRODUCTION REASONING SYSTEM DEMO**\n")

production_system = ProductionReasoningSystem()

# Simulate production load
test_texts = [
    "This is a normal friendly message.",
    "I hate all people from that country.",
    "The weather is nice today.",
    "This is a normal friendly message.",  # Duplicate for cache test
    "Research shows demographic differences in voting patterns.",
    "I hate all people from that country.",  # Duplicate
]

print("📊 Processing production requests...\n")

for i, text in enumerate(test_texts, 1):
    result = production_system.classify_with_monitoring(text)
    cache_status = "🎯 CACHED" if result['from_cache'] else "🔄 COMPUTED"
    
    print(f"Request {i}: {cache_status}")
    print(f"   Text: '{text[:50]}{'...' if len(text) > 50 else ''}'")
    print(f"   Result: {result['classification']} ({result['confidence']:.1%})")
    print(f"   Response time: {result['response_time']*1000:.1f}ms")
    print()

# Performance report
report = production_system.get_performance_report()
print("📈 **PERFORMANCE REPORT**")
print("=" * 40)
print(f"📊 Total Requests: {report['total_requests']}")
print(f"🎯 Cache Hit Rate: {report['cache_hit_rate']:.1%}")
print(f"⚡ Avg Response Time: {report['average_response_time_ms']:.1f}ms")
print(f"🗂️  Cache Size: {report['cache_size']} entries")
print(f"\n📋 Classification Distribution:")
for classification, count in report['classification_distribution'].items():
    percentage = count / report['total_requests'] * 100
    print(f"   • {classification}: {count} ({percentage:.1f}%)")

## Key Benefits and Limitations

Let's summarize the advantages and challenges of reasoning-based approaches.

In [None]:
# Create comprehensive comparison table
def create_comparison_table():
    """
    Create detailed comparison between traditional and reasoning approaches.
    """
    comparison_data = {
        'Aspect': [
            'Accuracy on Explicit Hate',
            'Accuracy on Implicit Hate', 
            'Context Understanding',
            'Explainability',
            'Bias Reduction',
            'Policy Alignment',
            'Processing Speed',
            'Resource Requirements',
            'Training Data Needs',
            'Adaptability to New Policies'
        ],
        'Traditional Classifier': [
            'High (85-90%)',
            'Medium (60-70%)',
            'Low',
            'Very Low',
            'Medium',
            'Low',
            'Very Fast',
            'Low',
            'High',
            'Low (Requires Retraining)'
        ],
        'Reasoning-Based': [
            'High (85-95%)',
            'High (80-90%)',
            'Very High',
            'Excellent',
            'High',
            'Excellent',
            'Slower',
            'Higher',
            'Low',
            'High (Prompt Updates)'
        ]
    }
    
    df = pd.DataFrame(comparison_data)
    
    # Display as formatted table
    print("📊 **COMPREHENSIVE COMPARISON: TRADITIONAL vs REASONING APPROACHES**\n")
    print(df.to_string(index=False))
    
    return df

comparison_df = create_comparison_table()

print("\n\n✨ **KEY INSIGHTS**\n")

insights = [
    {
        "title": "🎯 Superior Context Understanding",
        "description": "Reasoning approaches excel at understanding implicit hate speech, sarcasm, and cultural nuances that traditional classifiers often miss."
    },
    {
        "title": "📝 Transparent Decision Making",
        "description": "Step-by-step reasoning provides clear explanations for content moderation decisions, crucial for user trust and regulatory compliance."
    },
    {
        "title": "🔄 Rapid Policy Adaptation",
        "description": "Reasoning systems can adapt to new policies by updating prompts rather than retraining entire models, enabling faster responses to emerging threats."
    },
    {
        "title": "⚖️ Improved Fairness",
        "description": "By explicitly reasoning about context and intent, these systems can reduce false positives and bias against marginalized communities discussing their experiences."
    },
    {
        "title": "⚡ Performance Trade-offs", 
        "description": "While reasoning systems are more accurate and explainable, they require more computational resources and have slower response times."
    }
]

for insight in insights:
    print(f"**{insight['title']}**")
    print(f"   {insight['description']}\n")

---

## 📋 Summary

### 🔑 Key Concepts Mastered
- **Chain-of-Thought Prompting**: Systematic approach to guide AI through step-by-step reasoning for complex classification tasks
- **Context-Aware Analysis**: Understanding implicit hate speech, sarcasm, and cultural nuances through reasoning chains
- **Explainable AI**: Providing transparent justifications for content moderation decisions through detailed reasoning steps
- **Policy-Based Classification**: Aligning AI decisions with specific platform policies through structured reasoning templates
- **Advanced Pattern Recognition**: Detecting subtle forms of hate speech that traditional classifiers often miss

### 📈 Best Practices Learned
- **Structured Reasoning Templates**: Use consistent step-by-step templates for reliable analysis across different content types
- **Multi-Step Analysis**: Break down complex decisions into manageable reasoning steps (target identification, intent analysis, policy matching)
- **Context Integration**: Consider sarcasm, self-reference, academic discussion, and other contextual factors in classification decisions
- **Performance Monitoring**: Implement caching, metrics tracking, and response time optimization for production deployment
- **Hybrid Approaches**: Combine reasoning-based analysis with traditional classifiers for optimal performance and speed

### 🚀 Next Steps
- **Advanced Prompting**: Explore few-shot learning and prompt optimization techniques
- **RAG Integration**: Implement Retrieval-Augmented Generation with policy documents and examples
- **Multi-Modal Reasoning**: Extend reasoning to images, videos, and multi-modal content
- **Notebook 03**: Explore fine-tuning techniques for domain-specific reasoning
- **Production Deployment**: Study scalable reasoning systems with model serving and monitoring

---

## About the Author

**Vu Hung Nguyen** - AI Engineer & Researcher

Connect with me:
- 🌐 **Website**: [vuhung16au.github.io](https://vuhung16au.github.io/)
- 💼 **LinkedIn**: [linkedin.com/in/nguyenvuhung](https://www.linkedin.com/in/nguyenvuhung/)
- 💻 **GitHub**: [github.com/vuhung16au](https://github.com/vuhung16au/)

*This notebook is part of the [HF Transformer Trove](https://github.com/vuhung16au/hf-transformer-trove) educational series.*