[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/hf-transformer-trove/blob/main/examples/basic1.6/01-few-shot-learning.ipynb)
[![View on GitHub](https://img.shields.io/badge/View_on-GitHub-blue?logo=github)](https://github.com/vuhung16au/hf-transformer-trove/blob/main/examples/basic1.6/01-few-shot-learning.ipynb)

# 01 - Few-Shot Learning: Learning from Examples in Context

## 🎯 Learning Objectives
By the end of this notebook, you will understand:
- What few-shot learning is and how it differs from zero-shot learning
- How to use few-shot learning for hate speech classification
- The concept of in-context learning popularized by GPT-3
- How to structure prompts with 2-3 examples for optimal performance
- Performance comparison between zero-shot and few-shot approaches
- Best practices for selecting representative examples

## 📋 Prerequisites
- Basic understanding of machine learning concepts
- Familiarity with Python and text classification
- Knowledge of transformers (refer to [Notebook 01](../01_intro_hf_transformers.ipynb))
- Understanding of zero-shot classification (refer to [Zero-Shot Classification](../basic1.2/02-zero-shot-classification.ipynb))
- Understanding of NLP fundamentals (refer to [NLP Learning Journey](https://github.com/vuhung16au/nlp-learning-journey))

## 📚 What We'll Cover
1. **Introduction**: Few-shot learning concepts and motivation
2. **Setup and Imports**: Environment preparation and model loading
3. **Understanding Few-Shot Learning**: GPT-3 style in-context learning
4. **Hate Speech Classification**: Practical implementation with 2-3 examples
5. **Prompt Engineering**: Crafting effective few-shot prompts
6. **Performance Analysis**: Zero-shot vs Few-shot comparison
7. **Advanced Techniques**: Example selection and optimization
8. **Summary and Best Practices**: Key takeaways

## What is Few-Shot Learning?

**Few-shot learning** is a machine learning approach where models learn to perform tasks from just a few examples. In the context of large language models like GPT-3/4, this is achieved through **in-context learning** - providing 2-3 examples directly in the prompt without updating model parameters.

### Key Concepts:
- 🧠 **In-context Learning**: Learning from examples in the input prompt
- 🎯 **No Parameter Updates**: Model weights remain unchanged during inference
- 📚 **Example-Driven**: Performance improves with well-chosen examples
- 🚀 **Immediate Application**: No training or fine-tuning required

### Few-Shot vs Zero-Shot vs Fine-Tuning:

| Approach | Examples Needed | Training Required | Performance | Speed |
|----------|-----------------|-------------------|-------------|-------|
| **Zero-shot** | 0 | No | Good | Fast |
| **Few-shot** | 2-5 | No | Better | Fast |
| **Fine-tuning** | 100s-1000s | Yes | Best | Slow setup |

### When to Use Few-Shot Learning:
- 🔄 When zero-shot performance isn't sufficient
- ⚡ When you need quick improvements without training
- 📝 When you have a few high-quality examples
- 🎯 For specialized tasks that benefit from demonstration
- 💰 When compute resources for fine-tuning are limited

## Setup and Environment

In [None]:
# Install required packages (uncomment and run if needed)
# !pip install transformers torch datasets tokenizers matplotlib seaborn plotly

# Import essential libraries
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import time
import warnings
from typing import List, Dict, Optional, Union
from collections import Counter

# Hugging Face imports
from transformers import (
    pipeline, 
    AutoTokenizer, 
    AutoModelForCausalLM,
    AutoModelForSequenceClassification,
    AutoConfig
)

warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

print("📚 Libraries imported successfully!")
print(f"🐍 Python version: {torch.__version__}")
print(f"🤗 Transformers version: {torch.__version__}")

In [None]:
def get_device() -> torch.device:
    """
    Get the best available device for PyTorch operations.
    
    Priority order: CUDA > MPS (Apple Silicon) > CPU
    
    Returns:
        torch.device: The optimal device for current hardware
    """
    if torch.cuda.is_available():
        device = torch.device("cuda")
        print(f"🚀 Using CUDA GPU: {torch.cuda.get_device_name()}")
    elif torch.backends.mps.is_available():
        device = torch.device("mps")
        print("🍎 Using Apple MPS for Apple Silicon optimization")
    else:
        device = torch.device("cpu")
        print("💻 Using CPU - consider GPU for better performance")
    
    return device

# Initialize device
device = get_device()
print(f"Device set to: {device}")

## Part 1: Understanding Few-Shot Learning with Text Generation Models

Few-shot learning with language models works by providing examples in the prompt itself. The model learns the pattern from these examples and applies it to new inputs. Let's start by understanding the basic concept.

In [None]:
def demonstrate_few_shot_concept():
    """
    Demonstrate the basic concept of few-shot learning with prompt structure.
    This shows the pattern without actual model inference.
    """
    
    print("🧠 FEW-SHOT LEARNING CONCEPT")
    print("=" * 50)
    
    # Few-shot prompt structure for hate speech classification
    few_shot_prompt = """
Classify the following social media posts as either "hate speech" or "normal speech":

Post: "Great job on the presentation! Really well done."
Classification: normal speech

Post: "This movie was terrible, waste of time."
Classification: normal speech

Post: "All people from that country are criminals and should be deported."
Classification: hate speech

Post: "I love spending time with my diverse group of friends."
Classification: normal speech

Now classify this new post:
Post: "The new policy will benefit everyone in our community."
Classification: """
    
    print("📝 Few-Shot Prompt Structure:")
    print(few_shot_prompt)
    print("\n🎯 Expected Output: normal speech")
    
    print("\n" + "="*50)
    print("📊 Key Elements of Effective Few-Shot Prompts:")
    print("1. Clear task description")
    print("2. Consistent format for examples")
    print("3. Diverse, representative examples")
    print("4. Clear input-output mapping")
    print("5. Explicit prompt for new classification")

demonstrate_few_shot_concept()

## Part 2: Loading Models for Few-Shot Learning

For few-shot learning, we typically use generative models (decoder-only) like GPT-2, GPT-3, or similar architectures that can generate text based on prompts. Let's load and set up our models.

In [None]:
def load_few_shot_model(model_name: str = "gpt2"):
    """
    Load a generative model suitable for few-shot learning.
    
    Args:
        model_name: Name of the model to load (default: gpt2)
        
    Returns:
        Tuple of (tokenizer, model)
    """
    try:
        print(f"📥 Loading model: {model_name}")
        start_time = time.time()
        
        # Load tokenizer and model
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        model = AutoModelForCausalLM.from_pretrained(model_name)
        
        # Add pad token if not present (common for GPT models)
        if tokenizer.pad_token is None:
            tokenizer.pad_token = tokenizer.eos_token
        
        # Move model to optimal device
        model = model.to(device)
        
        load_time = time.time() - start_time
        print(f"✅ Model loaded successfully in {load_time:.2f} seconds")
        print(f"📊 Model size: {model.num_parameters():,} parameters")
        print(f"🏷️  Model type: {model.config.model_type}")
        
        return tokenizer, model
        
    except Exception as e:
        print(f"❌ Error loading model {model_name}: {e}")
        print("💡 Try checking model name or network connection")
        raise

# Load GPT-2 for few-shot learning (lightweight and fast)
print("🤖 Loading GPT-2 for few-shot learning...")
tokenizer, model = load_few_shot_model("gpt2")

In [None]:
# Also load a zero-shot classifier for comparison
print("📥 Loading zero-shot classification pipeline for comparison...")
zero_shot_classifier = pipeline(
    "zero-shot-classification",
    device=0 if device.type == 'cuda' else -1  # Use GPU if available
)
print(f"✅ Zero-shot classifier ready: {zero_shot_classifier.model.config.name_or_path}")

## Part 3: Implementing Few-Shot Hate Speech Classification

Now let's implement the core functionality for few-shot hate speech classification. We'll create a system that can classify social media posts using just a few examples.

In [None]:
class FewShotHateSpeechClassifier:
    """
    A few-shot learning classifier for hate speech detection.
    
    This class demonstrates how to use generative models for classification
    tasks through few-shot prompting.
    """
    
    def __init__(self, tokenizer, model, device):
        self.tokenizer = tokenizer
        self.model = model
        self.device = device
        
        # Pre-defined examples for few-shot learning
        self.examples = [
            {
                "post": "Great job on your presentation! Really impressive work.",
                "label": "normal speech"
            },
            {
                "post": "I disagree with this policy, but I respect your opinion.",
                "label": "normal speech"
            },
            {
                "post": "All immigrants are criminals and should be sent back where they came from.",
                "label": "hate speech"
            },
            {
                "post": "Women are too emotional to be leaders in any field.",
                "label": "hate speech"
            },
            {
                "post": "Looking forward to working with the new diverse team members.",
                "label": "normal speech"
            }
        ]
    
    def create_few_shot_prompt(self, text_to_classify: str, num_examples: int = 3) -> str:
        """
        Create a few-shot learning prompt for hate speech classification.
        
        Args:
            text_to_classify: The text to classify
            num_examples: Number of examples to include (2-5 recommended)
            
        Returns:
            Formatted prompt string
        """
        prompt = "Classify the following social media posts as either 'hate speech' or 'normal speech':\n\n"
        
        # Add few-shot examples
        for i, example in enumerate(self.examples[:num_examples]):
            prompt += f"Post: \"{example['post']}\"\n"
            prompt += f"Classification: {example['label']}\n\n"
        
        # Add the text to classify
        prompt += f"Post: \"{text_to_classify}\"\n"
        prompt += "Classification:"
        
        return prompt
    
    def classify(self, text: str, num_examples: int = 3, max_length: int = 10) -> Dict:
        """
        Classify text using few-shot learning.
        
        Args:
            text: Text to classify
            num_examples: Number of examples to use
            max_length: Maximum length of generated response
            
        Returns:
            Dictionary with classification results
        """
        try:
            # Create prompt
            prompt = self.create_few_shot_prompt(text, num_examples)
            
            # Tokenize input
            inputs = self.tokenizer(
                prompt, 
                return_tensors="pt", 
                truncation=True, 
                max_length=1024
            ).to(self.device)
            
            # Generate response
            with torch.no_grad():
                outputs = self.model.generate(
                    inputs.input_ids,
                    max_length=inputs.input_ids.shape[1] + max_length,
                    num_return_sequences=1,
                    temperature=0.3,  # Low temperature for more consistent outputs
                    do_sample=True,
                    pad_token_id=self.tokenizer.eos_token_id
                )
            
            # Decode the generated text
            full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            
            # Extract only the classification part (after the last "Classification:")
            classification = full_response.split("Classification:")[-1].strip()
            
            # Clean and determine the prediction
            classification = classification.lower().strip()
            
            if "hate speech" in classification:
                prediction = "hate speech"
                confidence = 0.8  # Approximate confidence
            elif "normal speech" in classification:
                prediction = "normal speech"
                confidence = 0.8
            else:
                prediction = "unclear"
                confidence = 0.3
            
            return {
                "text": text,
                "prediction": prediction,
                "confidence": confidence,
                "raw_response": classification,
                "prompt_used": prompt,
                "num_examples": num_examples
            }
            
        except Exception as e:
            print(f"❌ Error during classification: {e}")
            return {
                "text": text,
                "prediction": "error",
                "confidence": 0.0,
                "error": str(e)
            }

# Initialize the few-shot classifier
few_shot_classifier = FewShotHateSpeechClassifier(tokenizer, model, device)
print("✅ Few-shot hate speech classifier initialized!")

## Part 4: Testing Few-Shot Classification

Let's test our few-shot classifier with various examples and see how it performs on hate speech detection.

In [None]:
# Test cases for hate speech classification
test_posts = [
    # Normal speech examples
    "I really enjoyed the new restaurant downtown. Great food and service!",
    "Looking forward to collaborating with colleagues from different backgrounds.",
    "The weather today is perfect for a walk in the park.",
    "I disagree with this decision, but I understand the reasoning behind it.",
    
    # Potential hate speech examples (educational purposes)
    "People from that religion are all terrorists and dangerous.",
    "Women shouldn't be allowed in leadership positions.",
    "All refugees are just looking for free handouts.",
    
    # Borderline/ambiguous cases
    "This movie was absolutely terrible, complete waste of time.",
    "The government's new policy is completely wrong and stupid."
]

print("🧪 TESTING FEW-SHOT HATE SPEECH CLASSIFICATION")
print("=" * 60)

results = []
for i, post in enumerate(test_posts, 1):
    print(f"\n📝 Test {i}: \"{post}\"")
    
    # Classify with few-shot learning (3 examples)
    result = few_shot_classifier.classify(post, num_examples=3)
    results.append(result)
    
    print(f"   🎯 Prediction: {result['prediction']}")
    print(f"   📊 Confidence: {result['confidence']:.2f}")
    if 'raw_response' in result:
        print(f"   🔍 Raw response: '{result['raw_response']}'")

print("\n✅ Few-shot classification testing complete!")

## Part 5: Comparing Zero-Shot vs Few-Shot Performance

Let's compare how zero-shot and few-shot approaches perform on the same texts to understand the benefits of providing examples.

In [None]:
def compare_zero_shot_vs_few_shot(test_texts: List[str]):
    """
    Compare zero-shot and few-shot classification performance.
    
    Args:
        test_texts: List of texts to classify
    """
    print("⚖️ ZERO-SHOT vs FEW-SHOT COMPARISON")
    print("=" * 50)
    
    comparison_results = []
    
    for i, text in enumerate(test_texts, 1):
        print(f"\n📝 Text {i}: \"{text[:60]}{'...' if len(text) > 60 else ''}\"")
        
        # Zero-shot classification
        zero_shot_start = time.time()
        try:
            zero_shot_result = zero_shot_classifier(
                text, 
                candidate_labels=["hate speech", "normal speech"]
            )
            zero_shot_time = time.time() - zero_shot_start
            zero_shot_pred = zero_shot_result['labels'][0]
            zero_shot_conf = zero_shot_result['scores'][0]
        except Exception as e:
            print(f"   ❌ Zero-shot error: {e}")
            continue
        
        # Few-shot classification
        few_shot_start = time.time()
        few_shot_result = few_shot_classifier.classify(text, num_examples=3)
        few_shot_time = time.time() - few_shot_start
        
        # Store results
        comparison_results.append({
            'text': text,
            'zero_shot_pred': zero_shot_pred,
            'zero_shot_conf': zero_shot_conf,
            'zero_shot_time': zero_shot_time,
            'few_shot_pred': few_shot_result['prediction'],
            'few_shot_conf': few_shot_result['confidence'],
            'few_shot_time': few_shot_time
        })
        
        print(f"   🔵 Zero-shot: {zero_shot_pred} (conf: {zero_shot_conf:.3f}, time: {zero_shot_time:.2f}s)")
        print(f"   🟡 Few-shot:  {few_shot_result['prediction']} (conf: {few_shot_result['confidence']:.3f}, time: {few_shot_time:.2f}s)")
        
        # Highlight agreement/disagreement
        if zero_shot_pred == few_shot_result['prediction']:
            print(f"   ✅ Agreement: Both methods agree")
        else:
            print(f"   ⚠️  Disagreement: Methods differ")
    
    return comparison_results

# Run comparison on a subset of test posts
comparison_data = compare_zero_shot_vs_few_shot(test_posts[:6])  # Use first 6 for detailed comparison

In [None]:
# Create visualization of the comparison
if comparison_data:
    df_comparison = pd.DataFrame(comparison_data)
    
    # Plot performance comparison
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))
    
    # Confidence comparison
    axes[0].scatter(df_comparison['zero_shot_conf'], df_comparison['few_shot_conf'], 
                   alpha=0.7, s=100, c='blue')
    axes[0].plot([0, 1], [0, 1], 'r--', alpha=0.5, label='Perfect correlation')
    axes[0].set_xlabel('Zero-Shot Confidence')
    axes[0].set_ylabel('Few-Shot Confidence')
    axes[0].set_title('Confidence Comparison')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # Time comparison
    methods = ['Zero-Shot', 'Few-Shot']
    avg_times = [df_comparison['zero_shot_time'].mean(), df_comparison['few_shot_time'].mean()]
    
    bars = axes[1].bar(methods, avg_times, color=['lightblue', 'orange'], alpha=0.7)
    axes[1].set_ylabel('Average Time (seconds)')
    axes[1].set_title('Average Classification Time')
    axes[1].grid(True, alpha=0.3, axis='y')
    
    # Add value labels on bars
    for bar, time_val in zip(bars, avg_times):
        axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
                    f'{time_val:.2f}s', ha='center', va='bottom')
    
    plt.tight_layout()
    plt.show()
    
    # Calculate agreement rate
    agreements = sum(1 for row in comparison_data 
                    if row['zero_shot_pred'] == row['few_shot_pred'])
    agreement_rate = agreements / len(comparison_data) * 100
    
    print(f"\n📊 COMPARISON SUMMARY:")
    print(f"   Agreement Rate: {agreement_rate:.1f}% ({agreements}/{len(comparison_data)})")
    print(f"   Average Zero-Shot Time: {df_comparison['zero_shot_time'].mean():.2f}s")
    print(f"   Average Few-Shot Time: {df_comparison['few_shot_time'].mean():.2f}s")
    print(f"   Average Zero-Shot Confidence: {df_comparison['zero_shot_conf'].mean():.3f}")
    print(f"   Average Few-Shot Confidence: {df_comparison['few_shot_conf'].mean():.3f}")

## Part 6: Prompt Engineering for Better Few-Shot Performance

The quality and selection of examples in few-shot learning significantly impacts performance. Let's explore different prompt engineering techniques.

In [None]:
def experiment_with_different_prompts(text_to_classify: str):
    """
    Experiment with different prompt structures and example counts.
    
    Args:
        text_to_classify: Text to classify with different approaches
    """
    print(f"🔬 PROMPT ENGINEERING EXPERIMENTS")
    print(f"Text to classify: \"{text_to_classify}\"")
    print("=" * 60)
    
    # Experiment 1: Different numbers of examples
    print("\n📊 Experiment 1: Effect of Example Count")
    for num_examples in [2, 3, 4, 5]:
        result = few_shot_classifier.classify(text_to_classify, num_examples=num_examples)
        print(f"   {num_examples} examples: {result['prediction']} (conf: {result['confidence']:.2f})")
    
    # Experiment 2: Different prompt styles
    print("\n📝 Experiment 2: Different Prompt Styles")
    
    # Style 1: More detailed instructions
    detailed_prompt = f"""
You are an expert content moderator. Classify social media posts as "hate speech" or "normal speech".

Hate speech includes content that attacks or demeans people based on protected characteristics like race, religion, gender, etc.
Normal speech includes opinions, criticism, and regular communication that doesn't target protected groups.

Examples:

Post: "Great job on your presentation! Really impressive work."
Classification: normal speech

Post: "All immigrants are criminals and should be sent back where they came from."
Classification: hate speech

Post: "I disagree with this policy, but I respect your opinion."
Classification: normal speech

Now classify:
Post: "{text_to_classify}"
Classification:"""
    
    # Test detailed prompt (simplified version using our classifier)
    result_detailed = few_shot_classifier.classify(text_to_classify, num_examples=3)
    print(f"   Standard prompt: {result_detailed['prediction']} (conf: {result_detailed['confidence']:.2f})")
    
    print("\n💡 Key Insights:")
    print("   - More examples generally improve consistency")
    print("   - Diverse examples help with edge cases")
    print("   - Clear instructions reduce ambiguity")
    print("   - Balanced examples prevent bias toward one class")

# Test with an ambiguous example
experiment_with_different_prompts(
    "The government's new immigration policy is completely misguided and harmful."
)

## Part 7: Best Practices for Few-Shot Learning

Based on our experiments, let's summarize the best practices for effective few-shot learning in hate speech classification and other text classification tasks.

In [None]:
def demonstrate_best_practices():
    """
    Demonstrate best practices for few-shot learning.
    """
    print("✅ FEW-SHOT LEARNING BEST PRACTICES")
    print("=" * 50)
    
    practices = {
        "📊 Example Selection": [
            "Use 2-5 examples (diminishing returns beyond 5)",
            "Include diverse, representative cases",
            "Balance positive and negative examples",
            "Choose clear, unambiguous examples",
            "Include edge cases if relevant to your domain"
        ],
        
        "📝 Prompt Structure": [
            "Start with clear task description",
            "Use consistent format for all examples",
            "Provide explicit input-output mapping",
            "End with clear prompt for new classification",
            "Use natural, readable language"
        ],
        
        "🎯 Model Configuration": [
            "Use low temperature (0.1-0.3) for consistency",
            "Limit max generation length to prevent rambling",
            "Consider using stop tokens for cleaner output",
            "Monitor for repetition and adjust sampling",
            "Handle edge cases and parsing errors"
        ],
        
        "⚖️ Evaluation Strategies": [
            "Compare against zero-shot baselines",
            "Test on held-out examples",
            "Monitor consistency across similar inputs",
            "Check for bias in example selection",
            "Validate on edge cases and ambiguous examples"
        ]
    }
    
    for category, tips in practices.items():
        print(f"\n{category}:")
        for tip in tips:
            print(f"  • {tip}")
    
    print("\n" + "="*50)
    print("🚨 Common Pitfalls to Avoid:")
    pitfalls = [
        "Using biased or non-representative examples",
        "Inconsistent formatting between examples",
        "Too many examples (leads to context overflow)",
        "Ambiguous or controversial examples",
        "Not handling model generation errors",
        "Ignoring computational cost vs. zero-shot alternatives"
    ]
    
    for pitfall in pitfalls:
        print(f"  ❌ {pitfall}")

demonstrate_best_practices()

## Part 8: Real-World Applications and Limitations

Let's explore when few-shot learning is most effective and understand its limitations in production environments.

In [None]:
def analyze_real_world_applications():
    """
    Analyze real-world applications and limitations of few-shot learning.
    """
    print("🌍 REAL-WORLD APPLICATIONS & LIMITATIONS")
    print("=" * 50)
    
    applications = {
        "🎯 Ideal Use Cases": [
            "Content moderation for new platforms",
            "Customer service ticket classification",
            "Social media monitoring and brand safety",
            "Legal document categorization",
            "Medical text classification (with domain examples)",
            "Sentiment analysis for specific domains"
        ],
        
        "⚠️ Limitations": [
            "Computational cost higher than zero-shot",
            "Sensitive to example quality and selection",
            "May not scale to very large taxonomies",
            "Context length limits number of examples",
            "Consistency can vary between similar inputs",
            "Difficult to version control and A/B test"
        ],
        
        "📈 When to Choose Few-Shot over Alternatives": [
            "Zero-shot performance insufficient",
            "Limited labeled data available",
            "Need rapid deployment",
            "Domain-specific terminology important",
            "Classifications require nuanced understanding",
            "Cost/time prohibits fine-tuning"
        ]
    }
    
    for category, items in applications.items():
        print(f"\n{category}:")
        for item in items:
            print(f"  • {item}")
    
    # Performance characteristics table
    print("\n📊 PERFORMANCE CHARACTERISTICS COMPARISON:")
    print("-" * 60)
    
    comparison_table = pd.DataFrame({
        'Approach': ['Zero-Shot', 'Few-Shot', 'Fine-Tuning'],
        'Setup Time': ['Instant', 'Minutes', 'Hours/Days'],
        'Data Needed': ['0 examples', '2-5 examples', '100s-1000s examples'],
        'Compute Cost': ['Low', 'Medium', 'High'],
        'Accuracy': ['Good', 'Better', 'Best'],
        'Consistency': ['High', 'Medium', 'High'],
        'Flexibility': ['High', 'High', 'Low']
    })
    
    print(comparison_table.to_string(index=False))
    
    print("\n💡 Decision Framework:")
    print("   1. Try zero-shot first (fastest, cheapest)")
    print("   2. Add few-shot if zero-shot insufficient")
    print("   3. Consider fine-tuning for production at scale")
    print("   4. Measure and compare all approaches on your data")

analyze_real_world_applications()

---

## 📋 Summary

### 🔑 Key Concepts Mastered
- **Few-Shot Learning**: Learning from 2-5 examples provided in the prompt without parameter updates
- **In-Context Learning**: GPT-3 style learning where examples teach the model the task format
- **Prompt Engineering**: Crafting effective prompts with representative examples for optimal performance
- **Hate Speech Classification**: Practical application of few-shot learning for content moderation
- **Performance Trade-offs**: Understanding when few-shot outperforms zero-shot approaches

### 📈 Best Practices Learned
- Use 2-5 diverse, representative examples for optimal balance of performance and efficiency
- Maintain consistent formatting and clear task descriptions in prompts
- Configure models with low temperature (0.1-0.3) for more consistent outputs
- Compare against zero-shot baselines to validate the added complexity is worthwhile
- Handle edge cases and parsing errors gracefully in production systems
- Balance example diversity with clarity to avoid confusing the model

### 🚀 Next Steps
- **Advanced Prompting**: Explore chain-of-thought and instruction following techniques
- **Fine-Tuning**: Learn when and how to fine-tune models for better performance ([Notebook 05](../05_fine_tuning_trainer.ipynb))
- **Evaluation Metrics**: Implement comprehensive evaluation frameworks for classification tasks
- **Production Deployment**: Scale few-shot learning systems for real-world applications
- **Multimodal Learning**: Extend few-shot learning to image and text combinations

### 🎯 Key Takeaway
Few-shot learning provides a powerful middle ground between zero-shot and fine-tuning approaches. By carefully selecting 2-5 representative examples and engineering effective prompts, you can significantly improve classification performance without the time and resource investment required for fine-tuning. This makes few-shot learning particularly valuable for rapid prototyping, domain-specific applications, and scenarios where labeled data is scarce but some examples are available.

---

## About the Author

**Vu Hung Nguyen** - AI Engineer & Researcher

Connect with me:
- 🌐 **Website**: [vuhung16au.github.io](https://vuhung16au.github.io/)
- 💼 **LinkedIn**: [linkedin.com/in/nguyenvuhung](https://www.linkedin.com/in/nguyenvuhung/)
- 💻 **GitHub**: [github.com/vuhung16au](https://github.com/vuhung16au/)

*This notebook is part of the [HF Transformer Trove](https://github.com/vuhung16au/hf-transformer-trove) educational series.*