[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/hf-transformer-trove/blob/main/examples/basic1.2/03-text-generation.ipynb)
[![View on GitHub](https://img.shields.io/badge/View_on-GitHub-blue?logo=github)](https://github.com/vuhung16au/hf-transformer-trove/blob/main/examples/basic1.2/03-text-generation.ipynb)

# Text Generation with Hugging Face Transformers

## 🎯 Learning Objectives
By the end of this notebook, you will understand:
- How to use the text generation pipeline for creating text
- Different text generation models and their characteristics
- Key parameters for controlling text generation quality
- Best practices for text generation applications
- How to handle different use cases and scenarios

## 📋 Prerequisites
- Basic understanding of machine learning concepts
- Familiarity with Python and PyTorch
- Knowledge of NLP fundamentals (refer to [NLP Learning Journey](https://github.com/vuhung16au/nlp-learning-journey))
- Understanding of transformer architectures

## 📚 What We'll Cover
1. **Basic Text Generation**: Simple pipeline usage
2. **Model Selection**: Different models for various use cases
3. **Parameter Control**: Understanding generation parameters
4. **Advanced Techniques**: Sampling strategies and control methods
5. **Real-world Applications**: Practical examples and use cases
6. **Performance & Optimization**: Memory management and speed considerations

## Setup and Installation

In [None]:
# Install required packages (uncomment if needed)
# !pip install transformers torch datasets tokenizers matplotlib seaborn

import torch
import time
import os
import warnings
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from typing import List, Dict, Optional

warnings.filterwarnings('ignore')

# For Google Colab compatibility
try:
    from google.colab import userdata
    COLAB_AVAILABLE = True
    print("📍 Running in Google Colab")
except ImportError:
    COLAB_AVAILABLE = False
    print("💻 Running in local environment")

# Load environment variables for local development
try:
    from dotenv import load_dotenv
    load_dotenv('.env.local', override=True)
    print("✅ Environment variables loaded from .env.local")
except ImportError:
    print("ℹ️  python-dotenv not installed, skipping .env.local loading")

## Device Detection and Setup

In [None]:
def get_device() -> torch.device:
    """
    Automatically detect and return the best available device.
    
    Priority: CUDA > MPS (Apple Silicon) > CPU
    
    Returns:
        torch.device: The optimal device for current hardware
    """
    if torch.cuda.is_available():
        device = torch.device("cuda")
        print(f"🚀 Using CUDA GPU: {torch.cuda.get_device_name()}")
        print(f"   GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f}GB")
    elif torch.backends.mps.is_available():
        device = torch.device("mps") 
        print("🍎 Using Apple MPS (Apple Silicon)")
    else:
        device = torch.device("cpu")
        print("💻 Using CPU (consider GPU for better performance)")
    
    return device

# Set up device
device = get_device()

## Part 1: Basic Text Generation

Let's start with the simple example from the issue. This demonstrates the most basic way to generate text using Hugging Face transformers.

In [None]:
# Basic text generation as specified in the issue
from transformers import pipeline

# Create a text generation pipeline (uses default GPT-2 model)
generator = pipeline("text-generation")

# Generate text with the provided prompt
result = generator("In this course, we will teach you how to")

print("📝 Basic Text Generation Result:")
print("=" * 50)
print(result[0]['generated_text'])

### Understanding the Basic Pipeline

The simple `pipeline("text-generation")` call does several things automatically:
- Downloads and loads the default GPT-2 model (~500MB)
- Sets up tokenization and model inference
- Uses default generation parameters

Let's explore what's happening under the hood:

In [None]:
# Let's examine what model the pipeline is using
print(f"📊 Pipeline Model Information:")
print(f"Model: {generator.model.config.name_or_path if hasattr(generator.model.config, 'name_or_path') else 'gpt2'}")
print(f"Model type: {generator.model.config.model_type}")
print(f"Vocabulary size: {generator.model.config.vocab_size:,}")
print(f"Maximum position embeddings: {generator.model.config.n_positions:,}")
print(f"Number of parameters: {generator.model.num_parameters():,}")

# Show tokenizer information
print(f"\n🔤 Tokenizer Information:")
print(f"Tokenizer type: {type(generator.tokenizer).__name__}")
print(f"Vocabulary size: {len(generator.tokenizer):,}")
print(f"Special tokens: {generator.tokenizer.special_tokens_map}")

## Part 2: Model Selection and Comparison

Different models have different characteristics. Let's explore various text generation models and compare their outputs.

In [None]:
# Define different models for text generation
models_to_test = {
    "GPT-2 (Small)": "gpt2",  # 117M parameters
    "DistilGPT-2": "distilgpt2",  # 82M parameters (faster, smaller)
    "GPT-2 Medium": "gpt2-medium",  # 345M parameters (better quality)
}

# Initialize pipelines with different models
generators = {}
load_times = {}

for name, model_name in models_to_test.items():
    print(f"📥 Loading {name} ({model_name})...")
    start_time = time.time()
    
    try:
        generators[name] = pipeline(
            "text-generation", 
            model=model_name,
            device=0 if torch.cuda.is_available() else -1  # Use GPU if available
        )
        load_time = time.time() - start_time
        load_times[name] = load_time
        print(f"   ✅ Loaded in {load_time:.2f}s")
        
        # Print model info
        model = generators[name].model
        print(f"   📊 Parameters: {model.num_parameters():,}")
        
    except Exception as e:
        print(f"   ❌ Failed to load: {e}")
        # Remove from models_to_test if failed
        if name in models_to_test:
            del models_to_test[name]
    
    print()

### Model Comparison with Same Prompt

In [None]:
# Test all models with the same prompt
test_prompt = "In this course, we will teach you how to"
print(f"🎯 Testing with prompt: \"{test_prompt}\"") 
print("=" * 60)

# Generate text with each model
results = {}
generation_times = {}

for name, gen in generators.items():
    print(f"\n🤖 {name}:")
    start_time = time.time()
    
    try:
        # Generate text with consistent parameters
        output = gen(
            test_prompt,
            max_length=100,
            num_return_sequences=1,
            temperature=0.7,
            do_sample=True,
            pad_token_id=gen.tokenizer.eos_token_id  # Avoid warning
        )
        
        generation_time = time.time() - start_time
        generation_times[name] = generation_time
        results[name] = output[0]['generated_text']
        
        print(f"   ⏱️  Generated in {generation_time:.2f}s")
        print(f"   📝 Output: {output[0]['generated_text']}")
        
    except Exception as e:
        print(f"   ❌ Generation failed: {e}")
        results[name] = f"Error: {e}"
        generation_times[name] = 0

## Part 3: Understanding Generation Parameters

Text generation quality heavily depends on the parameters used. Let's explore the key parameters and their effects.

In [None]:
# Use the first available generator for parameter exploration
demo_generator = list(generators.values())[0]
demo_model_name = list(generators.keys())[0]

print(f"🔧 Exploring parameters with {demo_model_name}")
print("=" * 50)

# Define parameter sets to test
parameter_sets = {
    "Conservative (Low Temperature)": {
        "temperature": 0.3,
        "do_sample": True,
        "max_length": 80
    },
    "Balanced (Medium Temperature)": {
        "temperature": 0.7,
        "do_sample": True,
        "max_length": 80
    },
    "Creative (High Temperature)": {
        "temperature": 1.2,
        "do_sample": True,
        "max_length": 80
    },
    "Deterministic (Greedy)": {
        "do_sample": False,
        "max_length": 80
    }
}

prompt = "The future of artificial intelligence"
print(f"🎯 Prompt: \"{prompt}\"\n")

for param_name, params in parameter_sets.items():
    print(f"🎛️  {param_name}:")
    print(f"   Parameters: {params}")
    
    try:
        output = demo_generator(
            prompt,
            **params,
            pad_token_id=demo_generator.tokenizer.eos_token_id
        )
        print(f"   📝 Result: {output[0]['generated_text']}")
    except Exception as e:
        print(f"   ❌ Error: {e}")
    
    print()

### Parameter Explanation

Let's understand what each parameter does:

- **`temperature`**: Controls randomness (0.1-2.0)
  - Low (0.1-0.5): More focused, conservative
  - Medium (0.6-0.8): Balanced creativity
  - High (0.9-2.0): More random, creative

- **`do_sample`**: Whether to use sampling vs. greedy decoding
  - `True`: Uses sampling (more diverse)
  - `False`: Uses greedy decoding (deterministic)

- **`max_length`**: Maximum number of tokens to generate
- **`num_return_sequences`**: Number of different outputs to generate
- **`top_p`**: Nucleus sampling threshold
- **`top_k`**: Top-k sampling parameter

## Part 4: Advanced Generation Techniques

In [None]:
# Advanced sampling techniques
print("🧠 Advanced Sampling Techniques")
print("=" * 40)

advanced_techniques = {
    "Top-k Sampling (k=50)": {
        "do_sample": True,
        "top_k": 50,
        "temperature": 0.7,
        "max_length": 80
    },
    "Nucleus Sampling (p=0.9)": {
        "do_sample": True,
        "top_p": 0.9,
        "temperature": 0.7,
        "max_length": 80
    },
    "Combined (top_k=40, top_p=0.95)": {
        "do_sample": True,
        "top_k": 40,
        "top_p": 0.95,
        "temperature": 0.8,
        "max_length": 80
    },
    "Multiple Sequences (n=3)": {
        "do_sample": True,
        "num_return_sequences": 3,
        "temperature": 0.8,
        "max_length": 60
    }
}

prompt = "To build a successful AI application, you need to"
print(f"🎯 Prompt: \"{prompt}\"\n")

for technique_name, params in advanced_techniques.items():
    print(f"⚡ {technique_name}:")
    
    try:
        outputs = demo_generator(
            prompt,
            **params,
            pad_token_id=demo_generator.tokenizer.eos_token_id
        )
        
        if isinstance(outputs, list) and len(outputs) > 1:
            for i, output in enumerate(outputs, 1):
                print(f"   📝 Sequence {i}: {output['generated_text']}")
        else:
            print(f"   📝 Result: {outputs[0]['generated_text']}")
            
    except Exception as e:
        print(f"   ❌ Error: {e}")
    
    print()

## Part 5: Real-world Applications

Let's explore practical applications of text generation.

In [None]:
class TextGenerationHelper:
    """Helper class for different text generation applications."""
    
    def __init__(self, generator):
        self.generator = generator
    
    def creative_writing(self, prompt: str, style: str = "balanced") -> str:
        """Generate creative writing content."""
        style_params = {
            "conservative": {"temperature": 0.5, "top_p": 0.8},
            "balanced": {"temperature": 0.8, "top_p": 0.9},
            "creative": {"temperature": 1.1, "top_p": 0.95}
        }
        
        params = style_params.get(style, style_params["balanced"])
        
        result = self.generator(
            prompt,
            max_length=150,
            do_sample=True,
            pad_token_id=self.generator.tokenizer.eos_token_id,
            **params
        )
        return result[0]['generated_text']
    
    def business_content(self, prompt: str) -> str:
        """Generate professional business content."""
        result = self.generator(
            prompt,
            max_length=120,
            temperature=0.6,  # More conservative for professional content
            do_sample=True,
            top_p=0.85,
            pad_token_id=self.generator.tokenizer.eos_token_id
        )
        return result[0]['generated_text']
    
    def educational_content(self, prompt: str) -> str:
        """Generate educational content."""
        result = self.generator(
            prompt,
            max_length=140,
            temperature=0.5,  # Conservative for accuracy
            do_sample=True,
            top_k=50,
            pad_token_id=self.generator.tokenizer.eos_token_id
        )
        return result[0]['generated_text']

# Create helper instance
helper = TextGenerationHelper(demo_generator)

# Test different applications
applications = {
    "Creative Writing": {
        "prompt": "The old lighthouse stood alone on the cliff, its beam cutting through the fog",
        "method": helper.creative_writing
    },
    "Business Content": {
        "prompt": "Our company's strategic vision for the next quarter focuses on",
        "method": helper.business_content
    },
    "Educational Content": {
        "prompt": "Machine learning is a subset of artificial intelligence that",
        "method": helper.educational_content
    }
}

print("🎨 Real-world Application Examples")
print("=" * 40)

for app_name, config in applications.items():
    print(f"\n📋 {app_name}:")
    print(f"   Prompt: \"{config['prompt']}\"")
    
    try:
        result = config['method'](config['prompt'])
        print(f"   📝 Generated: {result}")
    except Exception as e:
        print(f"   ❌ Error: {e}")

## Part 6: Performance Analysis and Visualization

In [None]:
# Performance analysis
if len(generation_times) > 1:  # Only if we have multiple models
    # Create performance visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # Model loading times
    model_names = list(load_times.keys())
    load_time_values = list(load_times.values())
    
    ax1.bar(model_names, load_time_values, color='skyblue', alpha=0.7)
    ax1.set_title('Model Loading Times', fontsize=14, fontweight='bold')
    ax1.set_ylabel('Time (seconds)')
    ax1.tick_params(axis='x', rotation=45)
    
    # Add values on bars
    for i, v in enumerate(load_time_values):
        ax1.text(i, v + 0.1, f'{v:.2f}s', ha='center', va='bottom')
    
    # Generation times
    gen_model_names = list(generation_times.keys())
    gen_time_values = list(generation_times.values())
    
    ax2.bar(gen_model_names, gen_time_values, color='lightcoral', alpha=0.7)
    ax2.set_title('Text Generation Times', fontsize=14, fontweight='bold')
    ax2.set_ylabel('Time (seconds)')
    ax2.tick_params(axis='x', rotation=45)
    
    # Add values on bars
    for i, v in enumerate(gen_time_values):
        if v > 0:  # Only show if not error
            ax2.text(i, v + 0.01, f'{v:.2f}s', ha='center', va='bottom')
    
    plt.tight_layout()
    plt.show()
    
    # Performance summary
    print("\n📊 Performance Summary:")
    print("=" * 30)
    for model in model_names:
        if model in load_times and model in generation_times:
            print(f"{model}:")
            print(f"  Loading: {load_times[model]:.2f}s")
            print(f"  Generation: {generation_times[model]:.2f}s")
            print(f"  Total: {load_times[model] + generation_times[model]:.2f}s")
            print()
else:
    print("📊 Performance analysis requires multiple models to compare")

## Part 7: Best Practices and Troubleshooting

### 💡 Best Practices

1. **Model Selection**
   - Use DistilGPT-2 for speed, GPT-2 medium/large for quality
   - Consider memory constraints when choosing model size

2. **Parameter Tuning**
   - Start with temperature=0.7 for balanced output
   - Use `top_p=0.9` for nucleus sampling
   - Set appropriate `max_length` to avoid truncation

3. **Performance Optimization**
   - Use GPU acceleration when available
   - Batch multiple generations for efficiency
   - Cache models to avoid repeated loading

### ⚠️ Common Pitfalls

1. **Memory Issues**: Large models may cause out-of-memory errors
2. **Repetitive Output**: Too low temperature can cause repetition
3. **Inconsistent Quality**: Very high temperature leads to nonsensical output
4. **Padding Token Warnings**: Always set `pad_token_id` for clean output

In [None]:
def safe_generate_text(generator, prompt: str, **kwargs):
    """
    Safe text generation with error handling and best practices.
    
    Args:
        generator: HuggingFace pipeline
        prompt: Input text prompt
        **kwargs: Generation parameters
    
    Returns:
        Generated text or error message
    """
    try:
        # Set default parameters if not provided
        default_params = {
            'max_length': 100,
            'temperature': 0.7,
            'do_sample': True,
            'pad_token_id': generator.tokenizer.eos_token_id,
            'num_return_sequences': 1
        }
        
        # Update with user parameters
        params = {**default_params, **kwargs}
        
        # Validate parameters
        if params['max_length'] > 512:
            print("⚠️  Warning: max_length > 512 may cause memory issues")
        
        if params['temperature'] < 0.1 or params['temperature'] > 2.0:
            print(f"⚠️  Warning: temperature {params['temperature']} is outside recommended range (0.1-2.0)")
        
        # Generate text
        start_time = time.time()
        result = generator(prompt, **params)
        generation_time = time.time() - start_time
        
        print(f"✅ Generated in {generation_time:.2f}s")
        return result
        
    except torch.cuda.OutOfMemoryError:
        print("❌ GPU out of memory. Try reducing max_length or using a smaller model.")
        return None
    except Exception as e:
        print(f"❌ Generation error: {e}")
        return None

# Demonstrate safe generation
print("🛡️  Safe Text Generation Example:")
print("=" * 40)

result = safe_generate_text(
    demo_generator,
    "The key to successful machine learning projects is",
    max_length=120,
    temperature=0.8,
    top_p=0.9
)

if result:
    print(f"📝 Generated text: {result[0]['generated_text']}")
else:
    print("❌ Generation failed")

## Part 8: Memory Management and Cleanup

In [None]:
def print_memory_usage():
    """Print current GPU memory usage."""
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated() / 1024**3
        cached = torch.cuda.memory_reserved() / 1024**3
        print(f"🔋 GPU Memory - Allocated: {allocated:.2f}GB, Cached: {cached:.2f}GB")
    else:
        print("💻 Running on CPU - no GPU memory to track")

# Check memory usage
print("📊 Current Memory Usage:")
print_memory_usage()

# Optional: Clear GPU cache if needed
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("\n🧹 GPU cache cleared")
    print_memory_usage()

---

## 📋 Summary

### 🔑 Key Concepts Mastered
- **Basic Text Generation**: Used `pipeline("text-generation")` for simple text generation
- **Model Selection**: Compared different models (GPT-2, DistilGPT-2) for various use cases
- **Parameter Control**: Learned how temperature, sampling, and other parameters affect output quality
- **Advanced Techniques**: Explored top-k, nucleus sampling, and multiple sequence generation
- **Real-world Applications**: Applied text generation to creative writing, business, and educational content

### 📈 Best Practices Learned
- **Device Awareness**: Automatically detect and use optimal compute device (CUDA/MPS/CPU)
- **Parameter Optimization**: Balance creativity and coherence through proper parameter tuning
- **Error Handling**: Implement robust error handling for production applications
- **Memory Management**: Monitor and manage GPU memory usage effectively
- **Performance Monitoring**: Track loading and generation times for optimization

### 🚀 Next Steps
- **Notebook 04**: Mini Project - building complete NLP applications
- **Notebook 05**: Fine-tuning models for specific domains
- **Advanced Topics**: Explore larger models like GPT-3.5 and GPT-4 via APIs
- **Documentation**: [Hugging Face Transformers Text Generation Guide](https://huggingface.co/docs/transformers/task_summary#text-generation)

---

## About the Author

**Vu Hung Nguyen** - AI Engineer & Researcher

Connect with me:
- 🌐 **Website**: [vuhung16au.github.io](https://vuhung16au.github.io/)
- 💼 **LinkedIn**: [linkedin.com/in/nguyenvuhung](https://www.linkedin.com/in/nguyenvuhung/)
- 💻 **GitHub**: [github.com/vuhung16au](https://github.com/vuhung16au/)

*This notebook is part of the [HF Transformer Trove](https://github.com/vuhung16au/hf-transformer-trove) educational series.*