# Text Generation in Natural Language Processing

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/nlp-learning-journey/blob/main/examples/text-generation.ipynb)

## Overview

Text generation is the task of automatically producing coherent and contextually relevant text. It ranges from simple template-based approaches to sophisticated neural language models that can generate human-like text.

## What You'll Learn

- N-gram based text generation
- Markov chain approaches
- Neural language models
- Transformer-based generation
- Conditional text generation
- Evaluation techniques
- Real-world applications

## Prerequisites

Basic understanding of Python, probability, NLP concepts, and neural networks.

In [None]:
# Environment Detection and Setup
import sys
import subprocess

# Detect the runtime environment
IS_COLAB = "google.colab" in sys.modules
IS_KAGGLE = "kaggle_secrets" in sys.modules
IS_LOCAL = not (IS_COLAB or IS_KAGGLE)

print(f"Environment detected:")
print(f"  - Local: {IS_LOCAL}")
print(f"  - Google Colab: {IS_COLAB}")
print(f"  - Kaggle: {IS_KAGGLE}")

# Platform-specific system setup
if IS_COLAB:
    print("\nSetting up Google Colab environment...")
    !apt update -qq
    !apt install -y -qq libpq-dev
elif IS_KAGGLE:
    print("\nSetting up Kaggle environment...")
    # Kaggle usually has most packages pre-installed
else:
    print("\nSetting up local environment...")

# Install required packages for this notebook
required_packages = [
    "transformers",
    "torch",
    "nltk",
    "pandas",
    "matplotlib",
    "seaborn",
    "wordcloud"
]

print("\nInstalling required packages...")
for package in required_packages:
    if IS_COLAB or IS_KAGGLE:
        !pip install -q {package}
    else:
        subprocess.run([sys.executable, "-m", "pip", "install", "-q", package], 
                      capture_output=True)
    print(f"✓ {package}")

print("\n🎉 Environment setup complete!")

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict, Counter
import random
import re

# NLP libraries
import nltk

# Download NLTK data with error handling
nltk_datasets = ['punkt', 'punkt_tab', 'gutenberg']
print("Downloading NLTK datasets...")
for dataset in nltk_datasets:
    try:
        nltk.download(dataset, quiet=True)
        print(f"✓ {dataset}")
    except Exception as e:
        print(f"⚠️  Failed to download {dataset}: {e}")

from nltk.tokenize import sent_tokenize, word_tokenize
try:
    from nltk.corpus import gutenberg
    print("✓ NLTK corpus loaded")
except:
    print("⚠️  NLTK corpus not available")
    gutenberg = None

# Set random seeds for reproducibility
random.seed(42)
np.random.seed(42)

plt.style.use('default')
sns.set_palette("husl")

# Note: Transformers models will be loaded when needed due to potential network requirements
print("\n⚠️  Note: Transformer models (GPT-2, etc.) require internet access and will be loaded when needed.")

## Sample Text Corpus

Let's create a sample corpus for training our text generation models.

In [None]:
def create_sample_corpus():
    """Create a sample text corpus for training"""
    
    # Technology-related sentences
    tech_sentences = [
        "Artificial intelligence is transforming the way we work and live.",
        "Machine learning algorithms can learn patterns from data automatically.",
        "Deep learning models use neural networks with multiple layers.",
        "Natural language processing enables computers to understand human language.",
        "Computer vision systems can recognize objects in images and videos.",
        "Cloud computing provides scalable infrastructure for modern applications.",
        "Data science combines statistics, programming, and domain expertise.",
        "Software development requires careful planning and systematic implementation.",
        "Cybersecurity protects digital systems from malicious attacks.",
        "The internet connects billions of devices around the world."
    ]
    
    # Science-related sentences
    science_sentences = [
        "Scientists conduct experiments to test their hypotheses.",
        "Research findings contribute to our understanding of the natural world.",
        "The scientific method involves observation, hypothesis, and experimentation.",
        "Peer review ensures the quality and validity of scientific publications.",
        "Innovation often emerges from interdisciplinary collaboration.",
        "Laboratory equipment enables precise measurement and analysis.",
        "Mathematical models help scientists predict natural phenomena.",
        "Evidence-based conclusions form the foundation of scientific knowledge.",
        "Technology transfer brings scientific discoveries to practical applications.",
        "Scientific literacy is essential for informed decision making."
    ]
    
    # Australia-related sentences
    australia_sentences = [
        "Australia is known for its unique wildlife including kangaroos and koalas.",
        "The Great Barrier Reef is one of Australia's most famous natural wonders.",
        "Sydney Opera House is an iconic architectural landmark in Australia.",
        "Australia has diverse landscapes from deserts to tropical rainforests.",
        "Melbourne is renowned for its vibrant arts and coffee culture.",
        "The Australian Outback covers vast areas of the continent's interior.",
        "Australia's indigenous Aboriginal culture spans over 65,000 years.",
        "The country is famous for its beautiful beaches and coastal cities.",
        "Australia is home to many venomous snakes and dangerous wildlife.",
        "The Australian economy relies heavily on mining and agriculture exports."
    ]
    
    # Combine all sentences
    corpus_sentences = tech_sentences + science_sentences + australia_sentences
    
    # Create a larger corpus by adding some variety
    extended_corpus = []
    for sentence in corpus_sentences:
        extended_corpus.append(sentence)
        # Add some variations
        if "can" in sentence:
            extended_corpus.append(sentence.replace("can", "may"))
        if "is" in sentence:
            extended_corpus.append(sentence.replace("is", "was"))
    
    return extended_corpus

# Create corpus
corpus = create_sample_corpus()
print(f"Corpus size: {len(corpus)} sentences")
print("\nSample sentences:")
for i in range(5):
    print(f"{i+1}. {corpus[i]}")

# Create combined text
corpus_text = ' '.join(corpus)
print(f"\nTotal characters: {len(corpus_text)}")
print(f"Total words: {len(corpus_text.split())}")

## N-gram Based Text Generation

Simple probabilistic approach using n-grams to generate text.

In [None]:
class NGramGenerator:
    def __init__(self, n=2):
        self.n = n
        self.ngram_counts = defaultdict(Counter)
        self.vocab = set()
    
    def train(self, text):
        """Train the n-gram model on text"""
        # Tokenize text
        words = word_tokenize(text.lower())
        words = [word for word in words if word.isalnum()]
        
        # Add special tokens
        words = ['<START>'] * (self.n - 1) + words + ['<END>']
        self.vocab.update(words)
        
        # Count n-grams
        for i in range(len(words) - self.n + 1):
            context = tuple(words[i:i + self.n - 1])
            next_word = words[i + self.n - 1]
            self.ngram_counts[context][next_word] += 1
    
    def generate_text(self, max_length=50, temperature=1.0):
        """Generate text using the trained model"""
        # Start with context
        context = ['<START>'] * (self.n - 1)
        generated = []
        
        for _ in range(max_length):
            context_tuple = tuple(context[-(self.n-1):])
            
            if context_tuple not in self.ngram_counts:
                break
            
            # Get possible next words
            candidates = self.ngram_counts[context_tuple]
            
            if not candidates:
                break
            
            # Apply temperature to probabilities
            words = list(candidates.keys())
            counts = np.array(list(candidates.values()), dtype=float)
            
            if temperature != 1.0:
                counts = counts ** (1.0 / temperature)
            
            probabilities = counts / counts.sum()
            
            # Sample next word
            next_word = np.random.choice(words, p=probabilities)
            
            if next_word == '<END>':
                break
            
            generated.append(next_word)
            context.append(next_word)
        
        return ' '.join(generated)
    
    def get_statistics(self):
        """Get model statistics"""
        total_ngrams = sum(len(counts) for counts in self.ngram_counts.values())
        total_contexts = len(self.ngram_counts)
        
        return {
            'vocabulary_size': len(self.vocab),
            'total_ngrams': total_ngrams,
            'unique_contexts': total_contexts,
            'avg_choices_per_context': total_ngrams / total_contexts if total_contexts > 0 else 0
        }

# Train n-gram models with different n values
print("N-gram Text Generation:")
print("=" * 40)

models = {}
for n in [2, 3, 4]:
    model = NGramGenerator(n=n)
    model.train(corpus_text)
    models[n] = model
    
    stats = model.get_statistics()
    print(f"\n{n}-gram model statistics:")
    for key, value in stats.items():
        print(f"  {key}: {value:.2f}")

# Generate text with different models
print("\nGenerated Text Examples:")
for n in [2, 3, 4]:
    print(f"\n{n}-gram model:")
    for temp in [0.5, 1.0, 1.5]:
        generated = models[n].generate_text(max_length=20, temperature=temp)
        print(f"  Temp {temp}: {generated}")

## Markov Chain Text Generation

Character-level Markov chain for text generation.

In [None]:
class MarkovTextGenerator:
    def __init__(self, order=2):
        self.order = order
        self.transitions = defaultdict(Counter)
    
    def train(self, text):
        """Train Markov chain on character sequences"""
        # Clean text
        text = re.sub(r'[^a-zA-Z\s.,!?]', '', text)
        text = text.lower()
        
        # Build transition table
        for i in range(len(text) - self.order):
            state = text[i:i + self.order]
            next_char = text[i + self.order]
            self.transitions[state][next_char] += 1
    
    def generate_text(self, length=200, seed=None):
        """Generate text using Markov chain"""
        if not self.transitions:
            return "Model not trained"
        
        # Choose starting state
        if seed and len(seed) >= self.order:
            current_state = seed[-self.order:].lower()
        else:
            # Find states that start with capital letters (sentence beginnings)
            sentence_starts = [state for state in self.transitions.keys() 
                             if state[0].isupper() or state.startswith(' ')]
            if sentence_starts:
                current_state = random.choice(sentence_starts)
            else:
                current_state = random.choice(list(self.transitions.keys()))
        
        result = current_state
        
        for _ in range(length - self.order):
            if current_state not in self.transitions:
                break
            
            # Get possible next characters
            candidates = self.transitions[current_state]
            
            if not candidates:
                break
            
            # Choose next character based on probabilities
            chars = list(candidates.keys())
            weights = list(candidates.values())
            next_char = random.choices(chars, weights=weights)[0]
            
            result += next_char
            current_state = current_state[1:] + next_char
        
        return result
    
    def get_statistics(self):
        """Get model statistics"""
        total_transitions = sum(sum(counter.values()) for counter in self.transitions.values())
        avg_choices = np.mean([len(counter) for counter in self.transitions.values()])
        
        return {
            'unique_states': len(self.transitions),
            'total_transitions': total_transitions,
            'avg_choices_per_state': avg_choices
        }

# Train Markov models with different orders
print("Markov Chain Text Generation:")
print("=" * 40)

markov_models = {}
for order in [2, 3, 4]:
    model = MarkovTextGenerator(order=order)
    model.train(corpus_text)
    markov_models[order] = model
    
    stats = model.get_statistics()
    print(f"\nOrder {order} Markov model:")
    for key, value in stats.items():
        print(f"  {key}: {value:.2f}")

# Generate text examples
print("\nMarkov Chain Generated Text:")
for order in [2, 3, 4]:
    generated = markov_models[order].generate_text(length=150)
    print(f"\nOrder {order}: {generated}")

## Transformer-Based Text Generation

Using pre-trained language models for high-quality text generation.

In [None]:
# Initialize text generation pipeline
try:
    # Try to load GPT-2
    generator = pipeline('text-generation', model='gpt2', tokenizer='gpt2')
    print("Loaded GPT-2 model for text generation")
except:
    try:
        # Fallback to DistilGPT-2
        generator = pipeline('text-generation', model='distilgpt2')
        print("Loaded DistilGPT-2 model for text generation")
    except:
        generator = None
        print("Could not load transformer generation model")

def transformer_text_generation(prompt, max_length=100, num_return_sequences=3, 
                               temperature=0.7, top_p=0.9):
    """Generate text using transformer model"""
    if generator is None:
        return ["Transformer model not available"]
    
    try:
        results = generator(
            prompt,
            max_length=max_length,
            num_return_sequences=num_return_sequences,
            temperature=temperature,
            top_p=top_p,
            do_sample=True,
            pad_token_id=generator.tokenizer.eos_token_id
        )
        
        return [result['generated_text'] for result in results]
    except Exception as e:
        return [f"Error in generation: {str(e)}"]

if generator:
    print("\nTransformer Text Generation Examples:")
    print("=" * 50)
    
    # Test different prompts
    prompts = [
        "Artificial intelligence will",
        "The future of technology",
        "Scientists have discovered"
    ]
    
    for prompt in prompts:
        print(f"\nPrompt: '{prompt}'")
        print("-" * 30)
        
        generations = transformer_text_generation(
            prompt, max_length=80, num_return_sequences=2
        )
        
        for i, gen_text in enumerate(generations, 1):
            # Clean up the generated text
            clean_text = gen_text.replace(prompt, '').strip()
            if clean_text:
                print(f"  {i}. {prompt}{clean_text}")
            else:
                print(f"  {i}. {gen_text}")
else:
    print("Transformer generation not available")

## Conditional Text Generation

Generating text based on specific conditions or constraints.

In [None]:
class ConditionalGenerator:
    def __init__(self):
        self.topic_keywords = {
            'technology': ['artificial', 'intelligence', 'computer', 'software', 'data', 'algorithm'],
            'science': ['research', 'experiment', 'discovery', 'scientific', 'laboratory', 'hypothesis'],
            'health': ['medical', 'treatment', 'patient', 'doctor', 'medicine', 'therapy'],
            'education': ['learning', 'student', 'teacher', 'school', 'knowledge', 'study']
        }
        
        self.sentence_templates = {
            'technology': [
                "The latest {tech_term} technology enables {benefit}.",
                "Researchers are developing {tech_term} systems for {application}.",
                "Advanced {tech_term} algorithms can {capability}."
            ],
            'science': [
                "Scientists have conducted {study_type} to investigate {phenomenon}.",
                "The research findings suggest that {discovery}.",
                "Laboratory experiments demonstrate {result}."
            ]
        }
        
        self.template_fillers = {
            'tech_term': ['AI', 'machine learning', 'blockchain', 'quantum computing'],
            'benefit': ['improved efficiency', 'better accuracy', 'cost reduction'],
            'application': ['healthcare', 'finance', 'transportation', 'education'],
            'capability': ['process large datasets', 'recognize patterns', 'make predictions'],
            'study_type': ['comprehensive studies', 'controlled experiments', 'longitudinal research'],
            'phenomenon': ['climate change effects', 'cellular behavior', 'quantum mechanics'],
            'discovery': ['new mechanisms exist', 'correlations are significant', 'theories need revision'],
            'result': ['promising outcomes', 'unexpected findings', 'measurable improvements']
        }
    
    def generate_by_topic(self, topic, num_sentences=3):
        """Generate text focused on a specific topic"""
        if topic not in self.topic_keywords:
            return f"Topic '{topic}' not supported"
        
        sentences = []
        
        if topic in self.sentence_templates:
            templates = self.sentence_templates[topic]
            
            for _ in range(num_sentences):
                template = random.choice(templates)
                
                # Fill template placeholders
                filled_template = template
                for placeholder, options in self.template_fillers.items():
                    if '{' + placeholder + '}' in filled_template:
                        filled_template = filled_template.replace(
                            '{' + placeholder + '}', random.choice(options)
                        )
                
                sentences.append(filled_template)
        else:
            # Fallback: generate sentences with topic keywords
            keywords = self.topic_keywords[topic]
            base_sentences = [
                f"Recent advances in {random.choice(keywords)} are promising.",
                f"The field of {random.choice(keywords)} continues to evolve.",
                f"New developments in {random.choice(keywords)} show potential."
            ]
            sentences = random.sample(base_sentences, min(num_sentences, len(base_sentences)))
        
        return ' '.join(sentences)
    
    def generate_with_length_constraint(self, topic, target_length=100):
        """Generate text with specific length constraint"""
        generated_text = ""
        
        while len(generated_text.split()) < target_length:
            sentence = self.generate_by_topic(topic, num_sentences=1)
            generated_text += sentence + " "
            
            # Prevent infinite loop
            if len(generated_text.split()) > target_length * 1.5:
                break
        
        # Trim to approximate target length
        words = generated_text.split()[:target_length]
        return ' '.join(words)
    
    def generate_with_keywords(self, required_keywords, num_sentences=3):
        """Generate text that includes specific keywords"""
        sentences = []
        
        for keyword in required_keywords[:num_sentences]:
            sentence_templates = [
                f"The concept of {keyword} is fundamental to understanding the field.",
                f"Recent research on {keyword} has yielded significant insights.",
                f"Applications of {keyword} are expanding across various domains.",
                f"The importance of {keyword} cannot be overstated in modern studies."
            ]
            sentences.append(random.choice(sentence_templates))
        
        # Fill remaining sentences if needed
        while len(sentences) < num_sentences:
            general_sentences = [
                "Further investigation is needed to fully understand these phenomena.",
                "The implications of these findings are far-reaching.",
                "Continued research will likely reveal additional insights."
            ]
            sentences.append(random.choice(general_sentences))
        
        return ' '.join(sentences[:num_sentences])

# Test conditional generation
print("Conditional Text Generation:")
print("=" * 40)

conditional_gen = ConditionalGenerator()

# Topic-based generation
print("\n1. Topic-based Generation:")
for topic in ['technology', 'science']:
    generated = conditional_gen.generate_by_topic(topic, num_sentences=2)
    print(f"\n{topic.title()}: {generated}")

# Length-constrained generation
print("\n2. Length-constrained Generation:")
short_text = conditional_gen.generate_with_length_constraint('technology', target_length=30)
long_text = conditional_gen.generate_with_length_constraint('science', target_length=60)

print(f"\nShort (30 words): {short_text} ({len(short_text.split())} words)")
print(f"\nLong (60 words): {long_text} ({len(long_text.split())} words)")

# Keyword-based generation
print("\n3. Keyword-based Generation:")
required_keywords = ['machine learning', 'neural networks', 'algorithms']
keyword_text = conditional_gen.generate_with_keywords(required_keywords)
print(f"\nWith keywords {required_keywords}:\n{keyword_text}")

## Text Generation Evaluation

Methods to evaluate the quality of generated text.

In [None]:
def evaluate_text_quality(texts):
    """Evaluate various aspects of generated text quality"""
    results = []
    
    for text in texts:
        words = word_tokenize(text.lower())
        words = [word for word in words if word.isalnum()]
        
        sentences = sent_tokenize(text)
        
        # Basic metrics
        word_count = len(words)
        sentence_count = len(sentences)
        unique_words = len(set(words))
        
        # Lexical diversity (Type-Token Ratio)
        ttr = unique_words / word_count if word_count > 0 else 0
        
        # Average sentence length
        avg_sentence_length = word_count / sentence_count if sentence_count > 0 else 0
        
        # Repetition analysis
        word_freq = Counter(words)
        repeated_words = sum(1 for count in word_freq.values() if count > 1)
        repetition_ratio = repeated_words / unique_words if unique_words > 0 else 0
        
        # Readability (simple approximation)
        long_words = sum(1 for word in words if len(word) > 6)
        long_word_ratio = long_words / word_count if word_count > 0 else 0
        
        results.append({
            'text': text[:100] + '...' if len(text) > 100 else text,
            'word_count': word_count,
            'sentence_count': sentence_count,
            'unique_words': unique_words,
            'lexical_diversity': ttr,
            'avg_sentence_length': avg_sentence_length,
            'repetition_ratio': repetition_ratio,
            'long_word_ratio': long_word_ratio
        })
    
    return results

def compare_generation_methods(prompt="Artificial intelligence"):
    """Compare different text generation methods"""
    generated_texts = {}
    
    # N-gram generation
    if 2 in models:
        ngram_text = models[2].generate_text(max_length=30)
        generated_texts['2-gram'] = prompt + " " + ngram_text
    
    # Markov chain generation
    if 3 in markov_models:
        markov_text = markov_models[3].generate_text(length=100, seed=prompt)
        generated_texts['Markov'] = markov_text[:150]  # Limit length
    
    # Conditional generation
    conditional_text = conditional_gen.generate_by_topic('technology', num_sentences=2)
    generated_texts['Conditional'] = conditional_text
    
    # Transformer generation (if available)
    if generator:
        transformer_results = transformer_text_generation(prompt, max_length=60, num_return_sequences=1)
        if transformer_results:
            generated_texts['Transformer'] = transformer_results[0]
    
    return generated_texts

# Compare generation methods
print("Text Generation Method Comparison:")
print("=" * 50)

comparison_texts = compare_generation_methods()

# Evaluate all generated texts
all_texts = list(comparison_texts.values())
evaluation_results = evaluate_text_quality(all_texts)

# Display results
methods = list(comparison_texts.keys())
for method, result in zip(methods, evaluation_results):
    print(f"\n{method}:")
    print(f"  Text: {result['text']}")
    print(f"  Words: {result['word_count']}, Sentences: {result['sentence_count']}")
    print(f"  Lexical Diversity: {result['lexical_diversity']:.3f}")
    print(f"  Avg Sentence Length: {result['avg_sentence_length']:.1f}")
    print(f"  Repetition Ratio: {result['repetition_ratio']:.3f}")

# Visualize evaluation metrics
if len(evaluation_results) > 1:
    metrics = ['lexical_diversity', 'avg_sentence_length', 'repetition_ratio', 'long_word_ratio']
    
    fig, axes = plt.subplots(2, 2, figsize=(12, 8))
    axes = axes.ravel()
    
    for i, metric in enumerate(metrics):
        values = [result[metric] for result in evaluation_results]
        axes[i].bar(methods, values)
        axes[i].set_title(metric.replace('_', ' ').title())
        axes[i].tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.show()

## Real-World Applications

In [None]:
# Application 1: Content Creation Assistant
def content_creation_assistant(topic, content_type, length='medium'):
    """Generate content for different purposes"""
    length_settings = {
        'short': 30,
        'medium': 60,
        'long': 100
    }
    
    target_length = length_settings.get(length, 60)
    
    if content_type == 'blog_intro':
        intro_templates = [
            f"In today's rapidly evolving world, {topic} has become increasingly important.",
            f"The field of {topic} is experiencing unprecedented growth and innovation.",
            f"Understanding {topic} is crucial for anyone interested in modern technology."
        ]
        base_text = random.choice(intro_templates)
        
    elif content_type == 'product_description':
        desc_templates = [
            f"Our innovative {topic} solution offers cutting-edge features and reliability.",
            f"Experience the future with our advanced {topic} technology platform.",
            f"Discover how our {topic} product can transform your workflow and productivity."
        ]
        base_text = random.choice(desc_templates)
        
    elif content_type == 'educational':
        edu_templates = [
            f"Learning about {topic} provides valuable insights into modern scientific principles.",
            f"The study of {topic} encompasses various disciplines and methodologies.",
            f"Students exploring {topic} will gain practical knowledge and theoretical understanding."
        ]
        base_text = random.choice(edu_templates)
    
    else:
        base_text = f"The topic of {topic} is worth exploring in detail."
    
    # Extend with conditional generation
    if topic.lower() in ['technology', 'science']:
        extended_text = conditional_gen.generate_with_length_constraint(
            topic.lower(), target_length
        )
        return base_text + " " + extended_text
    else:
        return base_text

# Application 2: Creative Writing Assistant
def creative_writing_prompt(genre, elements):
    """Generate creative writing prompts and story starters"""
    
    genre_settings = {
        'sci-fi': {
            'settings': ['space station', 'alien planet', 'future city', 'research facility'],
            'themes': ['time travel', 'artificial intelligence', 'genetic engineering', 'space exploration']
        },
        'mystery': {
            'settings': ['old mansion', 'small town', 'university campus', 'corporate office'],
            'themes': ['missing person', 'stolen artifact', 'corporate espionage', 'family secret']
        },
        'fantasy': {
            'settings': ['magical forest', 'ancient castle', 'mystical island', 'enchanted city'],
            'themes': ['quest for power', 'ancient prophecy', 'magical artifact', 'hidden realm']
        }
    }
    
    if genre not in genre_settings:
        return "Genre not supported"
    
    setting = random.choice(genre_settings[genre]['settings'])
    theme = random.choice(genre_settings[genre]['themes'])
    
    # Create story starter
    starters = [
        f"In the {setting}, a discovery about {theme} changes everything.",
        f"The {setting} holds secrets related to {theme} that few understand.",
        f"When {theme} becomes a reality in the {setting}, unexpected challenges arise."
    ]
    
    story_starter = random.choice(starters)
    
    # Add character and conflict elements if provided
    if elements:
        element_text = f" The story involves {', '.join(elements)}."
        story_starter += element_text
    
    return story_starter

# Application 3: Email Template Generator
def generate_email_template(email_type, recipient_type, tone='professional'):
    """Generate email templates for different purposes"""
    
    tone_modifiers = {
        'professional': 'formal and respectful',
        'friendly': 'warm and approachable',
        'urgent': 'direct and action-oriented'
    }
    
    templates = {
        'follow_up': {
            'subject': 'Following up on our previous discussion',
            'body': 'I hope this email finds you well. I wanted to follow up on our recent conversation regarding [TOPIC]. Please let me know if you need any additional information.'
        },
        'introduction': {
            'subject': 'Introduction and collaboration opportunity',
            'body': 'I am reaching out to introduce myself and explore potential collaboration opportunities. I believe our organizations share common goals and could benefit from working together.'
        },
        'meeting_request': {
            'subject': 'Meeting request to discuss [TOPIC]',
            'body': 'I would like to schedule a meeting to discuss [TOPIC] in more detail. Please let me know your availability for the coming week.'
        }
    }
    
    if email_type not in templates:
        return "Email type not supported"
    
    template = templates[email_type]
    
    # Customize based on recipient and tone
    greeting = {
        'client': 'Dear valued client',
        'colleague': 'Hello',
        'vendor': 'Dear partner'
    }.get(recipient_type, 'Hello')
    
    closing = {
        'professional': 'Best regards',
        'friendly': 'Best wishes',
        'urgent': 'Thank you for your prompt attention'
    }.get(tone, 'Best regards')
    
    full_email = f"Subject: {template['subject']}\n\n{greeting},\n\n{template['body']}\n\n{closing},\n[Your Name]"
    
    return full_email

# Test applications
print("Real-World Text Generation Applications:")
print("=" * 50)

# Content creation
print("\n1. Content Creation Assistant:")
blog_intro = content_creation_assistant('artificial intelligence', 'blog_intro', 'medium')
product_desc = content_creation_assistant('machine learning', 'product_description', 'short')

print(f"\nBlog Introduction:\n{blog_intro}")
print(f"\nProduct Description:\n{product_desc}")

# Creative writing
print("\n2. Creative Writing Assistant:")
sci_fi_prompt = creative_writing_prompt('sci-fi', ['android scientist', 'quantum computer'])
mystery_prompt = creative_writing_prompt('mystery', ['detective', 'old diary'])

print(f"\nSci-Fi Prompt: {sci_fi_prompt}")
print(f"\nMystery Prompt: {mystery_prompt}")

# Email templates
print("\n3. Email Template Generator:")
follow_up_email = generate_email_template('follow_up', 'client', 'professional')
meeting_email = generate_email_template('meeting_request', 'colleague', 'friendly')

print(f"\nFollow-up Email:\n{follow_up_email}")
print(f"\nMeeting Request Email:\n{meeting_email}")

## Exercises

1. **Style Transfer**: Modify text generation to mimic different writing styles
2. **Dialogue Generation**: Create conversational text between multiple speakers
3. **Poetry Generation**: Implement rhyme and meter constraints
4. **Code Generation**: Generate code comments or simple functions

## Key Takeaways

- **Multiple approaches available**: From simple n-grams to sophisticated transformers
- **Quality vs complexity trade-off**: More complex models generally produce better text
- **Conditional generation is powerful**: Controlling output with constraints improves usefulness
- **Evaluation is challenging**: Automatic metrics don't always capture text quality
- **Context length matters**: Longer context generally leads to more coherent text

## Best Practices

1. **Start simple**: Begin with n-gram models to understand basics
2. **Use appropriate models**: Match model complexity to your quality needs
3. **Control generation**: Use temperature, top-p, and other parameters
4. **Evaluate thoroughly**: Use both automatic metrics and human evaluation
5. **Consider ethics**: Be aware of potential biases and harmful content

## Applications

- **Content creation**: Blog posts, product descriptions, marketing copy
- **Creative writing**: Story prompts, poetry, character development
- **Code generation**: Documentation, comments, simple functions
- **Chatbots**: Conversational AI and virtual assistants
- **Data augmentation**: Generate training data for other NLP tasks

## Next Steps

- Learn about fine-tuning pre-trained models
- Explore controllable generation techniques
- Study evaluation metrics like BLEU, ROUGE, and perplexity
- Practice with different domains and writing styles
- Learn about ethical considerations in text generation

## Resources

- [GPT-2 Paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
- [Hugging Face Text Generation](https://huggingface.co/models?pipeline_tag=text-generation)
- [The Illustrated GPT-2](http://jalammar.github.io/illustrated-gpt2/)
- [OpenAI API Documentation](https://platform.openai.com/docs/)
- [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness)