# Text Translation in Natural Language Processing

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vuhung16au/nlp-learning-journey/blob/main/examples/text-translation.ipynb)

## Overview

Text translation is the task of automatically converting text from one language to another while preserving meaning, style, and context. Modern approaches use neural machine translation with transformer architectures.

## What You'll Learn

- Translation approaches and evolution
- Using pre-trained translation models
- Transformer-based translation
- Evaluation metrics (BLEU, METEOR)
- Multilingual models
- Translation quality assessment
- Real-world applications

## Prerequisites

Basic understanding of Python, NLP concepts, and sequence-to-sequence models.

In [None]:
# Environment Detection and Setup
import sys
import subprocess

# Detect the runtime environment
IS_COLAB = "google.colab" in sys.modules
IS_KAGGLE = "kaggle_secrets" in sys.modules
IS_LOCAL = not (IS_COLAB or IS_KAGGLE)

print(f"Environment detected:")
print(f"  - Local: {IS_LOCAL}")
print(f"  - Google Colab: {IS_COLAB}")
print(f"  - Kaggle: {IS_KAGGLE}")

# Platform-specific system setup
if IS_COLAB:
    print("\nSetting up Google Colab environment...")
    !apt update -qq
    !apt install -y -qq libpq-dev
elif IS_KAGGLE:
    print("\nSetting up Kaggle environment...")
    # Kaggle usually has most packages pre-installed
else:
    print("\nSetting up local environment...")

# Install required packages for this notebook
required_packages = [
    "transformers",
    "torch",
    "sacrebleu",
    "pandas",
    "matplotlib",
    "seaborn",
    "langdetect"
]

print("\nInstalling required packages...")
for package in required_packages:
    if IS_COLAB or IS_KAGGLE:
        !pip install -q {package}
    else:
        subprocess.run([sys.executable, "-m", "pip", "install", "-q", package], 
                      capture_output=True)
    print(f"✓ {package}")

print("\n🎉 Environment setup complete!")

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict, Counter
import re

# Translation libraries
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
from transformers import MarianMTModel, MarianTokenizer
import sacrebleu
from langdetect import detect, LangDetectError

# Set up plotting
plt.style.use('default')
sns.set_palette("husl")

# Language code mappings
LANGUAGE_CODES = {
    'english': 'en',
    'vietnamese': 'vi',
    'spanish': 'es', 
    'french': 'fr',
    'german': 'de',
    'italian': 'it',
    'portuguese': 'pt',
    'russian': 'ru',
    'chinese': 'zh',
    'japanese': 'ja',
    'arabic': 'ar'
}

LANGUAGE_NAMES = {v: k for k, v in LANGUAGE_CODES.items()}

## Sample Texts for Translation

Let's create sample texts in different languages for translation experiments.

In [None]:
# Sample texts in different languages
sample_texts = {
    'en': [
        "Hello, how are you today?",
        "The weather is beautiful this morning.",
        "Artificial intelligence is changing the world.",
        "I would like to book a table for two people.",
        "Science and technology advance human knowledge."
    ],
    'vi': [
        "Xin chào, bạn khỏe không hôm nay?",
        "Thời tiết đẹp vào sáng nay.",
        "Trí tuệ nhân tạo đang thay đổi thế giới.",
        "Tôi muốn đặt bàn cho hai người.",
        "Khoa học và công nghệ nâng cao kiến thức con người."
    ],
    'es': [
        "Hola, ¿cómo estás hoy?",
        "El clima está hermoso esta mañana.",
        "La inteligencia artificial está cambiando el mundo.",
        "Me gustaría reservar una mesa para dos personas.",
        "La ciencia y la tecnología avanzan el conocimiento humano."
    ],
    'fr': [
        "Bonjour, comment allez-vous aujourd'hui?",
        "Le temps est magnifique ce matin.",
        "L'intelligence artificielle change le monde.",
        "J'aimerais réserver une table pour deux personnes.",
        "La science et la technologie font progresser les connaissances humaines."
    ],
    'de': [
        "Hallo, wie geht es dir heute?",
        "Das Wetter ist heute Morgen schön.",
        "Künstliche Intelligenz verändert die Welt.",
        "Ich möchte einen Tisch für zwei Personen reservieren.",
        "Wissenschaft und Technologie fördern das menschliche Wissen."
    ]
}

print("Sample Texts for Translation:")
print("=" * 50)

for lang_code, texts in sample_texts.items():
    lang_name = LANGUAGE_NAMES.get(lang_code, lang_code).title()
    print(f"\n{lang_name} ({lang_code}):")
    for i, text in enumerate(texts[:3], 1):
        print(f"  {i}. {text}")

## Language Detection

Before translation, we often need to detect the source language.

In [None]:
def detect_language_with_confidence(text):
    """Detect language with confidence estimation"""
    try:
        from langdetect import detect_langs
        detections = detect_langs(text)
        
        # Get the most likely language
        best_detection = detections[0]
        
        return {
            'language': best_detection.lang,
            'confidence': best_detection.prob,
            'all_detections': [(d.lang, d.prob) for d in detections]
        }
    except LangDetectError:
        return {
            'language': 'unknown',
            'confidence': 0.0,
            'all_detections': []
        }
    except ImportError:
        # Fallback if langdetect not available
        try:
            lang = detect(text)
            return {
                'language': lang,
                'confidence': 0.8,  # Assume reasonable confidence
                'all_detections': [(lang, 0.8)]
            }
        except:
            return {
                'language': 'unknown',
                'confidence': 0.0,
                'all_detections': []
            }

# Test language detection
print("Language Detection Results:")
print("=" * 40)

test_texts = [
    ("Hello, how are you?", "English"),
    ("Bonjour, comment allez-vous?", "French"),
    ("Hola, ¿cómo estás?", "Spanish"),
    ("Guten Tag, wie geht es Ihnen?", "German"),
    ("Artificial intelligence is the future", "English")
]

detection_results = []
for text, expected_lang in test_texts:
    result = detect_language_with_confidence(text)
    detection_results.append(result)
    
    detected_lang = LANGUAGE_NAMES.get(result['language'], result['language'])
    print(f"\nText: {text}")
    print(f"Expected: {expected_lang}")
    print(f"Detected: {detected_lang.title()} ({result['language']})")
    print(f"Confidence: {result['confidence']:.3f}")
    
    if len(result['all_detections']) > 1:
        alternatives = result['all_detections'][1:3]  # Show top 2 alternatives
        alt_str = ", ".join([f"{lang} ({prob:.2f})" for lang, prob in alternatives])
        print(f"Alternatives: {alt_str}")

# Calculate detection accuracy
correct_detections = 0
total_detections = len(test_texts)

expected_codes = {'english': 'en', 'french': 'fr', 'spanish': 'es', 'german': 'de'}
for i, (_, expected_lang) in enumerate(test_texts):
    expected_code = expected_codes.get(expected_lang.lower(), 'unknown')
    detected_code = detection_results[i]['language']
    
    if expected_code == detected_code:
        correct_detections += 1

accuracy = correct_detections / total_detections
print(f"\nLanguage Detection Accuracy: {accuracy:.1%} ({correct_detections}/{total_detections})")

## Transformer-Based Translation

Using pre-trained transformer models for high-quality translation.

In [None]:
# Initialize translation pipelines
translation_pipelines = {}

def load_translation_pipeline(source_lang, target_lang):
    """Load translation pipeline for specific language pair"""
    model_name_map = {
        ('en', 'es'): 'Helsinki-NLP/opus-mt-en-es',
        ('en', 'fr'): 'Helsinki-NLP/opus-mt-en-fr', 
        ('en', 'de'): 'Helsinki-NLP/opus-mt-en-de',
        ('es', 'en'): 'Helsinki-NLP/opus-mt-es-en',
        ('fr', 'en'): 'Helsinki-NLP/opus-mt-fr-en',
        ('de', 'en'): 'Helsinki-NLP/opus-mt-de-en'
    }
    
    pair = (source_lang, target_lang)
    if pair in model_name_map:
        try:
            model_name = model_name_map[pair]
            pipeline_obj = pipeline('translation', model=model_name)
            return pipeline_obj
        except Exception as e:
            print(f"Could not load {pair} model: {e}")
            return None
    
    # Try generic multilingual model
    try:
        pipeline_obj = pipeline('translation', 
                               model='facebook/m2m100_418M', 
                               src_lang=source_lang, 
                               tgt_lang=target_lang)
        return pipeline_obj
    except Exception as e:
        print(f"Could not load multilingual model for {pair}: {e}")
        return None

# Load some common translation pipelines
common_pairs = [('en', 'es'), ('en', 'fr'), ('es', 'en'), ('fr', 'en')]

print("Loading Translation Models:")
print("=" * 40)

for source, target in common_pairs:
    print(f"\nLoading {source} -> {target} model...")
    pipeline_obj = load_translation_pipeline(source, target)
    if pipeline_obj:
        translation_pipelines[(source, target)] = pipeline_obj
        print(f"✓ Successfully loaded {source} -> {target}")
    else:
        print(f"✗ Failed to load {source} -> {target}")

print(f"\nLoaded {len(translation_pipelines)} translation models")

def translate_text(text, source_lang, target_lang, max_length=512):
    """Translate text using available models"""
    pair = (source_lang, target_lang)
    
    if pair in translation_pipelines:
        try:
            result = translation_pipelines[pair](text, max_length=max_length)
            if isinstance(result, list) and len(result) > 0:
                return result[0]['translation_text']
            return str(result)
        except Exception as e:
            return f"Translation error: {e}"
    else:
        return f"No model available for {source_lang} -> {target_lang}"

# Test translation
if translation_pipelines:
    print("\nTranslation Examples:")
    print("=" * 30)
    
    test_translations = [
        ("Hello, how are you today?", "en", "es"),
        ("The weather is beautiful.", "en", "fr"),
        ("Hola, ¿cómo estás?", "es", "en"),
        ("Bonjour, comment allez-vous?", "fr", "en")
    ]
    
    for text, src_lang, tgt_lang in test_translations:
        if (src_lang, tgt_lang) in translation_pipelines:
            translation = translate_text(text, src_lang, tgt_lang)
            src_name = LANGUAGE_NAMES.get(src_lang, src_lang)
            tgt_name = LANGUAGE_NAMES.get(tgt_lang, tgt_lang)
            
            print(f"\n{src_name.title()} -> {tgt_name.title()}:")
            print(f"  Original: {text}")
            print(f"  Translation: {translation}")
else:
    print("No translation models available")

## Multilingual Translation Models

Using models that can translate between multiple language pairs.

In [None]:
class MultilingualTranslator:
    def __init__(self):
        self.pipeline = None
        self.supported_languages = set()
        self.load_model()
    
    def load_model(self):
        """Load multilingual translation model"""
        try:
            # Try to load M2M100 model (many-to-many multilingual)
            self.pipeline = pipeline('translation', model='facebook/m2m100_418M')
            
            # M2M100 supported languages (subset)
            self.supported_languages = {
                'en', 'es', 'fr', 'de', 'it', 'pt', 'ru', 'zh', 'ja', 'ar', 'hi', 'ko'
            }
            
            print("Loaded M2M100 multilingual model")
            
        except Exception as e:
            print(f"Could not load multilingual model: {e}")
            print("Using fallback approach...")
            
            # Fallback: use available bilateral models
            self.supported_languages = {'en', 'es', 'fr', 'de'}
    
    def translate(self, text, source_lang, target_lang):
        """Translate text between any supported language pair"""
        if source_lang not in self.supported_languages:
            return f"Source language '{source_lang}' not supported"
        
        if target_lang not in self.supported_languages:
            return f"Target language '{target_lang}' not supported"
        
        if source_lang == target_lang:
            return text
        
        if self.pipeline:
            try:
                # For M2M100, we need to set source and target languages
                result = self.pipeline(text, src_lang=source_lang, tgt_lang=target_lang)
                if isinstance(result, list) and len(result) > 0:
                    return result[0]['translation_text']
                return str(result)
            except Exception as e:
                return f"Translation error: {e}"
        else:
            # Fallback to bilateral models
            return translate_text(text, source_lang, target_lang)
    
    def get_supported_languages(self):
        """Get list of supported languages"""
        return sorted(list(self.supported_languages))
    
    def translate_to_multiple_languages(self, text, source_lang, target_langs):
        """Translate text to multiple target languages"""
        results = {}
        
        for target_lang in target_langs:
            if target_lang != source_lang:
                translation = self.translate(text, source_lang, target_lang)
                results[target_lang] = translation
        
        return results

# Initialize multilingual translator
multilingual_translator = MultilingualTranslator()

print(f"\nSupported languages: {multilingual_translator.get_supported_languages()}")

# Test multilingual translation
print("\nMultilingual Translation Examples:")
print("=" * 40)

test_text = "Hello, welcome to our restaurant!"
source_language = "en"
target_languages = ["es", "fr", "de"]

print(f"\nOriginal ({source_language}): {test_text}")
print("\nTranslations:")

translations = multilingual_translator.translate_to_multiple_languages(
    test_text, source_language, target_languages
)

for lang, translation in translations.items():
    lang_name = LANGUAGE_NAMES.get(lang, lang)
    print(f"  {lang_name.title()} ({lang}): {translation}")

# Test round-trip translation
print("\nRound-trip Translation Test:")
print("-" * 30)

original_text = "Machine learning is transforming technology."
intermediate_lang = "es"
final_lang = "en"

# Translate EN -> ES -> EN
step1 = multilingual_translator.translate(original_text, "en", intermediate_lang)
step2 = multilingual_translator.translate(step1, intermediate_lang, final_lang)

print(f"Original (EN): {original_text}")
print(f"Intermediate (ES): {step1}")
print(f"Back to English: {step2}")

# Calculate similarity (simple word overlap)
original_words = set(original_text.lower().split())
back_translated_words = set(step2.lower().split())
overlap = len(original_words.intersection(back_translated_words))
similarity = overlap / len(original_words) if original_words else 0

print(f"Word overlap similarity: {similarity:.2%}")

## Translation Quality Evaluation

Methods to evaluate translation quality using automatic metrics.

In [None]:
def calculate_bleu_score(reference, candidate):
    """Calculate BLEU score for translation quality"""
    try:
        # BLEU expects list of references and single candidate
        bleu_score = sacrebleu.sentence_bleu(candidate, [reference])
        return bleu_score.score
    except Exception as e:
        print(f"BLEU calculation error: {e}")
        return 0.0

def calculate_simple_metrics(reference, candidate):
    """Calculate simple translation metrics"""
    # Tokenize
    ref_words = reference.lower().split()
    cand_words = candidate.lower().split()
    
    # Length ratio
    length_ratio = len(cand_words) / len(ref_words) if ref_words else 0
    
    # Word overlap (precision and recall)
    ref_set = set(ref_words)
    cand_set = set(cand_words)
    overlap = ref_set.intersection(cand_set)
    
    precision = len(overlap) / len(cand_set) if cand_set else 0
    recall = len(overlap) / len(ref_set) if ref_set else 0
    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
    
    return {
        'length_ratio': length_ratio,
        'word_precision': precision,
        'word_recall': recall,
        'word_f1': f1
    }

def evaluate_translation_quality(test_cases):
    """Evaluate translation quality on multiple test cases"""
    results = []
    
    for case in test_cases:
        source_text = case['source']
        reference = case['reference']
        candidate = case['candidate']
        
        # Calculate metrics
        bleu = calculate_bleu_score(reference, candidate)
        simple_metrics = calculate_simple_metrics(reference, candidate)
        
        result = {
            'source': source_text,
            'reference': reference,
            'candidate': candidate,
            'bleu_score': bleu,
            **simple_metrics
        }
        
        results.append(result)
    
    return results

# Create test cases for evaluation
test_cases = [
    {
        'source': 'Hello, how are you?',
        'reference': 'Hola, ¿cómo estás?',
        'candidate': multilingual_translator.translate('Hello, how are you?', 'en', 'es')
    },
    {
        'source': 'The weather is nice today.',
        'reference': 'El clima está agradable hoy.',
        'candidate': multilingual_translator.translate('The weather is nice today.', 'en', 'es')
    },
    {
        'source': 'I love reading books.',
        'reference': 'Me encanta leer libros.',
        'candidate': multilingual_translator.translate('I love reading books.', 'en', 'es')
    }
]

# Evaluate translations
evaluation_results = evaluate_translation_quality(test_cases)

print("Translation Quality Evaluation:")
print("=" * 50)

total_bleu = 0
total_f1 = 0
valid_results = 0

for i, result in enumerate(evaluation_results, 1):
    print(f"\nTest Case {i}:")
    print(f"  Source: {result['source']}")
    print(f"  Reference: {result['reference']}")
    print(f"  Translation: {result['candidate']}")
    
    if not result['candidate'].startswith('Translation error'):
        print(f"  BLEU Score: {result['bleu_score']:.2f}")
        print(f"  Word F1: {result['word_f1']:.3f}")
        print(f"  Length Ratio: {result['length_ratio']:.2f}")
        
        total_bleu += result['bleu_score']
        total_f1 += result['word_f1']
        valid_results += 1
    else:
        print(f"  Error: {result['candidate']}")

if valid_results > 0:
    avg_bleu = total_bleu / valid_results
    avg_f1 = total_f1 / valid_results
    
    print(f"\nOverall Performance:")
    print(f"  Average BLEU Score: {avg_bleu:.2f}")
    print(f"  Average Word F1: {avg_f1:.3f}")
    print(f"  Valid Translations: {valid_results}/{len(test_cases)}")

# Visualize evaluation results
if valid_results > 0:
    valid_results = [r for r in evaluation_results if not r['candidate'].startswith('Translation error')]
    
    if len(valid_results) >= 2:
        metrics = ['bleu_score', 'word_f1', 'word_precision', 'word_recall']
        
        fig, axes = plt.subplots(2, 2, figsize=(12, 8))
        axes = axes.ravel()
        
        for i, metric in enumerate(metrics):
            values = [r[metric] for r in valid_results]
            test_names = [f"Test {j+1}" for j in range(len(valid_results))]
            
            axes[i].bar(test_names, values)
            axes[i].set_title(metric.replace('_', ' ').title())
            axes[i].set_ylim(0, 1)
        
        plt.tight_layout()
        plt.show()

## Translation Challenges and Limitations

Exploring common challenges in machine translation.

In [None]:
# Examples of challenging translation scenarios
challenging_examples = {
    'idioms': {
        'source': "It's raining cats and dogs outside.",
        'challenge': "Idiomatic expressions don't translate literally",
        'expected_meaning': "It's raining heavily outside."
    },
    'context_dependent': {
        'source': "The bank is closed.",
        'challenge': "'Bank' could mean financial institution or river bank",
        'expected_meaning': "Context determines which meaning is intended"
    },
    'cultural_references': {
        'source': "He's a real Einstein.",
        'challenge': "Cultural references may not translate across cultures",
        'expected_meaning': "He's very intelligent."
    },
    'wordplay': {
        'source': "Time flies like an arrow; fruit flies like a banana.",
        'challenge': "Puns and wordplay are difficult to preserve",
        'expected_meaning': "Humorous play on the word 'flies'"
    },
    'formal_informal': {
        'source': "What's up, dude?",
        'challenge': "Register and formality levels vary across languages",
        'expected_meaning': "Informal greeting between friends"
    }
}

print("Translation Challenges Analysis:")
print("=" * 50)

for challenge_type, example in challenging_examples.items():
    source_text = example['source']
    
    print(f"\n{challenge_type.replace('_', ' ').title()}:")
    print(f"  Original: {source_text}")
    print(f"  Challenge: {example['challenge']}")
    
    # Try translating to Spanish and back
    translation_es = multilingual_translator.translate(source_text, 'en', 'es')
    back_translation = multilingual_translator.translate(translation_es, 'es', 'en')
    
    if not translation_es.startswith('Translation error'):
        print(f"  Spanish: {translation_es}")
        if not back_translation.startswith('Translation error'):
            print(f"  Back to English: {back_translation}")
            
            # Check if meaning is preserved
            meaning_preserved = "partially" if source_text.lower() in back_translation.lower() else "unclear"
            print(f"  Meaning preservation: {meaning_preserved}")
    else:
        print(f"  Translation failed: {translation_es}")

# Domain-specific translation challenges
print("\n\nDomain-Specific Challenges:")
print("=" * 40)

domain_examples = {
    'Technical': "The API endpoint returns a JSON response with nested objects.",
    'Medical': "The patient presents with acute myocardial infarction symptoms.",
    'Legal': "The party hereby agrees to the terms and conditions stipulated herein.",
    'Literary': "The crimson sunset painted the sky with ethereal beauty."
}

for domain, text in domain_examples.items():
    translation = multilingual_translator.translate(text, 'en', 'fr')
    
    print(f"\n{domain}:")
    print(f"  English: {text}")
    if not translation.startswith('Translation error'):
        print(f"  French: {translation}")
    else:
        print(f"  Translation issue: {translation}")

# Translation quality factors
quality_factors = {
    'Fluency': 'How natural and grammatically correct is the translation?',
    'Adequacy': 'How much of the source meaning is preserved?',
    'Consistency': 'Are similar phrases translated consistently?',
    'Style': 'Is the appropriate register and style maintained?',
    'Terminology': 'Are domain-specific terms translated correctly?'
}

print("\n\nTranslation Quality Factors:")
print("=" * 35)
for factor, description in quality_factors.items():
    print(f"\n{factor}: {description}")

## Real-World Applications

In [None]:
# Application 1: Document Translation Service
class DocumentTranslationService:
    def __init__(self, translator):
        self.translator = translator
        self.supported_formats = ['txt', 'simple_html']
    
    def translate_document(self, content, source_lang, target_lang, format_type='txt'):
        """Translate document content while preserving structure"""
        if format_type == 'txt':
            return self._translate_plain_text(content, source_lang, target_lang)
        elif format_type == 'simple_html':
            return self._translate_html(content, source_lang, target_lang)
        else:
            return "Unsupported format"
    
    def _translate_plain_text(self, content, source_lang, target_lang):
        """Translate plain text paragraph by paragraph"""
        paragraphs = content.split('\n\n')
        translated_paragraphs = []
        
        for paragraph in paragraphs:
            if paragraph.strip():
                translated = self.translator.translate(paragraph.strip(), source_lang, target_lang)
                translated_paragraphs.append(translated)
            else:
                translated_paragraphs.append('')
        
        return '\n\n'.join(translated_paragraphs)
    
    def _translate_html(self, content, source_lang, target_lang):
        """Translate HTML content while preserving tags"""
        # Simple approach: extract text between tags and translate
        import re
        
        def translate_match(match):
            text = match.group(1)
            if text.strip():
                return self.translator.translate(text, source_lang, target_lang)
            return text
        
        # Pattern to match text between HTML tags (simplified)
        pattern = r'>([^<]+)<'
        translated_content = re.sub(pattern, lambda m: f'>{translate_match(m)}<', content)
        
        return translated_content
    
    def get_translation_summary(self, original_content, translated_content):
        """Generate summary of translation"""
        original_words = len(original_content.split())
        translated_words = len(translated_content.split())
        
        return {
            'original_word_count': original_words,
            'translated_word_count': translated_words,
            'length_ratio': translated_words / original_words if original_words > 0 else 0,
            'character_count_original': len(original_content),
            'character_count_translated': len(translated_content)
        }

# Application 2: Real-time Chat Translation
class ChatTranslationBot:
    def __init__(self, translator):
        self.translator = translator
        self.conversation_history = []
        self.user_languages = {}
    
    def add_user_language(self, user_id, language):
        """Set preferred language for a user"""
        self.user_languages[user_id] = language
    
    def process_message(self, message, sender_id, sender_lang=None):
        """Process and translate message for all participants"""
        # Detect language if not provided
        if not sender_lang:
            detection = detect_language_with_confidence(message)
            sender_lang = detection['language']
        
        # Store original message
        conversation_entry = {
            'sender_id': sender_id,
            'original_message': message,
            'original_language': sender_lang,
            'translations': {}
        }
        
        # Translate for each user
        for user_id, user_lang in self.user_languages.items():
            if user_id != sender_id and user_lang != sender_lang:
                translation = self.translator.translate(message, sender_lang, user_lang)
                conversation_entry['translations'][user_id] = {
                    'language': user_lang,
                    'text': translation
                }
        
        self.conversation_history.append(conversation_entry)
        return conversation_entry
    
    def get_message_for_user(self, message_entry, user_id):
        """Get appropriate message version for specific user"""
        if user_id in message_entry['translations']:
            translation_info = message_entry['translations'][user_id]
            return f"[{translation_info['language']}] {translation_info['text']}"
        else:
            return f"[{message_entry['original_language']}] {message_entry['original_message']}"

# Application 3: Website Localization Helper
def website_localization_demo(content, target_languages):
    """Demo of website content localization"""
    
    website_content = {
        'title': 'Welcome to Our Website',
        'navigation': ['Home', 'About', 'Services', 'Contact'],
        'main_text': 'We provide innovative solutions for your business needs.',
        'call_to_action': 'Get started today!',
        'footer': 'Copyright 2024. All rights reserved.'
    }
    
    localized_content = {}
    
    for lang in target_languages:
        localized_content[lang] = {}
        
        for key, value in website_content.items():
            if isinstance(value, list):
                # Translate list items
                translated_list = []
                for item in value:
                    translation = multilingual_translator.translate(item, 'en', lang)
                    translated_list.append(translation)
                localized_content[lang][key] = translated_list
            else:
                # Translate string
                translation = multilingual_translator.translate(value, 'en', lang)
                localized_content[lang][key] = translation
    
    return localized_content

# Test applications
print("Real-World Translation Applications:")
print("=" * 50)

# Test document translation
print("\n1. Document Translation Service:")
doc_service = DocumentTranslationService(multilingual_translator)

sample_document = """
Introduction

This document contains important information about our services. We offer comprehensive solutions for businesses of all sizes.

Our team has extensive experience in the industry. We are committed to providing excellent customer service.
"""

translated_doc = doc_service.translate_document(sample_document, 'en', 'es')
translation_summary = doc_service.get_translation_summary(sample_document, translated_doc)

print(f"\nOriginal Document (first 100 chars): {sample_document[:100]}...")
print(f"Translated Document (first 100 chars): {translated_doc[:100]}...")
print(f"\nTranslation Summary:")
for key, value in translation_summary.items():
    print(f"  {key}: {value:.2f}" if isinstance(value, float) else f"  {key}: {value}")

# Test chat translation
print("\n2. Chat Translation Bot:")
chat_bot = ChatTranslationBot(multilingual_translator)

# Set up users with different languages
chat_bot.add_user_language('user1', 'en')
chat_bot.add_user_language('user2', 'es')
chat_bot.add_user_language('user3', 'fr')

# Simulate conversation
messages = [
    ('user1', 'Hello everyone!', 'en'),
    ('user2', 'Hola, ¿cómo están?', 'es'),
    ('user3', 'Bonjour! Comment allez-vous?', 'fr')
]

print("\nMultilingual Chat Simulation:")
for sender, message, lang in messages:
    entry = chat_bot.process_message(message, sender, lang)
    
    print(f"\n{sender} ({lang}): {message}")
    
    # Show how each user sees the message
    for user_id in chat_bot.user_languages:
        if user_id != sender:
            user_view = chat_bot.get_message_for_user(entry, user_id)
            print(f"  {user_id} sees: {user_view}")

# Test website localization
print("\n3. Website Localization:")
target_langs = ['es', 'fr']
localized_site = website_localization_demo(None, target_langs)

for lang, content in localized_site.items():
    lang_name = LANGUAGE_NAMES.get(lang, lang)
    print(f"\n{lang_name.title()} ({lang}):")
    print(f"  Title: {content['title']}")
    print(f"  Navigation: {content['navigation']}")
    print(f"  Main Text: {content['main_text']}")

## Exercises

1. **Custom Translation Pipeline**: Build a pipeline that handles specific document formats
2. **Translation Memory**: Implement a system that reuses previous translations
3. **Quality Estimation**: Create a model to predict translation quality without references
4. **Domain Adaptation**: Fine-tune models for specific domains (medical, legal, etc.)

## Key Takeaways

- **Neural MT is dominant**: Transformer-based models provide state-of-the-art quality
- **Context matters**: Longer context generally improves translation quality
- **Language detection is crucial**: Accurate source language identification improves results
- **Evaluation is challenging**: Automatic metrics don't capture all aspects of quality
- **Domain specialization helps**: Models perform better on familiar text types

## Best Practices

1. **Preprocess carefully**: Clean and normalize text before translation
2. **Use appropriate models**: Match model capabilities to your language pairs
3. **Consider context**: Provide sufficient context for ambiguous terms
4. **Evaluate thoroughly**: Use both automatic metrics and human evaluation
5. **Handle errors gracefully**: Plan for translation failures and edge cases

## Common Challenges

- **Idioms and cultural references**: Don't translate literally across cultures
- **Technical terminology**: May require domain-specific models or dictionaries
- **Ambiguous words**: Context is crucial for correct translation
- **Formality levels**: Different languages express formality differently
- **Word order differences**: Some language pairs require significant restructuring

## Applications

- **Document translation**: Legal, technical, and business documents
- **Website localization**: Multilingual websites and applications
- **Real-time communication**: Chat applications and video calls
- **Content creation**: Multilingual marketing and educational materials
- **Accessibility**: Making content available in multiple languages

## Next Steps

- Learn about fine-tuning translation models for specific domains
- Explore multilingual and zero-shot translation approaches
- Study translation evaluation metrics and human evaluation
- Practice with real-world translation datasets
- Learn about computer-aided translation (CAT) tools

## Resources

- [Hugging Face Translation Models](https://huggingface.co/models?pipeline_tag=translation)
- [OPUS-MT Models](https://github.com/Helsinki-NLP/Opus-MT)
- [M2M-100 Paper](https://arxiv.org/abs/2010.11125)
- [BLEU Score](https://en.wikipedia.org/wiki/BLEU)
- [WMT Translation Shared Tasks](http://www.statmt.org/wmt21/)
- [Google Translate API](https://cloud.google.com/translate)
- [Microsoft Translator](https://www.microsoft.com/en-us/translator/)