# Inglish Translator - Quick Demo

This notebook demonstrates the Inglish Translation framework for technical English to Indian languages.

## What is Inglish?

Inglish is code-mixed translation that:
- **Preserves technical terms** in English (as professionals actually use them)
- **Translates context** into native language for accessibility
- **Produces natural output** reflecting authentic communication patterns

**Example:**
```
English:  "The for loop iterates over the array."
Hinglish: "for loop array ke upar iterate karta hai"
          (फॉर लूप ऐरे के ऊपर iterate करता है)
```

## Setup

In [None]:
import sys
from pathlib import Path

# Add src to path
sys.path.insert(0, str(Path.cwd().parent / "src"))

from pipeline import InglishtranslationPipeline, TranslationConfig
from term_extractor import TermExtractor

print("✓ Imports successful!")

## Example 1: Simple Translation

In [None]:
# Configure the pipeline
config = TranslationConfig(
    domain="programming",
    target_language="hi",  # Hindi
    translator_type="baseline",  # Simple rule-based
    output_format="both"  # Both Roman and Devanagari
)

# Create pipeline
pipeline = InglishtranslationPipeline(config)

# Translate
text = "The for loop iterates over the array of integers."
result = pipeline.translate(text, verbose=True)

print("\n" + "="*60)
print("RESULTS")
print("="*60)
print(f"Original:    {result['original_english']}")
print(f"Roman:       {result['hinglish_roman']}")
print(f"Devanagari:  {result['hinglish_devanagari']}")
print(f"Terms:       {result['metadata']['technical_terms']}")
print("="*60)

## Example 2: Term Extraction

Let's see how technical terms are identified:

In [None]:
# Create term extractor
extractor = TermExtractor("programming")

# Sample texts
samples = [
    "This class has four member variables.",
    "The function returns a boolean value.",
    "Each object has its own instance variables.",
]

for text in samples:
    terms = extractor.extract_terms(text)
    guarded = extractor.guard_terms(text, terms)
    
    print(f"\nOriginal: {text}")
    print(f"Terms:    {[t[0] for t in terms]}")
    print(f"Guarded:  {guarded}")

## Example 3: Batch Translation

Translate multiple sentences at once:

In [None]:
import pandas as pd

# Multiple texts
texts = [
    "The for loop iterates over the array.",
    "This class has four member variables.",
    "The function returns a boolean value.",
    "Each object has its own instance variables.",
    "The while loop continues until the condition is false.",
]

# Translate all
results = pipeline.translate_batch(texts)

# Create DataFrame for display
df = pd.DataFrame([
    {
        "English": r['original_english'],
        "Hinglish (Roman)": r['hinglish_roman'],
        "Terms": len(r['metadata']['technical_terms'])
    }
    for r in results
])

print("\nBatch Translation Results:")
print(df.to_string(index=False))

## Example 4: Quality Evaluation

In [None]:
# Original and reference
english = "The while loop continues until the condition becomes false."
reference = "while loop tab tak continue karta hai jab tak condition false nahi ho jati"

# Translate
result = pipeline.translate(english)
predicted = result['hinglish_roman']

# Evaluate
metrics = pipeline.evaluate_quality(english, predicted, reference)

print("\n" + "="*60)
print("QUALITY EVALUATION")
print("="*60)
print(f"English:    {english}")
print(f"Reference:  {reference}")
print(f"Predicted:  {predicted}")
print(f"\nMetrics:")
print(f"  Terminology Preservation: {metrics['terminology_preservation']*100:.1f}%")
print(f"  Length Ratio:             {metrics['length_ratio']:.2f}")
if 'word_overlap' in metrics:
    print(f"  Word Overlap:             {metrics['word_overlap']*100:.1f}%")
print("="*60)

## Example 5: Visualizing Results

In [None]:
import matplotlib.pyplot as plt

# Analyze term extraction across multiple samples
samples = [
    "The for loop iterates over the array.",
    "This class has four member variables.",
    "The function returns a boolean value.",
    "Inheritance allows code reuse in object-oriented programming.",
    "Arrays store multiple values of the same data type.",
]

results = pipeline.translate_batch(samples)

# Extract statistics
term_counts = [r['metadata']['terms_extracted'] for r in results]
labels = [f"Sample {i+1}" for i in range(len(samples))]

# Plot
plt.figure(figsize=(10, 5))
plt.bar(labels, term_counts, color='steelblue')
plt.xlabel('Sample')
plt.ylabel('Number of Technical Terms Extracted')
plt.title('Term Extraction Analysis')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print(f"\nAverage terms per sample: {sum(term_counts)/len(term_counts):.1f}")

## Example 6: Interactive Translation

Try your own sentences!

In [None]:
# Try your own sentence
your_text = "The array stores multiple integer values in memory."  # Change this!

result = pipeline.translate(your_text)

print("\n" + "="*60)
print("YOUR TRANSLATION")
print("="*60)
print(f"English:     {your_text}")
print(f"Hinglish:    {result['hinglish_roman']}")
print(f"Devanagari:  {result['hinglish_devanagari']}")
print(f"Terms:       {result['metadata']['technical_terms']}")
print("="*60)

## Summary

This demo showed:

1. ✅ **Simple Translation** - Basic English to Hinglish conversion
2. ✅ **Term Extraction** - Automatic identification of technical terms
3. ✅ **Batch Processing** - Translating multiple texts efficiently
4. ✅ **Quality Metrics** - Evaluating translation quality
5. ✅ **Visualization** - Analyzing extraction patterns
6. ✅ **Interactive Use** - Try your own sentences

### Next Steps

- Explore different domains (physics, finance)
- Run comprehensive benchmarks
- Try LLM-based translation (requires API key)
- Create custom glossaries for your domain

See the full documentation at `docs/API_REFERENCE.md`