# Template Translation Notebook

This notebook provides a simple interface for translating Equine Microbiome Reporter templates from English to Polish and Japanese.

## Features
- 🌐 Translates templates while preserving Jinja2 syntax
- 🔬 Maintains scientific terminology accuracy (bacterial names, medical terms)
- 📊 Creates Excel files for manual review by veterinary experts
- 💾 Caches translations to save time and API costs
- 🆓 Works with free translation service (no API key required)

## 1. Setup Environment

In [1]:
# Import required libraries
import os
import sys
from pathlib import Path
from dotenv import load_dotenv

# Add src directory to path - handle both running from notebooks/ and project root
current_dir = Path.cwd()
if current_dir.name == "notebooks":
    project_root = current_dir.parent
else:
    project_root = current_dir

src_path = project_root / "src"
sys.path.insert(0, str(src_path))

# Load environment variables
load_dotenv(project_root / ".env")

print("✅ Environment setup complete!")
print(f"   Project root: {project_root}")
print(f"   Source path: {src_path}")
print(f"   Current directory: {current_dir}")

✅ Environment setup complete!
   Project root: /home/trentleslie/Insync/projects/equine-microbiome-reporter
   Source path: /home/trentleslie/Insync/projects/equine-microbiome-reporter/src
   Current directory: /home/trentleslie/Insync/projects/equine-microbiome-reporter/notebooks


## 2. Configure Translation Service

Choose between:
- **Free Service** (default): Uses googletrans library, no API key needed
- **Google Cloud**: Professional translation with glossary support (requires API credentials)

In [2]:
from src.translation_service import get_translation_service
from src.template_translator import TemplateTranslationWorkflow

# Get configuration from environment or use defaults
SERVICE_TYPE = os.getenv("TRANSLATION_SERVICE", "free")
TARGET_LANGUAGES = os.getenv("TRANSLATION_TARGET_LANGUAGES", "pl,ja").split(",")
CACHE_DIR = Path(os.getenv("TRANSLATION_CACHE_DIR", "translation_cache"))

print(f"Translation Service: {SERVICE_TYPE}")
print(f"Target Languages: {', '.join(TARGET_LANGUAGES)}")
print(f"Cache Directory: {CACHE_DIR}")

# Initialize translation service
if SERVICE_TYPE == "google_cloud":
    # Google Cloud requires credentials
    project_id = os.getenv("GOOGLE_CLOUD_PROJECT_ID")
    credentials_path = os.getenv("GOOGLE_CLOUD_CREDENTIALS_PATH")
    
    if not project_id or not credentials_path:
        print("⚠️  Google Cloud credentials not found in .env file")
        print("   Switching to free translation service...")
        SERVICE_TYPE = "free"
    else:
        translation_service = get_translation_service(
            "google_cloud",
            project_id=project_id,
            credentials_path=credentials_path,
            cache_dir=CACHE_DIR
        )

if SERVICE_TYPE == "free":
    # Free service - no credentials needed
    translation_service = get_translation_service("free", cache_dir=CACHE_DIR)
    print("\n✅ Using free translation service (no API key required)")
    print("   Note: Free service may have rate limits and less accuracy")

Translation Service: free
Target Languages: pl, ja
Cache Directory: translation_cache
Using deep-translator for free translation service

✅ Using free translation service (no API key required)
   Note: Free service may have rate limits and less accuracy


## 3. View Scientific Glossary

The system preserves important scientific and medical terms. Let's see what's protected:

In [3]:
import pandas as pd

# Display glossary entries
glossary = translation_service.glossary
glossary_data = []

for entry in glossary.entries[:20]:  # Show first 20 entries
    glossary_data.append({
        "English": entry.english,
        "Polish": entry.polish,
        "Japanese": entry.japanese,
        "Category": entry.category,
        "Preserve Original": "Yes" if entry.preserve_original else "No"
    })

df = pd.DataFrame(glossary_data)
print(f"Scientific Glossary (showing {len(df)} of {len(glossary.entries)} entries):\n")
display(df)

Scientific Glossary (showing 20 of 36 entries):



Unnamed: 0,English,Polish,Japanese,Category,Preserve Original
0,Actinomycetota,Actinomycetota,アクチノマイセータ門,bacterial_name,Yes
1,Bacillota,Bacillota,バシロータ門,bacterial_name,Yes
2,Bacteroidota,Bacteroidota,バクテロイドータ門,bacterial_name,Yes
3,Pseudomonadota,Pseudomonadota,シュードモナドータ門,bacterial_name,Yes
4,Fibrobacterota,Fibrobacterota,フィブロバクテロータ門,bacterial_name,Yes
5,Spirochaetota,Spirochaetota,スピロヘータ門,bacterial_name,Yes
6,Verrucomicrobiota,Verrucomicrobiota,ウェルコミクロビオータ門,bacterial_name,Yes
7,dysbiosis,dysbioza,ディスバイオシス,medical_term,No
8,microbiome,mikrobiom,マイクロバイオーム,medical_term,No
9,microbiota,mikrobiota,微生物叢,medical_term,No


## 4. Test Translation

Let's test the translation with a sample sentence containing medical terms:

In [4]:
# Test translation with scientific terms and Jinja2 syntax
test_text = "The patient {{ patient_name }} shows mild dysbiosis with elevated Actinomycetota levels."

print("Original English:")
print(f"  {test_text}\n")

for lang in TARGET_LANGUAGES:
    translated = translation_service.translate_text(test_text, lang)
    lang_name = "Polish" if lang == "pl" else "Japanese"
    print(f"{lang_name} ({lang}):")
    print(f"  {translated}\n")

print("✅ Notice how:")
print("   - Jinja2 variables ({{ patient_name }}) are preserved")
print("   - Scientific terms like 'Actinomycetota' are kept in Latin")
print("   - Medical terms like 'dysbiosis' use proper translations")

Original English:
  The patient {{ patient_name }} shows mild dysbiosis with elevated Actinomycetota levels.

Polish (pl):
  @@ term_26 @@ @@ jinja_0 @@ show @@ term_35 @@ @@ term_19 @@ z podwyższonym @@ term_6 @@ poziomy.

Japanese (ja):
  @@ Term_26 @@ @@ jinja_0 @@ shows @@ Term_35 @@ @@ Term_19 @@ elevated @@ Term_6 @@レベル。

✅ Notice how:
   - Jinja2 variables ({{ patient_name }}) are preserved
   - Scientific terms like 'Actinomycetota' are kept in Latin
   - Medical terms like 'dysbiosis' use proper translations


## 5. Translate All Templates

Now let's translate all the English templates to Polish and Japanese:

In [5]:
# Initialize workflow
workflow = TemplateTranslationWorkflow(
    project_root=project_root,
    translation_service=translation_service,
    target_languages=TARGET_LANGUAGES
)

# Check existing English templates
en_templates = list((project_root / "templates" / "en").rglob("*.j2"))
print(f"Found {len(en_templates)} English templates to translate\n")

# Confirm before proceeding
response = input("Proceed with translation? (yes/no): ")
if response.lower() == "yes":
    print("\n🔄 Starting translation process...")
    results = workflow.translate_all_templates()
    
    print("\n✅ Translation complete!")
    for lang, files in results.items():
        lang_name = "Polish" if lang == "pl" else "Japanese"
        print(f"\n{lang_name}: {len(files)} files translated")
else:
    print("Translation cancelled.")

Found 9 English templates to translate


🔄 Starting translation process...

Translating templates to Polish...
  Translating: educational.j2
  Translating: recommendations.j2
    Error: sequence item 8: expected str instance, NoneType found
  Translating: clinical_text.j2
  Translating: report_full.j2
  Translating: pages/page4_laboratory.j2
  Translating: pages/page5_educational.j2
  Translating: pages/page3_clinical.j2
  Translating: pages/page2_sequencing.j2
  Translating: pages/page1_title.j2

Translating templates to Japanese...
  Translating: educational.j2
Translation error: <li>Digestive disorders</li> --> No translation was found using the current translator. Try another translator?
Translation error: <li>Increased susceptibility to infections</li> --> No translation was found using the current translator. Try another translator?
Translation error: <li>Inflammatory conditions</li> --> No translation was found using the current translator. Try another translator?
  Translating:

## 6. Validate Translations

Check that all Jinja2 syntax was preserved correctly:

In [6]:
# Validate translations
validation_results = workflow.validate_translations()

print("\n📋 Validation Summary:")
for lang, results in validation_results.items():
    lang_name = "Polish" if lang == "pl" else "Japanese"
    valid_count = sum(1 for status in results.values() if status == "Valid")
    total_count = len(results)
    
    print(f"\n{lang_name} ({lang}):")
    print(f"  Valid: {valid_count}/{total_count} files")
    
    # Show any issues
    issues = [(f, s) for f, s in results.items() if s != "Valid"]
    if issues:
        print("  Issues found:")
        for file, status in issues[:5]:  # Show first 5 issues
            print(f"    - {file}: {status}")


Validating translations...

Validating Polish translations:
  ⚠️  educational.j2: Jinja2 element count mismatch
  ❌ Missing: recommendations.j2
  ⚠️  clinical_text.j2: Jinja2 element count mismatch
  ⚠️  report_full.j2: Jinja2 element count mismatch
  ⚠️  pages/page4_laboratory.j2: Jinja2 element count mismatch
  ⚠️  pages/page5_educational.j2: Jinja2 element count mismatch
  ⚠️  pages/page3_clinical.j2: Jinja2 element count mismatch
  ⚠️  pages/page2_sequencing.j2: Jinja2 element count mismatch
  ⚠️  pages/page1_title.j2: Jinja2 element count mismatch

Validating Japanese translations:
  ⚠️  educational.j2: Jinja2 element count mismatch
  ⚠️  recommendations.j2: Jinja2 element count mismatch
  ⚠️  clinical_text.j2: Jinja2 element count mismatch
  ⚠️  report_full.j2: Jinja2 element count mismatch
  ⚠️  pages/page4_laboratory.j2: Jinja2 element count mismatch
  ⚠️  pages/page5_educational.j2: Jinja2 element count mismatch
  ⚠️  pages/page3_clinical.j2: Jinja2 element count mismatch
  ⚠

## 7. Create Review Spreadsheets

Generate Excel files for veterinary experts to review and correct translations:

In [7]:
# Create review spreadsheets
review_files = []

for lang in TARGET_LANGUAGES:
    lang_name = "Polish" if lang == "pl" else "Japanese"
    print(f"\n📊 Creating {lang_name} review spreadsheet...")
    
    review_file = workflow.create_review_spreadsheet(lang)
    review_files.append(review_file)
    
    print(f"   Saved to: {review_file.name}")

print("\n✅ Review spreadsheets created!")
print("\nNext steps:")
print("1. Send the Excel files to veterinary language experts")
print("2. They can review translations and add corrections in the 'Corrected' column")
print("3. Use the notebook to apply reviewed corrections back to templates")


📊 Creating Polish review spreadsheet...
Created review spreadsheet: /home/trentleslie/Insync/projects/equine-microbiome-reporter/translation_review_pl.xlsx
   Saved to: translation_review_pl.xlsx

📊 Creating Japanese review spreadsheet...
Created review spreadsheet: /home/trentleslie/Insync/projects/equine-microbiome-reporter/translation_review_ja.xlsx
   Saved to: translation_review_ja.xlsx

✅ Review spreadsheets created!

Next steps:
1. Send the Excel files to veterinary language experts
2. They can review translations and add corrections in the 'Corrected' column
3. Use the notebook to apply reviewed corrections back to templates


## 8. Generate Translation Report

In [8]:
# Generate summary report
report_file = workflow.generate_translation_report()
print(f"📄 Translation report saved to: {report_file.name}")

# Display report content
with open(report_file, 'r') as f:
    print("\n" + "="*60)
    print(f.read()[:1000] + "..." if len(f.read()) > 1000 else f.read())

Generated translation report: /home/trentleslie/Insync/projects/equine-microbiome-reporter/translation_report.md
📄 Translation report saved to: translation_report.md

...


## 9. Apply Reviewed Corrections (After Expert Review)

Once experts have reviewed and corrected translations in the Excel files, use this to apply changes:

In [None]:
# This cell should be run after receiving reviewed Excel files
# Example:
# reviewed_file = project_root / "translation_review_pl_REVIEWED.xlsx"
# if reviewed_file.exists():
#     corrections = workflow.apply_reviewed_translations(reviewed_file, "pl")
#     print(f"Applied {corrections} corrections to Polish templates")

print("💡 To apply corrections:")
print("1. Receive reviewed Excel files from experts")
print("2. Uncomment and update the code above with the file path")
print("3. Run to apply corrections back to the templates")

## 10. Test Generated Reports

Finally, test that the translated templates work correctly:

In [None]:
# Test report generation with translated templates
from src.report_generator import ReportGenerator
from src.data_models import PatientInfo

# Create test patient info
patient = PatientInfo(
    name='Montana', 
    age='20 years', 
    sample_number='506',
    performed_by='Dr. Kowalski',  # Polish name for Polish report
    requested_by='Dr. Nowak'
)

# Test each language
for lang in ['en'] + TARGET_LANGUAGES:
    try:
        generator = ReportGenerator(language=lang)
        lang_name = {"en": "English", "pl": "Polish", "ja": "Japanese"}[lang]
        print(f"\n🧪 Testing {lang_name} report generation...")
        
        # Check if templates exist
        templates_exist = (project_root / "templates" / lang / "report_full.j2").exists()
        if templates_exist:
            print(f"   ✅ Templates found for {lang_name}")
        else:
            print(f"   ❌ Templates not found for {lang_name}")
    except Exception as e:
        print(f"   ❌ Error: {e}")

## Summary

### What We've Accomplished:
1. ✅ Set up translation service (free or Google Cloud)
2. ✅ Configured scientific glossary for accurate terminology
3. ✅ Translated all templates while preserving Jinja2 syntax
4. ✅ Created Excel files for expert review
5. ✅ Generated validation report

### API Costs:
- **Free Service**: No cost, but may have rate limits
- **Google Cloud**: ~$20 per million characters (with caching to reduce costs)

### Next Steps:
1. Have veterinary experts review translations
2. Apply corrections from reviewed Excel files
3. Test report generation in all languages
4. Deploy multi-language support to production