A standalone translation utility for multilingual web projects. This tool can be integrated into any project to manage the automatic translation of content using AI-powered translation services.
- Clone this repository into your project:
git clone https://github.com/your-org/translator.git
cd translator
chmod +x bin/*The translator uses three main configuration files, all located in your project root:
| File | Purpose | Default Template |
|---|---|---|
translator.config.yaml |
Project-specific settings | translator/config/translator.config.template.yaml |
translator.models.yaml |
Translation model configuration | translator/config/translator.models.template.yaml |
translator.role.tpl |
Translation role template | translator/config/translator.role.template.tpl |
If any of these files are not found in your project root, the default templates from the translator directory will be used.
- Setup: Clone the translator repository into your project
- Configure: Create the configuration files in your project root
- Run: Execute the translation scripts
The translator will automatically:
- Load project configuration
- Load model definitions
- Find or create translation roles
- Process your content
Create a translator.config.yaml file in your project root directory:
# Project translation configuration
# Source and target directories
source_directory: content/english # Path to your source content
target_directory: content # Parent directory containing all languages
# Translation parameters
translation_chunk_size: 6144 # Maximum size of text chunks for translation
translation_parallel_chunks: 50 # Number of chunks to process in parallel within a file
translation_parallel_files: 1 # Number of files to process in parallel (set to 1 to disable)
# Cache directory for translations (per-document cache structure)
cache_directory: .translation-cache # Directory for per-document cache files
# Role template location (project-specific)
role_template: translator.role.tpl # Template for translation roles
# YAML front matter validation
check_yaml_keys: false # If true, validates that YAML keys contain only valid characters (a-z, A-Z, 0-9, _, ., -)
# Rejects translations that would corrupt YAML structure. Set to false for Hugo sites
# where front matter keys should remain in English.A template file is available at translator/config/translator.config.template.yaml.
Create a translator.models.yaml file in your project root directory to configure the AI models used for translation:
# Translation models configuration
# Models are sorted by token price in ascending order (input/output)
models:
- name: openai:gpt-4o-mini
priority: 1
price_notes: $0.15 / $0.6
- name: claude:claude-3-5-haiku-latest
priority: 2
price_notes: $0.8 / $4
# Add more models as neededA template file is available at translator/config/translator.models.template.yaml.
If either of these configuration files are not found, default values will be used.
The translation role template (translator.role.tpl) defines the instructions given to the AI model for translation tasks. This is a critical component that affects translation quality and accuracy.
- Primary location:
<project_root>/translator.role.tpl - Default template:
translator/config/translator.role.template.tpl
Create a translator.role.tpl file in your project root with instructions for the translation model. Here's a template:
You are a professional translator to $LANGUAGE language. You are specialized in translating markdown files with precise line-by-line translation. Your task is to:
1. Translate the entire document provided
2. Preserve the original document's structure exactly: all original formatting, spacing, empty lines and special characters
3. DO NOT translate:
- comment blocks in <!-- ... -->
- Any code blocks
- File paths
- Variables
- HTML tags
4. Do not ask questions or request continuations
5. ENSURE each line in the original corresponds to the same line in the translated version, even EMPTY line follows EMPTY line, very important to make translation LINE perfect same as original
6. Do not translate lines with the following strings: CODE_BLOCK_0 where 0 can be any number, this is a special string that indicates the start of a code block
Translate the following document exactly as instructed. Reply with just the translation WITHOUT adding anything additional from your side.The template uses the following environment variables for substitution:
$LANGUAGE: Replaced with the target language name during role creation
You can customize the role template to provide more specific instructions for your content types. For example:
- Add specific terminology to maintain across translations
- Include special instructions for domain-specific content
- Define rules for handling particular content elements
During translation:
- The translator checks if the role already exists in the aichat roles directory
- If not, it creates the role using your template or the default template
- When translating, files are sent to the AI model with the appropriate role instruction
The role files are created in the aichat roles directory as translate-to-<language>.md.
To automatically translate all content:
./translator/bin/auto-translate [project_directory]If project_directory is not specified, the current directory will be used as the project directory.
The auto-translate script will:
- Detect which files need translation by comparing line counts and checking the cache
- Use cached translations when available to avoid retranslating unchanged content
- Preserve code blocks and HTML comments exactly as they appear
- Maintain the exact line structure between source and translation
- Automatically clean up deleted source files from translations
The translator tool requires the following dependencies:
aichat- CLI tool for interacting with AI modelsjq- Command-line JSON processoryq- Command-line YAML processorenvsubst- Substitutes environment variables in shell format strings
To add a new language, create a directory with the language name inside your target_directory. The auto-translate tool will automatically detect and translate content for all language directories found.
The translator expects the following directory structure:
project/
├── content/ # target_directory
│ ├── english/ # source_directory
│ │ ├── file1.md
│ │ └── folder/
│ │ └── file2.md
│ ├── spanish/ # target language
│ └── french/ # target language
└── translator/ # translator directory
├── bin/
└── config/
The translation process works as follows:
- Scan for markdown files in the source directory
- Check if translation is needed by:
- Comparing line counts between source and target files
- Checking if all content chunks are already in the cache
- Extract code blocks and comments - These are preserved exactly as-is and cached separately
- Chunk the content - Split the document into manageable chunks (default: 6144 bytes)
- Check cache first - For each chunk, check if a translation already exists in the cache
- Translate missing chunks - Only translate chunks that aren't in the cache, using AI models in priority order
- Preserve structure - Ensure the translated file has the exact same line structure as the source (same line numbers for code blocks, comments, and empty lines)
- Update cache - Store all translated chunks in the cache for future use
- Clean up - Remove translation files for deleted source files
The translator uses a cache-based approach to optimize performance:
- Cache directory: Defined by
cache_directoryin config (default:.translation-cache) - Per-document cache: Each source document has its own cache file, following the same directory structure as the source files
- Uncompressed storage: Cache files are stored as plain JSON files (not compressed) for easy inspection and debugging
- Block-level caching: Each content block is hashed and cached independently
- Code block preservation: Code blocks and HTML comments are cached with their original content (not translated)
- Incremental updates: Only new or changed chunks are translated, existing cached translations are reused
- Multi-language support: The cache stores translations for each language separately
Cache Structure Example:
project/
├── content/
│ └── english/
│ ├── docs/
│ │ └── guide.md
│ └── api.md
└── .translation-cache/
├── docs/
│ └── guide.md.json
└── api.md.json
This approach ensures:
- Efficiency: Unchanged content is never retranslated
- Consistency: Code blocks and comments are always preserved exactly
- Speed: Large documents with small changes translate quickly
- Cost savings: Reduces API calls to translation services
- Maintainability: Per-document cache files are easy to inspect, debug, and manage
A comprehensive test suite is available to verify the translation system:
./run-all-tests.shThis script tests:
- New document translation
- Line changes (single and multiple)
- Empty line handling (addition, removal, at various positions)
- Whitespace-only line preservation
- HTML comment preservation
- Code block preservation
- File deletion handling
- Cache reuse
- Line structure matching (line counts, code block positions, comment positions, empty line positions)
- Translation failures: Check the output for specific error messages. The tool will attempt multiple models based on priority until a good translation is found.
- API keys: Ensure your AI service API keys are properly configured for the
aichattool. - Line count mismatches: The tool automatically retries with different models if line counts don't match. Check that your role template emphasizes line-by-line preservation.
- Cache issues: If translations seem stale, you can delete the cache directory (
.translation-cache) or specific cache files to force a full retranslation. Cache files are stored as plain JSON for easy inspection. - Structure preservation: The system validates that code blocks, HTML comments, and empty lines appear on the same line numbers in source and translation. If this fails, the translation is retried.
This project is licensed under the MIT License - see the LICENSE file for details.