This is a project for translating one language to another language, with the support of a thid language(s). It involves both the approaches to pivoting and the management techniques for available resources.
The first, and the most center to the project, is
tmtriangulate - a tool for phrase table triangulation
This program handles the triangulation of Moses phrase tables, with 6 different options.
The script requires Python >= 2.7.
The script has not yet been run on Windows.
TmTriangulate merges two phrase tables into one phrase table.
A command example:
./tmtriangulate.py features_based -m pspt -s test/model1 -t test/model1
This command will merge model1 with itself and estimate the feature values based on posterior probabilities.
The basic command line:
./tmtriangulate.py [action] -m [sppt] -s source-phrase-table -t target-phrase-table
Until now, there are two actions, associated with two approaches to estimating values of the source-target phrase table:
features_based: Computing the new probabilities from the component probabilities "Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora" (Cohn et al 2007)
counts_based: Computing the new probabilities by approximating new co-occurrence counts "Improving Pivot-Based Statistical Machine Translation by Pivoting the Co-occurrence Count of Phrase Pairs" (Zhu et al 2014)
Each action is set to default with its best options. Typically, you have to specify a few parameters:
-m): indicates the direction of input phrase tables, i.e. source-pivot or pivot-source.
-co): specifies the scenario to triangulate the co-occurrence counts.
-w): specifies the scenario to combine weights of identical phrase pairs.
source PT (
-s): specifies the source phrase table or its directory with a given structure (dir/model/phrase-table)
target PT (
-t): specifies the target phrase table or its directory with a given structure (dir/model/phrase-table)
For further usage information, run
This project is under development!
Python multi-processing is automatically activated. There is no need for any configuration.
Author: Tam Hoang, Ondřej Bojar
If you have any comments, questions or suggestions, even jokes, feel free to send me an email at
tamhd1990 AT gmail DOT com