Discriminative Language Models as a Tool for Machine Translation Error Analysis
C++ Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
scripts
src
test
AUTHORS
COPYING
ChangeLog
Makefile.am
NEWS
README
README.ja
TODO
configure.ac

README

## This software is purified with GPLv3+ ##

 DLMAnalyzer

  Koichi Akabe <vbkaisetsu@gmail.com>

============================================
 This is a program that lists informative n-grams for MT error analysis
 using structured perceptron.
============================================

Required:
  My development environment is:
    Boost (1.49), gflags (2.0), g++ (4.8.2)

Build:
  $ autoreconf -i
  $ ./configure
  $ make

  Additionaly, you can run "(sudo) make install" to install dlm_train on your computer.

Train discriminative LM and generate the model file:
  $ dlm_train -eta [ETA] -modeldata [MODEL_FILE] -traindata [N-BESTS for training] -testdata [N-BESTS for testing]

Generate the evaluation sheet:
  $ ./scripts/generate_sheet_seed.py [MODEL_FILE] [ONE-BESTS] [NUMBER of n-grams] > SEED
  $ ./scripts/build_analysis_sheet.py [SOURCE] [TARGET REFS] [ORDER MAP] [SEED] > HTML file

============================================

Input data:
List of translation candidates for each sentence with system scores and evaluation scores.

sentence id ||| translation ||| system score ||| evaluation score
............
......
...

sentence id       ...  ID of original sentence
translation       ...  translation candidate for the original sentence
system score      ...  the score given by the translation system
evaluation score  ...  the score given by the evaluation measure

For example, we have three candidates for the 2nd original sentence:

2 ||| 僕 は 少女 を 望遠鏡 で 見 た 。 ||| -54.24256771686 ||| 0.82328
2 ||| 僕 は 望遠鏡 を 持 っ た 少女 を 見 た 。 ||| -54.26887833166 ||| 0.788141
2 ||| 僕 は 少女 を 望遠鏡 で 見 る 。 ||| -54.27542894284 ||| 0.834369

In this case, translation system outputs the 1st sentence as the best translation,
but actually the 3rd translation is the best translation.