Skip to content

Discriminative Language Models as a Tool for Machine Translation Error Analysis

License

Notifications You must be signed in to change notification settings

vbkaisetsu/dlm-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

## This software is purified with GPLv3+ ##

 DLMAnalyzer

  Koichi Akabe <vbkaisetsu@gmail.com>

============================================
 This is a program that lists informative n-grams for MT error analysis
 using structured perceptron.
============================================

Required:
  My development environment is:
    Boost (1.49), gflags (2.0), g++ (4.8.2)

Build:
  $ autoreconf -i
  $ ./configure
  $ make

  Additionaly, you can run "(sudo) make install" to install dlm_train on your computer.

Train discriminative LM and generate the model file:
  $ dlm_train -eta [ETA] -modeldata [MODEL_FILE] -traindata [N-BESTS for training] -testdata [N-BESTS for testing]

Generate the evaluation sheet:
  $ ./scripts/generate_sheet_seed.py [MODEL_FILE] [ONE-BESTS] [NUMBER of n-grams] > SEED
  $ ./scripts/build_analysis_sheet.py [SOURCE] [TARGET REFS] [ORDER MAP] [SEED] > HTML file

============================================

Input data:
List of translation candidates for each sentence with system scores and evaluation scores.

sentence id ||| translation ||| system score ||| evaluation score
............
......
...

sentence id       ...  ID of original sentence
translation       ...  translation candidate for the original sentence
system score      ...  the score given by the translation system
evaluation score  ...  the score given by the evaluation measure

For example, we have three candidates for the 2nd original sentence:

2 ||| 僕 は 少女 を 望遠鏡 で 見 た 。 ||| -54.24256771686 ||| 0.82328
2 ||| 僕 は 望遠鏡 を 持 っ た 少女 を 見 た 。 ||| -54.26887833166 ||| 0.788141
2 ||| 僕 は 少女 を 望遠鏡 で 見 る 。 ||| -54.27542894284 ||| 0.834369

In this case, translation system outputs the 1st sentence as the best translation,
but actually the 3rd translation is the best translation.

About

Discriminative Language Models as a Tool for Machine Translation Error Analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published