Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
## This software is purified with GPLv3+ ## DLMAnalyzer Koichi Akabe <firstname.lastname@example.org> ============================================ This is a program that lists informative n-grams for MT error analysis using structured perceptron. ============================================ Required: My development environment is: Boost (1.49), gflags (2.0), g++ (4.8.2) Build: $ autoreconf -i $ ./configure $ make Additionaly, you can run "(sudo) make install" to install dlm_train on your computer. Train discriminative LM and generate the model file: $ dlm_train -eta [ETA] -modeldata [MODEL_FILE] -traindata [N-BESTS for training] -testdata [N-BESTS for testing] Generate the evaluation sheet: $ ./scripts/generate_sheet_seed.py [MODEL_FILE] [ONE-BESTS] [NUMBER of n-grams] > SEED $ ./scripts/build_analysis_sheet.py [SOURCE] [TARGET REFS] [ORDER MAP] [SEED] > HTML file ============================================ Input data: List of translation candidates for each sentence with system scores and evaluation scores. sentence id ||| translation ||| system score ||| evaluation score ............ ...... ... sentence id ... ID of original sentence translation ... translation candidate for the original sentence system score ... the score given by the translation system evaluation score ... the score given by the evaluation measure For example, we have three candidates for the 2nd original sentence: 2 ||| 僕 は 少女 を 望遠鏡 で 見 た 。 ||| -54.24256771686 ||| 0.82328 2 ||| 僕 は 望遠鏡 を 持 っ た 少女 を 見 た 。 ||| -54.26887833166 ||| 0.788141 2 ||| 僕 は 少女 を 望遠鏡 で 見 る 。 ||| -54.27542894284 ||| 0.834369 In this case, translation system outputs the 1st sentence as the best translation, but actually the 3rd translation is the best translation.