Skip to content

Context-aware beam search for unsupervised word-by-word translation

License

Notifications You must be signed in to change notification settings

yunsukim86/wbw-lm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Context-aware Beam Search for Unsupervised Word-by-Word Translation

This code implements a simple beam search where cross-lingual word embedding is combined with a language model. It is compatible with MUSE embeddings and kenlm language models. The output translation can be further fed to a denoising autoencoder for improved reordering.

If you use this code, please cite:

If you are looking for the denoising autoencoder, please go to sockeye-noise.

Installation

First, please install all dependencies:

Then clone this repository.

Usage

Here is a simple example for translation:

> cat {input_corpus} | python translate.py --src_emb {source_embedding} \
                                           --tgt_emb {target_embedding} \
                                           --emb_dim {embedding_dimension} \
                                           --lm {language_model} > {output_translation}

Please refer to help message (-h) for other detailed options.

About

Context-aware beam search for unsupervised word-by-word translation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published