bilingual dictionary extractor from parallel corpora
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
comparable
README
bicorpus.py
corpus.py
dictionary.py
hundict.py
interactive.py
preprocess.py
sentence.py

README

hundict is an experimental python project, that creates bilingual dictionary
from parallel corpora
Features (planned or done):
- easy to use (see hundict -h)
- fast (python fast, of course, not C fast)
- unigram pairs
  - A - B
- ngram-ngram extraction, not only unigram-unigram
  - ABC - DE
- multiple choice pairs
  - (A or B) - C
- stopword remove
- remaining corpora print