Skip to content

wrznr/timur

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Finite-state morphology for German

Join the chat at https://gitter.im/timur-morph/community

This package started as a migration of a set of finite-state grammars for the morphological analysis of German words delivered with SFST, a finite-state transducer (FST) toolkit by Helmut Schmid, to Pynini, another FST toolkit. The latter has the advantage that it is implemented as a python library allowing for seamless interaction with tons of other useful python packages. By now, a number of morphological operations have been added and some analysis strategies adjusted in comparison to the original rule set.

Installation

timur is implemented in Python 3. In the following, we assume a working Python 3 (tested versions 3.5 and 3.6) installation as well as a working C++ compiler supporting C++-11.

OpenFST

The underlying FST toolkit Pynini is itself based on OpenFST, a C++ library for constructing, combining, optimizing, and searching weighted FSTs. Get the latest version of OpenFST which works with the current version of Pynini (finding a working combination can by a little tricky since Pynini usually is a bit behind OpenFST; comparing the release dates helps), unpack the archive, build and install via

$ ./configure --enable-grm
$ make
$ [sudo] make install && [sudo ldconfig]

re2

TODO

virtualenv

Using virtualenv is highly recommended, although not strictly necessary for installing timur. It may be installed via:

$ [sudo] pip install virtualenv

Create a virtual environement in a subdirectory of your choice (e.g. env) using

$ virtualenv -p python3 env

and activate it.

$ . env/bin/activate

Python requirements

timur uses various 3rd party Python packages (including Pynini) which may best be installed using pip:

(env) $ pip install -r requirements.txt

Finally, timur itself can be installed via pip:

(env) $ pip install .

Releases

No releases published

Packages

No packages published

Languages