Repository for ru-syntax command line tool.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Here is the repository for ru-syntax command line tool, which allows for, ahem, syntactic parsing of Russian texts, plain text in, anniotated conll out. You can visit project's web page to parse some text online and find out more about the pipeline.

The work is done as a project at Higher School of Economics, Moscow, Faculty of Humanities, master's programme Computational Linguistics.



In order to get everything up and running, you need to make sure you have all the requirements, clone this repository or just download it as a zip-file, and unpack it. After that, you basically have two options.

1. Use sample config and move Mystem, MaltParser and TreeTagger to adjust for it

  1. Create 'config.ini' file in the folder containing
  2. Copy the sample config given below to config.ini.
  3. Replace the path in APP_ROOT = /home/nm/repos/ru-syntax line with the full path to the folder containing
  4. Create 'bin' folder in your folder containing
  5. Put Mystem binary, full TreeTagger folder, and full MaltParser folder into that bin folder.
  6. Make sure they are written in config.ini exactly the same as they are named (e.g., option MYSTEM_PATH = %(BIN_PATH)s/mystem might need to be replaced by something like MYSTEM_PATH = %(BIN_PATH)s/mystem-3.0-win7-32bit.exe).
  7. Put MaltParser model downloaded from ru-syntax website into the same folder as MaltParser jar file.

2. Put Mystem, MaltParser and TreeTagger anywhere you like and tweak the config

  1. Create 'config.ini' file in the folder containing
  2. Copy the sample config given below to config.ini.
  3. Tweak the paths in the config according to where you have your Mystem, TreeTagger, MaltParser, and MaltParser model.

Please note that constructions like %(SOME_OPTION)s simply substitute with the contents of SOME_OPTION. You don't have to use them but may rather just specify full paths. In order to use such a construction, you have to make sure that SOME_OPTION is specified either in [DEFAULT] section or in the same section where %(SOME_OPTION)s is invoked.

Also note that regardless of how you place MaltParser folder, you have to put the model into exactly the same folder as MaltParser jar file.

Sample config

# full path to the folder containing
APP_ROOT = /home/nm/repos/ru-syntax
# path to the folder containing Mystem, TreeTagger, and MaltParser
# path for output folder
# path for temporary files folder

# path to mystem binary

# full path to the folder containing MaltParser
MALT_ROOT = %(BIN_PATH)s/maltparser-1.8.1
# name of MaltParser binary file
MALT_NAME = maltparser-1.8.1.jar
# name of MaltParser model

# path to composites dictionary file
COMP_DICT_PATH = %(APP_ROOT)s/dictionaries/composites.csv

# path to TreeTagger folder
TREETAGGER_BIN = %(BIN_PATH)s/treetagger/bin/tree-tagger
# path to Treetagger model
TREETAGGER_PAR = %(APP_ROOT)s/tree_alltags_model.par


In order to annotate your file, you need to run the wrapper script from the command line:

Usage: python3 [-o OUTPUT_FILE] INPUT_FILE

  -h, --help            show this help message and exit
                        output results to OUTPUT_FILE.

If OUTPUT_FILE is not specified, the output file will have the same name as input file but with conll extension.