Official releases of the TOROT treebank
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README.md
afnik.conll
afnik.xml
avv.conll
avv.xml
birchbark.conll
birchbark.xml
const.conll
const.xml
domo.conll
domo.xml
drac.conll
drac.xml
dux-grjaz.conll
dux-grjaz.xml
kiev-hyp.conll
kiev-hyp.xml
kiev-mis.conll
kiev-mis.xml
lav.conll
lav.xml
luk-koloc.conll
luk-koloc.xml
mst.conll
mst.xml
mstislav-col.conll
mstislav-col.xml
nov-list.conll
nov-list.xml
nov-marg.conll
nov-marg.xml
nov-sin.conll
nov-sin.xml
novgorod-jaroslav.conll
novgorod-jaroslav.xml
ostromir-col.conll
ostromir-col.xml
peter.conll
peter.xml
psal-sin.conll
psal-sin.xml
pskov-ivan.conll
pskov-ivan.xml
pskov.conll
pskov.xml
pvl-hyp.conll
pvl-hyp.xml
rig-smol1281.conll
rig-smol1281.xml
riga-goth.conll
riga-goth.xml
rusprav.conll
rusprav.xml
schism.conll
schism.xml
sergrad.conll
sergrad.xml
smol-pol-lit.conll
smol-pol-lit.xml
spi.conll
spi.xml
supr.conll
supr.xml
suz-lav.conll
suz-lav.xml
usp-sbor.conll
usp-sbor.xml
ust-vlad.conll
ust-vlad.xml
varlaam.conll
varlaam.xml
vest-kur.conll
vest-kur.xml
vit-const.conll
vit-const.xml
vit-meth.conll
vit-meth.xml
zadon.conll
zadon.xml
zogr.conll
zogr.xml

README.md

The TOROT Treebank

The TOROT Treebank is a dependency treebank with morphosyntactic and information-structure annotation. It includes texts in several stages of Slavic and is freely available under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.

Please cite as

Hanne Martine Eckhoff and Aleksandrs Berdicevskis. 2015. 'Linguistics vs. digital editions: The Tromsø Old Russian and OCS Treebank'. Scripta & e-Scripta 2015 14–15, pp. 9-25.

Releases of the TOROT Treebank are hosted on Github.

Contents

The following texts are included in this release of the treebank:

Text Language Filename Size
Codex Suprasliensis Old Church Slavonic supr 79070 tokens
Codex Zographensis Old Church Slavonic zogr 1098 tokens
Kiev Missal Old Church Slavonic kiev-mis 370 tokens
Psalterium Sinaiticum Old Church Slavonic psal-sin 248 tokens
Vita Constantini Church Slavonic vit-const 890 tokens
Vita Methodii Church Slavonic vit-meth 331 tokens
Afanasij Nikitin’s journey beyond three seas Old Russian afnik 6842 tokens
Birchbark letters Old Russian birchbark 1965 tokens
Charter of Prince Jurij Svjatoslavich of Smolensk on the alliance with Poland and Lithuania, 1386 Old Russian smol-pol-lit 344 tokens
Colophon to Mstislav’s Gospel book Old Russian mstislav-col 259 tokens
Colophon to the Ostromir Codex Old Russian ostromir-col 199 tokens
Correspondence of Peter the Great Old Russian peter 100 tokens
Domostroj Old Russian domo 23459 tokens
Life of Sergij of Radonezh Old Russian sergrad 20361 tokens
Materials for the history of the schism Old Russian schism 1835 tokens
Missive from Prince Ivan of Pskov, 1463–1465 Old Russian pskov-ivan 339 tokens
Missive from the Archbishop of Riga to the Prince of Smolensk Old Russian rig-smol1281 171 tokens
Mstislav’s letter Old Russian mst 158 tokens
Novgorod service book marginalia Old Russian nov-marg 93 tokens
Novgorod’s treaty with Grand Prince Jaroslav Jaroslavich, 1266 Old Russian novgorod-jaroslav 423 tokens
Russkaja pravda Old Russian rusprav 4174 tokens
Statute of Prince Vladimir Old Russian ust-vlad 495 tokens
Testament of Ivan Jurievich Grjaznoj Old Russian dux-grjaz 421 tokens
The 1229 Treaty between Smolensk, Riga and Gotland (version A) Old Russian riga-goth 1421 tokens
The first Novgorod Chronicle, Synodal manuscript Old Russian nov-sin 17838 tokens
The Kiev Chronicle, Codex Hypatianus Old Russian kiev-hyp 544 tokens
The Life of Avvakum Old Russian avv 22835 tokens
The list of the Novgorodians’ losses Old Russian nov-list 187 tokens
The Primary Chronicle, Codex Hypatianus Old Russian pvl-hyp 3610 tokens
The Primary Chronicle, Codex Laurentianus Old Russian lav 56725 tokens
The Suzdal Chronicle, Codex Laurentianus Old Russian suz-lav 23760 tokens
The tale of Dracula Old Russian drac 2487 tokens
The tale of Igor’s campaign Old Russian spi 2850 tokens
The tale of Luka Kolocskij Old Russian luk-koloc 906 tokens
The taking of Pskov Old Russian pskov 2326 tokens
The tale of the fall of Constantinople Old Russian const 9258 tokens
Uspenskij sbornik Old Russian usp-sbor 25189 tokens
Varlaam’s donation charter to the Xutyn monastery Old Russian varlaam 148 tokens
Vesti-Kuranty Old Russian vest-kur 1154 tokens
Zadonshchina Old Russian zadon 2399 tokens

(The 'size' column in the table above shows the number of annotated tokens in a text. The number of tokens will be slightly larger than the number of words in the original printed edition as some words have been split into multiple tokens and some tokens have been inserted during annotation.)

Please see the XML files for detailed metadata and a full list of contributors.

Data formats

The texts are available on two formats:

  1. PROIEL XML: These files are the authoritative source files and the only ones that contain all available annotation. They contain the complete morphological, syntactic and information-structure annotation, as well as the complete text, including punctuation, section headers etc. The schema is defined in proiel.xsd.

  2. CoNLL-X format