TamilTB

This package contains a morphologically and syntactically (dependency) annotated corpus for Tamil. There are 600 sentences (trees) and 9581 tokens (non-root nodes).

The static copy in /net/data/TamilTB has not been touched since 2013. It apparently corresponds to the release 0.1. Since then, the Universal Dependencies conversion has been the more visible version of the corpus. However, we should be able to fix errors in the original Prague-style annotation.

Loganathan Ramasamy, the author of the original treebank, is no longer a member of ÚFAL. This repository has been started by Dan Zeman, based on the release 0.1.

Contents of this package

data/
    ./tamiltb.treex - Dependency treebank in Treex format
    ./tamiltb.conll - Dependency treebank in CoNLL-X format
    ./train.conll - First 400 sentences / 6329 tokens of the treebank, intended for training of models
    ./dev.conll   - Next   80 sentences / 1263 tokens of the treebank, intended for development and tuning
    ./test.conll  - Last  120 sentences / 1989 tokens of the treebank, intended for testing of models
doc/
    ./index.html - Documentation for the treebank data
    ...
README.md - This file

Treex Format

Dependency trees can be viewed/modified using tree editor TrEd with the EasyTreex extension.

CoNLL Format

CoNLL format is provided mainly to use machine learning tools to train and test tagging and parsing models. It is created automatically from the Treex version, which is the master file.

License

TamilTB.v0.1 by Institute of Formal and Applied Linguistics (ÚFAL) is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

doc

doc

README.md

README.md

Repository files navigation

TamilTB

Contents of this package

Treex Format

CoNLL Format

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
doc		doc
README.md		README.md

ufal/tamiltb

Folders and files

Latest commit

History

Repository files navigation

TamilTB

Contents of this package

Treex Format

CoNLL Format

License

About

Resources

Stars

Watchers

Forks

Languages