Skip to content

timarkh/tsakorpus-additional-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tsakorpus-additional-tools

A set of tools for data preprocessing or settings file generation for tsakorpus.

prepare_gloss_settings

A tool that collects glosses and POS tags used in the corpus and generates a report and settings files: conversion_settings.json, categories.json, and grammRules.csv (tab-delimited). All files are generated in the corpus directory. If conversion_settings.json is already present there, it is copied to conversion_settings.json.bak, and then updated. It is presumed that the generated files are further processed manually by a linguist, who should remove erroneously collected glosses, adjust the gloss-to-tag rules, etc.

Usage

Command line examples:

  • python prepare_gloss_settings.py -f tei --dir C:\Work\HZSK\corpora\selkup_tei -l selkup --pos ps --gloss ge
  • python prepare_gloss_settings.py -f exb --dir C:\Work\HZSK\corpora\selkup -l selkup --pos ps --gloss ge

(use python3 if you have also Python 2 installed)

Type python prepare_gloss_settings.py -h for help.

Requirements

The tools require the following software to run:

  • python >= 3.5
  • python modules: lxml (you can use requirements.txt)

License

The software is distributed under MIT license (see LICENSE.md).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages