Scripts used for the preprocessing of the EstGEC-L2 corpus that contains Estonian L2 learner texts error-annotated in the M2 format.
-
Updated
Dec 4, 2023 - Python
Scripts used for the preprocessing of the EstGEC-L2 corpus that contains Estonian L2 learner texts error-annotated in the M2 format.
Estonian TimeML Annotated Corpus \ Eesti keele TimeML märgendatud korpus
An extractor of keywords for Estonian texts.
Turn XML downloads from the Eesti Keele Instituut into usable Kindle dictionaries
Estonian Grammatical Error Correction (GEC) test and development corpus that contains L2 learner texts error-annotated in the M2 format.
Source code for "Sõnajaht: Definition Embeddings and Semantic Search for Reverse Dictionary Creation" published at *SEM 2024
Add a description, image, and links to the estonian-language topic page so that developers can more easily learn about it.
To associate your repository with the estonian-language topic, visit your repo's landing page and select "manage topics."