-
Notifications
You must be signed in to change notification settings - Fork 12
Demos
Here are the links to different demos showcasing some of the site's functionalities.
Link: http://minitorn.tlu.ee/~jaagup/oma/too/20/09/tasemed2.php
Demorakendus prognoosib eestikeelse teksti vastavust riiklikult hinnatavatele keeleoskustasemetele: A2 - esmane keeleoskus, B1 - suhtluslävi, B2 - edasijõudnu, C1 - vaba suhtlus.
Hindamismudelid on koostatud eesti keele tasemeeksamite kirjutiste statistilise analüüsi põhjal ja rühmitasid ristvalideerimisel õigesti 91%-97% tekstidest. Kõige tulemuslikum on eri tüüpi keelelisi tunnuseid kombineeriv koondmudel.
Hetkel võtab rakendus hindamisel arvesse kolme teksti mõõdet:
- üldine keerukus (teksti, sõnade ja lausete pikkus);
- sõnavara (sõnavara mitmekesisus, ulatus, tihedus ja abstraktsus);
- morfoloogia ehk vormikasutus (sõnaliikide ja muutevormide osakaalud ning rohkus).
Edaspidi on plaanis arvestada ka süntaksi ehk lauseehitusega ning õigekirja- ja grammatikavigadega.
Kuna hindamismudelite aluseks on eksamikirjutised, ei ole need samavõrd usaldusväärsed näiteks kodus abivahendeid kasutades kirjutatud tekstide taseme määramisel. Kohandame rakendust eri kirjutamisolukordade jaoks.
NB! Rakendus ei salvesta hinnatavaid tekste. Siiski ei soovita me tekstiväljale sisestada tundlikke isikuandmeid.
This demo application predicts the proficiency level of Estonian learner writings. The evaluation is based on the nationally tested language proficiency levels: A2 - elementary, B1 - intermediate, B2 - upper intermediate, C1 - advanced.
The classification models rely on statistical analysis of the Estonian language proficiency examination writings. In cross validation, they achieved average accuracy of 91%-97%. The best-performing model is the unified model that combines different types of linguistic features.
Currently, three feature sets are used:
- surface features describing general complexity of the text (text, word and sentence length);
- lexical features (diversity, sophistication and density of vocabulary, noun abstractness);
- morphological features (part of speech and grammatical form frequencies and diversity).
Henceforth, we are also planning to take info account the syntactic (sentence structure) features as well as spelling and grammar errors in the evaluation.
As the models have been trained on examination writings, the level predictions of other texts, e.g., written homework that has been compiled using reference tools (dictionaries, grammars), are less reliable. We aim to adapt the application for diverse writing situations.
Note that the application does not store your data. However, it is advisable not to insert any sensitive personal data to the text field.
Authors: Kais Allkivi-Metsoja, Kaisa Norak, Jaagup Kippar