paste train and dev data and save as new training data:
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/original/dw-bk$ cat train.bk dev.bk > ./tmp/train.bk
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/original/my-rk$ cat train.rk dev.rk > ./tmp/train.rk
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess$ wc *
670 6818 62657 test.bk
670 6769 66903 test.dw
1812 23545 221538 test.my
1812 23157 217510 test.rk
5952 61257 560161 train.bk
5952 60474 597185 train.dw
16561 211718 1990762 train.my
16561 208477 1959575 train.rk
49990 602215 5676291 total
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess$
I defined the labels as follows:
__label__bk
__label__dw
__label__my
__label__rk
prepared label only files:
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess$ wc *.label*
670 670 8040 test.label.bk
670 670 8040 test.label.dw
1812 1812 21744 test.label.my
1812 1812 21744 test.label.rk
5952 5952 71424 train.label.bk
5952 5952 71424 train.label.dw
16561 16561 198732 train.label.my
16561 16561 198732 train.label.rk
49990 49990 599880 total
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess$ head -n 2 *.label*
==> test.label.bk <==
__label__bk
__label__bk
==> test.label.dw <==
__label__dw
__label__dw
==> test.label.my <==
__label__my
__label__my
==> test.label.rk <==
__label__rk
__label__rk
==> train.label.bk <==
__label__bk
__label__bk
==> train.label.dw <==
__label__dw
__label__dw
==> train.label.my <==
__label__my
__label__my
==> train.label.rk <==
__label__rk
__label__rk
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess$
create a new folder and paste with the syllable segmented sentences:
write a shell script:
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess$ cat paste-all.sh
#!/bin/bash
paste -d " " test.label.bk test.bk > ./fasttext/test.fasttext.bk
paste -d " " test.label.dw test.dw > ./fasttext/test.fasttext.dw
paste -d " " test.label.my test.my > ./fasttext/test.fasttext.my
paste -d " " test.label.rk test.rk > ./fasttext/test.fasttext.rk
paste -d " " train.label.bk train.bk > ./fasttext/train.fasttext.bk
paste -d " " train.label.dw train.dw > ./fasttext/train.fasttext.dw
paste -d " " train.label.my train.my > ./fasttext/train.fasttext.my
paste -d " " train.label.rk train.rk > ./fasttext/train.fasttext.rk
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess$
change mode for the shell script:
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess$ chmod +x paste-all.sh
run paste-all.sh ...
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess$ ./paste-all.sh
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess$ cd fasttext/
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext$ wc *
670 7488 70697 test.fasttext.bk
670 7439 74943 test.fasttext.dw
1812 25357 243282 test.fasttext.my
1812 24969 239254 test.fasttext.rk
5952 67209 631585 train.fasttext.bk
5952 66426 668609 train.fasttext.dw
16561 228279 2189494 train.fasttext.my
16561 225038 2158307 train.fasttext.rk
49990 652205 6276171 total
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext$
check the content:
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext$ head -n 3 *.bk
==> test.fasttext.bk <==
__label__bk သူ ခင် ဗျား ကို ဒယ် ဇာ ပေး ဟုတ် ဝ ။
__label__bk နင် ဖယ် သူ့ ဝို စိတ် ညစ် ပေး ခဲ့ လဲ ။
__label__bk သိပ် ပြေ တာ ပေါ့ ။
==> train.fasttext.bk <==
__label__bk နင် ဘာ စီ စဉ် နေ ရယ် ဆို တာ ငါ့ ဝို ပြော သင့် ပေါ့ လန်း ။
__label__bk သူ လို့ စာ အုပ် သုံး ထောင် ကျော် ရောင်း ပီး ဟော ဘီ ။
__label__bk ငယ် ငယ် တည်း က မင်း သား လုပ် ဝို့ ဝါ သ နာ ပါ စ ။
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext$ head -n 3 *.dw
==> test.fasttext.dw <==
__label__dw အဲ ဝယ် ဟှား ခံ ဗျား ဟှို အဲ ဇာ ပေး ဟှို့ မှု ဝ ။
__label__dw နန် ဟှယ် လူ့ ဟှို ဒုက္ခ ပေး ရစ် ဇာ နူး ။
__label__dw သိပ် ပြေ တာ ပေါ့ ။
==> train.fasttext.dw <==
__label__dw နန် ဟှဲ ဇာ စီ စဉ် နေ ဟှယ် ဆို တာ ငါ့ ကို ပြော သင့် ဟှယ် ။
__label__dw သူး နို့ စာ အုပ် သုံး ထော် ကျော် ရော ပီး ပီ ။
__label__dw ချို့ လူ လေ ဟှာ မွီး ရာ ပါ ဇာတ် မှန်း သား လေ မား ။
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext$ head -n 3 *.my
==> test.fasttext.my <==
__label__my သူ အ မှန် အ တိုင်း မ ကျိန် ဆို ရဲ ဘူး လား ။
__label__my ကျွန် တော် သာ ဆို ပြန် ပေး လိုက် မှာ ။
__label__my ဆူ ပြီး တဲ့ ရေ ကို သောက် သ င့် တယ် ။
==> train.fasttext.my <==
__label__my မင်း အဲ့ ဒါ ကို အ ခြား တစ် ခု နဲ့ မ ချိတ် ဘူး လား ။
__label__my သူ မ ဘယ် သူ့ ကို မှ မ မှတ် မိ တော့ ဘူး ။
__label__my အဲ့ ဒါ ကျွန် တော် တို့ အ တွက် ခက် ခဲ တယ် ။
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext$ head -n 3 *.rk
==> test.fasttext.rk <==
__label__rk သူ အ မှန် အ တိုင်း မ ကျိန် ဆို ရဲ ပါ လား ။
__label__rk ကျွန် တော် ဆို ကေ ပြန် ပီး လိုက် ဖို့ ။
__label__rk ဆူ ပြီး ရီ ကို သောက် သ င့် ရေ ။
==> train.fasttext.rk <==
__label__rk မင်း ယင်း ချင့် ကို အ ခြား တစ် ခု နန့် မ ချိတ် ပါ လား ။
__label__rk ထို မ ချေ တစ် ယောက် လေ့ မ မှတ် မိ ပါ ယာ ။
__label__rk ယင်း ချင့် ကျွန် တော် ရို့ အ တွက် ခက် ခ ရေ ။
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext$
First, combine all languages into a one big file as follows:
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext$ mkdir final
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext$ cat train.fasttext.{bk,dw,my,rk} > ./final/train.fasttext.all
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext$ cat test.fasttext.{bk,dw,my,rk} > ./final/test.fasttext.all
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext$ cd final/
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext/final$ wc *
4964 65253 628176 test.fasttext.all
45026 586952 5647995 train.fasttext.all
49990 652205 6276171 total
Shuffle for training and testing:
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext/final$ mkdir shuffle
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext/final$ ls
shuffle test.fasttext.all train.fasttext.all
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext/final$ shuf ./test.fasttext.all > ./shuffle/test.shuf.all
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext/final$ shuf ./train.fasttext.all > ./shuffle/train.shuf.all
Check the content of the final test data:
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext/final/shuffle$ head ./test.shuf.all
__label__rk ဖြစ် နိုင် ကေ နောက် ကြာ သ ပ တေး နိ ။
__label__my ပြော ရ မှာ တော့ အား နာ ပါ ရဲ့ ကျွန် တော် ကွန် ပျူ တာ သိပ္ပံ နဲ့ ပတ် သက် လို့ များ များ စား စား မ သိ ဘူး ။
__label__rk ယင်း ချင့် ကို မင်း အာ မ မ ခံ ခ ပါ ။
__label__dw ကျွန် တော် မွန်း လန်း ဇာ စား နေ တူး ဟှ သူ ဖောင်း ပြော နေ ဟှယ် ။
__label__bk အ မှား လုပ် ဝယ့် ကျောင်း သား ဒေ ဝို ဆ ရာ ဂ ရိုက် ရယ် ။
__label__my ခင် ဗျား မှာ ကျွန် တော့် နံ ပါတ် ရှိ တယ် လေ ။
__label__rk ဒေ က လိန့် မေ စွာ စိတ် ရှုပ် လား ဗျာယ် ။
__label__my သူ တို့ က ဝံ ပု လွေ ကြီး ထွက် ပြေး အောင် ပန်း သီး တွေ နဲ့ ဗုံး ကြဲ သ လို ပစ် ပေါက် ကြ တယ် ။
__label__rk ဒေ မာ ငါး မျှား စွာ ကို ခွ င့် မ ပြု ပါ ။
__label__bk အယ့် ဒါ ဘ ဇာ လောက် ထင် ရှား ရိ ။
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext/final/shuffle$
Check the final content of the training data:
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext/final/shuffle$ head ./train.shuf.all
__label__dw နန့် ကီး မွန်း တည့် နူး ဟှ ကျန်် နော် တီ ဗီ ကေ့ နေ ဟှယ် ။
__label__my လတ် ဆတ် တဲ့ အ သီး များ နှ င့် ဟင်း သီး ဟင်း ရွက် များ က မင်း အ တွက် ကောင်း တယ် ။
__label__my သူ မ က သူ့ ကို သတ် ခဲ့ တာ လား ။
__label__my ဒါ ဘယ် သူ့ သွား တိုက် ဆေး လဲ ။
__label__my ဘယ် အ ချိန် ငွေ လာ ပေး ရ မ လဲ ဆို တာ ကျွန် တော် စဉ်း စား နေ တယ် ။
__label__rk ယင်း ချင့် ဇာ လောက် တန် ဖိုး ဟိ လေး ။
__label__my ငါ အိပ် ချင် တယ် ဒါ ပေ မ ယ့် မ အိပ် နိုင် ဘူး ။
__label__rk ဂီ တ ဟာ မြူး ကြွ စီ ရေ အ ထိ အ ရှိန် မြ င့် ခ ရေ ။
__label__my ကျွန် တော် တို့ အဲ ဒါ ကို တောင်း ဆို ထား လား ။
__label__dw တ ပည့် ဂန်း များ ရ စာ ရေး တတ် ဝို့ နဲ့ ဖတ် တက် ဝို့ သန် ယူ နေ ရယ် ။
(base) ye@ykt-pro:~/data/ethnic-parallel-data/4dialect-detect/preprocess/fasttext/final/shuffle$
After uploading to the server, I found there are some 2 spaces in training file and thus, I made clean space:
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/data/shuffle/clean$ perl ./clean-space.pl ./test.shuf.all > test.shuf.all.clean
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/data/shuffle/clean$ perl ./clean-space.pl ./train.shuf.all > train.shuf.all.clean
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/data/shuffle/clean$
Check the filesize again:
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/data/shuffle/clean$ wc *.clean
4964 65253 624596 test.shuf.all.clean
45026 586952 5615376 train.shuf.all.clean
49990 652205 6239972 total
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/data/shuffle/clean$
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-default
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1311061 lr: 0.000000 avg.loss: 0.178538 ETA: 0h 0m 0s
real 0m0.399s
user 0m2.391s
sys 0m0.048s
Check the output model:
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ ls
model-default.bin model-default.vec
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-default.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.954
R@1 0.954
real 0m0.018s
user 0m0.014s
sys 0m0.004s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-3gram -wordNgrams 3
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 874331 lr: 0.000000 avg.loss: 0.143056 ETA: 0h 0m 0s
real 0m5.585s
user 0m4.692s
sys 0m0.681s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
Testing with 3-gram model ...
Wow! Now we got 97 precision and recall! :)
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-3gram.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.97
R@1 0.97
real 0m0.367s
user 0m0.056s
sys 0m0.310s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-3gram-25epoch -wordNgrams 3 -epoch 25
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1010528 lr: 0.000000 avg.loss: 0.053030 ETA: 0h 0m 0s
real 0m6.342s
user 0m16.184s
sys 0m0.721s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-3gram-25epoch.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.968
R@1 0.968
real 0m0.365s
user 0m0.056s
sys 0m0.308s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
1-gram training/testing ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 1
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1311308 lr: 0.000000 avg.loss: 0.172516 ETA: 0h 0m 0s
real 0m0.375s
user 0m2.481s
sys 0m0.040s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.955
R@1 0.955
real 0m0.018s
user 0m0.017s
sys 0m0.000s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
2-gram training/testing ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 2
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 874318 lr: 0.000000 avg.loss: 0.134431 ETA: 0h 0m 0s
real 0m4.999s
user 0m4.171s
sys 0m0.692s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.97
R@1 0.97
real 0m0.343s
user 0m0.052s
sys 0m0.289s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
playing with 3-gram ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 3
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 873795 lr: 0.000000 avg.loss: 0.145663 ETA: 0h 0m 0s
real 0m4.870s
user 0m4.760s
sys 0m0.739s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.97
R@1 0.97
real 0m0.356s
user 0m0.052s
sys 0m0.303s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
training/testing with 4-gram ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 4
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 873765 lr: 0.000000 avg.loss: 0.157056 ETA: 0h 0m 0s
real 0m4.808s
user 0m5.186s
sys 0m0.727s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.968
R@1 0.968
real 0m0.367s
user 0m0.068s
sys 0m0.299s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
training/testing with 5-gram ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 5
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 656184 lr: 0.000000 avg.loss: 0.182098 ETA: 0h 0m 0s
real 0m5.020s
user 0m5.771s
sys 0m0.726s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.967
R@1 0.967
real 0m0.372s
user 0m0.056s
sys 0m0.315s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
training/testing with 6-gram ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 6
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 655627 lr: 0.000000 avg.loss: 0.193838 ETA: 0h 0m 0s
real 0m4.927s
user 0m5.852s
sys 0m0.725s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.966
R@1 0.966
real 0m0.376s
user 0m0.064s
sys 0m0.310s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
training/testing with 7-gram ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 7
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 524682 lr: 0.000000 avg.loss: 0.211927 ETA: 0h 0m 0s
real 0m5.050s
user 0m6.637s
sys 0m0.744s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.966
R@1 0.966
real 0m0.380s
user 0m0.076s
sys 0m0.304s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
training/testing 1-gram epoch 10 to 50 ...
Here, I will record testing results only ...
for 1-gram, 10 epochs
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.957
R@1 0.957
real 0m0.020s
user 0m0.015s
sys 0m0.005s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
1-gram, 20 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.956
R@1 0.956
real 0m0.019s
user 0m0.014s
sys 0m0.005s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
1-gram, 30 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.955
R@1 0.955
real 0m0.017s
user 0m0.014s
sys 0m0.004s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
1-gram, 40 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.956
R@1 0.956
real 0m0.016s
user 0m0.012s
sys 0m0.004s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
1-gram, 50 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.956
R@1 0.956
real 0m0.018s
user 0m0.017s
sys 0m0.000s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
2-gram, 10 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.97
R@1 0.97
real 0m0.342s
user 0m0.040s
sys 0m0.301s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
2-gram, 20 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.969
R@1 0.969
real 0m0.345s
user 0m0.048s
sys 0m0.296s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
2-gram, 30 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.967
R@1 0.967
real 0m0.346s
user 0m0.048s
sys 0m0.298s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
2-gram, 40 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.967
R@1 0.967
real 0m0.347s
user 0m0.036s
sys 0m0.310s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
2-gram, 50 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.967
R@1 0.967
real 0m0.347s
user 0m0.044s
sys 0m0.302s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
3-gram, 10 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 3 -epoch 10
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1050542 lr: 0.000000 avg.loss: 0.092547 ETA: 0h 0m 0s
real 0m5.091s
user 0m7.475s
sys 0m0.744s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.968
R@1 0.968
real 0m0.355s
user 0m0.052s
sys 0m0.302s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
3-gram, 20 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 3 -epoch 20
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1050878 lr: 0.000000 avg.loss: 0.067887 ETA: 0h 0m 0s
real 0m5.721s
user 0m13.181s
sys 0m0.748s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.968
R@1 0.968
real 0m0.360s
user 0m0.048s
sys 0m0.311s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
3-gram, 30 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 3 -epoch 30
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1050917 lr: 0.000000 avg.loss: 0.042155 ETA: 0h 0m 0s
real 0m6.307s
user 0m19.160s
sys 0m0.818s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.969
R@1 0.969
real 0m0.362s
user 0m0.052s
sys 0m0.310s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
3-gram, 40 epochs ...
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1051172 lr: 0.000000 avg.loss: 0.039813 ETA: 0h 0m 0s
real 0m6.750s
user 0m24.929s
sys 0m0.799s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.968
R@1 0.968
real 0m0.360s
user 0m0.060s
sys 0m0.299s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
3-gram, 50 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 3 -epoch 50
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1051498 lr: 0.000000 avg.loss: 0.041061 ETA: 0h 0m 0s
real 0m7.545s
user 0m30.487s
sys 0m0.843s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.968
R@1 0.968
real 0m0.363s
user 0m0.056s
sys 0m0.306s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
3-gram, 15 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 3 -epoch 15
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 985152 lr: 0.000000 avg.loss: 0.071952 ETA: 0h 0m 0s
real 0m5.798s
user 0m10.267s
sys 0m0.736s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.968
R@1 0.968
real 0m0.357s
user 0m0.056s
sys 0m0.299s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
3-gram, 25 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 3 -epoch 25
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1010634 lr: 0.000000 avg.loss: 0.054468 ETA: 0h 0m 0s
real 0m6.280s
user 0m16.097s
sys 0m0.767s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.968
R@1 0.968
real 0m0.363s
user 0m0.048s
sys 0m0.314s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
3-gram, 35 epochs ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 3 -epoch 35
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1082188 lr: 0.000000 avg.loss: 0.046379 ETA: 0h 0m 0s
real 0m7.185s
user 0m21.856s
sys 0m0.777s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.968
R@1 0.968
real 0m0.365s
user 0m0.044s
sys 0m0.320s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
Running with default parameters ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1312172 lr: 0.000000 avg.loss: 0.172135 ETA: 0h 0m 0s
real 0m0.434s
user 0m2.449s
sys 0m0.101s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.954
R@1 0.954
real 0m0.018s
user 0m0.018s
sys 0m0.000s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
2-gram and default ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 2
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 874454 lr: 0.000000 avg.loss: 0.137124 ETA: 0h 0m 0s
real 0m5.143s
user 0m4.242s
sys 0m0.731s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
N 4964
P@1 0.97
R@1 0.97
real 0m0.345s
user 0m0.032s
sys 0m0.313s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
1-gram with default parameters ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 1
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1310131 lr: 0.000000 avg.loss: 0.166001 ETA: 0h 0m 0s
real 0m0.433s
user 0m2.446s
sys 0m0.100s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean 5
N 4964
P@5 0.25
R@5 1
real 0m0.019s
user 0m0.015s
sys 0m0.004s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
2-gram with default parameters ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 2
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 874621 lr: 0.000000 avg.loss: 0.134965 ETA: 0h 0m 0s
real 0m5.505s
user 0m4.218s
sys 0m0.693s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean 5
N 4964
P@5 0.25
R@5 1
real 0m0.343s
user 0m0.044s
sys 0m0.297s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
3-gram with default parameters ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 3
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 874708 lr: 0.000000 avg.loss: 0.148019 ETA: 0h 0m 0s
real 0m5.078s
user 0m4.682s
sys 0m0.696s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean 5
N 4964
P@5 0.25
R@5 1
real 0m0.355s
user 0m0.049s
sys 0m0.306s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
4-gram with default parameters ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 4
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 874269 lr: 0.000000 avg.loss: 0.162514 ETA: 0h 0m 0s
real 0m5.227s
user 0m5.070s
sys 0m0.737s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean 5
N 4964
P@5 0.25
R@5 1
real 0m0.362s
user 0m0.064s
sys 0m0.298s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
5-gram with default parameters ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 5
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 656135 lr: 0.000000 avg.loss: 0.179791 ETA: 0h 0m 0s
real 0m5.216s
user 0m5.393s
sys 0m0.708s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean 5
N 4964
P@5 0.25
R@5 1
real 0m0.371s
user 0m0.052s
sys 0m0.318s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
We got same results for 1-gram to 5-gram when evaluation make with five precision and recall.
for 1-gram ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 1
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 1310701 lr: 0.000000 avg.loss: 0.169431 ETA: 0h 0m 0s
real 0m0.431s
user 0m2.456s
sys 0m0.107s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test-label ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
F1-Score : 0.973937 Precision : 0.968358 Recall : 0.979581 __label__rk
F1-Score : 0.950246 Precision : 0.941495 Recall : 0.959161 __label__my
F1-Score : 0.967449 Precision : 0.981567 Recall : 0.953731 __label__dw
F1-Score : 0.894172 Precision : 0.919558 Recall : 0.870149 __label__bk
N 4964
P@1 0.954
R@1 0.954
real 0m0.018s
user 0m0.014s
sys 0m0.004s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
for 2-gram ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 2
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 874358 lr: 0.000000 avg.loss: 0.134418 ETA: 0h 0m 0s
real 0m5.555s
user 0m4.231s
sys 0m0.682s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test-label ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
F1-Score : 0.982726 Precision : 0.976567 Recall : 0.988962 __label__rk
F1-Score : 0.967159 Precision : 0.959283 Recall : 0.975166 __label__my
F1-Score : 0.977307 Precision : 0.990798 Recall : 0.964179 __label__dw
F1-Score : 0.930268 Precision : 0.955906 Recall : 0.905970 __label__bk
N 4964
P@1 0.969
R@1 0.969
real 0m0.343s
user 0m0.048s
sys 0m0.293s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
for 3-gram ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 3
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 874590 lr: 0.000000 avg.loss: 0.136901 ETA: 0h 0m 0s
real 0m5.592s
user 0m4.692s
sys 0m0.733s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test-label ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
F1-Score : 0.982398 Precision : 0.979167 Recall : 0.985651 __label__rk
F1-Score : 0.968588 Precision : 0.958897 Recall : 0.978477 __label__my
F1-Score : 0.979592 Precision : 0.992343 Recall : 0.967164 __label__dw
F1-Score : 0.929664 Precision : 0.952978 Recall : 0.907463 __label__bk
N 4964
P@1 0.970
R@1 0.970
real 0m0.351s
user 0m0.048s
sys 0m0.302s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
for 4-gram ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 4
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 874480 lr: 0.000000 avg.loss: 0.161303 ETA: 0h 0m 0s
real 0m5.717s
user 0m5.095s
sys 0m0.732s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test-label ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
F1-Score : 0.982678 Precision : 0.979178 Recall : 0.986203 __label__rk
F1-Score : 0.967213 Precision : 0.957792 Recall : 0.976821 __label__my
F1-Score : 0.975094 Precision : 0.986260 Recall : 0.964179 __label__dw
F1-Score : 0.928025 Precision : 0.952830 Recall : 0.904478 __label__bk
N 4964
P@1 0.969
R@1 0.969
real 0m0.362s
user 0m0.064s
sys 0m0.298s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
for 5-gram ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 5
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 656396 lr: 0.000000 avg.loss: 0.181586 ETA: 0h 0m 0s
real 0m5.010s
user 0m5.848s
sys 0m0.734s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test-label ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
F1-Score : 0.981848 Precision : 0.978618 Recall : 0.985099 __label__rk
F1-Score : 0.966120 Precision : 0.956710 Recall : 0.975717 __label__my
F1-Score : 0.974242 Precision : 0.989231 Recall : 0.959701 __label__dw
F1-Score : 0.923780 Precision : 0.943925 Recall : 0.904478 __label__bk
N 4964
P@1 0.967
R@1 0.967
real 0m0.375s
user 0m0.068s
sys 0m0.307s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
for 6-gram ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 6
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 655983 lr: 0.000000 avg.loss: 0.201369 ETA: 0h 0m 0s
real 0m4.961s
user 0m6.058s
sys 0m0.742s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test-label ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
F1-Score : 0.982128 Precision : 0.978630 Recall : 0.985651 __label__rk
F1-Score : 0.964178 Precision : 0.955556 Recall : 0.972958 __label__my
F1-Score : 0.972686 Precision : 0.989198 Recall : 0.956716 __label__dw
F1-Score : 0.916413 Precision : 0.933437 Recall : 0.900000 __label__bk
N 4964
P@1 0.966
R@1 0.966
real 0m0.377s
user 0m0.060s
sys 0m0.317s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
for 7-gram ...
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext supervised -input ../data/shuffle/clean/train.shuf.all.clean -output model-tmp -wordNgrams 7
Read 0M words
Number of words: 2430
Number of labels: 4
Progress: 100.0% words/sec/thread: 656322 lr: 0.000000 avg.loss: 0.208479 ETA: 0h 0m 0s
real 0m4.961s
user 0m6.401s
sys 0m0.744s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$ time ../../fasttext test-label ./model-tmp.bin ../data/shuffle/clean/test.shuf.all.clean
F1-Score : 0.982649 Precision : 0.980759 Recall : 0.984547 __label__rk
F1-Score : 0.964841 Precision : 0.953150 Recall : 0.976821 __label__my
F1-Score : 0.971125 Precision : 0.989164 Recall : 0.953731 __label__dw
F1-Score : 0.916159 Precision : 0.936137 Recall : 0.897015 __label__bk
N 4964
P@1 0.966
R@1 0.966
real 0m0.381s
user 0m0.060s
sys 0m0.320s
(tabpfn) yekyaw.thu@gpu:~/tool/fastText/dialect-detection/model$
Table.1 Precision and Recall of 1-gram to 7-gram models
n-gram | Precision | Recall | Training Time | Testing Time |
---|---|---|---|---|
1-gram | 0.955 | 0.955 | 0m0.375s | 0m0.018s |
2-gram | 0.97 | 0.97 | 0m4.999s | 0m0.343s |
3-gram | 0.97 | 0.97 | 0m4.870s | 0m0.356s |
4-gram | 0.968 | 0.968 | 0m4.808s | 0m0.367s |
5-gram | 0.967 | 0.967 | 0m5.020s | 0m0.372s |
6-gram | 0.966 | 0.966 | 0m4.927s | 0m0.376s |
7-gram | 0.966 | 0.966 | 0m5.050s | 0m0.380s |
Table.2 Precision and Recall of 1-gram to 3-gram models with various epochs
n-gram | 10 epochs | 20 epochs | 30 epochs | 40 epochs | 50 epochs |
---|---|---|---|---|---|
1-gram | 0.957 | 0.956 | 0.955 | 0.956 | 0.956 |
2-gram | 0.970 | 0.969 | 0.967 | 0.967 | 0.967 |
3-gram | 0.968 | 0.968 | 0.969 | 0.968 | 0.968 |
Table.3 F-1 score, Precision and Recall of each class for 1-gram model
Class-name | F-1 Score | Precision | Recall |
---|---|---|---|
__label__rk | 0.974 | 0.968 | 0.980 |
__label__my | 0.950 | 0.941 | 0.959 |
__label__dw | 0.967 | 0.982 | 0.954 |
__label__bk | 0.894 | 0.920 | 0.870 |
Table.4 F-1 score, Precision and Recall of each class for 2-gram model
Class-name | F-1 Score | Precision | Recall |
---|---|---|---|
__label__rk | 0.983 | 0.977 | 0.989 |
__label__my | 0.967 | 0.959 | 0.975 |
__label__dw | 0.977 | 0.991 | 0.964 |
__label__bk | 0.930 | 0.956 | 0.906 |
Table.5 F-1 score, Precision and Recall of each class for 3-gram model
Class-name | F-1 Score | Precision | Recall |
---|---|---|---|
__label__rk | 0.982 | 0.979 | 0.986 |
__label__my | 0.969 | 0.959 | 0.978 |
__label__dw | 0.980 | 0.992 | 0.967 |
__label__bk | 0.930 | 0.953 | 0.907 |