Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
doc
src
.gitignore
README.md
install_depend.sh

README.md

Introduction

This software performs part of speech tagging, word segmentation, and phoneme analysis for Vietnamese (Homepage).

Basically, the toolkit solves the tasks by appling CRFs method that implemented in CRFSuite

This has been tested on Ubuntu 14.04 lts.

Dependencies

In order to compile the program, you need to install the following software:

  • boost: sudo apt-get install libboost-all-dev
  • cmake: sudo apt-get install cmake

The script install_depend.sh will automatically install CRFSuite and liblbfgs-1.10.

Installation

  1. Install boost C++
  2. ./install_depend.sh
  3. cd build && cmake ../src && make

Usages

Model downloads

The model and also the training scripts can be found in vita-model

Program: vita

PoS tagging, word segmentation, dictionary generates

echo "Hai nghi phạm Nguyễn Hải Dương và Vũ Văn Tiến" | vita -m model_dir

Output:

Hai   M,B-NP,0,h a iz
nghi_phạm   M,I-NP,0_5,ng i_ph a mz
Nguyễn_Hải_Dương    Nu,I-NP,2_3_0,ng w ie nz_h a iz_d wa ngz
và    Cc,B-VP,1,v a
Vũ_Văn_Tiến   V,I-VP,2_0_4,v u_v aw nz_t ie nz

Output format: word PoS,chunking info,tone(s),phoneme(s)

Run vita -h for more options.

Program: vtalk_ana

Phoneme analysys (mainly used for text-to-speech) echo "Hai nghi phạm Nguyễn Hải Dương và Vũ Văn Tiến" | vita_ana -m model_dir

Output:

xx^xx-sil+h=a@0-0/A:xx_0/B:xx-1@0-0&0-0/C:0+3/D:xx-0/E:xx-1/F:M-3/G:0-0/H:1=1@0=3/I:18-3/J:30+5-2
xx^sil-h+a=iz@0-2/A:xx_1/B:0-3@0-0&0-2/C:0+2/D:xx-1/E:M-3/F:M-5/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
sil^h-a+iz=ng@1-1/A:xx_1/B:0-3@0-0&0-2/C:0+2/D:xx-1/E:M-3/F:M-5/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
h^a-iz+ng=i@2-0/A:xx_1/B:0-3@0-0&0-2/C:0+2/D:xx-1/E:M-3/F:M-5/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
a^iz-ng+i=ph@0-1/A:0_3/B:0-2@0-1&1-2/C:5+3/D:M-3/E:M-5/F:Nu-10/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
iz^ng-i+ph=a@1-0/A:0_3/B:0-2@0-1&1-2/C:5+3/D:M-3/E:M-5/F:Nu-10/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
ng^i-ph+a=mz@0-2/A:0_2/B:5-3@1-0&2-1/C:2+4/D:M-3/E:M-5/F:Nu-10/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
i^ph-a+mz=ng@1-1/A:0_2/B:5-3@1-0&2-1/C:2+4/D:M-3/E:M-5/F:Nu-10/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
ph^a-mz+ng=w@2-0/A:0_2/B:5-3@1-0&2-1/C:2+4/D:M-3/E:M-5/F:Nu-10/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
a^mz-ng+w=ie@0-3/A:5_3/B:2-4@0-2&2-2/C:3+3/D:M-5/E:Nu-10/F:Cc-2/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
mz^ng-w+ie=nz@1-2/A:5_3/B:2-4@0-2&2-2/C:3+3/D:M-5/E:Nu-10/F:Cc-2/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
ng^w-ie+nz=h@2-1/A:5_3/B:2-4@0-2&2-2/C:3+3/D:M-5/E:Nu-10/F:Cc-2/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
w^ie-nz+h=a@3-0/A:5_3/B:2-4@0-2&2-2/C:3+3/D:M-5/E:Nu-10/F:Cc-2/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
ie^nz-h+a=iz@0-2/A:2_4/B:3-3@1-1&3-1/C:0+3/D:M-5/E:Nu-10/F:Cc-2/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
nz^h-a+iz=d@1-1/A:2_4/B:3-3@1-1&3-1/C:0+3/D:M-5/E:Nu-10/F:Cc-2/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
h^a-iz+d=wa@2-0/A:2_4/B:3-3@1-1&3-1/C:0+3/D:M-5/E:Nu-10/F:Cc-2/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
a^iz-d+wa=ngz@0-2/A:3_3/B:0-3@2-0&4-0/C:1+2/D:M-5/E:Nu-10/F:Cc-2/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
iz^d-wa+ngz=v@1-1/A:3_3/B:0-3@2-0&4-0/C:1+2/D:M-5/E:Nu-10/F:Cc-2/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
d^wa-ngz+v=a@2-0/A:3_3/B:0-3@2-0&4-0/C:1+2/D:M-5/E:Nu-10/F:Cc-2/G:1-1/H:18=3@1=2/I:10-2/J:30+5-2
wa^ngz-v+a=v@0-1/A:0_3/B:1-2@0-0&0-1/C:2+2/D:Nu-10/E:Cc-2/F:V-8/G:18-3/H:10=2@2=1/I:1-1/J:30+5-2
ngz^v-a+v=u@1-0/A:0_3/B:1-2@0-0&0-1/C:2+2/D:Nu-10/E:Cc-2/F:V-8/G:18-3/H:10=2@2=1/I:1-1/J:30+5-2
v^a-v+u=v@0-1/A:1_2/B:2-2@0-2&1-2/C:0+3/D:Cc-2/E:V-8/F:xx-1/G:18-3/H:10=2@2=1/I:1-1/J:30+5-2
a^v-u+v=aw@1-0/A:1_2/B:2-2@0-2&1-2/C:0+3/D:Cc-2/E:V-8/F:xx-1/G:18-3/H:10=2@2=1/I:1-1/J:30+5-2
v^u-v+aw=nz@0-2/A:2_2/B:0-3@1-1&2-1/C:4+3/D:Cc-2/E:V-8/F:xx-1/G:18-3/H:10=2@2=1/I:1-1/J:30+5-2
u^v-aw+nz=t@1-1/A:2_2/B:0-3@1-1&2-1/C:4+3/D:Cc-2/E:V-8/F:xx-1/G:18-3/H:10=2@2=1/I:1-1/J:30+5-2
v^aw-nz+t=ie@2-0/A:2_2/B:0-3@1-1&2-1/C:4+3/D:Cc-2/E:V-8/F:xx-1/G:18-3/H:10=2@2=1/I:1-1/J:30+5-2
aw^nz-t+ie=nz@0-2/A:0_3/B:4-3@2-0&3-0/C:xx+1/D:Cc-2/E:V-8/F:xx-1/G:18-3/H:10=2@2=1/I:1-1/J:30+5-2
nz^t-ie+nz=sil@1-1/A:0_3/B:4-3@2-0&3-0/C:xx+1/D:Cc-2/E:V-8/F:xx-1/G:18-3/H:10=2@2=1/I:1-1/J:30+5-2
t^ie-nz+sil=xx@2-0/A:0_3/B:4-3@2-0&3-0/C:xx+1/D:Cc-2/E:V-8/F:xx-1/G:18-3/H:10=2@2=1/I:1-1/J:30+5-2
ie^nz-sil+xx=xx@0-0/A:4_3/B:xx-1@0-0&0-0/C:xx+0/D:V-8/E:xx-1/F:xx-0/G:10-2/H:1=1@3=0/I:0-0/J:30+5-2

Output format: Please read doc/lab_format.pdf

Citation

Please use the following Bibtex when you want to cite vita:

  @misc{truong_vita,
    author = {Quoc Truong Do},
    title = {Vita: A Toolkit for Vietnamese segmentation, chunking, part of speech tagging and morphological analyzer},
    url = {http://truongdo.com/vita/},
    year = {2015}
  }

TODO

Credits

CRFSuite

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.