No description, website, or topics provided.
Clone or download
trungtv Merge pull request #3 from rain1024/syllable
add more rules to syllabelize and tests
Latest commit 200cc28 Mar 31, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
pyvi add more rules to syllabelize and tests Mar 31, 2018
tests add more rules to syllabelize and tests Mar 31, 2018
.gitignore add source code Nov 2, 2016
LICENSE.txt add source code Nov 2, 2016
MANIFEST.in add source code Nov 2, 2016
PKG-INFO add source code Nov 2, 2016
README.rst Update README.rst Mar 28, 2018
setup.cfg add source code Nov 2, 2016
setup.py version 0.0.9. Simplify API Mar 28, 2018

README.rst

Python Vietnamese Toolkit

Pyvi performs tokenizing / pos-tagging for Vietnamese in Python.

Algorithm: Conditional Random Field

Vietnamese tokenizer f1_score = 0.978637686

Vietnamese pos tagging f1_score = 0.92520656

POS TAGS:

  • A - Adjective
  • C - Coordinating conjunction
  • E - Preposition
  • I - Interjection
  • L - Determiner
  • M - Numeral
  • N - Common noun
  • Nc - Noun Classifier
  • Ny - Noun abbreviation
  • Np - Proper noun
  • Nu - Unit noun
  • P - Pronoun
  • R - Adverb
  • S - Subordinating conjunction
  • T - Auxiliary, modal words
  • V - Verb
  • X - Unknown
  • F - Filtered out (punctuation)

Installation

At the command line with pip

$ pip install pyvi

Uninstall

$ pip uninstall pyvi

Usage

from pyvi import ViTokenizer, ViPosTagger

ViTokenizer.tokenize(u"Trường đại học bách khoa hà nội")

ViPosTagger.postagging(ViTokenizer.tokenize(u"Trường đại học Bách Khoa Hà Nội")