ASEAN word tokenizer written in Ruby
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data/tha
test
wordcut
.gitignore
Gemfile
LICENSE
README.md
wordcut.gemspec

README.md

wordcut.rb

ASEAN word tokenizer written in Ruby.

Example

Thai

 # coding: utf-8
 require 'wordcut/dict'
 require 'wordcut/tokenizer'
 require 'pp'

 tha_dict = Wordcut::BasicDict.from_bundle("tha", "tdict-std.txt")
 tokenizer = Wordcut::BasicTokenizer.new(tha_dict)
 PP.pp tokenizer.tokenize('กากากา')