Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
Introduction ======== Rseg is a Chinese Word Segmentation(中文分词) routine in pure Ruby. The algorithm is based on this article: http://xiecc.blog.163.com/blog/static/14032200671110224190/ Usage ======== Rseg now support two modes: inline and C/S mode. 1. Inline mode > require 'rubygems' > require 'rseg' > Rseg.segment("需要分词的文章") ['需要', '分词', '的', '文章'] The first call to Rseg#segment will need about 30 seconds to load the dictionary, the second call will be very fast, you can also call Rseg#load to load dictionaries manually. 2. C/S mode $ rseg_server == Sinatra/0.9.4 has taken the stage on 4100 This will start rseg server on http://localhost:4100 You can visit it via your browser or the rseg command. $ rseg '需要分词的文章' 需要 分词 的 文章 You can also access server with the Rseg#remote_segment $ irb > require 'rubygems' > require 'rseg' > Rseg.remote_segment("需要分词的文章") # This will be very fast ['需要', '分词', '的', '文章'] Performance ======== About 5M character/s on my Macbook (Intel Core 2 Duo 2GHz/4G mem). License ======== Rseg includes two built-in dictionaries: * CC-CEDICT (http://cc-cedict.org/wiki/) with Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/) * Wikipedia Chinese article title list (http://download.wikimedia.org/zhwiki/) with Creative Commons Attribution-Share Alike 3.0 License（http://creativecommons.org/licenses/by-sa/3.0/) The codes and others in Rseg are licensed under MIT license. Feedback ======== All feedback are welcome, Yuanyi Zhang(zhangyuanyi#gmail.com)