Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

A Chinese Word Segmentation(中文分词) routine in pure Ruby

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 bin
Octocat-spinner-32 dict
Octocat-spinner-32 lib
Octocat-spinner-32 public
Octocat-spinner-32 views
Octocat-spinner-32 .gitignore
Octocat-spinner-32 LICENSE
Octocat-spinner-32 README
Octocat-spinner-32 Rakefile
Octocat-spinner-32 VERSION
Octocat-spinner-32 rseg.gemspec
README
Introduction
========
Rseg is a Chinese Word Segmentation(中文分词) routine in pure Ruby.

The algorithm is based on this article: http://xiecc.blog.163.com/blog/static/14032200671110224190/

Usage
========

Rseg now support two modes: inline and C/S mode.

1. Inline mode

> require 'rubygems'
> require 'rseg'
> Rseg.segment("需要分词的文章")
['需要', '分词', '的', '文章']

The first call to Rseg#segment will need about 30 seconds to load the dictionary, the second call will be very fast, you can also call Rseg#load to load dictionaries manually.

2. C/S mode

$ rseg_server
== Sinatra/0.9.4 has taken the stage on 4100

This will start rseg server on http://localhost:4100

You can visit it via your browser or the rseg command.

$ rseg '需要分词的文章'
需要 分词 的 文章

You can also access server with the Rseg#remote_segment

$ irb
> require 'rubygems'
> require 'rseg'
> Rseg.remote_segment("需要分词的文章") # This will be very fast
['需要', '分词', '的', '文章']

Performance
========
About 5M character/s on my Macbook (Intel Core 2 Duo 2GHz/4G mem). 

License
========

Rseg includes two built-in dictionaries:

* CC-CEDICT (http://cc-cedict.org/wiki/) with Creative Commons Attribution-Share Alike 3.0 License (http://creativecommons.org/licenses/by-sa/3.0/)
* Wikipedia Chinese article title list (http://download.wikimedia.org/zhwiki/) with Creative Commons Attribution-Share Alike 3.0 License(http://creativecommons.org/licenses/by-sa/3.0/)

The codes and others in Rseg are licensed under MIT license.

Feedback
========
All feedback are welcome, Yuanyi Zhang(zhangyuanyi#gmail.com)
Something went wrong with that request. Please try again.