Adds the SmartChineseAnalyzer as a plugin.
Java Groovy Shell
Pull request Compare This branch is 2446 commits ahead of pepite:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.idea
bin
config
gradle/wrapper
lib/sigar
modules
plugins
.gitignore
LICENSE.txt
NOTICE.txt
README.textile
build.gradle
gradlew
gradlew.bat
settings.gradle

README.textile

Adds the SmartChineseAnalyzer (http://code.google.com/p/imdict-chinese-analyzer/) as an easy-to-install plugin.

1) From a clean install, install the plugin as follows:

./plugin -url https://github.com/downloads/thmttch/elasticsearch/elasticsearch-analysis-smartchinese-0.18.0-SNAPSHOT.zip -install analysis-smartchinese

2) Create a new index, and set the default analyzer:

curl -XPUT localhost:9200/test1 -d ’
{
“analysis”: {
“analyzer”: {
“default”: {
“type”: “SmartChinese”
}
}
}
}’

3) Generate an analysis of some text. Notice that the analyzer generates both unigrams and bigrams:

curl -XGET localhost:9200/test1/analyze -d ‘{ “body” : “我说世界好!” }’
{
“tokens”: [
{
“end_offset”: 7,
“position”: 3,
“start_offset”: 3,
“token”: “text”,
“type”: “word”
},
{
“end_offset”: 12,
“position”: 7,
“start_offset”: 11,
“token”: “我”,
“type”: “word”
},
{
“end_offset”: 13,
“position”: 8,
“start_offset”: 12,
“token”: “说”,
“type”: “word”
},
{
“end_offset”: 15,
“position”: 9,
“start_offset”: 13,
“token”: “世界”,
“type”: “word”
},
{
“end_offset”: 16,
“position”: 10,
"start
offset": 15,
“token”: “好”,
“type”: “word”
}
]
}