Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chinese Search #3

Closed
hustcc opened this issue Apr 27, 2016 · 4 comments
Closed

Chinese Search #3

hustcc opened this issue Apr 27, 2016 · 4 comments

Comments

@hustcc
Copy link

hustcc commented Apr 27, 2016

  1. whether supports Chinese search?
  2. can i index some text which are not in database, but all text have ID.

thanks for your answer.

@nticaric
Copy link
Contributor

I would say that it even works for Chinese. The stemming process simply would do nothing since the stemming concept is not applicable in Chinese.

If you take a look at the demo page, try to search for: 指原の乱
I don't know what this means or if this is even Chinese, but it gives me some results.

Regarding your second question, I am not sure what you meant. If you have some text in your database then yes, it can be searched. Where else could the text be if it's not in the db?

@hustcc
Copy link
Author

hustcc commented Apr 27, 2016

after post this issue, I read the code ofthe project.
I think may need a Chinese Tokenizer analyzer, and then write a Chinese stemmer. If I hava time, maybe can push a request.o̖⸜((̵̵́ ̆͒͟˚̩̭ ̆͒)̵̵̀)⸝o̗

the second question, I found the answer after read the code.

typed use ipad, it is not convenient

thx for your reply.

@nticaric
Copy link
Contributor

I think the current tokenization process should also work for Chinese. It's a simple regular expression that breaks text into words. After that, each word is stemmed. The stemming concept cannot be applied to Chinese but to Indo-European group of languages, so the stemming will simply be ignored and will do nothing to the word.

@dryyun
Copy link

dryyun commented Nov 26, 2016

Chinese a bit complex, the test results are not good,I think have a Chinese Tokenizer analyzer is better

sleepless pushed a commit to sleepless/tntsearch that referenced this issue Oct 25, 2017
* commit '1e3135846c74efe9818ef5517b8499b24c1f0eb5':
  remove changes
  removed
  removed default
  - order
  - order requests_foreign date de
  - most change from kaidl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants