-
-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elasticsearch issue when search Thai language #1887
Comments
Does the Thai language have spaces between the words? |
Looks like you need a different Tokenizer for the Thai language: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-thai-tokenizer.html |
Seems like Thai language not have spaces between the words |
so Should i drop the current index? private function getParams()
|
I think this Tokenizer would be the best: https://www.elastic.co/guide/en/elasticsearch/plugins/7.10/analysis-icu-tokenizer.html |
So, it would be cool if you could test it. Use the ICU Tokenizer, drop the index and re-create it and re-index the content. |
Okay so i just need change that "standard" to "icu_tokenizer" right? let me try it |
Yes, seems so :-) |
Hi Thorsten, i got this error when i click create index |
Ah, sorry, looks like, you have to install the ICU support via plugins:
|
Done install the ICU support, success create index -> full import |
Could you please try this example? https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-thai-tokenizer.html |
change the tokenizer to "thai" the condition still same. can search the 5 first character but cant find with the last 5 character |
Did you try the example mentioned in the Elasticsearch documentation? |
Hi Thorsten, in the elasticsearch documentaion for thai tokenizer, there is no example configuration but when check another tokenizer configuration the point is change the tokenizer value to the what we want to use right? correct me if imwrong because im not really good at elasticsearch |
I'll try to reproduce it on my test installation. |
I tried it with the ICU Tokenizer on my local v3.1 installation using Elasticsearch 7.10. The search worked for
Removing one more character results in an empty search result. |
can you try with 5 - 8 character only? from the last character. so in latest development version and latest elasticsearch its working fine with full string, and remove 1 or 2 character from the full string. and it will also working like that in 3.0.8 release later? |
I moved it to the next version as I need the possibility for the users to configure which tokenizer will be used. So we need a new configuration to handle this. |
The v3.1 release will be a drop in replacement for 3.0, so no need to change the templates. |
Thank you, cant wait for the new version :) also i found another issue with DB MSSQL, but i already fix it in the code and from data type in the table, also i dont know if thats because the collation or not. because in SQL Server i use we dont have any suffix _utf8. i just use the default collation. and this is example issue, we cant insert chinese, japan, thai character in MSSQL, because data type is "varchar". i change it into "nvarchar" also in the query i add N before string. should i open new issue for this? |
@herdianabdillah a new issue would be awesome. I would change the CREATE TABLE statement for MS SQL. |
Describe the bug
unable to search with some Thai word. this is example sentence i have in my FAQ body ปรับการปฏิบัติงานโดยรวม
it can find the FAQ with first 5 character, but when i try with last 5 character the FAQ not found.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
FAQ should be found even with first 5 character or last five character as long as it contain the word
Screenshots
this is the FAQ
this when i search with first 5 character
this when i search with last 5 character
phpMyFAQ (please complete the following information):
The text was updated successfully, but these errors were encountered: