Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chinese mess code and mark offset #2

Closed
wolf1860 opened this issue Oct 24, 2021 · 5 comments
Closed

Chinese mess code and mark offset #2

wolf1860 opened this issue Oct 24, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@wolf1860
Copy link

wolf1860 commented Oct 24, 2021

Thank u very much!
I go get -u the packages and found the update only fixed the first paragraph <mark> tag.if the text has several paragraphs, sometimes the <mark> maybe have some offset.Mess code still exist from the 2nd paragraph. like this one:

var text1 string = `马克思主义认为,管理具有两重性,即既有同生产力相联系的自然属性,又有同生产力相互制约的社会属性。后勤管理是与科学技术的进步、生产力的发展水平紧密联系在一起的。生产力和科学技术水平直接决定着后勤工作中财和物的管理水平以及人员素质,这是后勤管理自然属性的表现。另一方面,后勤管理又是占有生产资料的阶级用来调整阶级关系,维护本阶级利益的一种手段,具有与生产关系相联系的性质,在阶级社会中具有鲜明的阶级性。社会主义制度下的后勤管理不再体现为剥削与被剥削的关系,而体现人与人之间平等互助的客观要求,这是后勤管理的社会属性。`

result bug :
1-the first <mark> is at wrong position,there is some offset
2-mess code from the second paragraph.

Result of: '管理': 1 matches

  1. 1, (0.063785)
    content: …管理具有两重性,<mark>即既</mark>有同生产力相联系的自然属性,又有同生产力相互制约的社会属性。
    后勤管理是与科学技术的进�<mark>�、�</mark>�产力的发展水平紧密联系在一起的。生产力和科学技术水平直接决定着后勤工作中财和物的管理水平
    以及人员素质,这是后勤管理自然属性的表现。另一方面,�<mark>��勤�</mark>��理又是占有生产资料的阶级用来调整阶级关�<mark>��,�</mark>��护本阶级利益的一
    种手段,具有与生产关系相联系<mark>的性</mark>质,在阶级社会中具有鲜明的阶级性。社会主义制度…
@wolf1860 wolf1860 changed the title Chinese mess code yet Chinese mess code and mark offset Oct 24, 2021
@vcaesar
Copy link
Owner

vcaesar commented Oct 24, 2021

I will optimize it in tomorrow.

@vcaesar
Copy link
Owner

vcaesar commented Oct 24, 2021

And you should not set the Opt and Trim or use Opt: "dag", because the other Cut mode delete some chars.

@vcaesar vcaesar added the enhancement New feature or request label Oct 24, 2021
@wolf1860
Copy link
Author

wolf1860 commented Oct 25, 2021

That's so funny:) when I want to search "审计" in "审计工作",I get noting. This issue bothered me for a long time until I found gse-bleve with Opt: "search-hmm". If I follow your advice ,I'll return to the origin, The same situation that happend both gse-bleve and github.com/leopku/bleve-gse-tokenizer/v2. What should I do and how?

@vcaesar
Copy link
Owner

vcaesar commented Oct 26, 2021

Fixed. You can use search and search-hmm mode.

@vcaesar vcaesar closed this as completed Oct 26, 2021
@wolf1860
Copy link
Author

That's great:) thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants