Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with compound search #49

Closed
nagavardhan1 opened this issue Oct 31, 2018 · 5 comments
Closed

Issue with compound search #49

nagavardhan1 opened this issue Oct 31, 2018 · 5 comments

Comments

@nagavardhan1
Copy link

if I search "whatareyou" it is giving "what you"

@wolfgarbe
Copy link
Owner

wolfgarbe commented Oct 31, 2018

WordSegmentation vs. LookupCompound

LookupCompound can insert only a single space into a token (string fragment separated by existing spaces). It is intended for spelling correction of word segmented text but can fix an occasional missing space. There are fewer variants to generate and evaluate because of the single space restriction per token. Therefore it is faster and the quality of the correction is usually better.

WordSegmentation can insert as many spaces as required into a token. Therefore it is suitable also for long strings without any space. The drawback is a slower speed and correction quality, as many more potential variants exist, which need to be generated, evaluated and chosen from.

@nagavardhan1
Copy link
Author

Is there a way I can increase it to 2 spaces instead of one?

@aashish-amber-abz
Copy link

@wolfgarbe I can't find any port in python for WordSegmentation, all the ports listed by you on the read me page is for LookupCompound. Do you know any port for WordSegmentation in python ?

@wolfgarbe
Copy link
Owner

@aashish-amber-abz No, I don't know a Python port of SymSpell which includes WordSegmentation.
But there are other word segmentation approaches in Python available (word segmentation only, without spelling correction) e.g.: https://github.com/grantjenks/python-wordsegment

@wolfgarbe
Copy link
Owner

@nagavardhan1 There is no easy way to increase the number of spaces for LookupCompound (for performance reasons). The algorithm requires significant modification. The modified algorithm, which can deal with a unlimited number of spaces to be inserted is called WordSegmentation. Please note, that it can still correct spelling errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants