-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with compound search #49
Comments
WordSegmentation vs. LookupCompound LookupCompound can insert only a single space into a token (string fragment separated by existing spaces). It is intended for spelling correction of word segmented text but can fix an occasional missing space. There are fewer variants to generate and evaluate because of the single space restriction per token. Therefore it is faster and the quality of the correction is usually better. WordSegmentation can insert as many spaces as required into a token. Therefore it is suitable also for long strings without any space. The drawback is a slower speed and correction quality, as many more potential variants exist, which need to be generated, evaluated and chosen from. |
Is there a way I can increase it to 2 spaces instead of one? |
@wolfgarbe I can't find any port in python for WordSegmentation, all the ports listed by you on the read me page is for LookupCompound. Do you know any port for WordSegmentation in python ? |
@aashish-amber-abz No, I don't know a Python port of SymSpell which includes WordSegmentation. |
@nagavardhan1 There is no easy way to increase the number of spaces for LookupCompound (for performance reasons). The algorithm requires significant modification. The modified algorithm, which can deal with a unlimited number of spaces to be inserted is called WordSegmentation. Please note, that it can still correct spelling errors. |
if I search "whatareyou" it is giving "what you"
The text was updated successfully, but these errors were encountered: