-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spanish support? #40
Comments
Hello @demian85. I guess you mean to ask if the library has a stemmer for the Spanish language. Unfortunately it does not have one yet. Using Schinke stemmer on Spanish text will indeed produce only garbage since the algorithm is targeting Latin. However, I am currently working on Talisman, a much wider library than this one (which is in JavaScript, not Clojure) and can probably implement some kind of Spanish stemmer soon (the ones used by Lucene I think). Tell me if this would suit your use case. The stemmers I found for Spanish are the Martin Porter one in Snowball & the UniNe one used by Lucene. |
Turns out that what I'm looking for is an inflector, I just want a way to normalize a string. More specifically, I need to singularize nouns in spanish. |
Ok. The UniNe stemmer might be of some use to you then. It perform really simple stemming and will probably drop most plural forms (won't inflect them in a grammatically correct way though). Here is how it works:
if (s[len-2] == 'e' && s[len-3] == 's' && s[len-4] == 'e')
return len-2;
if (s[len-2] == 'e' && s[len-3] == 'c') {
s[len-3] = 'z';
return len - 2;
}
if (s[len-2] == 'o' || s[len-2] == 'a' || s[len-2] == 'e')
return len - 2; |
Else, here the code of a python inflector for the Spanish language. |
What are you trying to achieve specifically here? Fuzzy matching? Clustering? |
I'm using MongoDB but the full text search is not smart enough to cover edge cases. I cannot find a way to match all terms using AND without losing stemming and other stuff. |
If you tell me the python inflector works for you and solves your problem, I can implement it in talisman but I'll need some time to do so. |
Ok, I just implemented both the UniNe stemmer & the python inflector in talisman @demian85. Here is how to use them:
// The stemmer
const stemmer = require('talisman/stemmers/spanish/unine');
// The inflector
const inflector = require('talisman/inflectors/spanish/noun').singularize; |
Great! I'll give it a try! Thanks! |
I'll close this issue. Open one on talisman if you have any problem. |
So did it work for you? |
@demian85 you never told me if this worked for you or if there was some things to fix. |
Yeah it works. Thanks.
…On Fri, Dec 2, 2016, 10:29 Guillaume Plique ***@***.***> wrote:
@demian85 <https://github.com/demian85> you never told me if this worked
for you or if there was some things to fix.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#40 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAcuZggWRrnxVrYwWnkaDQ36Vwxa961Fks5rEB1VgaJpZM4KhaiI>
.
|
There is no mention whatsoever about language support.
Schinke stemmer is supposed to be latin but it doesn't work as expected.
Thanks.
The text was updated successfully, but these errors were encountered: