Spanish support? #40

demian85 · 2016-10-26T16:55:51Z

There is no mention whatsoever about language support.
Schinke stemmer is supposed to be latin but it doesn't work as expected.
Thanks.

Yomguithereal · 2016-10-26T19:09:30Z

Hello @demian85. I guess you mean to ask if the library has a stemmer for the Spanish language. Unfortunately it does not have one yet. Using Schinke stemmer on Spanish text will indeed produce only garbage since the algorithm is targeting Latin.

However, I am currently working on Talisman, a much wider library than this one (which is in JavaScript, not Clojure) and can probably implement some kind of Spanish stemmer soon (the ones used by Lucene I think). Tell me if this would suit your use case.

The stemmers I found for Spanish are the Martin Porter one in Snowball & the UniNe one used by Lucene.

demian85 · 2016-10-26T20:18:17Z

Turns out that what I'm looking for is an inflector, I just want a way to normalize a string. More specifically, I need to singularize nouns in spanish.

Yomguithereal · 2016-10-26T21:16:31Z

Ok. The UniNe stemmer might be of some use to you then. It perform really simple stemming and will probably drop most plural forms (won't inflect them in a grammatically correct way though).

Here is how it works:

Deburr the string
If the string is less than 5 characters long, then don't affect it
Else drop final o, a and e
Handle final s likewise:

if (s[len-2] == 'e' && s[len-3] == 's' && s[len-4] == 'e')
  return len-2;
if (s[len-2] == 'e' && s[len-3] == 'c') {
  s[len-3] = 'z';
  return len - 2;
}
if (s[len-2] == 'o' || s[len-2] == 'a' || s[len-2] == 'e')
  return len - 2;

Yomguithereal · 2016-10-26T21:17:32Z

Else, here the code of a python inflector for the Spanish language.

Yomguithereal · 2016-10-26T21:37:22Z

What are you trying to achieve specifically here? Fuzzy matching? Clustering?

demian85 · 2016-10-26T22:13:03Z

I'm using MongoDB but the full text search is not smart enough to cover edge cases. I cannot find a way to match all terms using AND without losing stemming and other stuff.
I'm just planning to store a normalized string and search for equality.
Thanks por the Python version, do you know any JS implementation?

Yomguithereal · 2016-10-27T06:45:53Z

If you tell me the python inflector works for you and solves your problem, I can implement it in talisman but I'll need some time to do so.

Yomguithereal · 2016-10-27T07:59:16Z

Ok, I just implemented both the UniNe stemmer & the python inflector in talisman @demian85. Here is how to use them:

npm install talisman

// The stemmer
const stemmer = require('talisman/stemmers/spanish/unine');

// The inflector
const inflector = require('talisman/inflectors/spanish/noun').singularize;

demian85 · 2016-10-27T13:15:04Z

Great! I'll give it a try! Thanks!

Yomguithereal · 2016-10-27T13:16:31Z

I'll close this issue. Open one on talisman if you have any problem.

Yomguithereal · 2016-10-28T19:14:54Z

So did it work for you?

Yomguithereal · 2016-12-02T13:29:56Z

@demian85 you never told me if this worked for you or if there was some things to fix.

demian85 · 2016-12-02T14:29:35Z

Yeah it works. Thanks.

…

On Fri, Dec 2, 2016, 10:29 Guillaume Plique ***@***.***> wrote: @demian85 <https://github.com/demian85> you never told me if this worked for you or if there was some things to fix. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#40 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAcuZggWRrnxVrYwWnkaDQ36Vwxa961Fks5rEB1VgaJpZM4KhaiI> .

Yomguithereal added the question label Oct 26, 2016

Yomguithereal closed this as completed Oct 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spanish support? #40

Spanish support? #40

demian85 commented Oct 26, 2016

Yomguithereal commented Oct 26, 2016 •

edited

Loading

demian85 commented Oct 26, 2016

Yomguithereal commented Oct 26, 2016

Yomguithereal commented Oct 26, 2016

Yomguithereal commented Oct 26, 2016

demian85 commented Oct 26, 2016

Yomguithereal commented Oct 27, 2016

Yomguithereal commented Oct 27, 2016 •

edited

Loading

demian85 commented Oct 27, 2016

Yomguithereal commented Oct 27, 2016

Yomguithereal commented Oct 28, 2016

Yomguithereal commented Dec 2, 2016

demian85 commented Dec 2, 2016 via email

Spanish support? #40

Spanish support? #40

Comments

demian85 commented Oct 26, 2016

Yomguithereal commented Oct 26, 2016 • edited Loading

demian85 commented Oct 26, 2016

Yomguithereal commented Oct 26, 2016

Yomguithereal commented Oct 26, 2016

Yomguithereal commented Oct 26, 2016

demian85 commented Oct 26, 2016

Yomguithereal commented Oct 27, 2016

Yomguithereal commented Oct 27, 2016 • edited Loading

demian85 commented Oct 27, 2016

Yomguithereal commented Oct 27, 2016

Yomguithereal commented Oct 28, 2016

Yomguithereal commented Dec 2, 2016

demian85 commented Dec 2, 2016 via email

Yomguithereal commented Oct 26, 2016 •

edited

Loading

Yomguithereal commented Oct 27, 2016 •

edited

Loading