Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fastest-levenshtein instead of leven #7

Closed
wants to merge 3 commits into from
Closed

Use fastest-levenshtein instead of leven #7

wants to merge 3 commits into from

Conversation

ka-weihe
Copy link

@ka-weihe ka-weihe commented Jul 22, 2020

fastest-levenshtein is much faster than leven (up to 10x faster). Please consider using it instead.

Benchmark

Higher is better.

Test Target N=4 N=8 N=16 N=32 N=64 N=128 N=256 N=512 N=1024
fastest-levenshtein 44423 23702 10764 4595 1049 291.5 86.64 22.24 5.473
leven 19688 6884 1606 436 117 30.34 7.604 1.929 0.478

@tanhauhau
Copy link
Owner

Thank you for the pull request.

I believe fastest-levenshtein in most cases are faster, in those case, i think the user should just use closest from fastest-levenshtein instead.

I dont see a reason for having this package as a wrapper of fastest-levenshtein which adds no value to it.

@tanhauhau tanhauhau closed this Jul 26, 2020
@tanhauhau
Copy link
Owner

tanhauhau commented Jul 26, 2020

for anyone who is curious of using fastest-levenshtein instead:

// levenary
import levenary from 'levenary';
levenary('cat', ['cow', 'dog', 'pig']);

// fastest-levenshtein
const {closest} = require('fastest-levenshtein')
closest('cat', ['cow', 'dog', 'pig']);

@ka-weihe
Copy link
Author

ka-weihe commented Jul 27, 2020

The reason for this package to be a wrapper of fastest-levenshtein is that this package already has a lot of people using it. Why wouldn't you merge a PR that clearly makes it better for them. It does not matter whether fastest-levenshtein has the same functionality or not. This package has less use if it uses leven, bacause then it is just an intentionally slow alternative. You should reconsider.

@tanhauhau
Copy link
Owner

@JLHwung @nicolo-ribaudo i need some advice over here.

levenary was created as a super tiny library for babel babel/babel#10924, which is

  • slimer and faster
  • and supports flow type.

@ka-weihe here has implemented an even faster levensthein distance library, according to his benchmark.

im a bit hesitant to swap leven with fastest-levenshtein over here, because fastest-levenshtein has the exact same api of levenary.

from what i see the main dependent on levenary is babel, i would suggest to replace levenary with fastest-levenshtein as it is much faster, rather than having deep nested dependencies: babel -> levenary -> fastest-levensthein.

what do you think?

@ka-weihe
Copy link
Author

@JLHwung @nicolo-ribaudo

@JLHwung
Copy link
Contributor

JLHwung commented Aug 24, 2020

@tanhauhau For context: babel/babel#11911 (comment)

Babel does not swap levenary because it is smaller than fastest-levenshtein. It does not necessarily imply that people should always use levenary. Babel uses levenary to pick up a suggestion from a very small array (length ~ 10) and it is only executed when users does not offer a correct option key. In this specific and limited issue scope, I don't feel it necessarily to switch.

@ka-weihe I understand your passion on promoting fastest-levenshtein. Although Babel can offer a significant download, it will be even better if you can seek for other libraries that depends heavily on levenstein algorithm, so it can have greater impact on the communities.

@ka-weihe
Copy link
Author

ka-weihe commented Aug 24, 2020

I get that Babel doesn't need a fast levenshtein implementation since you only use it on a very small array, but I don't get why you would have two dependecies for something so simple! This (2 line - 150 bytes) piece of code has the functionality of Levenary and Leven combined.

o=Math.min,a=((a,e)=>(d=e.map(d=>((d,a)=>(g=(e=>e*n?1+o(g(e,--n),g(--e)
-(d[e]==a[n++]),g(e)):e+n))(d.length,n=a.length))(a,d)),e[d.indexOf(o(...d))]));

If you don't believe that it works... try run it in your browser.

a('cat', ['cow', 'dog', 'pig']);
//=> 'cow'

I suggest people to just write a levenshtein implementation themselves or copy a simple one from google. If performance matters, add a dependency. That is why I think levenary should be fast (not use leven) or at least not deceive people with its benchmarks #8. @tanhauhau

@JLHwung
Copy link
Contributor

JLHwung commented Aug 25, 2020

o=Math.min,a=((a,e)=>(d=e.map(d=>((d,a)=>(g=(e=>e*n?1+o(g(e,--n),g(--e)
-(d[e]==a[n++]),g(e)):e+n))(d.length,n=a.length))(a,d)),e[d.indexOf(o(...d))]));

Thanks for your input. However the recursive version is significantly slow due to the overhead of JS functions. I went with https://rosettacode.org/wiki/Levenshtein_distance#ES5 which is small but not that slow for some 20 character strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants