Use fastest-levenshtein instead of leven #7

ka-weihe · 2020-07-22T12:42:43Z

fastest-levenshtein is much faster than leven (up to 10x faster). Please consider using it instead.

Benchmark

Higher is better.

Test Target	N=4	N=8	N=16	N=32	N=64	N=128	N=256	N=512	N=1024
fastest-levenshtein	44423	23702	10764	4595	1049	291.5	86.64	22.24	5.473
leven	19688	6884	1606	436	117	30.34	7.604	1.929	0.478

tanhauhau · 2020-07-26T09:56:44Z

Thank you for the pull request.

I believe fastest-levenshtein in most cases are faster, in those case, i think the user should just use closest from fastest-levenshtein instead.

I dont see a reason for having this package as a wrapper of fastest-levenshtein which adds no value to it.

tanhauhau · 2020-07-26T09:58:15Z

for anyone who is curious of using fastest-levenshtein instead:

// levenary
import levenary from 'levenary';
levenary('cat', ['cow', 'dog', 'pig']);

// fastest-levenshtein
const {closest} = require('fastest-levenshtein')
closest('cat', ['cow', 'dog', 'pig']);

ka-weihe · 2020-07-27T07:07:45Z

The reason for this package to be a wrapper of fastest-levenshtein is that this package already has a lot of people using it. Why wouldn't you merge a PR that clearly makes it better for them. It does not matter whether fastest-levenshtein has the same functionality or not. This package has less use if it uses leven, bacause then it is just an intentionally slow alternative. You should reconsider.

tanhauhau · 2020-08-12T15:53:32Z

@JLHwung @nicolo-ribaudo i need some advice over here.

levenary was created as a super tiny library for babel babel/babel#10924, which is

slimer and faster
and supports flow type.

@ka-weihe here has implemented an even faster levensthein distance library, according to his benchmark.

im a bit hesitant to swap leven with fastest-levenshtein over here, because fastest-levenshtein has the exact same api of levenary.

from what i see the main dependent on levenary is babel, i would suggest to replace levenary with fastest-levenshtein as it is much faster, rather than having deep nested dependencies: babel -> levenary -> fastest-levensthein.

what do you think?

ka-weihe · 2020-08-22T14:48:59Z

@JLHwung @nicolo-ribaudo

JLHwung · 2020-08-24T15:38:05Z

@tanhauhau For context: babel/babel#11911 (comment)

Babel does not swap levenary because it is smaller than fastest-levenshtein. It does not necessarily imply that people should always use levenary. Babel uses levenary to pick up a suggestion from a very small array (length ~ 10) and it is only executed when users does not offer a correct option key. In this specific and limited issue scope, I don't feel it necessarily to switch.

@ka-weihe I understand your passion on promoting fastest-levenshtein. Although Babel can offer a significant download, it will be even better if you can seek for other libraries that depends heavily on levenstein algorithm, so it can have greater impact on the communities.

ka-weihe · 2020-08-24T16:35:32Z

I get that Babel doesn't need a fast levenshtein implementation since you only use it on a very small array, but I don't get why you would have two dependecies for something so simple! This (2 line - 150 bytes) piece of code has the functionality of Levenary and Leven combined.

o=Math.min,a=((a,e)=>(d=e.map(d=>((d,a)=>(g=(e=>e*n?1+o(g(e,--n),g(--e)
-(d[e]==a[n++]),g(e)):e+n))(d.length,n=a.length))(a,d)),e[d.indexOf(o(...d))]));

If you don't believe that it works... try run it in your browser.

a('cat', ['cow', 'dog', 'pig']);
//=> 'cow'

I suggest people to just write a levenshtein implementation themselves or copy a simple one from google. If performance matters, add a dependency. That is why I think levenary should be fast (not use leven) or at least not deceive people with its benchmarks #8. @tanhauhau

JLHwung · 2020-08-25T22:26:48Z

o=Math.min,a=((a,e)=>(d=e.map(d=>((d,a)=>(g=(e=>e*n?1+o(g(e,--n),g(--e)
-(d[e]==a[n++]),g(e)):e+n))(d.length,n=a.length))(a,d)),e[d.indexOf(o(...d))]));

Thanks for your input. However the recursive version is significantly slow due to the overhead of JS functions. I went with https://rosettacode.org/wiki/Levenshtein_distance#ES5 which is small but not that slow for some 20 character strings.

ka-weihe added 3 commits July 22, 2020 16:38

use fastest-levenshtein instead of leven

573d4be

new version

513ed0b

remove lock

677b92a

tanhauhau closed this Jul 26, 2020

JLHwung mentioned this pull request Aug 25, 2020

refactor: add @babel/helper-validator-option babel/babel#12006

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use fastest-levenshtein instead of leven #7

Use fastest-levenshtein instead of leven #7

ka-weihe commented Jul 22, 2020 •

edited

Loading

tanhauhau commented Jul 26, 2020

tanhauhau commented Jul 26, 2020 •

edited

Loading

ka-weihe commented Jul 27, 2020 •

edited

Loading

tanhauhau commented Aug 12, 2020

ka-weihe commented Aug 22, 2020

JLHwung commented Aug 24, 2020

ka-weihe commented Aug 24, 2020 •

edited

Loading

JLHwung commented Aug 25, 2020

Use fastest-levenshtein instead of leven #7

Use fastest-levenshtein instead of leven #7

Conversation

ka-weihe commented Jul 22, 2020 • edited Loading

Benchmark

tanhauhau commented Jul 26, 2020

tanhauhau commented Jul 26, 2020 • edited Loading

ka-weihe commented Jul 27, 2020 • edited Loading

tanhauhau commented Aug 12, 2020

ka-weihe commented Aug 22, 2020

JLHwung commented Aug 24, 2020

ka-weihe commented Aug 24, 2020 • edited Loading

JLHwung commented Aug 25, 2020

ka-weihe commented Jul 22, 2020 •

edited

Loading

tanhauhau commented Jul 26, 2020 •

edited

Loading

ka-weihe commented Jul 27, 2020 •

edited

Loading

ka-weihe commented Aug 24, 2020 •

edited

Loading