Add Levenshtein Distance for dictionaries #43

MrWook · 2021-03-16T17:53:26Z

It would be nice if the dictionary matching has the ability to use Levenshtein Distance (LD) calculations to match passwords which are non-exact matches to a dictionary entry.
For example that something like this would match:
{someDictionary: ['Herbert', 'Dorothea']} with the "password" Hebert would match the first entry of the dictionary.
The downside is that this would decrease the performance of the library.

The text was updated successfully, but these errors were encountered:

Tostino · 2021-05-05T03:57:52Z

I'd take a look at what I did for the Java "port" I maintain: https://github.com/GoSimpleLLC/nbvcxz

LD calculations were extremely useful, but my god they slow things down. In nbvcxz I have it as a configuration option if we use an LD pass for our dictionary matching algorithm or not.

Here is the meat of it: https://github.com/GoSimpleLLC/nbvcxz/blob/master/src/main/java/me/gosimple/nbvcxz/matching/DictionaryMatcher.java

MrWook · 2021-05-05T06:26:52Z

That's what i wanted to do anyway if i implement it :D

Actually i thought exactly the same thing that this will slow down everything extremely. So i thought about making it optional and it would be a good idea to add a debounce zxcvbn function.

Normally this library is used directly on an input field and the distance would make the input really laggy, i guess. So a debounce function which will trigger the real zxcvbn function only after a certain time after the user typed would improve this.

Tostino · 2021-05-05T17:36:45Z

So I optimized the LD pretty well for the common case in my implementation. For the vast majority of passwords, we don't even bother getting to that code path. Check all the cases in the DictionaryMatcher where I short circuit the LD code path because the password isn't a good candidate for it to matter. I only bother doing the LD calculation on the whole password rather than each individual part of the password, cutting down the amount of work required immensely.

But your mention of how this is usually tied directly to a field makes that less of a difference. For my Java library since it's server side, I generally just get updates from the input box it's tied to every so often, and then send my estimate back to the client to update the "password strength meter" the calculation is tied to. As a user is typing, I don't necessarily get an update every keystroke to recalculate things with my web framework, so it doesn't cause issues with how I use it.

I like the idea of having different code paths, one with very tight timing that will give some reasonably accurate result as the user is typing, and then a slower code path that is able to get things "right" once they have stopped mashing the keyboard.

JonL1 · 2021-12-14T14:50:44Z

I was checking some SO topics on this matter and I found this. It might be worth exploring.

https://github.com/gustf/js-levenshtein

MrWook · 2022-01-08T21:58:19Z

I finally got the time to do a little bit and added the levenshtein distance.
@Tostino i tried your function to measure the distance and ported it to JS, but it took ages to finish. Maybe i did something wrong while porting it to js but it didn't looked to complicated 🤷
But i implemented your other points like using an option and only check the whole password and not every part of it 👍

I don't like to create a dependency but this seems to be the best option for me. The library is super fast and i already saw some changes in the scoring, some comparison cases on the documentation page dropped by one point for the scoring.

MrWook added the enhancement New feature or request label Mar 16, 2021

MrWook added the help wanted Extra attention is needed label Aug 4, 2021

MrWook added this to the 2.0.0 milestone Jan 8, 2022

MrWook mentioned this issue Jan 8, 2022

Add levenshtein distance #96

Merged

MrWook closed this as completed in #96 Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Levenshtein Distance for dictionaries #43

Add Levenshtein Distance for dictionaries #43

MrWook commented Mar 16, 2021

Tostino commented May 5, 2021

MrWook commented May 5, 2021

Tostino commented May 5, 2021 •

edited

Loading

JonL1 commented Dec 14, 2021

MrWook commented Jan 8, 2022

Add Levenshtein Distance for dictionaries #43

Add Levenshtein Distance for dictionaries #43

Comments

MrWook commented Mar 16, 2021

Tostino commented May 5, 2021

MrWook commented May 5, 2021

Tostino commented May 5, 2021 • edited Loading

JonL1 commented Dec 14, 2021

MrWook commented Jan 8, 2022

Tostino commented May 5, 2021 •

edited

Loading