Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Levenshtein Distance for dictionaries #43

Closed
MrWook opened this issue Mar 16, 2021 · 5 comments · Fixed by #96
Closed

Add Levenshtein Distance for dictionaries #43

MrWook opened this issue Mar 16, 2021 · 5 comments · Fixed by #96
Labels
enhancement New feature or request help wanted Extra attention is needed
Milestone

Comments

@MrWook
Copy link
Collaborator

MrWook commented Mar 16, 2021

It would be nice if the dictionary matching has the ability to use Levenshtein Distance (LD) calculations to match passwords which are non-exact matches to a dictionary entry.
For example that something like this would match:
{someDictionary: ['Herbert', 'Dorothea']} with the "password" Hebert would match the first entry of the dictionary.
The downside is that this would decrease the performance of the library.

@MrWook MrWook added the enhancement New feature or request label Mar 16, 2021
@Tostino
Copy link
Contributor

Tostino commented May 5, 2021

I'd take a look at what I did for the Java "port" I maintain: https://github.com/GoSimpleLLC/nbvcxz

LD calculations were extremely useful, but my god they slow things down. In nbvcxz I have it as a configuration option if we use an LD pass for our dictionary matching algorithm or not.

Here is the meat of it: https://github.com/GoSimpleLLC/nbvcxz/blob/master/src/main/java/me/gosimple/nbvcxz/matching/DictionaryMatcher.java

@MrWook
Copy link
Collaborator Author

MrWook commented May 5, 2021

That's what i wanted to do anyway if i implement it :D

Actually i thought exactly the same thing that this will slow down everything extremely. So i thought about making it optional and it would be a good idea to add a debounce zxcvbn function.

Normally this library is used directly on an input field and the distance would make the input really laggy, i guess. So a debounce function which will trigger the real zxcvbn function only after a certain time after the user typed would improve this.

@Tostino
Copy link
Contributor

Tostino commented May 5, 2021

So I optimized the LD pretty well for the common case in my implementation. For the vast majority of passwords, we don't even bother getting to that code path. Check all the cases in the DictionaryMatcher where I short circuit the LD code path because the password isn't a good candidate for it to matter. I only bother doing the LD calculation on the whole password rather than each individual part of the password, cutting down the amount of work required immensely.

But your mention of how this is usually tied directly to a field makes that less of a difference. For my Java library since it's server side, I generally just get updates from the input box it's tied to every so often, and then send my estimate back to the client to update the "password strength meter" the calculation is tied to. As a user is typing, I don't necessarily get an update every keystroke to recalculate things with my web framework, so it doesn't cause issues with how I use it.

I like the idea of having different code paths, one with very tight timing that will give some reasonably accurate result as the user is typing, and then a slower code path that is able to get things "right" once they have stopped mashing the keyboard.

@MrWook MrWook added the help wanted Extra attention is needed label Aug 4, 2021
@JonL1
Copy link
Contributor

JonL1 commented Dec 14, 2021

I was checking some SO topics on this matter and I found this. It might be worth exploring.

https://github.com/gustf/js-levenshtein

@MrWook MrWook added this to the 2.0.0 milestone Jan 8, 2022
@MrWook
Copy link
Collaborator Author

MrWook commented Jan 8, 2022

I finally got the time to do a little bit and added the levenshtein distance.
@Tostino i tried your function to measure the distance and ported it to JS, but it took ages to finish. Maybe i did something wrong while porting it to js but it didn't looked to complicated 🤷
But i implemented your other points like using an option and only check the whole password and not every part of it 👍

I don't like to create a dependency but this seems to be the best option for me. The library is super fast and i already saw some changes in the scoring, some comparison cases on the documentation page dropped by one point for the scoring.

@MrWook MrWook closed this as completed in #96 Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants