Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For randomly generated inputs, the value of guessesLog10 is incorrectly equal to the length of the string. #216

Closed
aaronstanek opened this issue Jun 20, 2023 · 2 comments

Comments

@aaronstanek
Copy link

For a string of random characters, the interactive demo at https://zxcvbn-ts.github.io/zxcvbn/demo/ provides values for guessesLog10 that are equal to the length of the string. The correct values for guessesLog10 are larger than those given by the demo.

Assumptions

The equation that I am using to compute the expected value of guessesLog10 for a password composed of randomly selected characters is:

guessesLog10 = log10( alphabetSize ^ passwordLength / 2 )

Where alpahbetSize is the number of distinct characters that we select from when generating the password, and passwordLength is the number of characters in the password.

Using the equation above, I would expect a string of 20 random ASCII alphanumeric characters (alphabetSize=52) to have a guessesLog10 around 34.

Steps to Reproduce

Example 1

  • Generate a random string of 20 ASCII alphanumeric characters. I used TJsPwSNxryMg9eQmDr6G.
  • Enter this string into the Password field of the demo website.
  • Record the value for guessLog10. I got 20 when I expected 34 using the equation above.

Example 2

  • Generate a random string of 100 ASCII alphanumeric characters. I used HBdFQgfp38AdhzF0QP6KD5U33qE2nIEeNQ7cmw9ZYgZARO4HFcKAdQRBgIgvXxjm0Ws3JUgiVcXuCzAIgNDccAQ3XBKur68nMbAm.
  • Enter this string into the Password field of the demo website.
  • Record the value for guessLog10. I got 100 when I expected 171 using the equation above.

Example 3

  • Generate a random string of 40 printable ASCII characters. (alphabetSize=94) I used {m3@JtR&'BNMo,NI7K3oWF8Ug36+ie7<_Z,&!h*T
  • Enter this string into the Password field of the demo website.
  • Record the value for guessLog10. I got 40 when I expected 79 using the equation above.
@MrWook
Copy link
Collaborator

MrWook commented Jun 20, 2023

Hey thank you for your issue.

I think, it's good as it is as i rather have a lower score 🤔
But to elaborate a little.

In general, the password entropy is not really calculated here as you did. In the old repository, the entropy was explicitly removed https://github.com/dropbox/zxcvbn/releases/tag/4.0.1

In zxcvbn, a guess value is created, which is then used to determine the guessLog10 and the scoring.
In the bruteforce matcher, which is made for such random character strings, this guess value is calculated with BRUTEFORCE_CARDINALITY ** token.length where the constant is 10. Unfortunately there is no explanation in the original repository why 10 was taken but I think it has something to do with getting a decent value that is not too high but also not too low.

For example here is a little comparison:
zxcvbn-ts:
A random password with 11 characters is at the edge of dropping from a 4/4 scoring to a 3/4 score with a entropy of 11.

Your calculation
A random password with 7 characters is at the edge of dropping from a 4/4 scoring to a 3/4 score with a entropy of 12.

As you can see your calculation would score the password way to high for such a small password.

Moreover I have no idea of all possible combination for the alphabetSize as this library is not intended for latin characters only.
If a password field is correctly implemented it should be possible to add all kinds of unicode characters. For example the polish language package already have some included like ł. And if we were to merge language packages like the Persian one #136 we have completely different set of possible characters.

@aaronstanek
Copy link
Author

Thank you for your reply. Your explanation was really informative.

Using 10 as a fixed base makes a lot of sense, I can see why the original repository would want to use a good-enough value instead of trying to handle every single unicode character.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants