For randomly generated inputs, the value of guessesLog10 is incorrectly equal to the length of the string. #216

aaronstanek · 2023-06-20T00:35:23Z

For a string of random characters, the interactive demo at https://zxcvbn-ts.github.io/zxcvbn/demo/ provides values for guessesLog10 that are equal to the length of the string. The correct values for guessesLog10 are larger than those given by the demo.

Assumptions

The equation that I am using to compute the expected value of guessesLog10 for a password composed of randomly selected characters is:

guessesLog10 = log10( alphabetSize ^ passwordLength / 2 )

Where alpahbetSize is the number of distinct characters that we select from when generating the password, and passwordLength is the number of characters in the password.

Using the equation above, I would expect a string of 20 random ASCII alphanumeric characters (alphabetSize=52) to have a guessesLog10 around 34.

Steps to Reproduce

Example 1

Generate a random string of 20 ASCII alphanumeric characters. I used TJsPwSNxryMg9eQmDr6G.
Enter this string into the Password field of the demo website.
Record the value for guessLog10. I got 20 when I expected 34 using the equation above.

Example 2

Generate a random string of 100 ASCII alphanumeric characters. I used HBdFQgfp38AdhzF0QP6KD5U33qE2nIEeNQ7cmw9ZYgZARO4HFcKAdQRBgIgvXxjm0Ws3JUgiVcXuCzAIgNDccAQ3XBKur68nMbAm.
Enter this string into the Password field of the demo website.
Record the value for guessLog10. I got 100 when I expected 171 using the equation above.

Example 3

Generate a random string of 40 printable ASCII characters. (alphabetSize=94) I used {m3@JtR&'BNMo,NI7K3oWF8Ug36+ie7<_Z,&!h*T
Enter this string into the Password field of the demo website.
Record the value for guessLog10. I got 40 when I expected 79 using the equation above.

The text was updated successfully, but these errors were encountered:

MrWook · 2023-06-20T04:09:23Z

Hey thank you for your issue.

I think, it's good as it is as i rather have a lower score 🤔
But to elaborate a little.

In general, the password entropy is not really calculated here as you did. In the old repository, the entropy was explicitly removed https://github.com/dropbox/zxcvbn/releases/tag/4.0.1

In zxcvbn, a guess value is created, which is then used to determine the guessLog10 and the scoring.
In the bruteforce matcher, which is made for such random character strings, this guess value is calculated with BRUTEFORCE_CARDINALITY ** token.length where the constant is 10. Unfortunately there is no explanation in the original repository why 10 was taken but I think it has something to do with getting a decent value that is not too high but also not too low.

For example here is a little comparison:
zxcvbn-ts:
A random password with 11 characters is at the edge of dropping from a 4/4 scoring to a 3/4 score with a entropy of 11.

Your calculation
A random password with 7 characters is at the edge of dropping from a 4/4 scoring to a 3/4 score with a entropy of 12.

As you can see your calculation would score the password way to high for such a small password.

Moreover I have no idea of all possible combination for the alphabetSize as this library is not intended for latin characters only.
If a password field is correctly implemented it should be possible to add all kinds of unicode characters. For example the polish language package already have some included like ł. And if we were to merge language packages like the Persian one #136 we have completely different set of possible characters.

aaronstanek · 2023-07-19T23:17:02Z

Thank you for your reply. Your explanation was really informative.

Using 10 as a fixed base makes a lot of sense, I can see why the original repository would want to use a good-enough value instead of trying to handle every single unicode character.

aaronstanek closed this as completed Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For randomly generated inputs, the value of guessesLog10 is incorrectly equal to the length of the string. #216

For randomly generated inputs, the value of guessesLog10 is incorrectly equal to the length of the string. #216

aaronstanek commented Jun 20, 2023

MrWook commented Jun 20, 2023

aaronstanek commented Jul 19, 2023

For randomly generated inputs, the value of guessesLog10 is incorrectly equal to the length of the string. #216

For randomly generated inputs, the value of guessesLog10 is incorrectly equal to the length of the string. #216

Comments

aaronstanek commented Jun 20, 2023

Assumptions

Steps to Reproduce

Example 1

Example 2

Example 3

MrWook commented Jun 20, 2023

aaronstanek commented Jul 19, 2023