Separator Matcher doesn't catch first separator #193

MrWook · 2023-05-06T09:04:42Z

@domosapien i wanted to publish the new major version and tested everything before and it seems like some of your later changes broke the separater matcher a little bit. Your video from #115 doesn't seem the source of true anymore as the string buy by beer splits into buy by as bruteforce, as separator and beer as a dictionary.
I think the first approach was a better idea to have specific chars that acts as separators 🤔 What do you think?

The text was updated successfully, but these errors were encountered:

domosapien · 2023-05-06T13:42:24Z

It's up to you. It made a little more sense to me to allow for any special character. I also didn't like how much I was mangling the bruteforce and repeating checks, so I loosened those. Sorry, I should have brought that up and not just left it for you to discover. Although I don't know why it prefers the bruteforce over the dictionary + separator. The limited character set shouldn't matter, I could revert back to only the handful I had. I would assume most people expect only a small handful to act as separators, but to me the repeating special char between words makes sense to mark as a separator regardless of what is used. The bigger thing would be to ensure that the bruteforce algorithm doesn't get to select as broad ranges any more. That was the bigger problem that seemed to eat more inputs when I was testing that I relaxed a bit in later versions (I let the separator match and then the algorithm determine if bruteforce was better or not). I have some time today in a few hours and I can put out another PR with changes. Let me know what you think would be best.

…

On Sat, May 6, 2023, 05:04 MrWook ***@***.***> wrote: @domosapien <https://github.com/domosapien> i wanted to publish the new major version and tested everything before and it seems like some of your later changes broke the separater matcher a little bit. Your video from #115 <#115> doesn't seem the source of true anymore as the string buy by beer splits into buy by as bruteforce, as separator and beer as a dictionary. I think the first approach was a better idea to have specific chars that acts as separators 🤔 What do you think? — Reply to this email directly, view it on GitHub <#193>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABI3I5WASIOSJGVG3V2DH73XEYH3LANCNFSM6AAAAAAXX7IZD4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

MrWook · 2023-05-06T17:49:49Z

It appears reasonable to check for a specific set of special characters, as the Java port may also be utilizing any special character. However, when adding other languages such as Persian, every character could be considered a special character. Therefore, I propose that we define a fixed set of separators, and trigger them if at least one is present. The suggested separators include:

[
' ',
',',
';',
':',
'|',
'/',
'\\',
'-',
'_',
'.',
]

Naturally, this list should be customizable, allowing users to define their own set of separators or consider all special characters as separators if they prefer.

I also had the opportunity to explore the earlier implementation where you made adjustments to the repeat and brute-force matchers. That version had a significant flaw, as it considered by by a strong password. Consequently, the current implementation appears to be an improvement in this aspect.

domosapien · 2023-05-06T20:54:00Z

Ok, I've made some changes but they aren't 100% done and I have to run again. I will try to finish this up later today (I'm in eastern US). I wasn't aware a regex of `\W` or `[^\w]` wouldn't match on other locales. I've reverted the broader range of allowed chars and am matching only the ones you give. I also fixed a bug in changes to the bruteforce matcher that I added (if the password changed, for example when we were checking for repeats it uses the same container to track patterns and I look up the regex, we wouldn't reset on password change). Now, `buy by beer` gives: ``` { calcTime: 212, password: 'buy by beer', guesses: 130000000, guessesLog10: 8.113943352306837, sequence: [ { pattern: 'bruteforce', token: 'bu', i: 0, j: 1, guesses: 100, guessesLog10: 2 }, { pattern: 'repeat', i: 2, j: 7, token: 'y by b', baseToken: 'y b', baseGuesses: 23, repeatCount: 2, guesses: 50, guessesLog10: 1.6989700043360185 }, { pattern: 'bruteforce', token: 'eer', i: 8, j: 10, guesses: 1000, guessesLog10: 2.9999999999999996 } ], crackTimesSeconds: { onlineThrottling100PerHour: 4680000000, onlineNoThrottling10PerSecond: 13000000, offlineSlowHashing1e4PerSecond: 13000, offlineFastHashing1e10PerSecond: 0.013 }, crackTimesDisplay: { onlineThrottling100PerHour: 'centuries', onlineNoThrottling10PerSecond: '5 months', offlineSlowHashing1e4PerSecond: '4 hours', offlineFastHashing1e10PerSecond: 'less than a second' }, score: 3, feedback: { warning: null, suggestions: [] } } ``` Removing the changes I made to bruteforce matching, I get ``` { calcTime: 209, password: 'buy by beer', guesses: 100000000, guessesLog10: 8, sequence: [ { pattern: 'bruteforce', token: 'buy by', i: 0, j: 5, guesses: 1000000, guessesLog10: 5.999999999999999 }, { pattern: 'separator', token: ' ', i: 6, j: 6, guesses: 0, guessesLog10: 0 }, { pattern: 'dictionary', i: 7, j: 10, token: 'beer', matchedWord: 'beer', rank: 514, dictionaryName: 'passwords', reversed: false, l33t: false, baseGuesses: 514, uppercaseVariations: 1, l33tVariations: 1, guesses: 514, guessesLog10: 2.7109631189952754 } ], crackTimesSeconds: { onlineThrottling100PerHour: 3600000000, onlineNoThrottling10PerSecond: 10000000, offlineSlowHashing1e4PerSecond: 10000, offlineFastHashing1e10PerSecond: 0.01 }, crackTimesDisplay: { onlineThrottling100PerHour: 'centuries', onlineNoThrottling10PerSecond: '4 months', offlineSlowHashing1e4PerSecond: '3 hours', offlineFastHashing1e10PerSecond: 'less than a second' }, score: 2, feedback: { warning: null, suggestions: [ 'Add more words that are less common.' ] } } ``` So it seems like I should probably remove that or change the repeat to not include separators (which I had before, and what the original demo showed). Like I said, I'll mess around with it some more later, unless you have an opinion on what I should or should not do. * Let separator matches just match and let the algorithm pick * Remove separator matching from brute force options (bruteforce will be used more often than separator + dictionary or other match) * Remove separator matching from repeat (this is less likely, but with separators a repeat pattern can happen)

…

On Sat, May 6, 2023 at 1:49 PM MrWook ***@***.***> wrote: It appears reasonable to check for a specific set of special characters, as the Java port may also be utilizing any special character. However, when adding other languages such as Persian <#136>, every character could be considered a special character. Therefore, I propose that we define a fixed set of separators, and trigger them if at least one is present. The suggested separators include: [ ' ', ',', ';', ':', '|', '/', '\\', '-', '_', '.', ] Naturally, this list should be customizable, allowing users to define their own set of separators or consider all special characters as separators if they prefer. I also had the opportunity to explore the earlier implementation where you made adjustments to the repeat and brute-force matchers. That version had a significant flaw, as it considered by by a strong password. Consequently, the current implementation appears to be an improvement in this aspect. — Reply to this email directly, view it on GitHub <#193 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABI3I5WRVHJVABA6UCTGVPLXE2FMPANCNFSM6AAAAAAXX7IZD4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

domosapien · 2023-05-07T03:11:51Z

With the repeat matcher turned off: ``` { calcTime: 146, password: 'buy by beer', guesses: 10000000000000000, guessesLog10: 16, sequence: [ { pattern: 'dictionary', i: 0, j: 2, token: 'buy', matchedWord: 'buy', rank: 509, dictionaryName: 'commonWords', reversed: false, l33t: false, baseGuesses: 509, uppercaseVariations: 1, l33tVariations: 1, guesses: 509, guessesLog10: 2.7067177823367583 }, { pattern: 'separator', token: ' ', i: 3, j: 3, guesses: 10, guessesLog10: 1 }, { pattern: 'dictionary', i: 4, j: 5, token: 'by', matchedWord: 'by', rank: 11, dictionaryName: 'wikipedia', reversed: false, l33t: false, baseGuesses: 11, uppercaseVariations: 1, l33tVariations: 1, guesses: 50, guessesLog10: 1.6989700043360185 }, { pattern: 'separator', token: ' ', i: 6, j: 6, guesses: 0, guessesLog10: 0 }, { pattern: 'dictionary', i: 7, j: 10, token: 'beer', matchedWord: 'beer', rank: 514, dictionaryName: 'passwords', reversed: false, l33t: false, baseGuesses: 514, uppercaseVariations: 1, l33tVariations: 1, guesses: 514, guessesLog10: 2.7109631189952754 } ], crackTimesSeconds: { onlineThrottling100PerHour: 360000000000000000, onlineNoThrottling10PerSecond: 1000000000000000, offlineSlowHashing1e4PerSecond: 1000000000000, offlineFastHashing1e10PerSecond: 1000000 }, crackTimesDisplay: { onlineThrottling100PerHour: 'centuries', onlineNoThrottling10PerSecond: 'centuries', offlineSlowHashing1e4PerSecond: 'centuries', offlineFastHashing1e10PerSecond: '12 days' }, score: 4, feedback: { warning: null, suggestions: [] } } ``` So the score is definitely ballooning. So it seems like, at least for now, I will just match separators and remove the changes in the bruteforce / repeater matchers so the algorithm can make whatever guess it wants (but this may not use separators as often as a result).

…

On Sat, May 6, 2023 at 4:53 PM Zach Werner ***@***.***> wrote: Ok, I've made some changes but they aren't 100% done and I have to run again. I will try to finish this up later today (I'm in eastern US). I wasn't aware a regex of `\W` or `[^\w]` wouldn't match on other locales. I've reverted the broader range of allowed chars and am matching only the ones you give. I also fixed a bug in changes to the bruteforce matcher that I added (if the password changed, for example when we were checking for repeats it uses the same container to track patterns and I look up the regex, we wouldn't reset on password change). Now, `buy by beer` gives: ``` { calcTime: 212, password: 'buy by beer', guesses: 130000000, guessesLog10: 8.113943352306837, sequence: [ { pattern: 'bruteforce', token: 'bu', i: 0, j: 1, guesses: 100, guessesLog10: 2 }, { pattern: 'repeat', i: 2, j: 7, token: 'y by b', baseToken: 'y b', baseGuesses: 23, repeatCount: 2, guesses: 50, guessesLog10: 1.6989700043360185 }, { pattern: 'bruteforce', token: 'eer', i: 8, j: 10, guesses: 1000, guessesLog10: 2.9999999999999996 } ], crackTimesSeconds: { onlineThrottling100PerHour: 4680000000, onlineNoThrottling10PerSecond: 13000000, offlineSlowHashing1e4PerSecond: 13000, offlineFastHashing1e10PerSecond: 0.013 }, crackTimesDisplay: { onlineThrottling100PerHour: 'centuries', onlineNoThrottling10PerSecond: '5 months', offlineSlowHashing1e4PerSecond: '4 hours', offlineFastHashing1e10PerSecond: 'less than a second' }, score: 3, feedback: { warning: null, suggestions: [] } } ``` Removing the changes I made to bruteforce matching, I get ``` { calcTime: 209, password: 'buy by beer', guesses: 100000000, guessesLog10: 8, sequence: [ { pattern: 'bruteforce', token: 'buy by', i: 0, j: 5, guesses: 1000000, guessesLog10: 5.999999999999999 }, { pattern: 'separator', token: ' ', i: 6, j: 6, guesses: 0, guessesLog10: 0 }, { pattern: 'dictionary', i: 7, j: 10, token: 'beer', matchedWord: 'beer', rank: 514, dictionaryName: 'passwords', reversed: false, l33t: false, baseGuesses: 514, uppercaseVariations: 1, l33tVariations: 1, guesses: 514, guessesLog10: 2.7109631189952754 } ], crackTimesSeconds: { onlineThrottling100PerHour: 3600000000, onlineNoThrottling10PerSecond: 10000000, offlineSlowHashing1e4PerSecond: 10000, offlineFastHashing1e10PerSecond: 0.01 }, crackTimesDisplay: { onlineThrottling100PerHour: 'centuries', onlineNoThrottling10PerSecond: '4 months', offlineSlowHashing1e4PerSecond: '3 hours', offlineFastHashing1e10PerSecond: 'less than a second' }, score: 2, feedback: { warning: null, suggestions: [ 'Add more words that are less common.' ] } } ``` So it seems like I should probably remove that or change the repeat to not include separators (which I had before, and what the original demo showed). Like I said, I'll mess around with it some more later, unless you have an opinion on what I should or should not do. * Let separator matches just match and let the algorithm pick * Remove separator matching from brute force options (bruteforce will be used more often than separator + dictionary or other match) * Remove separator matching from repeat (this is less likely, but with separators a repeat pattern can happen) On Sat, May 6, 2023 at 1:49 PM MrWook ***@***.***> wrote: > It appears reasonable to check for a specific set of special characters, > as the Java port may also be utilizing any special character. However, when > adding other languages such as Persian > <#136>, every character could be > considered a special character. Therefore, I propose that we define a fixed > set of separators, and trigger them if at least one is present. The > suggested separators include: > > [ > ' ', > ',', > ';', > ':', > '|', > '/', > '\\', > '-', > '_', > '.', > ] > > Naturally, this list should be customizable, allowing users to define > their own set of separators or consider all special characters as > separators if they prefer. > > I also had the opportunity to explore the earlier implementation where > you made adjustments to the repeat and brute-force matchers. That version > had a significant flaw, as it considered by by a strong password. > Consequently, the current implementation appears to be an improvement in > this aspect. > > — > Reply to this email directly, view it on GitHub > <#193 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABI3I5WRVHJVABA6UCTGVPLXE2FMPANCNFSM6AAAAAAXX7IZD4> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

MrWook added the bug Something isn't working label May 6, 2023

domosapien mentioned this issue May 7, 2023

Separator matching cleanup #195

Merged

MrWook linked a pull request May 7, 2023 that will close this issue

Separator matching cleanup #195

Merged

MrWook closed this as completed May 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separator Matcher doesn't catch first separator #193

Separator Matcher doesn't catch first separator #193

MrWook commented May 6, 2023

domosapien commented May 6, 2023 via email

MrWook commented May 6, 2023

domosapien commented May 6, 2023 via email

domosapien commented May 7, 2023 via email

Separator Matcher doesn't catch first separator #193

Separator Matcher doesn't catch first separator #193

Comments

MrWook commented May 6, 2023

domosapien commented May 6, 2023 via email

MrWook commented May 6, 2023

domosapien commented May 6, 2023 via email

domosapien commented May 7, 2023 via email