Specially crafted string on normalize function returns an abnormally long list #32

akari-dogman · 2022-01-10T05:25:19Z

If you run normalize on this string:

abing🪀|C-01 |🍏inv100+ Lv16推赞1300

then it will return an EXTREMELY long (3981312 entries) list.

It must be getting caught on something, because a 3+ million list attempting to normalize a string is absurd.

To reproduce:

import confusables

foo = "abing🪀|C-01 |🍏inv100+ Lv16推赞1300"

x = confusables.normalize(foo)

print(len(x))
print(x)

The text was updated successfully, but these errors were encountered:

ThioJoe · 2022-01-10T20:15:06Z

I mean when you print out the number of confusable characters for each character in the string you get the following:

a : 115
b : 63
i : 155
n : 72
g : 57
🪀 : 1
| : 141
C : 65
- : 13
0 : 193
1 : 141
  : 18
🍏 : 1
v : 64
+ : 4
L : 147
6 : 17
推 : 1
赞 : 1
3 : 127

So if it is trying to come up with all the different combinations of confusables, yea it could easily get to 3 million depending on how it does it. Though the prioritize_alpha option doesn't seem to make a difference which it should.

If you happen to be trying to find words in the strings by normalizing them first then searching them, the much much better way seems to be to just use the confusable_regex option.

akari-dogman closed this as completed Oct 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specially crafted string on normalize function returns an abnormally long list #32

Specially crafted string on normalize function returns an abnormally long list #32

akari-dogman commented Jan 10, 2022

ThioJoe commented Jan 10, 2022 •

edited

Loading

Specially crafted string on normalize function returns an abnormally long list #32

Specially crafted string on normalize function returns an abnormally long list #32

Comments

akari-dogman commented Jan 10, 2022

ThioJoe commented Jan 10, 2022 • edited Loading

ThioJoe commented Jan 10, 2022 •

edited

Loading