Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(frontend): adding a use-case for Levenshtein distance #902

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

bcm-at-zama
Copy link
Contributor

Adding a use-case for Levenshtein distance

closes #https://github.com/zama-ai/concrete-internal/issues/750

@cla-bot cla-bot bot added the cla-signed label Jun 19, 2024
@bcm-at-zama bcm-at-zama marked this pull request as draft June 19, 2024 12:41
@bcm-at-zama bcm-at-zama force-pushed the levenshtein_distance_750 branch 3 times, most recently from f61fa55 to 122de8c Compare June 19, 2024 13:49
@bcm-at-zama
Copy link
Contributor Author

Works quite well but fail from time to time:

Computations in FHE

    Computing Levenshtein between strings '' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'a' - OK in 0.01 seconds
    Computing Levenshtein between strings 'b' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings 'a' and 'a' - OK in 10.76 seconds
    Computing Levenshtein between strings 'a' and 'b' - OK in 11.03 seconds
    Computing Levenshtein between strings '' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'a' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'tv' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'gag' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'wywd' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'oezbr' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'ctuugd' - OK in 0.01 seconds
    Computing Levenshtein between strings 'n' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings 'o' and 'g' - OK in 11.78 seconds
    Computing Levenshtein between strings 'p' and 'sl' - OK in 21.41 seconds
    Computing Levenshtein between strings 'r' and 'qbd' - OK in 30.90 seconds
    Computing Levenshtein between strings 't' and 'vbej' - OK in 43.54 seconds
    Computing Levenshtein between strings 'b' and 'srvxs' - OK in 52.11 seconds

    Computing Levenshtein between strings 'n' and 'fkuftz' - OK in 63.67 seconds
    Computing Levenshtein between strings 'ey' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings 'kd' and 'f' - OK in 20.81 seconds
    Computing Levenshtein between strings 'fv' and 'xh' - OK in 60.51 seconds
    Computing Levenshtein between strings 'mv' and 'dnr' - OK in 120.93 seconds
    Computing Levenshtein between strings 'db' and 'msvl' - OK in 199.30 seconds

    Computing Levenshtein between strings 'hn' and 'whoql'Traceback (most recent call last):
  File "frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py", line 189, in <module>
    assert l1_fhe == l1_clear, f"    {l1_fhe=} and {l1_clear=} are different"
AssertionError:     l1_fhe=5 and l1_clear=4 are different

I have p_error=10**-8 which is already quite low

@bcm-at-zama bcm-at-zama force-pushed the levenshtein_distance_750 branch 2 times, most recently from c9b3afb to d6720e5 Compare June 20, 2024 09:04
@bcm-at-zama
Copy link
Contributor Author

@bcm-at-zama
Copy link
Contributor Author

Hey @umut-sahin , would you have a look, to see if we can make it faster, please?

@bcm-at-zama bcm-at-zama force-pushed the levenshtein_distance_750 branch 3 times, most recently from 0ee22ef to e2dceee Compare June 28, 2024 16:39
@bcm-at-zama
Copy link
Contributor Author

Now I have added a system for alphabet: eg, one can do

python frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py --autotest --alphabet ACTG --show_mlir

and you'll see it generates random string of A, C, T and G.

@bcm-at-zama
Copy link
Contributor Author

It's very easy to add a new alphabet

@bcm-at-zama
Copy link
Contributor Author

And there is a new option, to make perks on all alphabets:

python frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py --autoperf

Typical performances for alphabet ACTG, with string of maximal length:

    Computing Levenshtein between strings 'AACG' and 'GATT' - OK in 5.24 seconds
    Computing Levenshtein between strings 'ACCG' and 'GTGG' - OK in 4.71 seconds
    Computing Levenshtein between strings 'TTTC' and 'TTTA' - OK in 4.87 seconds

Typical performances for alphabet string, with string of maximal length:

    Computing Levenshtein between strings 'skon' and 'iisi' - OK in 14.67 seconds
    Computing Levenshtein between strings 'qukm' and 'vufu' - OK in 15.01 seconds
    Computing Levenshtein between strings 'afoe' and 'kbwh' - OK in 14.62 seconds

Typical performances for alphabet STRING, with string of maximal length:

    Computing Levenshtein between strings 'RPJX' and 'MLZU' - OK in 14.33 seconds
    Computing Levenshtein between strings 'GYAQ' and 'IEWC' - OK in 13.73 seconds
    Computing Levenshtein between strings 'GEUC' and 'CLUI' - OK in 15.11 seconds

Typical performances for alphabet StRiNg, with string of maximal length:

    Computing Levenshtein between strings 'oXcM' and 'Igjh' - OK in 30.11 seconds
    Computing Levenshtein between strings 'pgBk' and 'GuOp' - OK in 28.20 seconds
    Computing Levenshtein between strings 'jScn' and 'yRRN' - OK in 30.81 seconds

Successful end

@bcm-at-zama bcm-at-zama changed the title docs(frontend): adding a use-case for Levenshtein distance [WIP] docs(frontend): adding a use-case for Levenshtein distance Jun 28, 2024
@bcm-at-zama
Copy link
Contributor Author

What do you think guys? When you like the code, I add a small .md and measure that on AWS

@bcm-at-zama bcm-at-zama marked this pull request as ready for review June 28, 2024 17:07
@bcm-at-zama
Copy link
Contributor Author

@bcm-at-zama
Copy link
Contributor Author

No more blocked by 794

print("")

if args.autoperf:
for alphabet in ["ACTG", "string", "STRING", "StRiNg"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really prefer to have a class for the alphabet with encode, decode, check, ... methods. It'd make the example much clear IMO.

Also you would be able to do:

for alphabet in [Alphabet.lowercase(), Alphabet.uppercase(), Alphabet.dna(), ...]:

Which is less error prone and more clear IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of OOP, especially when like here, it doesn't seem needed. But if you want really it, I'll do. (And for sure, I have not the right habits for OOP so I am going to make something not idomatic, you'll correct me)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is OOP. Sure it has a bit of encapsulation and maybe some abstraction, but there is no inheritance or polymorphism. It's just a way to make alphabets more type safe and understandable.

Would be better to me, wdyt @BourgerieQuentin?

Copy link
Member

@BourgerieQuentin BourgerieQuentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment feel free to include or not, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants