docs(frontend): adding a use-case for Levenshtein distance #902

bcm-at-zama · 2024-06-19T12:41:04Z

Adding a use-case for Levenshtein distance

closes #https://github.com/zama-ai/concrete-internal/issues/750

bcm-at-zama · 2024-06-19T13:53:09Z

Works quite well but fail from time to time:

Computations in FHE

    Computing Levenshtein between strings '' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'a' - OK in 0.01 seconds
    Computing Levenshtein between strings 'b' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings 'a' and 'a' - OK in 10.76 seconds
    Computing Levenshtein between strings 'a' and 'b' - OK in 11.03 seconds
    Computing Levenshtein between strings '' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'a' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'tv' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'gag' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'wywd' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'oezbr' - OK in 0.01 seconds
    Computing Levenshtein between strings '' and 'ctuugd' - OK in 0.01 seconds
    Computing Levenshtein between strings 'n' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings 'o' and 'g' - OK in 11.78 seconds
    Computing Levenshtein between strings 'p' and 'sl' - OK in 21.41 seconds
    Computing Levenshtein between strings 'r' and 'qbd' - OK in 30.90 seconds
    Computing Levenshtein between strings 't' and 'vbej' - OK in 43.54 seconds
    Computing Levenshtein between strings 'b' and 'srvxs' - OK in 52.11 seconds

    Computing Levenshtein between strings 'n' and 'fkuftz' - OK in 63.67 seconds
    Computing Levenshtein between strings 'ey' and '' - OK in 0.01 seconds
    Computing Levenshtein between strings 'kd' and 'f' - OK in 20.81 seconds
    Computing Levenshtein between strings 'fv' and 'xh' - OK in 60.51 seconds
    Computing Levenshtein between strings 'mv' and 'dnr' - OK in 120.93 seconds
    Computing Levenshtein between strings 'db' and 'msvl' - OK in 199.30 seconds

    Computing Levenshtein between strings 'hn' and 'whoql'Traceback (most recent call last):
  File "frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py", line 189, in <module>
    assert l1_fhe == l1_clear, f"    {l1_fhe=} and {l1_clear=} are different"
AssertionError:     l1_fhe=5 and l1_clear=4 are different

I have p_error=10**-8 which is already quite low

bcm-at-zama · 2024-06-19T14:02:18Z

Blocked by https://github.com/zama-ai/concrete-internal/issues/754, and a bit by https://github.com/zama-ai/concrete-internal/issues/752 and https://github.com/zama-ai/concrete-internal/issues/753

bcm-at-zama · 2024-06-21T16:55:32Z

Now:

https://github.com/zama-ai/concrete-internal/issues/754: no more semantic issue (with wires + @rudy-6-4 has a fix for the issue)
https://github.com/zama-ai/concrete-internal/issues/752: good now
https://github.com/zama-ai/concrete-internal/issues/753: closed

bcm-at-zama · 2024-06-24T08:39:51Z

Hey @umut-sahin , would you have a look, to see if we can make it faster, please?

frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py

bcm-at-zama · 2024-06-28T16:40:13Z

Now I have added a system for alphabet: eg, one can do

python frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py --autotest --alphabet ACTG --show_mlir

and you'll see it generates random string of A, C, T and G.

bcm-at-zama · 2024-06-28T16:40:58Z

It's very easy to add a new alphabet

bcm-at-zama · 2024-06-28T17:07:04Z

And there is a new option, to make perks on all alphabets:

python frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py --autoperf

Typical performances for alphabet ACTG, with string of maximal length:

    Computing Levenshtein between strings 'AACG' and 'GATT' - OK in 5.24 seconds
    Computing Levenshtein between strings 'ACCG' and 'GTGG' - OK in 4.71 seconds
    Computing Levenshtein between strings 'TTTC' and 'TTTA' - OK in 4.87 seconds

Typical performances for alphabet string, with string of maximal length:

    Computing Levenshtein between strings 'skon' and 'iisi' - OK in 14.67 seconds
    Computing Levenshtein between strings 'qukm' and 'vufu' - OK in 15.01 seconds
    Computing Levenshtein between strings 'afoe' and 'kbwh' - OK in 14.62 seconds

Typical performances for alphabet STRING, with string of maximal length:

    Computing Levenshtein between strings 'RPJX' and 'MLZU' - OK in 14.33 seconds
    Computing Levenshtein between strings 'GYAQ' and 'IEWC' - OK in 13.73 seconds
    Computing Levenshtein between strings 'GEUC' and 'CLUI' - OK in 15.11 seconds

Typical performances for alphabet StRiNg, with string of maximal length:

    Computing Levenshtein between strings 'oXcM' and 'Igjh' - OK in 30.11 seconds
    Computing Levenshtein between strings 'pgBk' and 'GuOp' - OK in 28.20 seconds
    Computing Levenshtein between strings 'jScn' and 'yRRN' - OK in 30.81 seconds

Successful end

bcm-at-zama · 2024-06-28T17:07:52Z

What do you think guys? When you like the code, I add a small .md and measure that on AWS

bcm-at-zama · 2024-07-01T10:57:25Z

Blocked by https://github.com/zama-ai/concrete-internal/issues/794

closes #zama-ai/concrete-internal#750

closes #zama-ai/concrete-internal#755

and fixing a few issues closes #zama-ai/concrete-internal#753

bcm-at-zama · 2024-07-04T11:45:22Z

No more blocked by 794

frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py

frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.md

umut-sahin · 2024-07-05T08:54:24Z

frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py

+        print("")
+
+    if args.autoperf:
+        for alphabet in ["ACTG", "string", "STRING", "StRiNg"]:


I'd really prefer to have a class for the alphabet with encode, decode, check, ... methods. It'd make the example much clear IMO.

Also you would be able to do:

for alphabet in [Alphabet.lowercase(), Alphabet.uppercase(), Alphabet.dna(), ...]:

Which is less error prone and more clear IMO.

I'm not a big fan of OOP, especially when like here, it doesn't seem needed. But if you want really it, I'll do. (And for sure, I have not the right habits for OOP so I am going to make something not idomatic, you'll correct me)

I don't think this is OOP. Sure it has a bit of encapsulation and maybe some abstraction, but there is no inheritance or polymorphism. It's just a way to make alphabets more type safe and understandable.

Would be better to me, wdyt @BourgerieQuentin?

frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.md

BourgerieQuentin

Minor comment feel free to include or not, thanks.

frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.md

cla-bot bot added the cla-signed label Jun 19, 2024

bcm-at-zama marked this pull request as draft June 19, 2024 12:41

bcm-at-zama force-pushed the levenshtein_distance_750 branch 3 times, most recently from f61fa55 to 122de8c Compare June 19, 2024 13:49

bcm-at-zama force-pushed the levenshtein_distance_750 branch 2 times, most recently from c9b3afb to d6720e5 Compare June 20, 2024 09:04

bcm-at-zama requested a review from umut-sahin June 24, 2024 08:39

umut-sahin requested changes Jun 24, 2024

View reviewed changes

bcm-at-zama force-pushed the levenshtein_distance_750 branch 3 times, most recently from 0ee22ef to e2dceee Compare June 28, 2024 16:39

bcm-at-zama changed the title ~~docs(frontend): adding a use-case for Levenshtein distance [WIP]~~ docs(frontend): adding a use-case for Levenshtein distance Jun 28, 2024

bcm-at-zama requested review from umut-sahin and aPere3 June 28, 2024 17:07

bcm-at-zama marked this pull request as ready for review June 28, 2024 17:07

bcm-at-zama force-pushed the levenshtein_distance_750 branch from 63c858e to 1247eda Compare July 1, 2024 10:47

bcm-at-zama added 4 commits July 1, 2024 13:01

docs(frontend): adding a use-case for Levenshtein distance

6a0ec92

closes #zama-ai/concrete-internal#750

docs(frontend): adding wires in the Levenshtein example

5381ce2

closes #zama-ai/concrete-internal#755

docs(frontend): using if_then_else, using keygen

ca27d83

and fixing a few issues closes #zama-ai/concrete-internal#753

docs(frontend): faster

5d1462d

bcm-at-zama added 7 commits July 1, 2024 13:01

docs(frontend): format

24d92f9

docs(frontend): review

a6b3290

docs(frontend): make functions and argparse

e3fff34

docs(frontend): arguments

38757c1

docs(frontend): adding alphabet

ecc4eeb

docs(frontend): adding perfs

6a6d5ee

docs(frontend): adding a --distance option

2771fed

bcm-at-zama force-pushed the levenshtein_distance_750 branch from 1247eda to 2771fed Compare July 1, 2024 11:01

bcm-at-zama added 2 commits July 1, 2024 13:02

docs(frontend): adding the tuto in the docs

5f93318

docs(frontend): check the input lengths

3c84a77

BourgerieQuentin reviewed Jul 4, 2024

View reviewed changes

frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.py Outdated Show resolved Hide resolved

BourgerieQuentin requested changes Jul 4, 2024

View reviewed changes

frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.md Show resolved Hide resolved

umut-sahin reviewed Jul 5, 2024

View reviewed changes

docs(frontend): review

e2d1b47

bcm-at-zama commented Jul 5, 2024

View reviewed changes

frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.md Show resolved Hide resolved

docs(frontend): review

46f1cd5

BourgerieQuentin approved these changes Jul 5, 2024

View reviewed changes

frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.md Show resolved Hide resolved

frontends/concrete-python/examples/levenshtein_distance/levenshtein_distance.md Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(frontend): adding a use-case for Levenshtein distance #902

docs(frontend): adding a use-case for Levenshtein distance #902

bcm-at-zama commented Jun 19, 2024

bcm-at-zama commented Jun 19, 2024

bcm-at-zama commented Jun 19, 2024

bcm-at-zama commented Jun 21, 2024

bcm-at-zama commented Jun 24, 2024

bcm-at-zama commented Jun 28, 2024

bcm-at-zama commented Jun 28, 2024

bcm-at-zama commented Jun 28, 2024

bcm-at-zama commented Jun 28, 2024

bcm-at-zama commented Jul 1, 2024

bcm-at-zama commented Jul 4, 2024

umut-sahin Jul 5, 2024

bcm-at-zama Jul 5, 2024

umut-sahin Jul 5, 2024

BourgerieQuentin left a comment

docs(frontend): adding a use-case for Levenshtein distance #902

Are you sure you want to change the base?

docs(frontend): adding a use-case for Levenshtein distance #902

Conversation

bcm-at-zama commented Jun 19, 2024

bcm-at-zama commented Jun 19, 2024

bcm-at-zama commented Jun 19, 2024

bcm-at-zama commented Jun 21, 2024

bcm-at-zama commented Jun 24, 2024

bcm-at-zama commented Jun 28, 2024

bcm-at-zama commented Jun 28, 2024

bcm-at-zama commented Jun 28, 2024

bcm-at-zama commented Jun 28, 2024

bcm-at-zama commented Jul 1, 2024

bcm-at-zama commented Jul 4, 2024

umut-sahin Jul 5, 2024

Choose a reason for hiding this comment

bcm-at-zama Jul 5, 2024

Choose a reason for hiding this comment

umut-sahin Jul 5, 2024

Choose a reason for hiding this comment

BourgerieQuentin left a comment

Choose a reason for hiding this comment