# Comparing encrypted IBAN names

When doing a transfer between Bank A and Bank B, Bank B has the obligation to check that the IBAN and the name of the recipient match. This is essential to combat frauds (fraudster impersonating someone else) and to avoid misdirected payments. Bank B would usually not reject a transfer if the name is close enough but doesn’t match exactly the recipient’s actual name. This is essential to make room for small spelling mistakes considering the impact of a rejected transfer (days / weeks of delays that can harm a business or a buyer, extra costs to handle the error, …). It is therefore important for Bank A to pre-check the name and inform the sender that the name is likely not matching, before initiating the transfer. For privacy reason however, it's better to do this pre-check over encrypted names.

In this small tutorial, we show how to use our TFHE Levenshtein distance computations to perform such a privacy-preserving check, very simply and directly in Python. This tutorial can be easily configured, to change for example the way strings are normalized before encryption and comparison. 

## Importing our FHE Levenshtein computations

One can have a look to this file to see how the FHE computations are handled.

In [1]:
from levenshtein_distance import Alphabet, LevenshteinDistance
from time import time

## Define the comparison functions

FHE computation will happen in `calculate`, if `fhe_or_simulate` is set to `fhe`.

In [2]:
def normalized_string(st):
    """Normalize a string, to later make that the distance between non-normalized
    string 'John Doe' and 'doe john' is small. This function can be configured depending
    on the needs.
    """

    # Force lower case
    st = st.lower()

    # Replace - and . by spaces
    st = st.replace("-", " ")
    st = st.replace(".", " ")

    # Sort the words and join
    words = st.split()
    st = "".join(sorted(words))

    return st


# N802 is for names in capital, like IBAN
def compare_IBAN_names(string0: str, string1: str, fhe_or_simulate: str):  # noqa: N802
    """Compare two IBAN names: first, normalize the strings, then compute in FHE (look in
    calculate for FHE details)."""
    # Normalize strings
    string0 = normalized_string(string0)
    string1 = normalized_string(string1)
    max_string_length = max(len(string0), len(string1))

    alphabet = Alphabet.init_by_name("name")
    levenshtein_distance = LevenshteinDistance(
        alphabet, max_string_length, show_mlir=False, show_optimizer=False
    )
    time_begin = time()
    distance = levenshtein_distance.calculate(string0, string1, mode=fhe_or_simulate)
    time_end = time()

    max_len = max(len(string0), len(string1))
    similarity = (max_len - distance) / max_len

    print(
        f"Similarity between the two strings is {similarity:.4f}, "
        f"computed in {time_end - time_begin: .2f} seconds"
    )
    return similarity

This is the option to set to "fhe" to run computations in FHE. If you set it to "simulate", only simulation will be done, which is sufficient to debug what happens, but should not be used in production settings. Remark that computations in FHE can be long, especially if the strings are long. 

In [3]:
fhe_or_simulate = "fhe"

## Make a few comparisons in a private setting

First, with equal strings, the match is perfect.

In [4]:
string0 = "John Doe"
string1 = "John Doe"

assert compare_IBAN_names(string0, string1, fhe_or_simulate=fhe_or_simulate) == 1.0

Similarity between the two strings is 1.0000, computed in  149.59 seconds


With reversed names, the match is also perfect, thanks to our definition of `normalized_string`. If it is a non-desired property, we can change it.

In [5]:
string0 = "John Doe"
string1 = "Doe John"

assert compare_IBAN_names(string0, string1, fhe_or_simulate=fhe_or_simulate) == 1.0

Similarity between the two strings is 1.0000, computed in  154.02 seconds


With a typo, similarity is smaller, but still quite high.

In [6]:
string0 = "John Doe"
string1 = "John Do"

assert round(compare_IBAN_names(string0, string1, fhe_or_simulate=fhe_or_simulate), 2) == 0.86

Similarity between the two strings is 0.8571, computed in  133.71 seconds


With an added letter, it is also high.

In [7]:
string0 = "John Doe"
string1 = "John W Doe"

assert round(compare_IBAN_names(string0, string1, fhe_or_simulate=fhe_or_simulate), 2) == 0.88

Similarity between the two strings is 0.8750, computed in  166.83 seconds


With the way we have normalized strings, we consider '-' and ' ' as equal.

In [8]:
string0 = "John Doe"
string1 = "John-Doe"

assert round(compare_IBAN_names(string0, string1, fhe_or_simulate=fhe_or_simulate), 2) == 1.0

Similarity between the two strings is 1.0000, computed in  150.00 seconds


Finally, with totally different names, we can see a very low similarity.

In [9]:
string0 = "John Doe"
string1 = "Gill Cot"

assert round(compare_IBAN_names(string0, string1, fhe_or_simulate=fhe_or_simulate), 2) == 0.14

Similarity between the two strings is 0.1429, computed in  148.66 seconds


Remark that, as we sort words in `normalized_string`, typos in the first letter can have bad impacts. It's not obvious to find a function which accepts word reordering but at the same time is not too impacted by mistakes on the first word letters. Choices can be done depending by the banks to fit their preference.

In [10]:
# One typo in the first letter
string0 = "John Doe"
string1 = "John Poe"

assert round(compare_IBAN_names(string0, string1, fhe_or_simulate=fhe_or_simulate), 2) == 0.14

# One typo in the last letter
string0 = "John Doe"
string1 = "John Doy"

assert round(compare_IBAN_names(string0, string1, fhe_or_simulate=fhe_or_simulate), 2) == 0.86

Similarity between the two strings is 0.1429, computed in  155.03 seconds
Similarity between the two strings is 0.8571, computed in  148.72 seconds


## Conclusion

We have shown how to use Levenshtein distances in FHE, to perform IBAN checks in a private way. And since the code is open-source and in Python, it's pretty easy for developers to modify it, to fine-tune it to their specific needs, eg in terms of string normalization.