### Analyze computational cost of 4 algorithms for detecting anagrams

<font size = "4">

An "anagram" is a permutation of a string. For example "ytonhp" is an anagram of "python"

<font size = "4">

**Algorithm 1**

- Check lengths of `s1` and `s2`. If they have different lengths, they are not anagrams of each other. Terminate the algorithm

- Convert `s2` to a list of single characters

- For character 0 in `s1`, loop over the list and compare each item to see if it matches character 0. If no match is found, we conclude that the strings are not anagrams of each other, and terminate the algorithm.

- If we do find a match, replace that item in the list with `None`. This is so we don't "double count" any matches.

- If we found a match, repeat for character 1 in `s1`. If no match is found, terminate. Otherwise repeat for character 2 in `s1`, and so on until we reach the final character.

- If a match was found every time (i.e. we never terminated), then they are anagrams and we return `True`. Otherwise we return `False`

In [3]:
def anagram_solution_1(s1, s2):
    still_ok = True
    if len(s1) != len(s2):
        still_ok = False

    a_list = list(s2)
    pos_1 = 0

    while pos_1 < len(s1) and still_ok:
        pos_2 = 0
        found = False
        while pos_2 < len(a_list) and not found:
            if s1[pos_1] == a_list[pos_2]:
                found = True
            else:
                pos_2 = pos_2 + 1
        if found:
            a_list[pos_2] = None
        else:
            still_ok = False
        pos_1 = pos_1 + 1

    return still_ok

In [4]:
is_anagram = anagram_solution_1("apple", "pleap")
print(is_anagram)

True


<font size = "4">

**Computational cost**: Think about how many times we "visit" an entry in the second list. We'll consider the case above where `s1` is "apple" and `s2` is "pleap".

1. For "a", we will visit 4 elements before finding the matching "a" (and replace with `None`)

2. For the first "p", we will visit 1 element.

3. For the second "p" we will visit 5 elements.

4. For the "l" we will visit 2 elements.

5. For the "e" we will visit 3 elements.

Total number of "visits" is $1 + 2 + 3 + 4 + 5 = \frac{5\times 6}{2} = 15$

In general, for $n$ characters, we will visit 

$$1 + 2 + 3 + \cdots + n = \sum_{i=1}^n i = \frac{n(n+1)}{2}$$

So the cost is proportional to $\frac{1}{2}n^2 + \frac{1}{2}n = \mathcal{O}(n^2)$

<font size = "4">

**Algorithm 2**

- Check lengths of `s1` and `s2`. If they have different lengths, they are not anagrams of each other. Terminate the algorithm

- Convert both strings to lists.

- Sort both lists alphabetically.

- Loop over the indices and compare the corresponding elements of each list. If they are equal for each index, return `True`. Otherwise, return `False`.

In [7]:
def anagram_solution_2(s1, s2):

    if len(s1) != len(s2):
        return False

    a_list_1 = list(s1)
    a_list_2 = list(s2)

    a_list_1.sort()
    a_list_2.sort()

    pos = 0
    matches = True

    while pos < len(s1) and matches:
        if a_list_1[pos] == a_list_2[pos]:
            pos = pos + 1
        else:
            matches = False

    return matches


print(anagram_solution_2("apple", "pleap"))  # expected: True
print(anagram_solution_2("abcd", "dcba"))  # expected: True
print(anagram_solution_2("abcd", "dcda"))  # expected: False
print(anagram_solution_2("apple", "pleapz")) # expected: False

True
True
False
False


The `while` loop has $\mathcal{O}(n)$ cost. But we have ignored the cost of sorting! Depending on the algorithm used, the cost of sorting could be $\mathcal{O}(n\log n)$ or $\mathcal{O}(n^2)$.

Thus, the cost of the algorithm has the "same order" as the cost of the sorting algorithm

<font size = "4">

**Algorithm 3 (Brute Force)**

- Check lengths of `s1` and `s2`. If they have different lengths, they are not anagrams of each other. Terminate the algorithm

- Loop over every possible permutation of the characters making up `s1`. For each permutation, check if it coincides with `s2`. 

- If a match is found for a permutation, terminate the algorithm, returning `True`.

- If we exhaust all permutations without finding a match, return `False`.

<font size = "4">

**Computational cost**

- If there are $n$ characters in a string, then there are $n!$ different permutations.

- The cost will be $\mathcal{O}(n!)$, which grows faster than even $2^n$.

- For characters with $n = 20$ strings, it would take about 77 billion years to generate all permutations.

- We won't even implement this one...

<font size = "4">

**Algorithm 4 (counting letters)**

- Create two lists, `c1` and `c2` that consist of 26 zeros - one for each letter of the alphabet.

- Loop over the characters of `s1`. For each character, determine which letter it is. Increment the value of `c1[ind]` by $+1$, where `ind` is the index corresponding to that letter.

- Repeat this process for `s2`, using the list `c2` to keep count of the appearance of letters.

- Loop over the indices $0, 1, 2, \dots, 25$ and compare the corresponding elements of `c1` and `c2`. If the values match for all 26, return `True`. If we find an index where the counts don't match, terminate and return `False`.

In [8]:
def anagram_solution_4(s1, s2):
    c1 = [0] * 26
    c2 = [0] * 26

    for i in range(len(s1)):
        # ord function returns unicode representation of character
        pos = ord(s1[i]) - ord("a")
        c1[pos] = c1[pos] + 1

    for i in range(len(s2)):
        pos = ord(s2[i]) - ord("a")
        c2[pos] = c2[pos] + 1

    j = 0
    still_ok = True
    while j < 26 and still_ok:
        if c1[j] == c2[j]:
            j = j + 1
        else:
            still_ok = False

    return still_ok

In [9]:
print(anagram_solution_4("apple", "pleap"))  # expected: True
print(anagram_solution_4("abcd", "dcba"))  # expected: True
print(anagram_solution_4("abcd", "dcda"))  # expected: False
print(anagram_solution_4("apple", "pleapz"))  # expected: False

True
True
False
False


<font size = "4">

**Computational cost**

- The last loop takes $26$ steps. First two loops take $n$ steps each. Total cost:

$$\mathcal{O}(2n + 26) = \mathcal{O}(n)$$

- This is the fastest of the four algorithms!

- We have introduced **space** requirements, two extra lists of size 26. Trade-off between using more time and using more space.