Given two words word1 and word2, find the minimum number of operations required to convert word1 to word2.

You have the following 3 operations permitted on a word:

- Insert a character
- Delete a character
- Replace a character

I previously read the solution for this problem. However, here I have answered it without looking at the solution.

Assuming the solution to this problem is similar to others like it, let's define D(i, j) as the minimum number of operations to convert word1[:i] to word2[:j].

While I think one often starts with operations to come up with transition between the states, in this case it was easier for me to think about the different states and how one might transition between adjacent states.

How can we go from i,j to i,j+1? We need an extra letter. Can do it with insertion. Plus 1 operation.

How can we go from i,j to i+1,j? We have a new letter to use. But we actually can't use it. We have to delete a character.

How can we go from i,j to i+1,j+1? Both segments have an extra letter, word1[i] and word2[j]. If they're the same then we don't do anything (the minimum number of operations remains the same). If they're different then we can use replace (the minimum number of operations increases by one).

In [62]:
try:
    from helperfunctions import printmatrix
except:
    pass

class Solution:
    def minDistance(self, word1: str, word2: str) -> int:
        m = len(word1)
        n = len(word2)
        D = [[i]+[0]*n for i in range(m)] + [[m]+[0]*n]
        D[0] = [i for i in range(n+1)]
        
        for i in range(m):
            for j in range(n):
                min_by_insertion = D[i+1][j] + 1
                min_by_deletion = D[i][j+1] + 1
                min_by_replacement = D[i][j] + (0 if word1[i] == word2[j] else 1)
                D[i+1][j+1] = min(min_by_insertion, min_by_deletion, min_by_replacement)
                print(i+1, j+1, min_by_insertion, min_by_deletion, min_by_replacement)
        
        try:
            printmatrix(D)
        except:
            print(D)
            
        return D[-1][-1]

As seen in my other solutions to DP problems, one can also store just two rows at a time instead of populating the entire m by n matrix.

In [68]:
class Solution:
    def minDistance(self, word1: str, word2: str) -> int:
        D = [i for i in range(len(word2)+1)]

        for i, word1_char in enumerate(word1):
            D_new = [i+1]
            for j, word2_char in enumerate(word2):
                min_by_insertion = D_new[j] + 1
                min_by_deletion = D[j+1] + 1
                min_by_replacement = D[j] + (1 if word1_char != word2_char else 0)
                D_new.append(min(min_by_insertion, min_by_deletion, min_by_replacement))
            D = D_new
            
        return D[-1]

# Comments

In my head I know that one can know the upperbound on insertions, deletions, and replacements.

Without loss of generality, word1 is shorter than or equal to word2 (because going from word1 to word2 takes the same number of steps as going from word2 to word1, just reverse the operations).

So if their lengths are m and n, respectively, then we could make n-m insertions and m replacements. So an upperbound for the minimum number of replacements is n.

The way one reduces replacements is either when letters match up via the insertions and in some cases by using a delete-and-insert pair. Thus the bound on the number of pairs is bounded by half of m.

Summarizing, the_number_of_replacements <= m, the number_of_deletions <= m//2, and the number_of_insertions <= (n-m)+m//2.

Thus in a given row we would essentially be concerned with computations from (approximately) i - (m//2) to i + (n-m)+m//2 but truncated to fit between existing indices, e.g., max(0, i-(m//2)) and min(n, i+(n-m)+m//2).

Separately what are special cases. If word1 is length 0 then we know the answer is n. If the word1 is length 1, then the answer is n-1 if the letter is in the second word, otherwise n by making a replacement.

# Testcases

In [69]:
s = Solution()

In [70]:
s.minDistance("ros", "horsey")

4

In [71]:
s.minDistance("", "")

0

In [72]:
s.minDistance("horse", "horse")

0

In [73]:
s.minDistance("", "ros")

3

In [74]:
s.minDistance("horse", "")

5

In [75]:
s.minDistance("aexcellent", "excellenta")

2

In [76]:
s.minDistance("canexcellent", "excellent")

3

In [77]:
s.minDistance("abwbebsbobmbe", "awesomef")

7