Finding the Levenshtein distance in Python

Posted on 17 Jan 2008.

I discovered today that Moneygement won’t accept unicode characters when someone adds transactions by email because of the editdist module I used to check it. Since I don’t really need a fast function to do it (it’s all eight-letter words on average), I decided to write my own Python version of the function and am sharing it here if anyone needs it (because I haven’t found a Python implementation anywhere). It’s released under a BSD license with attribution, meaning that I’d like it if my name was mentioned where it is used :)

def levenshtein_distance(first, second): """Find the Levenshtein distance between two strings.""" if len(first) > len(second): first, second = second, first if len(second) == 0: return len(first) first_length = len(first) + 1 second_length = len(second) + 1 distance_matrix = [[0] * second_length for x in range(first_length)] for i in range(first_length): distance_matrix[i][0] = i for j in range(second_length): distance_matrix[0][j]=j for i in xrange(1, first_length): for j in range(1, second_length): deletion = distance_matrix[i-1][j] + 1 insertion = distance_matrix[i][j-1] + 1 substitution = distance_matrix[i-1][j-1] if first[i-1] != second[j-1]: substitution += 1 distance_matrix[i][j] = min(insertion, deletion, substitution) return distance_matrix[first_length-1][second_length-1]