Skip to content
Jaro-Winkler and Levenshtein string distance algorithms for Common Lisp
Common Lisp
Find file
Latest commit f2e4500 Jan 19, 2016 @vsedach Merge pull request #2 from rudolfochrist/soerensen-dice
Add Soerensen-Dice coefficient.
Failed to load latest commit information.
LICENSE
README
jaro-winkler.lisp
levenshtein.lisp
package.lisp
soerensen-dice.lisp
test.lisp
test.vas-string-metrics.asd
vas-string-metrics.asd

README

vas-string-metrics provides the Jaro, Jaro-Winkler, Soerensen-Dice,
Levenshtein, and normalized Levenshtein string distance/similarity
metrics algorithms.

The Jaro (function jaro-distance), Jaro-Winkler (function
jaro-winkler-distance), Soerensen-Dice (function
soerensen-dice-coefficient) and normalized Levenshtein
(function normalized-levenshtein-distance) algorithms return a
number in the range 0 to 1 indicating how similar two given strings
are - where 0 indicates no similarity, and 1 indicatesa perfect match.

The Jaro-Winkler metric is a heuristic suitable for shorter strings
(such as place and people names), while the Levenshtein distance is
computed as the minimum number of insertions, deletions, or
substitutions needed to transform one string into the other (function
levenshtein-distance).

The Soerensen-Dice coefficient is a statistic suitable for heterogenous
data sets and gives less weight to outliers[1].

The code is distributed under the terms of the LLGPLv3 (see LICENSE
for details), except for the unit tests, which are in the public
domain.

[1] https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient#Applications
Something went wrong with that request. Please try again.