This is the PostgreSQL extension package which provides functions that calculate similarity between two strings.
string1
,string2
— strings to comparelimit
— minimum similarity of two strings
Calculate similarity of two strings string1
and string2
. A value of
0 means that the strings are entirely different. A value of 1 means that the
strings are identical. Everything else lies between 0 and 1 and describes the
amount of similarity between the strings. Argument limit
that gives the
minimum similarity the two strings must satisfy. similarity
stops analyzing
the string as soon as the result drops below the given limit, in which case the
result will be invalid but lower than the given limit
. You can use this to
speed up the common case of searching for the most similar string from a set by
specifing the maximum similarity found so far.
string1
,string2
— strings to compare
The same as similarity(string1, string2, 0)
.
To install postgresql-similarity extension type the following commands as root (assuming that X.X is the version of a package):
tar -xvf postgresql-similarity-X.X.tar.xz
cd postgresql-similarity-X.X
make && make install
Pre-built binary packages for openSUSE, Fedora, Mageia, RHEL, CentOS, Debian and Ubuntu are available at openSUSE Build Service.
To install the extension into database dbname
type the following commans as
root:
su postgres -c 'psql dbname -c "CREATE EXTENSION similarity"'
Test the extension:
psql dbname -c "SELECT similarity('similarity', 'distinction')"
similarity
-------------------
0.285714285714286
(1 row)
To uninstall the extension type:
su postgres -c 'psql dbname -c "DROP EXTENSION similarity"'
The basic algorithm is described in: “An O(ND) Difference Algorithm and its Variations”, Eugene Myers, Algorithmica Vol. 1 No. 2, 1986, pp. 251—266; see especially section 4.2, which describes the variation used below.
The basic algorithm was independently discovered as described in: “Algorithms for Approximate String Matching”, E. Ukkonen, Information and Control Vol. 64, 1985, pp. 100—118.
This software and documentation are released under the GPL-2.0 license.
(the underlying fstrcmp function was taken from gnu diffutils and modified by Peter Miller and Marc Lehmann).