Skip to content

PostgreSQL extension package which provides functions that calculate the similarity between two strings

License

Notifications You must be signed in to change notification settings

urbic/postgresql-similarity

Repository files navigation

postgresql-similarity

This is the PostgreSQL extension package which provides functions that calculate similarity between two strings.

License Build Status

Synopsis

similarity(text string1, text string2, float limit)

  • string1, string2 — strings to compare
  • limit — minimum similarity of two strings

Calculate similarity of two strings string1 and string2. A value of 0 means that the strings are entirely different. A value of 1 means that the strings are identical. Everything else lies between 0 and 1 and describes the amount of similarity between the strings. Argument limit that gives the minimum similarity the two strings must satisfy. similarity stops analyzing the string as soon as the result drops below the given limit, in which case the result will be invalid but lower than the given limit. You can use this to speed up the common case of searching for the most similar string from a set by specifing the maximum similarity found so far.

similarity(text string1, text string2)

  • string1, string2 — strings to compare

The same as similarity(string1, string2, 0).

Installation

Build and install the extension package from source tarball

To install postgresql-similarity extension type the following commands as root (assuming that X.X is the version of a package):

tar -xvf postgresql-similarity-X.X.tar.xz
cd postgresql-similarity-X.X
make && make install

Install pre-built binary packages

Pre-built binary packages for openSUSE, Fedora, Mageia, RHEL, CentOS, Debian and Ubuntu are available at openSUSE Build Service.

Install the extension into database

To install the extension into database dbname type the following commans as root:

su postgres -c 'psql dbname -c "CREATE EXTENSION similarity"'

Test the extension:

psql dbname -c "SELECT similarity('similarity', 'distinction')"
    similarity     
-------------------
 0.285714285714286
(1 row)

To uninstall the extension type:

su postgres -c 'psql dbname -c "DROP EXTENSION similarity"'

See also

The basic algorithm is described in: “An O(ND) Difference Algorithm and its Variations”, Eugene Myers, Algorithmica Vol. 1 No. 2, 1986, pp. 251—266; see especially section 4.2, which describes the variation used below.

The basic algorithm was independently discovered as described in: “Algorithms for Approximate String Matching”, E. Ukkonen, Information and Control Vol. 64, 1985, pp. 100—118.

License

This software and documentation are released under the GPL-2.0 license.

Author

Anton Shvetz

(the underlying fstrcmp function was taken from gnu diffutils and modified by Peter Miller and Marc Lehmann).

About

PostgreSQL extension package which provides functions that calculate the similarity between two strings

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published