SimHashPhp is a PHP library that port the SimHash algorithm created by Moses Charikar. This algorithm provide an efficient way to compare two texts or two data sets.
See "SimHash or the way to compare quickly two datasets" for more informations.
The library use PSR-0 naming convention so you can easily use it with an autoloader. However, basically, you can use a simple require:
<?php require 'Leg/SimHash/SimHash.php'; require 'Leg/SimHash/SimHashFactory.php'; use Leg\SimHash\SimHash; use Leg\SimHash\SimHashFactory; $factory = new SimHashFactory(); $fingerprint1 = $factory->run('Mary-jane is very tall. She was in the 9th grade.'); $fingerprint2 = $factory->run('John is in high school. He is not so tall.'); // This method return a float between 0 and 1 // where 0 is completely different strings and // 1 is completely equals strings. $fingerprint1->compareWith($fingerprint2);
This library is under the MIT license (see LICENSE.md)
The SimHashPhp is mainly developed by Titouan Galopin.
Issues and feature requests are tracked in the Github issue tracker.