Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Nilsimsa for Java

Build Status

Nilsimsa is a hashing function which belongs to the family of locality sensitive hashes (LSH). This library provides functions to compute and compare the Nilsimsa string similarity hash for a given string.

The code is a port of the Python Nilsimsa implementation by Michael Itz to the Java language: http://code.google.com/p/py-nilsimsa.

Original C nilsimsa-0.2.4 implementation by cmeclax: http://ixazon.dynip.com/~cmeclax/nilsimsa.html

Requirements

  • Java 8+

Maven Dependency

 <dependency>
     <groupId>com.weblyzard.lib.string</groupId>
     <artifactId>nilsimsa</artifactId>
     <version>0.0.5</version>
 </dependency>

Examples

Compute and output the Nilsimsa hash

String text = "A short test message"; 
Nilsimsa n = Nilsimsa.getHash(text);
System.out.println("Nilsimsa hash for message '" + text + "': " + n.hexdigest());

Text similarity

Nilsimsa first = Nilsimsa.getHash("A short test message");
Nilsimsa second = Nilsimsa.getHash("A short test message!");
Nilsimsa third = Nilsimsa.getHash("Something completely different");

System.out.println(first.bitwiseDifference(first));    // 0
System.out.println(first.bitwiseDifference(second));   // 3
System.out.println(first.bitwiseDifference(third));    // 133

Determine whether two strings are significantly different

List<String> testList = Arrays.asList("A short test message", 
                                      "A short test message!", 
                                      "Something completely different");

for (String firstString: testList) {
    for (String secondString: testList) {
        Nilsimsa firstHash = Nilsimsa.getHash(firstString);
        Nilsimsa secondHash = Nilsimsa.getHash(secondString);

        System.out.println("The hash value of text '" + firstString + "' and '" 
            + secondString + "' differ in " + firstHash.bitwiseDifference(secondHash) + " bits.");
    }
}

Changelog

Please refer to the releases page.

About

A Java library for computing and comparing Nilsimsa string similarity hashes.

Resources

License

Packages

No packages published

Languages

You can’t perform that action at this time.