Skip to content

Hierarchical clustering of images using phash and Hamming distance

License

Notifications You must be signed in to change notification settings

wolny/phash-hierarchical-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

phash-hierarchical-clustering

An app clusters a given set of images and displays results via a simple JavaFX GUI. First, Perceptual Hashing is used to map the images to binary feature vectors. Then Agglomerative Hierarchical Clustering with Hamming distance as a distance measure is used to group similar binary vectors.

Note: we use a low hard-coded cutHight value of 8.0 in order to cut the dendrogram tree into small clusters with low number of outliers. You might experiment with different values of cutHeight in the HCluster depending on your dataset size and required 'quality' of the clustering.

Running

Build the project with sbt assembly. This will generate a phash-hierarchical-clustering-assembly-<version>.jar uberjar file in the target/scala-<scalaVersion> subdirectory (where <version> is the current version defined in build.sbt).

Run the application from the .jar with the java -jar command, e.g.:

  • java -jar target/scala-2.12/phash-hierarchical-clustering-assembly-1.0.jar <imageDirectory> this might take a while the 1st time, since the app needs to compute the phash value for every image in the <imageDirectory>

<imageDirectory> is the folder where the images are stored (use as many images as possible for better results).

Sample results

  • Sample clusters from a dataset consisting of 5K images with Apple logo Cluster 1 Cluster 2 Cluster 3

  • A dendrogram illustrate the result of Hierarchical Clustering used with complete agglomeration method (see Smile docs for more details) Dendrogram

Releases

No releases published

Packages

No packages published

Languages