Deduplication : minhash w/ LSH
-
Updated
Oct 8, 2023 - Python
Deduplication : minhash w/ LSH
similarity of the texts (Jaccard Similarity, Minhash, LSH)
Implementation of a B+ Tree for range and exact match queries and of the LSH algorithm for finding similar documents as measured by Jaccard Similarity.
Implementing Locality Sensitive Hashing for DNA Sequences.
Fast Jaccard similarity search for abstract sets (documents, products, users, etc.) using MinHashing and Locality Sensitve Hashing
Recommendation systems for Yelp (collaborative filtering & content-based)
An easy-to-use script for fast similarity search in the textual data (and embedding space) with GPU & Multi-core support.
There are Python 2.7 codes and learning notes for Spark 2.1.1
ETH Zurich Fall 2017
Scalable Data Mining - Assignment submissions
insight data engineering fellow project
Add a description, image, and links to the minhash-lsh-algorithm topic page so that developers can more easily learn about it.
To associate your repository with the minhash-lsh-algorithm topic, visit your repo's landing page and select "manage topics."