Skip to content

Implement MinHash Matrix and Locality Sensitive Hashing (LSH) to estimate Jaccard similarity among documents and to identify near-duplicate documents.

Notifications You must be signed in to change notification settings

ydengGitHub/Implementation-of-Large-Data-Set-Algorithm-in-Text-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Implementation-of-Large-Data-Set-Algorithm-in-Text-Analysis

Implement MinHash Matrix and Locality Sensitive Hashing (LSH) to estimate Jaccard similarity among documents and to identify near-duplicate documents.

About

Implement MinHash Matrix and Locality Sensitive Hashing (LSH) to estimate Jaccard similarity among documents and to identify near-duplicate documents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages