Bachelor's Thesis at FER, University of Zagreb, 2018.
-
Updated
Jan 24, 2022 - Java
Bachelor's Thesis at FER, University of Zagreb, 2018.
Document Clustering project utilizing K-Means algorithm. Requires Stanford CoreNLP as a dependency. From my undergraduate course in Predictive Analytics taken with Anasse Bari at NYU.
mvn -Dhadoop2.version=2.5.0 -Dlucene.version=xxx -DskipTests clean install
DocClusterizer is a Java desktop application designed to analyze and cluster documents based on their content similarity. The application utilizes Lucene and Tika libraries to process various file extensions such as txt, pdf, docx, and pptx.
Add a description, image, and links to the document-clustering topic page so that developers can more easily learn about it.
To associate your repository with the document-clustering topic, visit your repo's landing page and select "manage topics."