Contains work done for NLP Specialization courses from DeepLearning.AI
-
Updated
Jan 5, 2022 - Jupyter Notebook
Contains work done for NLP Specialization courses from DeepLearning.AI
Lab solutions for Analysis of Massive Datasets ("Analiza velikih skupova podataka") course at FER 2020/21
Hola, amigos! Welcome to the first assignment of TIPR-2019.
ETH Zurich Fall 2017
Big data computing homework
Explores the MovieLens dataset (1M version) to uncover valuable insights into user behavior, demographics, movie popularity, and community structures. Various tasks, including data preprocessing, clustering, community detection, and recommendation systems, provide a holistic understanding of the dataset.
Code Scalable Product Duplicate Detection 2021
This repository contains my coursework and projects completed during the Natural Language Processing Specialization offered by DeepLearning.AI.
Code developed for CSE 515 Multimedia Web Databases
Locality-sensitive hashing algorithm to identify similar messages. Designed for a range of security and digital forensic applications.
Renowned data mining algorithms implemented in PySpark
Long Reads Mapping Algorithms
Massive Data Analysis
Implementing Locality Sensitive Hashing for DNA Sequences.
Assessing MinHash LSH for text similarity. Compares with kNN using BART embeddings as ground truth. Involves data preprocessing, shingle creation, LSH experiments. Findings inform LSH's efficiency in document similarity tasks, enhancing understanding of LSH techniques.
k-means implementation using locality-sensitive hashing
Nearest neighbor search (NNS)
Simple standalone multi-threaded locality sensitive hashing implementation in Rust
An implementation of the MinHashing algorithm in C using POSIX threads.
Add a description, image, and links to the locality-sensitive-hashing topic page so that developers can more easily learn about it.
To associate your repository with the locality-sensitive-hashing topic, visit your repo's landing page and select "manage topics."