A UI application for File Deduplication using Hashing
-
Updated
Jan 18, 2018 - Java
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
A UI application for File Deduplication using Hashing
Project for helping brother in finding duplicates in his photos directory.
Built a web application for two-phase deduplication that leverages and combines intra and inter-user deduplication techniques by introducing deduplication proxies (DPs) between the clients and the storage server (SS).
Data bus based on Apache Kafka and consisting of separate components [copied from own private repos]
Java (using hadoop) implementation of NuBeam deduplication algorithm from paper of Hang Dai and Yongtao Guan : Nubeam-dedup: a fast and RAM-efficient tool to de-duplicate sequencing reads without mapping.
This repository contains the source code of two applications: the Crime Ingestion App aims at extracting, geolocalizing and deduplicating crime-related news articles from online newspapers and the Crime Visualization App allows visualizing crime-related data in a web application.
Utility for automatic Git repository deduplication
A java based database driven backup tool with multi storage support and other nice things
PRIMAT - Private Matching Toolbox
Mirror of https://bitbucket.org/resteorts/smered
Record Linkage tool used by https://cidacs.bahia.fiocruz.br/
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
A general purpose deduplication framework
Java DSL for (online) deduplication
Accelerating the deduplication and collapsing process for reads with Unique Molecular Identifiers (UMI). Heavily optimized for scalability and orders of magnitude faster than a previous tool.
RocketMQ消息幂等去重消费者,支持使用MySQL或者Redis做幂等表,开箱即用
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Created by Halbert L. Dunn
Released 1946