Wrangler Transform: A DMD system for transforming Big Data
-
Updated
May 21, 2024 - Java
Wrangler Transform: A DMD system for transforming Big Data
Java DSL for (online) deduplication
Implements the DMI imputation algorithm for imputing missing values in a dataset from Rahman, M. G., and Islam, M. Z. (2013): Missing Value Imputation Using Decision Trees and Decision Forests by Splitting and Merging Records: Two Novel Techniques
Implements the CAIRAD techique for detecting noisy values in a dataset for Weka
kDMI employs two levels of horizontal partitioning (based on a decision tree and k-NN algorithm) of a data set, in order to find the records that are very similar to the one with missing value/s. Additionally, it uses a novel approach to automatically find the value of k for each record.
Add a description, image, and links to the data-cleansing topic page so that developers can more easily learn about it.
To associate your repository with the data-cleansing topic, visit your repo's landing page and select "manage topics."