Skip to content
#

data-cleaning

Here are 17 public repositories matching this topic...

LFD is a data-driven discretization technique that does not require any user input. LFD uses low frequency values as cut points and thus reduces the information loss due to discretization. It uses all other categorical attributes and any numerical attribute that has already been categorized.

  • Updated Mar 25, 2023
  • Java

SiMI imputes numerical and categorical missing values by making an educated guess based on records that are similar to the record having a missing value. Using the similarity and correlations, missing values are then imputed. To achieve a higher quality of imputation some segments are merged together using a novel approach.

  • Updated Mar 24, 2023
  • Java

DMI Class implements the DMI imputation algorithm for imputing missing values in a dataset from Rahman, M. G., and Islam, M. Z. (2013): Missing Value Imputation Using Decision Trees and Decision Forests by Splitting and Merging Records: Two Novel Techniques

  • Updated Mar 24, 2023
  • Java

Value Normalizer is a microservice which can be used to normalize values in the column of a csv file. This tool allows you to upload csv file to the server, select a column and then normalize it based on user feedback. This repository contains both backend code(normalizer) and the UI code (normalizer-ui) which can be hosted together or separatel…

  • Updated Jan 5, 2023
  • Java

Improve this page

Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."

Learn more