Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

datasets

A collection of public datasets for supervised machine learning research. The conventions with the datasets are as follows:

  1. All datasets are in CSV format.
  2. All datasets have header rows.
  3. The target variable is always the last column.
  4. All numeric nominal features have been encoded as strings.
  5. Any constant columns have been removed.
  6. Any row ID-like columns have been removed.
  7. Watch out for any possible missing values in the descriptive features.

A sample Python script named "prepare_dataset_for_modeling_github.py" has also been included for loading these datasets and preparing them for model fitting.

####################################################

Description of these datasets can be found in the "github_datasets_desc" Notebook file:

https://github.com/vaksakalli/datasets/blob/master/github_dataset_descriptions.ipynb

About

A collection of public datasets for supervised machine learning research.

Resources

License

Releases

No releases published

Packages

No packages published