Skip to content

uwgraphics/BoxerData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Public Datasets of Boxer

This repo contains all datasets used in Boxer demo. Some other useful resources are listed as follows:

Dataset Name Description Related Use case Source
Iris Boxer demo uses Iris dataset by default. The dataset contains 3 classes of 50 instances each Boxer demo https://archive.ics.uci.edu/ml/datasets/iris
Imputation The Mushroom dataset that considers a testing set of 2,000 (of 8,124) randomly selected instances. Model Selection https://archive.ics.uci.edu/ml/datasets/mushroom
IMDB Confidence The data consists of 5044 movies with 27 features, however, 25% is sequestered for final assessment. A stratified sampling of 200 movies per class is held out from the 3756 for testing. Model Selection and Tuning https://www.kaggle.com/suchitgupta60/imdb-data
recid This dataset is used for fair learning: the Broward County recidivism dataset, popularized by ProPublica. The data set contains 6,172 instances and 14 numeric features (created by one-hot encoding the categorical features in the initial seven feature data set). 20% are held for testing. Fairness Assessment https://github.com/algofairness/fairness-comparison
date-12000-strat The dataset is the TCP collection of historical documents. It took a random sample of 12,000 documents, and held out 30% using stratified sampling. While the testing set is balanced (1,800 per class), the training set is highly skewed (only 15% before 1642) Bias and Data Discovery ...
fuzz-mod-5-02 The data set is a collection of 554 plays written in the Early Modern Period (1470-1660). Five linguistic features are used. It contains four kinds of plays : Comedy, History, Tragedy and Tragicomedy. Feature Sensitivity Testing ...
tcp-tree-select-9-10 This dataset considers a corpus of 59,989 documents from a historical literary collection and the data counts the 500 most common English words in each document. Model Selection and Data Discovery ...
(continuous) wine quality The dataset is used for wine quality classification, which requires classifying the quality of a wine from its properties. (Continuous) Hyper parameter Tuning https://archive.ics.uci.edu/ml/datasets/wine+quality
(contunuous) heart disease The dataset is a standard data set used in machine learning education. Classifiers are trained to predict if a patient is likely to develop a disease (binary decision). (Continuous) Model Selection and Calibration Analysis https://archive.ics.uci.edu/ml/datasets/heart+disease
(continuous) income The dataset comes from income classification benchmark dataset from that has been downsampled. Classifiers determine whether an individual’s income is above a certain level. (Continuous) Model Selection https://archive.ics.uci.edu/ml/datasets/census+income
(continuous) cifar-sampled-scaling The datset is created based on CIFAR 100 computer vision benchmark using Tensorflow. The data set has 100 classes, and the trained classifier produces a distribution over these classes as its decision.A binary classifier has been created for a “meta-class” which combines 5 of the main classes. This datasets aims to classify flowers, which can be any one of 5 of the original classes. Because the test set contains all 100 classes, it is quite imbalanced: flowers are only 5% of the total instances (Continuous) Model Selection and Detail Examination https://github.com/mattdutson/ml-vis
(continuous) cdate-2500 This dataset considers a corpus of 59,989 documents from a historical literary collection: Text Creation Partnership (TCP) transcriptions of the Early English Books Online (EEBO). The data counts the 500 most common English words in each document. For the experiment, we took a random sample of 2500 documents, and held out 30% using stratified sampling. (Continuous) Data Examination ...

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages