Public Datasets of Boxer

This repo contains all datasets used in Boxer demo. Some other useful resources are listed as follows:

Dataset Name	Description	Related Use case	Source
Iris	Boxer demo uses Iris dataset by default. The dataset contains 3 classes of 50 instances each	Boxer demo	https://archive.ics.uci.edu/ml/datasets/iris
Imputation	The Mushroom dataset that considers a testing set of 2,000 (of 8,124) randomly selected instances.	Model Selection	https://archive.ics.uci.edu/ml/datasets/mushroom
IMDB Confidence	The data consists of 5044 movies with 27 features, however, 25% is sequestered for final assessment. A stratified sampling of 200 movies per class is held out from the 3756 for testing.	Model Selection and Tuning	https://www.kaggle.com/suchitgupta60/imdb-data
recid	This dataset is used for fair learning: the Broward County recidivism dataset, popularized by ProPublica. The data set contains 6,172 instances and 14 numeric features (created by one-hot encoding the categorical features in the initial seven feature data set). 20% are held for testing.	Fairness Assessment	https://github.com/algofairness/fairness-comparison
date-12000-strat	The dataset is the TCP collection of historical documents. It took a random sample of 12,000 documents, and held out 30% using stratified sampling. While the testing set is balanced (1,800 per class), the training set is highly skewed (only 15% before 1642)	Bias and Data Discovery	...
fuzz-mod-5-02	The data set is a collection of 554 plays written in the Early Modern Period (1470-1660). Five linguistic features are used. It contains four kinds of plays : Comedy, History, Tragedy and Tragicomedy.	Feature Sensitivity Testing	...
tcp-tree-select-9-10	This dataset considers a corpus of 59,989 documents from a historical literary collection and the data counts the 500 most common English words in each document.	Model Selection and Data Discovery	...
(continuous) wine quality	The dataset is used for wine quality classification, which requires classifying the quality of a wine from its properties.	(Continuous) Hyper parameter Tuning	https://archive.ics.uci.edu/ml/datasets/wine+quality
(contunuous) heart disease	The dataset is a standard data set used in machine learning education. Classifiers are trained to predict if a patient is likely to develop a disease (binary decision).	(Continuous) Model Selection and Calibration Analysis	https://archive.ics.uci.edu/ml/datasets/heart+disease
(continuous) income	The dataset comes from income classification benchmark dataset from that has been downsampled. Classifiers determine whether an individual’s income is above a certain level.	(Continuous) Model Selection	https://archive.ics.uci.edu/ml/datasets/census+income
(continuous) cifar-sampled-scaling	The datset is created based on CIFAR 100 computer vision benchmark using Tensorflow. The data set has 100 classes, and the trained classifier produces a distribution over these classes as its decision.A binary classifier has been created for a “meta-class” which combines 5 of the main classes. This datasets aims to classify flowers, which can be any one of 5 of the original classes. Because the test set contains all 100 classes, it is quite imbalanced: flowers are only 5% of the total instances	(Continuous) Model Selection and Detail Examination	https://github.com/mattdutson/ml-vis
(continuous) cdate-2500	This dataset considers a corpus of 59,989 documents from a historical literary collection: Text Creation Partnership (TCP) transcriptions of the Early English Books Online (EEBO). The data counts the 500 most common English words in each document. For the experiment, we took a random sample of 2500 documents, and held out 30% using stratified sampling.	(Continuous) Data Examination	...

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data_converter		data_converter
datasets		datasets
LICENSE		LICENSE
README.md		README.md
datasets_config.json		datasets_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_converter

data_converter

datasets

datasets

LICENSE

LICENSE

README.md

README.md

datasets_config.json

datasets_config.json

Repository files navigation

Public Datasets of Boxer

About

Releases

Packages

Contributors 3

Languages

License

uwgraphics/BoxerData

Folders and files

Latest commit

History

Repository files navigation

Public Datasets of Boxer

About

Resources

License

Stars

Watchers

Forks

Languages