To predict whether a person would have lived or died on the Titanic.
This project contains a dataset of passengers on the Titanic. It can be used to assess different predictive models of survival.
The dataset has been obtained from http://biostat.mc.vanderbilt.edu/DataSets.
The full dataset is located in original.csv
. It has been randomly shuffled into training and validation data train.csv
and test data test.csv
.
Predictive models should be trained and validated on the training data before being tested on the test data.
This problem is stochastic in nature, so is inconsistent. We cannot expect to find a model which correctly predicts all of training points.
The project currently contains four hypothesis classes:
- Everyone Dies
- Females survive
- Greedy decision trees
- Pruned decision trees
These hypotheses can be run using the titanic.app.HypothesisRunner
.