# Speed Dating Analysis - Tristan Langley

**This project looks at speed dating survey results. The goal is to find the features that best predict whether someone
will decide "yes" on their partner (i.e. want to match with them). Then, we will create a model based on these features
and test how accurately it predicts "yes" decisions.**

**1. Data preprocessing:** Speed dating data is in a csv file. There are 195 columns; we will keep only the most
"straightforward" ones. E.g. attractiveness rating, correlation in interests, age difference, etc.

In [24]:
import pandas as pd

# List of columns I want to keep
# TODO decide if I want to put partner's ranking on each individual interest (e.g. 'exercise', 'dining', etc.)
colnames = ['gender', 'int_corr', 'samerace', 'age_o', 'dec_o', 'attr_o', 'sinc_o',
            'intel_o', 'fun_o', 'amb_o', 'shar_o', 'prob_o', 'met_o', 'age', 'race', 'dec']

# Load the columns I want into a pandas dataframe
full_df = pd.read_csv('SpeedDatingData.csv', usecols=colnames)

# Drop any rows with NaN value(s)
full_df.dropna(inplace=True)

# Create a new column for difference in age. Negative means partner is younger than me
full_df['d_age'] = full_df.apply(lambda row: row.age - row.age_o, axis = 1)

# Now drop my age from the dataframe -- not going to use this as a predictor, because it does not describe their partner
full_df = full_df.drop(['age_o'], axis=1)

# Separate data (possible predictors) from targets (what we are trying to predict: the decision yes/no)
data = full_df.drop(['gender', 'dec_o', 'dec', 'gender'], axis=1)
targets = full_df['dec_o']


Description of data columns:
- gender: F=0, M=1
- int_corr: correlation between interests (polled from 1-10 on interests like exercise, dining, museums, gaming, etc.)
- samerace: same race=1, different races=0
- dec_o: my decision, no=0 yes=1
- attr_o: my rating of partner's attractiveness (1-10)
- sinc_o: my rating of partner's sincerity (1-10)
- intel_o: my rating of partner's intelligence (1-10)
- fun_o: my rating of partner's funnyness (1-10)
- amb_o: my rating of partner's ambition (1-10)
- shar_o: my rating of partner on our shared interests (1-10)
- prob_o: how likely do I think it is that my partner decided "yes" on me (1-10)
- met_o: have I met my partner before, no=1 yes=2
- age: partner's age (years)
- race: partner's race (integer)
- dec: partner's decision on me, no=0 yes=1
- d_age: difference in age (negative means partner is younger than me)

**2. Find best predictors:** Use calculated correlations and top layers of a decision tree to see which qualities seem to be the best predictors
of "yes" decisions

**3. Create KNN model**

**4. Create Decision Tree model**

**5. Create Linear Regression model**

**6. Compare accuracies of each prediction model**