In [46]:
print(__doc__)

Automatically created module for IPython interactive environment


# gender_classification_challenge

## Overview

This is the code for the gender classification challenge for 'Learn Python for Data Science #1' by @Sirajology on [YouTube](https://youtu.be/T5pRlIbr6gg). The code uses the [scikit-learn](http://scikit-learn.org/) machine learning library to train a [decision tree](https://en.wikipedia.org/wiki/Decision_tree) on a small dataset of body metrics (height, width, and shoe size) labeled male or female. Then we can predict the gender of someone given a novel set of body metrics. 

## Dependencies

* Scikit-learn (http://scikit-learn.org/stable/install.html)
* numpy (pip install numpy)
* scipy (pip install scipy)

Install missing dependencies using [pip](https://pip.pypa.io/en/stable/installing/)

## Usage

Once you have your dependencies installed via pip, run the script in terminal via

```
python demo.py
```

## Challenge

Find 3 more classifiers from the sci-kit learn [documentation](http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html) and add them to the demo.py code. Train them on the same dataset and [compare their results](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html). You can determine accuracy by trying to predict testing you trained classifier on samples from the training data and see if it correctly classifies it. Push your code repository to [github](https://help.github.com/articles/set-up-git/) then post it in the comments. I'll give the winner a shoutout a week from now!

## Credits

Credits for some of the code go to [chribsen](https://github.com/chribsen). I've merely created a wrapper to get people started easily.

In [39]:
from sklearn import tree

clf = tree.DecisionTreeClassifier()

# CHALLENGE - create 3 more classifiers...
# 1
# 2
# 3

# [height, weight, shoe_size]
X = [[181, 80, 44], [177, 70, 43], [160, 60, 38], [154, 54, 37], [166, 65, 40],
     [190, 90, 47], [175, 64, 39],
     [177, 70, 40], [159, 55, 37], [171, 75, 42], [181, 85, 43]]

Y = ['male', 'male', 'female', 'female', 'male', 'male', 'female', 'female',
     'female', 'male', 'male']


# CHALLENGE - ...and train them on our data
clf = clf.fit(X, Y)

prediction = clf.predict([[190, 70, 43]])

# CHALLENGE compare their reusults and print the best one!

print('Challenge: '+ str(prediction))

Challenge: ['male']


 > Author: Nestor Hernandez <zensualito@gmail.com>
 
 > License: BSD 3 clause

In [40]:
from sklearn.gaussian_process import GaussianProcessClassifier as gau
from sklearn.gaussian_process.kernels import RBF
from sklearn.neighbors import KNeighborsClassifier as knb
from sklearn.svm import SVC

In [41]:
X = [[181, 80, 44], [177, 70, 43], [160, 60, 38], [154, 54, 37], [166, 65, 40],
     [190, 90, 47], [175, 64, 39],
     [177, 70, 40], [159, 55, 37], [171, 75, 42], [181, 85, 43]]
y = ['male', 'male', 'female', 'female', 'male', 'male', 'female', 'female',
     'female', 'male', 'male']

In [42]:
prediction = None
# http://scikit-learn.org/stable/modules/neighbors.html#classification
print('Neighbours Classifier to fit the data...')
clf = knb(3, weights='distance')
clf.fit(X, y)
prediction = clf.predict([[190, 70, 43]])
print('Prediction: '+ str(prediction))

Neighbours Classifier to fit the data...
Prediction: ['male']


In [43]:
prediction = None
# http://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessClassifier.html#sklearn.gaussian_process.GaussianProcessClassifier
print('Gaussian Classifier to fit the data...')
clf = gau(1.0 * RBF(1.0), warm_start=True).fit(X, y)
prediction = clf.predict([[190, 70, 43]])
print('Prediction: '+ str(prediction) + ' <-- Gaussian')    

Gaussian Classifier to fit the data...
Prediction: ['male'] <-- Gaussian


In [38]:
prediction = None
# http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC
print('C-Support Vector Classification to fit the data...')
clf = SVC(gamma=2, C=1).fit(X,y)
prediction = clf.predict([[190, 70, 43]])
print('Prediction: '+ str(prediction) + ' <-- SVC')    

C-Support Vector Classification to fit the data...
Prediction: ['male'] <-- SVC


# Metrics
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html

In [48]:
from sklearn.metrics import accuracy_score
y_pred = [1, 1, 1, 1] # Prediction
y_true = [1, 1, 1, 1] # Expectation
accuracy_score(y_true, y_pred)

1.0