<a href="https://github.com/theonaunheim">
    <img style="border-radius: 100%; float: right;" src="static/strawberry_thief_square.png" width=10% alt="Theo Naunheim's Github">
</a>

<br style="clear: both">
<hr>
<br>

<h1 align='center'>Modeling</h1>

<br>

<div style="display: table; width: 100%">
    <div style="display: table-row; width: 100%;">
        <div style="display: table-cell; width: 50%; vertical-align: middle;">
            <img src="static/svm_hyperplanes.svg" width="300">
        </div>
        <div style="display: table-cell; width: 10%">
        </div>
        <div style="display: table-cell; width: 40%; vertical-align: top;">
            <blockquote>
                <p style="font-style: italic;">"Essentially, all models are wrong, but some are useful"</p>
                <br>
                <p>- George E. P. Box</p>
            </blockquote>
        </div>
    </div>
</div>

<br>

<div align='left'>
    Image courtesy of <a href='https://commons.wikimedia.org/wiki/File:Svm_separating_hyperplanes_(SVG).svg'>ZackWeinberg</a> under the <a href='https://creativecommons.org/licenses/by-sa/3.0/deed.en'>CC BY-SA 3.0</a>
</div>

<hr>

In [None]:
from sklearn.decomposition import PCA

from sklearn.ensemble import RandomForestRegressor

from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV

from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import RobustScaler
from sklearn.preprocessing import StandardScaler

from sklearn.pipeline import Pipeline

from sklearn.svm import SVR

import numpy as np
import pandas as pd

import matplotlib.pyplot as plot
import matplotlib
%matplotlib inline

---

## Generally

Modeling is generally the stage most people think of when describing ML. Conceptually what we are doing is we are translating a complex process to a simplified model of the process which we can better use and manipulate. In the case of ML, we are taking observed data and feeding it to a ML algorithm. The algorithm fits our observed data and gives us a model to use.

Reposting for reference:
<img src='static/supervised_ml_flowchart_annotated.png'>

#### Step 1: Prep our data

Usually we will want to load and process our data so that our algorithm can work well. To reiterate, we use **X for our feature matrix** (the data we will use to predict), and **y for our target vector** (the data will try to predict). In the example below, we limit the amount of input data to 10,000 rows for the sake of time. Do not do this. More data will generally give you better models.

In [None]:
# Load data. 
rdf = pd.read_csv('data/diamonds.csv', nrows=10000)

# Encode cut category info as binary
cut_df = pd.get_dummies(rdf['cut'])
df = pd.concat([rdf, cut_df], axis=1)

# Map colors to rank (Best D -> Worst Z)
le = LabelEncoder()
le.fit(sorted(df['color'].unique()))
df['color_rank'] = le.transform(df['color'])

# Get our features
X = df[['carat', 'color_rank', 'depth', 'x', 'y',
        'x', 'Ideal', 'Premium', 'Very Good', 'Good',
        'Fair',
]]

# And our targets
y = df[['price']]

# Scale X and y.
x_scaler = StandardScaler()
X_scaled = x_scaler.fit_transform(X)
y_scaler = StandardScaler()
y_scaled = y_scaler.fit_transform(y).ravel()

# Train and test set
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_scaled)

# Show unscaled.
X.head(5)

#### Step 2: Create ML Algorithm

Our first step is to create our algorithm and feed the necessary arguments or [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) for the algorithm to work.

In [None]:
clf = RandomForestRegressor(
    n_estimators=25,
)

#### Step 3: Feed algorithm feature data and target data via the .fit() method.

Depending on how computationally heavy your algorithm is, this could take a while.

In [None]:
clf.fit(X_scaled, y_scaled)

#### Step 4: Predict using our newly fitted classifer.

You can use predict() or predict_proba() depending on your model.

In [None]:
y_predicted = clf.predict(X_test)
y_actual    = y_test

result_df = pd.DataFrame({
    'z_score_prediction': y_predicted,
    'z_score_actual'    : y_actual,
    'price_prediction'  : y_scaler.inverse_transform(y_predicted),
    'price_actual'      : y_scaler.inverse_transform(y_actual),
})

result_df['price_diff'] = result_df['price_prediction'] - result_df['price_actual']
result_df.round(2).head(5)

## But all those separate fit()s and transform()s are exhausting and I'm lazy. Isn't there an easier way to do this?

Yup! Because all sklearn algorithms have roughly the same API, you can chain them and operate on them as a group. This is what pipelines are for.

* [Pipeline](http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html)
* [Pipeline User Guide](http://scikit-learn.org/stable/modules/pipeline.html#pipeline)

Note: most of the main pages for models like the first bullet above have links to the Sklearn User Guide, which gives concrete examples. The documentation is good. Use it.

A pipeline is made up of a series of steps that are done sequentially. In practice, this is basically just a list of tuples, with the first element being the step's identifier, and the second element being the model or transformer you want to add to the pipeline. 

For example, it's pretty common to want to scale something, do PCA, and then fit a model. Let's do this via a pipeline.

In [None]:
# Lets make a scaler, PCA, support vector machine classifier.
pipe = Pipeline([
    ('scale', RobustScaler()),
    ('pca'  , PCA()),
    ('clf'  , SVR(kernel='linear'))
])

# This means we can just
pipe.fit(X, y.values.ravel())
pipe.predict(X.iloc[:10,:])

#### Isn't that easier? We can also add more complex steps into the pipeline using the FunctionTransformers we discussed before.

From the sklearn documentation, we get a function that basically chops off the first column of data.

In [None]:
# Define function
def all_but_first_column(input_X):
    return input_X[:, 1:]

# Which we can wrap in a Function Transformer, and then wrap in a tuple before inserting it
# as a pipeline step.
('func_trans', FunctionTransformer(all_but_first_column))

#### Where this is the most useful is hyperparameter searches.

We can use these pipelines to to help us exhaustively run through hyperparameters and to choose the best outcome and parameters. We simply use the following notation with the step name ('stepname') and keyword argument ('kwarg') separated by a double underscore, and then follwed with a list of values like this:

    {
        'stepname1__kwarg1: [value1, value2],
        'stepname2__kwarg2: [value3, value4]
    }

An example might be helpful. Say we want to test our pipeline above, but we're not sure if a linear kernel or a radial basis function kernel would be preferable for our classifier. Say we also want to tweak the quantile range to see if that helps.

Note: this tests every possible permutation, which can take a really, really long time. For example, the example below has to fit 4 pipelines instead of one.

In [None]:
# Create the parameter grid we want to search.
grid = {
    'scale__quantile_range': [(10.0, 90.0), (25.0, 75.0)],
    'clf__kernel'          : ['linear', 'rbf'],
}


# Define the parameters for our search.
best_model = GridSearchCV(
    pipe, 
    n_jobs=2, 
    param_grid=grid, 
    scoring='neg_mean_squared_error'
)

# Fit the model ... we are chopping down the data for the sake of time.
best_model.fit(X.iloc[:1000,:], y.values.ravel()[:1000])

# The model will now have the optimal parameters.
print('The best parameters are: {}'.format(best_model.best_params_))

# And we can predict with the model.
best_model.predict(X.iloc[:10,:])

---

## So what algorithm should I use for what?

#### TL;DR: 

¯\\\_(ツ)\_/¯

#### Seriously though:

Entire books have been devoted to this topic (see Notebook 7, Resources). You really should ask someone who knows what they are doing (and if you know what you're doing, you should put together a monthly session that exams a particular algorithm, it's strengths, and how to best use it). That said, some of the algorithms available to you are below.

Note: if a class states that it is a "Classifier" (e.g. Support Vector Classifier), it will have an associated regressor. As previously mentioned, you will use the classifier for discrete targets (e.g. "setosa") and regressors for continuous targets (e.g. 6.25).

#### Algorithms:

* **Lasso Regression**
    * [Sklearn Class Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso)
    * [Sklearn Algorithm Info](http://scikit-learn.org/stable/modules/linear_model.html#lasso)
    * <a href="https://en.wikipedia.org/wiki/Lasso_(statistics)">Wikipedia</a>
    
    
* **Elastic Net**
    * [Sklearn Class Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html#sklearn.linear_model.ElasticNet)
    * [Sklearn Algorithm Info](http://scikit-learn.org/stable/modules/linear_model.html#elastic-net)
    * [Wikipedia](https://en.wikipedia.org/wiki/Elastic_net_regularization)


* **Logistic Regression**
    * [Sklearn Class Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)
    * [Sklearn Algorithm Info](http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression)
    * [Wikipedia](https://en.wikipedia.org/wiki/Logistic_regression)
  
  
* **Stoochastic Gradient Descent (SVM-based)**
    * [Sklearn Class Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html)
    * [Sklearn Algorithm Info](http://scikit-learn.org/stable/modules/linear_model.html#stochastic-gradient-descent-sgd)
    * [Wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)
  
  
* **Perceptron**
    * [Sklearn Class Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html#sklearn.linear_model.Perceptron)
    * [Sklearn Algorithm Info](http://scikit-learn.org/stable/modules/linear_model.html#perceptron)
    * [Wikipedia](https://en.wikipedia.org/wiki/Perceptron)
  
  
* **Support Vector Machines**
    * [Sklearn Class Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)
    * [Sklearn Algorithm Info](http://scikit-learn.org/stable/modules/svm.html)
    * [Wikipedia](https://en.wikipedia.org/wiki/Support_vector_machine)
  
  
* **K-Nearest Neighbors**
    * [Sklearn Class Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier)
    * [Sklearn Algorithm Info](http://scikit-learn.org/stable/modules/neighbors.html)
    * [Wikipedia](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)


* **Naive Bayes**
    * [Sklearn Class Documentation]()
    * [Sklearn Algorithm Info](http://scikit-learn.org/stable/modules/naive_bayes.html)
    * [Wikipedia](https://en.wikipedia.org/wiki/Naive_Bayes_classifier)
  
  
* **Random Forests**
    * [Sklearn Class Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
    * [Sklearn Algorithm Info](http://scikit-learn.org/stable/modules/ensemble.html#random-forests)
    * [Wikipedia](https://en.wikipedia.org/wiki/Random_forest)


* **Neural Networks / Multi-Layer Perceptrons**
    * [Sklearn Class Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html)
    * [Sklearn Algorithm Info](http://scikit-learn.org/stable/modules/neural_networks_supervised.html)
    * [Wikipedia](https://en.wikipedia.org/wiki/Neural_network)
  
  
* **Gradient Boosted Trees**
    * [Sklearn Class Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html)
    * [Sklearn Algorithm Info](http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting)
    * [Wikipedia](https://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting)


* **Clustering**
    * [Sklearn Class Documentation](http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html)
    * [Sklearn Algorithm Info](http://scikit-learn.org/stable/modules/clustering.html)
    * [Wikipedia](https://en.wikipedia.org/wiki/Cluster_analysis)

If you still have no idea, just use this picture:

<img src='static/ml_map.png'>

# Additional Learing Resources

* ### [Sklearn Supervised Learning Model Guide](http://scikit-learn.org/stable/supervised_learning.html)
* ### [Sklearn Unsupervised Learning Model Guide](http://scikit-learn.org/stable/unsupervised_learning.html)

---

# Next Up: [Validating](5_validating.ipynb)

<br>

<img style="margin-left: 0;" src="static/roc_curve.svg" width="20%">

<br>

<div align='left'>
    Image courtesy of <a href='https://commons.wikimedia.org/wiki/File:ROC_curves_colors.svg'>נדב ס</a> under the <a href='https://creativecommons.org/licenses/by-sa/4.0/deed.en'>CC BY-SA 4.0</a>
</div>

---