# Dev Week 2017 - Machine Learning & Recommender Systems

This talk will present to you some basic concepts about Machine Learning and how we develop projects using this concept and related technologies (IMHO).

## Fundamentals of Machine Learning

### Some important points that we must keep in mind for this talk

In terms of concept size: 
- Data Science and AI concepts is bigger than Machine Learning;
- Machine Learning concept is bigger than Deep Learning;

### What is learning?

- Using past experiences to acquire expertise in execute a specific task;
- Do you wanna an example? Pretend to grab a stone from ground near a street dog!

### What is machine learning?

- When you program a machine to learn how to execute a specific task;
- The old but gold example of filtering Spams;

### Types of learning

#### Supervised versus Unsupervised

*Supervised:* when you learn from a targeted amount of data how to target another amount of data;

*Unsupervised:* no targets;

### Classification, regression, multiclass, ranking and complex predictions

*Simple (binary) classification:* you have two classes and a new sample must fit one of the two classes;

*Regression:* you must to predict a continuous value for your new samples;

*Multi classification:* you have more than two classes;

*Ranking:* You must order a set of instances by relevance;

*Complex prediction:* As an example, cost sensitive classification;

#### Active versus Passive learners

About how the learner interacts with the environment at training time. If actively performs interactions (A/B tests, questions, etc.) or wait for another actor to bring to learnship.

#### Online versus batch learning

The frequency and/or lifecycle of when your learner incorporates the learnship;

### Overfitting

- When your predictor performance on the training set is excellent, yet its performance on the ground truth is very poor;
- In other words, occurs when your hypothesis fits the training data "too well";

### Bias

- When you inducts your learn to a specific "conclusion";
- Be CAREFUL with assumptions;
> "In the face of ambiguity, refuse the temptation to guess." (The Zen of Python, by Tim Peters)

### Algorithms types

- Linear regression;
- Support Vector Machines;
- Decision trees;
- Ensembles;
- Neural networks;

## Connecting Machine Learning to a business goal

### Goals and metrics

In successful Machine Learning projects, the intelligence being developed is strongly coupled to a very specific goal and measured by a very specific metric. So:

- Always define the goal and metric before the development kick-off
- Plan and architect your Machine Learning strategy over the goal and metric previously defined
- Use goal and metric to measure the performance of your intelligence

### Data domains

- A data domain are the group of datasets that compose the available description of your context and its players and events;
- For example, if you have an e-commerce, your data domain is all data **available** that describes your customers, the products being selled, financial transactions, etc.
- Two different data domains can have distinct definitions of the same player or event. Wanna an example? CHURN!

In [1]:
%matplotlib inline

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

from sklearn import datasets

In [2]:
bc = datasets.load_breast_cancer()
features = pd.DataFrame(bc.data, columns=bc.feature_names)
targets = bc.target

features.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


- CRIM: per capita crime rate by town
- ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS: proportion of non-retail business acres per town
- CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX: nitric oxides concentration (parts per 10 million)
- RM: average number of rooms per dwelling
- AGE: proportion of owner-occupied units built prior to 1940
- DIS: weighted distances to five Boston employment centres
- RAD: index of accessibility to radial highways
- TAX: full-value property-tax rate per \$10,000
- PTRATIO: pupil-teacher ratio by town
- B: $ 1000(Bk - 0.63)^2 $ where Bk is the proportion of blacks by town
- LSTAT: percentage lower status of the population
- MEDV: Median value of owner-occupied homes in \$1000's

### Offline evaluation framework

## Feature engineering

Standardization

In [19]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X_train, X_test, y_train, y_test = train_test_split(features, targets, train_size=0.8, random_state=42)
scaler = StandardScaler().fit(X_train)
X_train_scaled = pd.DataFrame(scaler.transform(X_train), index=X_train.index.values, columns=X_train.columns.values)
X_test_scaled = pd.DataFrame(scaler.transform(X_test), index=X_test.index.values, columns=X_test.columns.values)

Dimensionality analysis

In [22]:
from sklearn.decomposition import PCA
pca = PCA()
pca.fit(X_train)
cpts = pd.DataFrame(pca.transform(X_train))
x_axis = np.arange(1, pca.n_components_+1)
pca_scaled = PCA()
pca_scaled.fit(X_train_scaled)
cpts_scaled = pd.DataFrame(pca.transform(X_train_scaled))

## Algorithm engineering

Chosing your algorithm

In [2]:
from sklearn import linear_model, tree, discriminant_analysis, svm

Tools for tunning

In [3]:
from sklearn.grid_search import GridSearchCV, RandomizedSearchCV



In [23]:
from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(n_estimators=500, oob_score=True, random_state=0)
rf.fit(X_train, y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_split=1e-07, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           n_estimators=500, n_jobs=1, oob_score=True, random_state=0,
           verbose=0, warm_start=False)

In [30]:
from sklearn.metrics import r2_score
from scipy.stats import spearmanr, pearsonr

predicted_train = rf.predict(X_train)
predicted_test = rf.predict(X_test)
test_score = r2_score(y_test, predicted_test)
spearman = spearmanr(y_test, predicted_test)
pearson = pearsonr(y_test, predicted_test)

print "Out-of-bag R-2 score estimate: {0}".format(rf.oob_score_)
print "Test data R-2 score: {0}".format(test_score)
print "Test data Spearman correlation: {0}".format(spearman[0])
print "Test data Pearson correlation: {0}".format(pearson[0])

Out-of-bag R-2 score estimate: 0.841012707889
Test data R-2 score: 0.885868268962
Test data Spearman correlation: 0.903501805135
Test data Pearson correlation: 0.941888050264


## Evaluating model performance

What to do if learning fails?

- Get a larger sample
- Change the hypothesis class by
    - Enlarging it;
    - Reducing it;
    - Completely changing it;
    - Changing the parameters you consider;
- Change the feature representation of the data;
- Change the optimization algorithm used to apply your learning rule;

## Evaluating (offline) business gains 

## Evaluating model lifecycle