<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span><ul class="toc-item"><li><span><a href="#Scikit-Learn-Design-Principles" data-toc-modified-id="Scikit-Learn-Design-Principles-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Scikit-Learn Design Principles</a></span></li></ul></li><li><span><a href="#Custom-Transformers" data-toc-modified-id="Custom-Transformers-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Custom Transformers</a></span></li><li><span><a href="#Transformation-Pipelines-pipeline:Pipeline" data-toc-modified-id="Transformation-Pipelines-pipeline:Pipeline-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Transformation Pipelines <code>pipeline:Pipeline</code></a></span><ul class="toc-item"><li><span><a href="#Combine-the-pipeline-output-pipeline:FeatureUnion" data-toc-modified-id="Combine-the-pipeline-output-pipeline:FeatureUnion-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Combine the pipeline output <code>pipeline:FeatureUnion</code></a></span></li></ul></li><li><span><a href="#Store-Model-externals:joblib" data-toc-modified-id="Store-Model-externals:joblib-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Store Model <code>externals:joblib</code></a></span></li></ul></div>

# Introduction

## Scikit-Learn Design Principles

* Consistency. All objects share a consistent and simple interface.
    * Estimators.`.fit()`
    * Transformers. `.transform()`, `.fit_transform()`
    * Predictors. `.predict()`, `.score()`
* Inspection. All the estimator's hyperparameters are accessible directly via public instance variables, like `imputer.strategy`. All the estimator's kearned parameters are also accessible with an underscore suffix, like `imputer.statistics_`

* Nonproliferation of classes. Only `numpy` arrays or `scipy` sparse matrices or build-in python types.
* Composition. `Pipeline` estimator.
* Sensible defaults. Scikit-learn provides reasonable default values for most parameters.

# Custom Transformers
Scikit-Learn relies on duck typing (not inheritance).

All you need is to vreate a class and implement three methods:
1. `fit()` return `self`
2. `transform()`
3. `fit_transform()`

You can get the last one for free by siumply adding `TransformerMixin` as a bse class.

If you add `BaseEstimator` a a base class, you will get two extra methods `get_params()` and `set_params()` that will be useful for automatic hyperparameter tuning. Don't use `*args` or `**kargs` in `__init__()` function.

In [None]:
from sklearn.base import BaseEstimator, TransformerMixin

rooms_ix, bedrooms_ix, population_ix, household_ix = 3, 4, 5, 6

class CombinedAttributesAdder(BaseEstimator, TransformerMixin):
    def __init__(self, add_bedrooms_per_room = True): # no *args or **kargs
        self.add_bedrooms_per_room = add_bedrooms_per_room
    
    def fit(self, X, y=None):
        return self # nothing else to do
    def transform(self, X, y=None):
        rooms_per_household = X[:, rooms_ix] / X[:, household_ix] 
        population_per_household = X[:, population_ix] / X[:, household_ix] 
        if self.add_bedrooms_per_room:
            bedrooms_per_room = X[:, bedrooms_ix] / X[:, rooms_ix]
            return np.c_[X, rooms_per_household, population_per_household, bedrooms_per_room]
        else:
            return np.c_[X, rooms_per_household, population_per_household]
attr_adder = CombinedAttributesAdder(add_bedrooms_per_room=False)
housing_extra_attribs = attr_adder.transform(housing.values)


#  Transformation Pipelines `pipeline:Pipeline`

In [None]:
from sklearn.pipeline import Pipeline

pipeline_obj = Pipeline([
    ('name1', <Transformer1>(hyper=...)),
    ('name2',.<Transformer2>()),
    ...,
])

output_ = pipeline_obj.fit_transform(input_)

## Combine the pipeline output `pipeline:FeatureUnion` 

In [None]:
from sklearn.pipeline import FeatureUnion

featureUnion_obj = FeatureUnion([
    ('name1', pipeline_obj1),
    ('name2', pipline_obj2),
])

# Store Model `externals:joblib`

In [None]:
from sklearn.externals import joblib
joblib.dump(my_model, "my_model.pkl")
# and later...
my_model_loaded = joblib.load("my_model.pkl")