<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Custom-classes-in-scikit-learn" data-toc-modified-id="Custom-classes-in-scikit-learn-1">Custom classes in scikit-learn</a></span></li><li><span><a href="#What-is-the-best-part-of-ice-cream?" data-toc-modified-id="What-is-the-best-part-of-ice-cream?-2">What is the best part of ice cream?</a></span></li><li><span><a href="#Learning-Outcomes" data-toc-modified-id="Learning-Outcomes-3">Learning Outcomes</a></span></li><li><span><a href="#Object-oriented-programming-(OOP)-Review" data-toc-modified-id="Object-oriented-programming-(OOP)-Review-4">Object-oriented programming (OOP) Review</a></span></li><li><span><a href="#Custom-Column-Selector" data-toc-modified-id="Custom-Column-Selector-5">Custom Column Selector</a></span></li><li><span><a href="#Debugging-Pipelines-with-Mixins" data-toc-modified-id="Debugging-Pipelines-with-Mixins-6">Debugging Pipelines with Mixins</a></span></li><li><span><a href="#Takeaways" data-toc-modified-id="Takeaways-7">Takeaways</a></span></li><li><span><a href="#Sources-of-Inspiration" data-toc-modified-id="Sources-of-Inspiration-8">Sources of Inspiration</a></span></li></ul></div>

<center><h2>Custom classes in scikit-learn</h2></center>

<center><h2>What is the best part of ice cream?</h2></center>

<center><img src="../images/cold_stone_mixin.jpg" width="75%"/></center>

The mixins!

The regular flavors are great. But being able to extend them with other options is specials.

Classes in scikit-learn are just like ice cream. scikit-learn has the regular classes. Those can be extended with mixins.

<center><h2>Learning Outcomes</h2></center>

__By the end of this session, you should be able to__:

- Explain inheritance in your own words.
- Write new classes that extend the functionality of scikit-learn

<center><h2>Object-oriented programming (OOP) Review</h2></center>

- Classes are way to mix nouns (attributes) and verbs (methods).
- Classes create specific examples (instances) from a template (class).
- The specific examples has state that can be updated.

In [99]:
reset -fs

In [100]:
from warnings import filterwarnings
filterwarnings('ignore')

palette = "Dark2"
%matplotlib inline

In [101]:
# Example dict

d = dict(brian='blue', lambda_dog='brown')
# d.<tab>
d['alex'] = 'red'

One advantage of classes is that they allow hierarchy relationships to be defined. 

<center><img src="../images/class_inheritance.png" width="75%"/></center>

In [102]:
# Example of class inheritance 
from collections import Counter

issubclass(Counter, dict)

# A Counter is special type of dictionary. It does all the things a dictionary does and more.

True

<center><h2>Custom Column Selector</h2></center>

In [113]:
from sklearn.base import BaseEstimator

# help(BaseEstimator)

In [114]:
from sklearn.base import TransformerMixin

# help(TransformerMixin)

In [105]:
class ColumnSelector(BaseEstimator, TransformerMixin):
    """Selects columns in the pipeline based on column names.
       This saves space by only referencing a pointer to the column."""

    def __init__(self, cols):
        self._cols = cols

    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        return X[self._cols]

In [106]:
# Toy data example
import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3], 
                   'b': [True, True, False], 
                   'c': [7, 8, 9]})

# Find the boolean column, Simple example of selection columns
# You can imagine more complex function to find specific column names.
# Other examples: All uniform distributed columns
boolean_col_name = df.select_dtypes(include='bool').columns.values[0]

In [108]:
from sklearn.pipeline      import Pipeline

pipe = Pipeline([('select_boolean', ColumnSelector(cols=boolean_col_name))])

<center><h2>Debugging Pipelines with Mixins</h2></center>


Every abstraction has trade-offs. Each one takes a while to learn. When it works, it is awesome. When it breaks, it is even more work than not using the abstraction.

Pipelines are awesome but can be opaque to understand debug.

Below is a helper class

In [109]:
class Debug(BaseEstimator, TransformerMixin): # Note this class inherits from two different ancestors
    "Allow introspection of transformation in middle of a pipeline"

    def fit(self, X, y=None, **fit_params):
        return self
    
    def transform(self, X):
#         self.shape = X.shape # Change this to the attribute you care about to return its state
        print(X) # Display to scree
        return X

    

In [110]:
import numpy as np

# Make toy data 
X = np.array([[0, 0], 
              [1, 1]])
y = np.array([[0], 
              [1]])

In [111]:
# Define function to change data 
def add_10(X):
    X += 10
    return X

# Convert to transformer so it can be used in a pipeline
from sklearn.preprocessing import FunctionTransformer

add_10_transformer = FunctionTransformer(add_10)

In [112]:
from sklearn.pipeline      import Pipeline
from sklearn.tree          import DecisionTreeClassifier

pipe = Pipeline([('col_trans',  add_10_transformer),
                 ('debug',      Debug()), # Put debugger into pipeline
                 ('clf',        DecisionTreeClassifier()),])
pipe.fit(X, y)

[[10 10]
 [11 11]]


Pipeline(steps=[('col_trans',
                 FunctionTransformer(func=<function add_10 at 0x7ff1bb8fbca0>)),
                ('debug', Debug()), ('clf', DecisionTreeClassifier())])

<center><h2>Takeaways</h2></center>


- Python and scikit-learn both support object-oriented programming (OOP).
- You can use extend existing class by using inheritance.
- Inheritance allows customization and creativity in modeling while not minimizing simple errors.

<center><h2>Sources of Inspiration</h2></center>

- https://realpython.com/inheritance-composition-python/

<br>
<br> 
<br>

----