# Testing
### *and more on modular code*

- In our coverage of modular code, we talked about abstracting reusable code chunks into their own **functions**
    - And, in turn, grouping those functions together into separate **modules**
    - We created a function that splits a data set into its features (a DataFrame) and target (a Series)
    
- In our discussion of feature engineering, we showed how one might make a "preprocessor": a column transformer that one-hot encodes categorical features and applies standard scaling to numeric columns
    - We then chained this preprocessor together with a logistic regression model in order to form a scikit-learn **pipeline**

- We might use the same approach in preprocessing other datasets, so **let's move that logic to its own function and add it to our personal module**

## Writing a Preprocessor Function

Sometimes it's easiest to write a function's definition, or *signature*, before actually writing its code.

Our function is going to give us a column transformer that we can use in pipelines.
The only parameter will be the features DataFrame (at least, for right now).

One possible function signature looks like this:

```python
def make_preprocessor(features):
    ...
```

Now that we have our defintion, we can add code to it.
In this case, we can reuse the code we wrote in the feature engineering section.

```python
from sklearn.compose import ColumnTransformer

preprocessor = ColumnTransformer([
    ('one-hot-encoder', categorical_preprocessor, categorical_columns),
    ('standard_scaler', numeric_preprocessor, numeric_columns)
])
```

Can we just put all of that code into our function without any changes?

In [1]:
def make_preprocessor(features):
    from sklearn.compose import ColumnTransformer

    preprocessor = ColumnTransformer([
        ('one-hot-encoder', categorical_preprocessor, categorical_columns),
        ('standard_scaler', numeric_preprocessor, numeric_columns)
    ])

<div class="admonition note alert alert-info">
    <b><p class="first admonition-title" style="font-weight: bold;">Discussion</p></b>
    Does anyone see any issues with this?
</div>

In [2]:
import pandas as pd
fake_features = pd.read_csv('../data/planes.csv')

In [3]:
preprocessor = make_preprocessor(fake_features)

NameError: name 'categorical_preprocessor' is not defined

Our code is missing some context.
`categorical_preprocessor`, `categorical_columns`, `numeric_preprocessor`, and `numeric_columns` aren't defined yet.

Here's an updated version in which we assign to those variables before using them.

In [14]:
def make_preprocessor(features):
    from sklearn.compose import ColumnTransformer
    from sklearn.preprocessing import OneHotEncoder, StandardScaler
    
    categorical_preprocessor = OneHotEncoder(handle_unknown="ignore")
    numeric_preprocessor = StandardScaler()
    
    numeric_columns = features.select_dtypes(exclude=object)
    categorical_columns = features.select_dtypes(include=object)

    preprocessor = ColumnTransformer([
        ('one-hot-encoder', categorical_preprocessor, categorical_columns),
        ('standard_scaler', numeric_preprocessor, numeric_columns)
    ])

Things run without error now!

In [15]:
preprocessor = make_preprocessor(fake_features)

But there are a couple of other issues.

What does our resulting preprocessor object look like?

In [16]:
preprocessor

In [17]:
type(preprocessor)

NoneType

- We need to remember to *return a value* -- otherwise we can't get anything useful out of the function.

- Generally, Python best practice is to import libraries *outside* functions.
All imports, even if they're to be used in different functions, are usually placed at the top of the Python module.

Let's make those changes...

In [18]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler

def make_preprocessor(features):
    categorical_preprocessor = OneHotEncoder(handle_unknown="ignore")
    numeric_preprocessor = StandardScaler()
    
    numeric_columns = fake_features.select_dtypes(exclude=object)
    categorical_columns = fake_features.select_dtypes(include=object)

    preprocessor = ColumnTransformer([
        ('one-hot-encoder', categorical_preprocessor, categorical_columns),
        ('standard_scaler', numeric_preprocessor, numeric_columns)
    ])
    
    return preprocessor

And then make sure it works...

In [19]:
preprocessor = make_preprocessor(fake_features)
preprocessor

ColumnTransformer(transformers=[('one-hot-encoder',
                                 OneHotEncoder(handle_unknown='ignore'),
                                      tailnum                     type                   manufacturer  \
0     N10156  Fixed wing multi engine                        EMBRAER   
1     N102UW  Fixed wing multi engine               AIRBUS INDUSTRIE   
2     N103US  Fixed wing multi engine               AIRBUS INDUSTRIE   
3     N104UW  Fixed wing multi engine               AIRBUS INDUSTRIE   
4     N10575  Fixed wing multi engine                        EMBRAER   
...      ...                      ...                            ...   
3317  N997AT  Fixed wing multi engine...
3317    717-200  Turbo-fan  
3318      MD-88  Turbo-fan  
3319    717-200  Turbo-fan  
3320      MD-88  Turbo-jet  
3321      MD-88  Turbo-jet  

[3322 rows x 5 columns]),
                                ('standard_scaler', StandardScaler(),
                                         year  engines 

In [20]:
type(preprocessor)

sklearn.compose._column_transformer.ColumnTransformer