# Pipelines

In this notebook, we will learn about pipelines in scikit-learn.

<a href="https://colab.research.google.com/github/thomasjpfan/ml-workshop-intro/blob/master/notebooks/04-pipelines.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>

In [None]:
# Install dependencies for google colab
import sys
if 'google.colab' in sys.modules:
    %pip install -r https://raw.githubusercontent.com/thomasjpfan/ml-workshop-intro/master/requirements.txt

In [None]:
import sklearn
sklearn.set_config(display='diagram')

Load data from previous notebook 

In [None]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

boston = fetch_openml(data_id=531, as_frame=True)
X, y = boston.data, boston.target
X = X.select_dtypes(include='number')

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

## Make pipeline!

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor

knr = make_pipeline(
    StandardScaler(), KNeighborsRegressor()
)
knr.fit(X_train, y_train)

In [None]:
knr.score(X_test, y_test)

In [None]:
from sklearn.preprocessing import PolynomialFeatures

knr_select = make_pipeline(
    StandardScaler(),
    PolynomialFeatures(),
    KNeighborsRegressor()
)
knr_select.fit(X_train, y_train)

In [None]:
knr_select.score(X_test, y_test)

## Exercise 1

1. Load the cancer dataset from `sklearn.datasets` using the `load_breast_cancer` function.
2. Is this is a classification or regression problem?
3. Split the data into training and test dataset.
4. Create a pipeline with a `StandardScaler` and a `sklearn.linear_model.LogisticRegression`.
5. Evaluate the performance of this pipeline on the test dataset.
6. **Extra**: Add a `sklearn.preprocessing.PolynomialFeatures` to the pipeline and see if the performance improves.

In [None]:
# %load solutions/04-ex01-solution.py