# How to create pipeline in sklearn?
1. Pipeline of transforms with a final estimator.
2. The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. 
3. Apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’ which can be implemented with "fit" and "transform" methods.
4. The final estimator only needs to implement "fit". The transformers in the pipeline can be cached using "memory" argument.

## Step:1 Import libraries

In [3]:
# !pip install -U scikit-learn

In [4]:
from sklearn.svm import SVC
# StandardScaler subtracts the mean from each features and then scale to unit variance.
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline

## Step:2 Data Prepration

In [5]:
X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

## Step:3 Create Pipeline

In [6]:
pipeline = [('scaler', StandardScaler()), ('svc', SVC())]

# pipeline object
pipe = Pipeline(pipeline) 

In [7]:
# The pipeline can be used as any other estimator
# and avoids leaking the test set into the train set
print(pipe.fit(X_train, y_train))
# 
print(pipe.score(X_test, y_test))

Pipeline(steps=[('scaler', StandardScaler()), ('svc', SVC())])
0.88


In [8]:
scale = StandardScaler().fit(X_train)
X_train_scaled = scale.transform(X_train)
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
grid = GridSearchCV(SVC(), param_grid=parameters, cv=5)
grid.fit(X_train_scaled, y_train)