[![Open in Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/justmarkham/scikit-learn-tips/master?filepath=notebooks%2F48_pipeline_slicing.ipynb)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/justmarkham/scikit-learn-tips/blob/master/notebooks/48_pipeline_slicing.ipynb)

# 🤖⚡ scikit-learn tip #48 ([video](https://www.youtube.com/watch?v=sMlsd2CnIf4&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=48))

Want to operate on part of a Pipeline (instead of the whole thing)? Slice it using Python's slicing notation!

See example 👇

In [1]:
import pandas as pd
df = pd.read_csv('http://bit.ly/kaggletrain')

In [2]:
cols = ['Sex', 'Name', 'Age']
X = df[cols]
y = df['Survived']

In [3]:
from sklearn import set_config
set_config(display='diagram')

In [4]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.impute import SimpleImputer
from sklearn.feature_selection import SelectPercentile, chi2
from sklearn.linear_model import LogisticRegression
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

In [5]:
ct = ColumnTransformer(
    [('ohe', OneHotEncoder(), ['Sex']),
     ('vectorizer', CountVectorizer(), 'Name'),
     ('imputer', SimpleImputer(), ['Age'])])

In [6]:
fs = SelectPercentile(chi2, percentile=50)

In [7]:
clf = LogisticRegression(solver='liblinear', random_state=1)

In [8]:
# create Pipeline
pipe = Pipeline([('preprocessor', ct), ('feature selector', fs), ('classifier', clf)])
pipe

In [9]:
# access step 0 (preprocessor)
pipe[0].fit_transform(X)

<891x1512 sparse matrix of type '<class 'numpy.float64'>'
	with 5348 stored elements in Compressed Sparse Row format>

In [10]:
# access steps 0 and 1 (preprocessor and feature selector)
pipe[0:2].fit_transform(X, y)

<891x756 sparse matrix of type '<class 'numpy.float64'>'
	with 4128 stored elements in Compressed Sparse Row format>

In [11]:
# access step 1 (feature selector)
pipe[1].get_support()

array([ True,  True,  True, ...,  True, False,  True])

### Want more tips? [View all tips on GitHub](https://github.com/justmarkham/scikit-learn-tips) or [Sign up to receive 2 tips by email every week](https://scikit-learn.tips) 💌

© 2020 [Data School](https://www.dataschool.io). All rights reserved.