[![Open in Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/justmarkham/scikit-learn-tips/master?filepath=notebooks%2F44_parallel_processing.ipynb)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/justmarkham/scikit-learn-tips/blob/master/notebooks/44_parallel_processing.ipynb)

# 🤖⚡ scikit-learn tip #44 ([video](https://www.youtube.com/watch?v=QqFGKVieywY&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=44))

Want your grid search to run faster? Set n_jobs=-1 to use parallel processing with all CPUs!

See example 👇

In [1]:
import pandas as pd
df = pd.read_csv('http://bit.ly/kaggletrain')

In [2]:
cols = ['Sex', 'Name', 'Age']
X = df[cols]
y = df['Survived']

In [3]:
from sklearn import set_config
set_config(display='diagram')

In [4]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

In [5]:
ct = ColumnTransformer(
    [('ohe', OneHotEncoder(), ['Sex']),
     ('vectorizer', CountVectorizer(), 'Name'),
     ('imputer', SimpleImputer(), ['Age'])])

In [6]:
clf = LogisticRegression(solver='liblinear', random_state=1)

In [7]:
pipe = Pipeline([('preprocessor', ct), ('classifier', clf)])

In [8]:
params = {}
params['preprocessor__ohe__drop'] = [None, 'first']
params['preprocessor__vectorizer__min_df'] = [1, 2, 3]
params['preprocessor__vectorizer__ngram_range'] = [(1, 1), (1, 2)]
params['classifier__C'] = [0.001, 0.01, 0.1, 1, 10, 100, 1000]
params['classifier__penalty'] = ['l1', 'l2']

In [9]:
grid = GridSearchCV(pipe, params)
%time grid.fit(X, y)

CPU times: user 18.1 s, sys: 35.4 ms, total: 18.1 s
Wall time: 18.1 s


In [10]:
grid = GridSearchCV(pipe, params, n_jobs=-1)
%time grid.fit(X, y)

CPU times: user 734 ms, sys: 52.9 ms, total: 787 ms
Wall time: 6.34 s


### Want more tips? [View all tips on GitHub](https://github.com/justmarkham/scikit-learn-tips) or [Sign up to receive 2 tips by email every week](https://scikit-learn.tips) 💌

© 2020 [Data School](https://www.dataschool.io). All rights reserved.