[![Open in Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/justmarkham/scikit-learn-tips/master?filepath=notebooks%2F41_drop_if_binary.ipynb)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/justmarkham/scikit-learn-tips/blob/master/notebooks/41_drop_if_binary.ipynb)

# 🤖⚡ scikit-learn tip #41 ([video](https://www.youtube.com/watch?v=6EtfLjKhIec&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=41))

New in version 0.23: Use drop='if_binary' with OneHotEncoder to drop the first category ONLY if it's a binary feature (meaning it has exactly two categories).

See example 👇

In [1]:
import pandas as pd
X = pd.DataFrame({'Shape':['circle', 'oval', 'square', 'square'],
                  'Color': ['pink', 'yellow', 'pink', 'yellow']})

In [2]:
from sklearn.preprocessing import OneHotEncoder

In [3]:
# Shape has 3 categories, Color has 2 categories
X

Unnamed: 0,Shape,Color
0,circle,pink
1,oval,yellow
2,square,pink
3,square,yellow


In [4]:
# drop=None (default) creates one feature column per category
ohe = OneHotEncoder(sparse=False, drop=None)
ohe.fit_transform(X)

array([[1., 0., 0., 1., 0.],
       [0., 1., 0., 0., 1.],
       [0., 0., 1., 1., 0.],
       [0., 0., 1., 0., 1.]])

In [5]:
# drop='first' drops the first category in each feature
ohe = OneHotEncoder(sparse=False, drop='first')
ohe.fit_transform(X)

array([[0., 0., 0.],
       [1., 0., 1.],
       [0., 1., 0.],
       [0., 1., 1.]])

In [6]:
# drop='if_binary' drops the first category of binary features
ohe = OneHotEncoder(sparse=False, drop='if_binary')
ohe.fit_transform(X)

array([[1., 0., 0., 0.],
       [0., 1., 0., 1.],
       [0., 0., 1., 0.],
       [0., 0., 1., 1.]])

### Want more tips? [View all tips on GitHub](https://github.com/justmarkham/scikit-learn-tips) or [Sign up to receive 2 tips by email every week](https://scikit-learn.tips) 💌

© 2020 [Data School](https://www.dataschool.io). All rights reserved.