# Dummy encoding

- Dummy encoding is a technique used in data analysis and machine learning to convert categorical variables into numerical format, making them suitable for mathematical modeling. In Python, the Pandas library provides a convenient way to perform dummy encoding. Let's break down the process in an easy-to-understand manner:

### Understanding Categorical Variables:

- Categorical variables are those that represent categories or labels, not numerical values.
- Examples include "Color" with categories like "Red," "Blue," and "Green," or "City" with categories like "New York," "London," and "Tokyo."

### Dummy Encoding Basics:

- Dummy encoding creates binary (0 or 1) columns for each category in a categorical variable.
- For each unique category, a new column is added. If a data point belongs to that category, the corresponding column gets a 1; otherwise, it gets a 0.

### Pandas get_dummies Function:

- Pandas provides a function called get_dummies for dummy encoding.
- This function takes a DataFrame and the column(s) containing categorical variables as input and returns a new DataFrame with dummy-encoded columns.

### Example:

- Let's say you have a DataFrame with a "Color" column containing categorical values: "Red," "Blue," and "Green."
- Applying pd.get_dummies(df['Color']) would create three new columns: "Color_Red," "Color_Blue," and "Color_Green."
- If a row originally had "Blue" in the "Color" column, the new columns would have values 0, 1, and 0, respectively.

In [3]:
import pandas as pd

# Sample DataFrame
data = {'Color': ['Red', 'Blue', 'Green', 'Red', 'Green']}
df = pd.DataFrame(data)

df

Unnamed: 0,Color
0,Red
1,Blue
2,Green
3,Red
4,Green


In [5]:
# Dummy encoding
dummy_df = pd.get_dummies(df['Color'], prefix='Color')

dummy_df

Unnamed: 0,Color_Blue,Color_Green,Color_Red
0,0,0,1
1,1,0,0
2,0,1,0
3,0,0,1
4,0,1,0


In [6]:
# Concatenate dummy columns with the original DataFrame
df_encoded = pd.concat([df, dummy_df], axis=1)

df_encoded

Unnamed: 0,Color,Color_Blue,Color_Green,Color_Red
0,Red,0,0,1
1,Blue,1,0,0
2,Green,0,1,0
3,Red,0,0,1
4,Green,0,1,0
