#Demo 3: Encoding Categorical Variables Using One-Hot Encoding with Scikit-lear

##**Scenario: Customer Segmentation for Marketing Campaigns**
A company wants to personalize marketing campaigns by analyzing customer demographics and purchase behaviors. The dataset includes categorical variables like customer location, preferred product category, and membership type that must be converted into numerical format for machine learning models.

Since machine learning models work better with numerical data, One-Hot Encoding is needed to transform categorical features into a format that can be effectively used in predictive modeling.

##Objective

Convert categorical variables (e.g., region, product category, membership type) into numerical format using One-Hot Encoding.

* Ensure proper encoding without introducing bias or unnecessary dimensionality.

* Prepare the data for machine learning models like classification or clustering.

In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv("customer_segmentation.csv")

In [None]:
# Display the first few rows to inspect the dataset
print("Initial Dataset:\n", df.head())

# Check the data types to identify categorical variables
print("\nData Types:\n", df.dtypes)

In [None]:
# Define categorical columns that require encoding
categorical_cols = ["Region", "ProductCategory", "MembershipType"]


In [None]:
from sklearn.preprocessing import OneHotEncoder

# Initialize OneHotEncoder without the 'sparse' argument
encoder = OneHotEncoder(drop="first", handle_unknown='ignore')  # drop="first" avoids multicollinearity (dummy variable trap)

In [None]:
# Apply encoding on categorical columns
encoded_data = encoder.fit_transform(df[categorical_cols]).toarray()

In [None]:
# Convert encoded data to a DataFrame with meaningful column names
encoded_df = pd.DataFrame(encoded_data, columns=encoder.get_feature_names_out(categorical_cols))

In [None]:
# Combine encoded columns with the original dataset (excluding the original categorical columns)
df_encoded = pd.concat([df.drop(columns=categorical_cols), encoded_df], axis=1)

# Display the transformed dataset
print("\nDataset After One-Hot Encoding:\n", df_encoded.head())

In [None]:
# Save the final dataset after encoding
df_encoded.to_csv("encoded_customer_segmentation.csv", index=False)

print("\nEncoded dataset saved as 'encoded_customer_segmentation.csv'")