**For a given set of training data examples stored in a .CSV file, implement and demonstrate the conversion of categorical data to numeric of Tips.csv file using python libraries.**

**Dataset: https://www.kaggle.com/datasets/hnazari8665/tipscsv** 

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder , OneHotEncoder


In [2]:
# Load the dataset
df = pd.read_csv('/kaggle/input/categorical-data/tips.csv')

#Display the first few rows of the dataframe to understand ther structure
print(df.head())

   total_bill   tip     sex smoker  day    time  size  price_per_person  \
0       16.99  1.01  Female     No  Sun  Dinner     2              8.49   
1       10.34  1.66    Male     No  Sun  Dinner     3              3.45   
2       21.01  3.50    Male     No  Sun  Dinner     3              7.00   
3       23.68  3.31    Male     No  Sun  Dinner     2             11.84   
4       24.59  3.61  Female     No  Sun  Dinner     4              6.15   

           Payer Name         CC Number Payment ID  
0  Christy Cunningham  3560325168603410    Sun2959  
1      Douglas Tucker  4478071379779230    Sun4608  
2      Travis Walters  6011812112971322    Sun4458  
3    Nathaniel Harris  4676137647685994    Sun5260  
4        Tonya Carter  4832732618637221    Sun2251  


# Identify Catagorical Columns

**To proceed, you need to identify which columns are catagorical. Usually, catagorical data can be strings or objects in a dataframe**

In [3]:
# Display column types to find which ones are catagorical
print(df.dtypes)

# Alternatively, you can manually list the columns
catagorical_columns = ['sex', 'smoker', 'day', 'time'] #Example of tips dataset

total_bill          float64
tip                 float64
sex                  object
smoker               object
day                  object
time                 object
size                  int64
price_per_person    float64
Payer Name           object
CC Number             int64
Payment ID           object
dtype: object


# Convert Catagorical Data to Numeric

**There are two common methods for converting data to numeric:**

**Label Encoding: Converts categories into integers (useful for ordinary categories). One Hot Encoding: Converts 

# Label Encoding for Ordinal Catagories

**Label Encoding is useful when the categoeical values have a natural order(for example, low, medium, high)**

In [4]:
# Apply Label Encoding
label_encoder = LabelEncoder()

df['sex'] = label_encoder.fit_transform(df['sex'])
df['smoker'] = label_encoder.fit_transform(df['smoker'])
df['day'] = label_encoder.fit_transform(df['day'])
df['time'] = label_encoder.fit_transform(df['time'])

# Display the transformed dataframe
print(df.head())


   total_bill   tip  sex  smoker  day  time  size  price_per_person  \
0       16.99  1.01    0       0    2     0     2              8.49   
1       10.34  1.66    1       0    2     0     3              3.45   
2       21.01  3.50    1       0    2     0     3              7.00   
3       23.68  3.31    1       0    2     0     2             11.84   
4       24.59  3.61    0       0    2     0     4              6.15   

           Payer Name         CC Number Payment ID  
0  Christy Cunningham  3560325168603410    Sun2959  
1      Douglas Tucker  4478071379779230    Sun4608  
2      Travis Walters  6011812112971322    Sun4458  
3    Nathaniel Harris  4676137647685994    Sun5260  
4        Tonya Carter  4832732618637221    Sun2251  


# One-Hot Encoding for Nominal Categories

**One-Hot Encoding is useful when the cateforical values don't have any natural order (for example, 'male', 'female').**

In [5]:
# Apply One-Hot Encoding
df_encoded  = pd.get_dummies(df,columns=['sex', 'smoker', 'day', 'time'], drop_first=True)

#display the transformed dataframe
print(df_encoded.head())

   total_bill   tip  size  price_per_person          Payer Name  \
0       16.99  1.01     2              8.49  Christy Cunningham   
1       10.34  1.66     3              3.45      Douglas Tucker   
2       21.01  3.50     3              7.00      Travis Walters   
3       23.68  3.31     2             11.84    Nathaniel Harris   
4       24.59  3.61     4              6.15        Tonya Carter   

          CC Number Payment ID  sex_1  smoker_1  day_1  day_2  day_3  time_1  
0  3560325168603410    Sun2959  False     False  False   True  False   False  
1  4478071379779230    Sun4608   True     False  False   True  False   False  
2  6011812112971322    Sun4458   True     False  False   True  False   False  
3  4676137647685994    Sun5260   True     False  False   True  False   False  
4  4832732618637221    Sun2251  False     False  False   True  False   False  
