# **Predicting Click-Through Rate with XGBoost**

This section discusses the steps to predict the click-through rate with the help of the XGBoost algorithm. The dataset contains 10 columns, with 9 of them representing an instance of a data record as features. We will be predicting the "Clicked on Ad" column, which represents if the visitor clicks on the ad.

Follow the steps given below to create a click-through rate prediction model using the XGBoost algorithm:

**Step 1: Import Necessary Libraries**

* In this step, you will be importing the necessary libraries that you can use to predict the click-through rate.
* It includes data manipulation libraries such as pandas and numpy, LabelEncoder to encode the categorical features into numerical value, train_test_split to split the dataset into training and test sets, XGBoost library to leverage its powers to predict the clicks, and accuracy_score to measure how our model performs.

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier as xgb
from sklearn.metrics import accuracy_score

**Step 2: Reading Data and Performing Basic Analysis**

In [2]:
data = pd.read_csv("/content/sample_data/ad_10000records.csv")
print(data.head())

   Daily Time Spent on Site   Age  Area Income  Daily Internet Usage  \
0                     62.26  32.0     69481.85                172.83   
1                     41.73  31.0     61840.26                207.17   
2                     44.40  30.0     57877.15                172.83   
3                     59.88  28.0     56180.93                207.17   
4                     49.21  30.0     54324.73                201.58   

                         Ad Topic Line             City  Gender  \
0      Decentralized real-time circuit         Lisafort    Male   
1       Optional full-range projection  West Angelabury    Male   
2  Total 5thgeneration standardization        Reyesfurt  Female   
3          Balanced empowering success      New Michael  Female   
4  Total 5thgeneration standardization     West Richard  Female   

                        Country            Timestamp  Clicked on Ad  
0  Svalbard & Jan Mayen Islands  2016-06-09 21:43:05              0  
1                     Si

In the above output, "Clicked on Ad" column, 0 represents the users not clicking on the ad, and 1 represents the user who click on the ads. Let's look at the total value counts of the combined 0's and 1's and check for the click through rate.


In [3]:
print(data["Clicked on Ad"].value_counts())

# Click through rate
click_through_rate = 4917 / 10000 * 100
print(f"The click through rate is: {click_through_rate}%")

Clicked on Ad
0    5083
1    4917
Name: count, dtype: int64
The click through rate is: 49.17%


**Step 3: Data Preprocessing**

This encodes the "Gender" variable into numeric values.

In [4]:
le = LabelEncoder()

data["Gender"] = le.fit_transform(data["Gender"])

**Step 4: Data Splitting**



In [5]:
# Taking the first seven columns of the dataset as features
x=data.iloc[:,0:7]
# Dropping the Categorical Variables
x=x.drop(['Ad Topic Line','City'],axis=1)

# Assigning the final variable as the target variable
y=data.iloc[:,9]
# Using train test split to split the dataset
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)

**Step 5: Defining & Training a XGBoost Classifier Model**

In [6]:
# XGBoostClassifier model
model = xgb(random_state=42)

# Training the model
model.fit(x_train, y_train)

**Step 6: Predicting & Checking Accuracy of the Model**

In [7]:
# Predicting test dataset values with the model
y_pred = model.predict(x_test)

# Accuracy check of the model prediction
print("The model accuracy is", accuracy_score(y_test,y_pred))

The model accuracy is 0.793


So, the model is accurate almost 79% of the times.