# A/B Testing for Advertising Campaign

**Author:** Walid Shehzad Ali  
**Dataset:** [Marketing A/B Testing Dataset](https://www.kaggle.com/datasets)  
**Contact:** [LinkedIn Profile](https://www.linkedin.com/in/walid-shehzad-ali/)

---

## Introduction
Let’s say you are working as a data analyst in a company, and your manager has asked you to conduct an analysis to determine the effectiveness of advertising on Audience 1 compared to Audience 2, which did not see the ads.
and Answer the following questions :

1. **Determine if the campaign succeeded.** 🤔
2. **Quantify how much of that success can be attributed to the advertisements.** 💰
3. **How can we optimize our campaign to get the most out of it?** 💰


### A/B testing 
also known as split testing, is a method used to compare two versions of a variable to determine which one performs better. It is commonly used in marketing, product development, and web design to optimize user experience and increase conversion rates.

#### Example Use Cases
- Testing different versions of a landing page to see which one generates more leads.
- Comparing two email subject lines to determine which one results in higher open rates.
- Evaluating different advertisement formats to see which drives more sales.

# About Dataset 📊

This section provides a description of the dataset used for the A/B testing analysis.

| Feature            | Description                                                                                   |
|--------------------|-----------------------------------------------------------------------------------------------|
| **Index**          | Row index                                                                                     |
| **User ID**       | User ID (unique) 👤                                                                           |
| **Test Group**     | Indicates if the user saw the advertisement ("ad") or only the public service announcement ("psa") 📢 |
| **Converted**      | Indicates if the user purchased the product (True) or not (False) 💸                           |
| **Total Ads**      | Total number of ads seen by the user 📈                                                       |
| **Most Ads Day**   | The day the user saw the highest number of ads 📅                                             |
| **Most Ads Hour**  | The hour of the day the user saw the highest number of ads ⏰                                   |

---

- Now in this case Audience no 1 = users in "ad" group
- and audience no 2 =             users in "psa" group
- converted  = True (succeeded)  , converted = False(not succeeded)

Now Lets Start Our Analysis to investigate for the questions asked by our Manager!

Lets Load the Data first.

In [1]:
import pandas as pd
import plotly.express as px

In [2]:
df = pd.read_csv('/kaggle/input/marketing-ab-testing/marketing_AB.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,user id,test group,converted,total ads,most ads day,most ads hour
0,0,1069124,ad,False,130,Monday,20
1,1,1119715,ad,False,93,Tuesday,22
2,2,1144181,ad,False,21,Tuesday,18
3,3,1435133,ad,False,355,Tuesday,10
4,4,1015700,ad,False,276,Friday,14


In [3]:
df.shape         # Check the shape 

(588101, 7)

In [4]:
df.isnull().sum()          # Check null values

Unnamed: 0       0
user id          0
test group       0
converted        0
total ads        0
most ads day     0
most ads hour    0
dtype: int64

In [5]:
df.info()       # check the information of each of the columns , it really helps to identify
                # the datatypes of each of the column.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 588101 entries, 0 to 588100
Data columns (total 7 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   Unnamed: 0     588101 non-null  int64 
 1   user id        588101 non-null  int64 
 2   test group     588101 non-null  object
 3   converted      588101 non-null  bool  
 4   total ads      588101 non-null  int64 
 5   most ads day   588101 non-null  object
 6   most ads hour  588101 non-null  int64 
dtypes: bool(1), int64(4), object(2)
memory usage: 27.5+ MB


### Now we are interested to find out the Conversion rate through ads
we need to breakdown "test group" column as this columns have users group information.

In [6]:
print(df['test group'].value_counts())

test group
ad     564577
psa     23524
Name: count, dtype: int64


## Observations_1 🔍

we have two types of user's in the test group, and there are 
- 564577 user's in the "ad" group and 
- 23524 user's are in the "psa" group.

now we need to know from "ad" group how many of the user's get converted 
- (0 for False 
- 1 for True)

In [7]:
print(df.groupby('test group')['converted'].value_counts())

test group  converted
ad          False        550154
            True          14423
psa         False         23104
            True            420
Name: count, dtype: int64


## Observations_2 🔍

- the "ad" group users converted is 14423 out of 564577(total ad group user's) 
- the "psa" group users converted is 420 out of 23524(total ad group user's) 

In [8]:
#if we find the percentage for both of the distributions for ad and psa who get converted by how much percent.
# total ads shows is 564577 out of which 14423 gets converted , and total psa is 23524 out of which 420 gets converted so so we need to find out the percentage for both distributions 
# Given data
total_ads = 564577
converted_ads = 14423
total_psa = 23524
converted_psa = 420

# Calculate percentages
conversion_percentage_ads = (converted_ads / total_ads) * 100
conversion_percentage_psa = (converted_psa / total_psa) * 100

# Display results
print(f"Conversion Percentage for Total Ads: {conversion_percentage_ads:.2f}%")
print(f"Conversion Percentage for Total PSA: {conversion_percentage_psa:.2f}%")


Conversion Percentage for Total Ads: 2.55%
Conversion Percentage for Total PSA: 1.79%


## Conclusion 📌
Total ads perform approximately 42.43% better than PSA in terms of conversion percentage
we get answers for 2 questions asked by manager after doing some analysis 🎉

- the compaign is successful in terms of advertisment as compared to psa group.
- the online advertisment is 42.43% better than psa (usually done through television , radio etc)

Now we need to Filtered out all the data related to "ad" group and "psa" group.
it will help us to manage results only related to "ad" group and psa group 

# Optimization ⚙️
we need to find what steps required to optimize the compaign , basically to answer the third question.

First we will filter all the user's which belongs to "ad" group and their convertion is True 

In [9]:
ad_group = df[(df['converted'] == True) & (df['test group'] == 'ad')] # filtering rows based on our conditon

In [10]:
most_ads_day = ad_group.groupby('most ads day').size().reset_index(name='count').sort_values(by='count', ascending=False)
print(most_ads_day)

  most ads day  count
1       Monday   2778
5      Tuesday   2270
3       Sunday   2027
0       Friday   1995
6    Wednesday   1963
4     Thursday   1711
2     Saturday   1679


## Observation_3 🔍 
you can see that here we get the list of days based on their convertion rate 
- Monday       2778
- Tuesday      2270
- Sunday       2027
- Friday       1995
- Wednesday    1963
- Thursday     1711
- Saturday     1679

# 📢📢 Note
Always use Visualization to support your Statement.

In [11]:
fig = px.bar(most_ads_day, 
              x='most ads day', 
              y='count', 
              title='How Many Users Converted on Every Single Day',
              labels={'most ads day': 'Weekdays', 'count': 'Users Converted'},
              text='count',  # Display the count on the bars
              color='count',  # Use 'count' for color mapping
              color_continuous_scale=px.colors.sequential.GnBu)  # Apply the GnBu color palette

# Show the figure
fig.show()

Now we can Check what are the most converted hours 

In [12]:
most_ads_hour = ad_group.groupby('most ads hour').size().reset_index(name='count').sort_values(by='count', ascending=False)
print(most_ads_hour)

    most ads hour  count
15             15   1279
14             14   1251
13             13   1140
16             16   1111
12             12   1092
11             11    992
17             17    959
18             18    853
21             21    843
20             20    843
10             10    818
19             19    782
22             22    675
9               9    582
23             23    449
8               8    337
7               7    114
0               0    102
1               1     62
6               6     46
2               2     39
3               3     27
5               5     16
4               4     11


In [13]:
fig = px.bar(most_ads_hour, 
              x='most ads hour', 
              y='count', 
              title='How Many Users Converted on an Hourly Basis',
              labels={'most ads hour': '24 Hours Distribution', 'count': 'Users Converted'},
              text='count',  # Display the count on the bars
              color='count',  # Use 'count' for color mapping
              color_continuous_scale=px.colors.sequential.GnBu)  # Apply the GnBu color palette

# Update layout for better readability
fig.update_layout(
    xaxis_tickvals=most_ads_hour['most ads hour'],
    xaxis_ticktext=[f"{hour % 12 or 12} {'AM' if hour < 12 else 'PM'}" for hour in most_ads_hour['most ads hour']]
)

# Show the figure
fig.show()

## Observation_4 🔍
From above chart we can easily filter out the most converted timings during weekdays. please it gives you timings and conversion for all days in combine.next we will analyze for each day separately.

- from 8am to 11am
- and 12pm to 11pm

but if we want to check most converted hours in every single day ??........
let do it! 
so for Monday, we need to breakdown the most converted hours

In [14]:
Monday=df[(df['converted'] == True) & (df['test group'] == 'ad') & (df['most ads day']=='Monday')]
Tuesday=df[(df['converted'] == True) & (df['test group'] == 'ad') & (df['most ads day']=='Tuesday')]
Wednesday=df[(df['converted'] == True) & (df['test group'] == 'ad') & (df['most ads day']=='Wednesday')]
Thursday=df[(df['converted'] == True) & (df['test group'] == 'ad') & (df['most ads day']=='Thursday')]
Friday=df[(df['converted'] == True) & (df['test group'] == 'ad') & (df['most ads day']=='Friday')]
Saturday=df[(df['converted'] == True) & (df['test group'] == 'ad') & (df['most ads day']=='Saturday')]
Sunday=df[(df['converted'] == True) & (df['test group'] == 'ad') & (df['most ads day']=='Sunday')]


In [15]:
monday_ads=Monday.groupby('most ads hour').size().reset_index(name='count').sort_values(by='count', ascending=False)
Tuesday_ads=Tuesday.groupby('most ads hour').size().reset_index(name='count').sort_values(by='count', ascending=False)
Wednesday_ads=Wednesday.groupby('most ads hour').size().reset_index(name='count').sort_values(by='count', ascending=False)
Thursday_ads=Thursday.groupby('most ads hour').size().reset_index(name='count').sort_values(by='count', ascending=False)
Friday_ads=Friday.groupby('most ads hour').size().reset_index(name='count').sort_values(by='count', ascending=False)
Saturday_ads=Saturday.groupby('most ads hour').size().reset_index(name='count').sort_values(by='count', ascending=False)
Sunday_ads=Sunday.groupby('most ads hour').size().reset_index(name='count').sort_values(by='count', ascending=False)


In [16]:
import pandas as pd
import plotly.express as px


# List of days
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

# Initialize a dictionary to hold the figures for each day
figures = {}

color_palette = px.colors.sequential.GnBu  

for day in days:
    # Filter the DataFrame for the specific day
    day_data = df[(df['converted'] == True) & (df['test group'] == 'ad') & (df['most ads day'] == day)]
    
    # Group by 'most ads hour' and count
    day_ads = day_data.groupby('most ads hour').size().reset_index(name='count').sort_values(by='count', ascending=False)
    
    # Create a bar figure for the current day
    fig = px.bar(day_ads, 
                 x='most ads hour', 
                 y='count', 
                 title=f'User Conversions for {day} in ad group on Hourly Basis',
                 labels={'most ads hour': 'Most Ads Hour', 'count': 'Users Converted'},
                 text='count',
                 color='count',  # Optional: use 'count' for a color scale based on the count
                 color_continuous_scale=color_palette)  # Apply the color palette
    
    # Update layout for better readability
    fig.update_layout(
        xaxis_tickvals=day_ads['most ads hour'],
        xaxis_ticktext=[f"{hour % 12 or 12} {'AM' if hour < 12 else 'PM'}" for hour in day_ads['most ads hour']]
    )
    
    # Store the figure in the dictionary
    figures[day] = fig

for day in days:
    figures[day].show()  

## Observation_5 🔍
from above Charts we get the exact hours in which most of conversion done , here is the breakdown 
- **Monday** : Ad should show from 9Am to 10PM
- **tuesday**  Ad should show from 9Am to  6PM
- **wednesday** Ad should show from 10Am to 6PM
- **thursday** Ad should show from 11Am to 8PM
- **friday**   Ad should show from 10Am to 10PM
- **saturday** Ad should show from 12pm to 9PM
- **sunday**   Ad should show from 10Am to 10PM


# psa Group 

In [17]:
psa_group = df[(df['converted'] == True) & (df['test group'] == 'psa')]


In [18]:
most_ads_day = psa_group.groupby('most ads day').size().reset_index(name='count').sort_values(by='count', ascending=False)
print(most_ads_day)

  most ads day  count
1       Monday     79
4     Thursday     79
3       Sunday     63
0       Friday     62
6    Wednesday     55
5      Tuesday     42
2     Saturday     40


most ads day  count
1       Monday     79
4     Thursday     79
3       Sunday     63
0       Friday     62
6    Wednesday     55
5      Tuesday     42
2     Saturday     40

- above is the  user conversion breakdown accross weekdays

As i said, support your statements by what ? yes by Visualization thats correct 

In [19]:
fig = px.bar(most_ads_day, 
              x='most ads day', 
              y='count', 
              title='How Many Users Converted in psa on Every Single Day',
              labels={'most ads day': 'Weekdays', 'count': 'Users Converted'},
              text='count',  # Display the count on the bars
              color='count',  # Use 'count' for color mapping
              color_continuous_scale=px.colors.sequential.GnBu)  # Apply the GnBu color palette

# Show the figure
fig.show()

In [20]:
most_ads_hour = psa_group.groupby('most ads hour').size().reset_index(name='count').sort_values(by='count', ascending=False)


In [21]:
fig = px.bar(most_ads_hour, 
              x='most ads hour', 
              y='count', 
              title='How Many Users Converted in psa group on an Hourly Basis',
              labels={'most ads hour': '24 Hours Distribution', 'count': 'Users Converted'},
              text='count',  # Display the count on the bars
              color='count',  # Use 'count' for color mapping
              color_continuous_scale=px.colors.sequential.GnBu)  # Apply the GnBu color palette

# Update layout for better readability
fig.update_layout(
    xaxis_tickvals=most_ads_hour['most ads hour'],
    xaxis_ticktext=[f"{hour % 12 or 12} {'AM' if hour < 12 else 'PM'}" for hour in most_ads_hour['most ads hour']]
)

# Show the figure
fig.show()

## Observation_6 🔍 
- Hours should from 3pm to 4 pm 

## Recommendations 💡

These are the steps to optimize this campaign according to the ad group:

- The ad should run all days during the week.
- The marketing team should schedule the campaign timings as follows:
  
  - **Monday**: Ad should show from 9 AM to 10 PM
  - **Tuesday**: Ad should show from 9 AM to 6 PM
  - **Wednesday**: Ad should show from 10 AM to 6 PM
  - **Thursday**: Ad should show from 11 AM to 8 PM
  - **Friday**: Ad should show from 10 AM to 10 PM
  - **Saturday**: Ad should show from 12 PM to 9 PM
  - **Sunday**: Ad should show from 10 AM to 10 PM

- For the Public Service Announcement (PSA), it should be scheduled between 3 PM and 4 PM (PSA is a marketing medium and can be done through television, radio, etc.).

## Final Words ✨

Do you have any questions or need suggestions for improvements in my notebook? Feel free to leave your comments! If you enjoyed this notebook, please upvote it, as your encouragement motivates me to create more!

Thank you! 🙏