# A/B Testing

Embarking on an A/B testing project is akin to delving into the dynamic world of marketing experimentation, where the efficacy of two distinct strategies is meticulously scrutinized to shape future campaigns. Allow me to share a personal experience – initiating an Instagram ad campaign to promote a post marked the inception of this journey. The audience targeted in this maiden campaign differed significantly from that of the subsequent one. Delving into the results, a consistent preference emerged for the audience of the second campaign, showcasing superior reach and heightened follower engagement compared to its predecessor.

In essence, A/B testing becomes a strategic compass, guiding decisions to boost sales, gain followers, or enhance website traffic based on the lessons gleaned from prior marketing endeavors.

Now, let's transition from theory to practice. To bring this A/B testing project to life using Python, we need a dataset comparing the outcomes of two diverse marketing approaches aligned with the same goal. I've curated a dataset precisely for this purpose, and you can access it here.

In the forthcoming section, join me as we navigate the intricacies of A/B Testing using Python, turning data into actionable insights for our marketing strategies.

## Importing datasets and libraries

In [1]:
import pandas as pd
import datetime
from datetime import date, timedelta
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_white"

control_data = pd.read_csv("control_group.csv", sep = ";")
test_data = pd.read_csv("test_group.csv", sep = ";")

In [2]:
# Let us look at the control data table.
print(control_data.head())

      Campaign Name       Date  Spend [USD]  # of Impressions     Reach  \
0  Control Campaign  1.08.2019         2280           82702.0   56930.0   
1  Control Campaign  2.08.2019         1757          121040.0  102513.0   
2  Control Campaign  3.08.2019         2343          131711.0  110862.0   
3  Control Campaign  4.08.2019         1940           72878.0   61235.0   
4  Control Campaign  5.08.2019         1835               NaN       NaN   

   # of Website Clicks  # of Searches  # of View Content  # of Add to Cart  \
0               7016.0         2290.0             2159.0            1819.0   
1               8110.0         2033.0             1841.0            1219.0   
2               6508.0         1737.0             1549.0            1134.0   
3               3065.0         1042.0              982.0            1183.0   
4                  NaN            NaN                NaN               NaN   

   # of Purchase  
0          618.0  
1          511.0  
2          372.0  
3   

In [3]:
# Let us also look at the test data table.
print(test_data.head())

   Campaign Name       Date  Spend [USD]  # of Impressions  Reach  \
0  Test Campaign  1.08.2019         3008             39550  35820   
1  Test Campaign  2.08.2019         2542            100719  91236   
2  Test Campaign  3.08.2019         2365             70263  45198   
3  Test Campaign  4.08.2019         2710             78451  25937   
4  Test Campaign  5.08.2019         2297            114295  95138   

   # of Website Clicks  # of Searches  # of View Content  # of Add to Cart  \
0                 3038           1946               1069               894   
1                 4657           2359               1548               879   
2                 7885           2572               2367              1268   
3                 4216           2216               1437               566   
4                 5863           2106                858               956   

   # of Purchase  
0            255  
1            677  
2            578  
3            340  
4            768  


## Data Preparation

In [4]:
control_data.columns = ["Campaign Name", "Date", "Amount Spent",
                        "Number of Impressions", "Reach", "Website Clicks",
                        "Searches Received", "Content Viewed", "Added to Cart",
                        "Purchases"]

test_data.columns = ["Campaign Name", "Date", "Amount Spent",
                        "Number of Impressions", "Reach", "Website Clicks",
                        "Searches Received", "Content Viewed", "Added to Cart",
                        "Purchases"]

Let us check for missing values:

In [5]:
print(control_data.isnull().sum())

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    1
Reach                    1
Website Clicks           1
Searches Received        1
Content Viewed           1
Added to Cart            1
Purchases                1
dtype: int64


In [6]:
print(test_data.isnull().sum())

Campaign Name            0
Date                     0
Amount Spent             0
Number of Impressions    0
Reach                    0
Website Clicks           0
Searches Received        0
Content Viewed           0
Added to Cart            0
Purchases                0
dtype: int64


The dataset of the control campaign has missing values in a row. Let’s fill in these missing values by the mean value of each column:

In [7]:
control_data["Number of Impressions"].fillna(value=control_data["Number of Impressions"].mean(),
                                             inplace=True)
control_data["Reach"].fillna(value=control_data["Reach"].mean(),
                             inplace=True)
control_data["Website Clicks"].fillna(value=control_data["Website Clicks"].mean(),
                                      inplace=True)
control_data["Searches Received"].fillna(value=control_data["Searches Received"].mean(),
                                         inplace=True)
control_data["Content Viewed"].fillna(value=control_data["Content Viewed"].mean(),
                                      inplace=True)
control_data["Added to Cart"].fillna(value=control_data["Added to Cart"].mean(),
                                     inplace=True)
control_data["Purchases"].fillna(value=control_data["Purchases"].mean(),
                                 inplace=True)

## Creating A/B dataset

We will now create a new table by merging the previous 2 together.

In [8]:
ab_data = control_data.merge(test_data,
                             how="outer").sort_values(["Date"])
ab_data = ab_data.reset_index(drop=True)
print(ab_data.head())

      Campaign Name        Date  Amount Spent  Number of Impressions    Reach  \
0  Control Campaign   1.08.2019          2280                82702.0  56930.0   
1     Test Campaign   1.08.2019          3008                39550.0  35820.0   
2     Test Campaign  10.08.2019          2790                95054.0  79632.0   
3  Control Campaign  10.08.2019          2149               117624.0  91257.0   
4     Test Campaign  11.08.2019          2420                83633.0  71286.0   

   Website Clicks  Searches Received  Content Viewed  Added to Cart  Purchases  
0          7016.0             2290.0          2159.0         1819.0      618.0  
1          3038.0             1946.0          1069.0          894.0      255.0  
2          8125.0             2312.0          1804.0          424.0      275.0  
3          2277.0             2475.0          1984.0         1629.0      734.0  
4          3750.0             2893.0          2617.0         1075.0      668.0  


  ab_data = control_data.merge(test_data,


Let us see if the table has equal amounts of data from both tables.

In [9]:
print(ab_data["Campaign Name"].value_counts())

Control Campaign    30
Test Campaign       30
Name: Campaign Name, dtype: int64


The dataset has 30 samples for each campaign. Now let’s start with A/B testing to find the best marketing strategy.

## A/B Testing

In [10]:
figure = px.scatter(data_frame = ab_data,
                    x="Number of Impressions",
                    y="Amount Spent",
                    size="Amount Spent",
                    color= "Campaign Name",
                    trendline="ols")
figure.show()

The control campaign resulted in more impressions according to the amount spent on both campaigns. Now let’s have a look at the number of searches performed on the website from both campaigns:

In [17]:
def create_pie_plot(data):
  label = [f"{data} from Control Campaign",
          f"{data} from Test Campaign"]
  counts = [round(sum(control_data[data])),
            round(sum(test_data[data]))]
  colors = ['gold','lightgreen']
  fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
  fig.update_layout(title_text=f'Control Vs Test: {data}')
  fig.update_traces(hoverinfo='label+percent', textinfo='value',
                    textfont_size=30,
                    marker=dict(colors=colors,
                                line=dict(color='black', width=3)))
  fig.show()

In [20]:
create_pie_plot("Searches Received")

The test campaign resulted in more searches on the website. Now let’s have a look at the number of website clicks from both campaigns:

In [18]:
create_pie_plot("Website Clicks")

The test campaign wins in the number of website clicks. Now let’s have a look at the amount of content viewed after reaching the website from both campaigns:

In [19]:
create_pie_plot("Content Viewed")

The audience of the control campaign viewed more content than the test campaign. Although there is not much difference, as the website clicks of the control campaign were low, its engagement on the website is higher than the test campaign.

Now let’s have a look at the number of products added to the cart from both campaigns:

In [21]:
create_pie_plot("Added to Cart")

Despite low website clicks more products were added to the cart from the control campaign. Now let’s have a look at the amount spent on both campaigns:

In [22]:
create_pie_plot("Amount Spent")

The amount spent on the test campaign is higher than the control campaign. But as we can see that the control campaign resulted in more content views and more products in the cart, the control campaign is more efficient than the test campaign.

Now let’s have a look at the purchases made by both campaigns:

In [23]:
create_pie_plot("Purchases")

There’s only a difference of around 1% in the purchases made from both ad campaigns. As the Control campaign resulted in more sales in less amount spent on marketing, the control campaign wins here!

Now let’s analyze some metrics to find which ad campaign converts more. I will first look at the relationship between the number of website clicks and content viewed from both campaigns:

In [None]:
figure = px.scatter(data_frame = ab_data,
                    x="Content Viewed",
                    y="Website Clicks",
                    size="Website Clicks",
                    color= "Campaign Name",
                    trendline="ols")
figure.show()

The website clicks are higher in the test campaign, but the engagement from website clicks is higher in the control campaign. So the control campaign wins!

Now I will analyze the relationship between the amount of content viewed and the number of products added to the cart from both campaigns:

In [24]:
figure = px.scatter(data_frame = ab_data,
                    x="Added to Cart",
                    y="Content Viewed",
                    size="Added to Cart",
                    color= "Campaign Name",
                    trendline="ols")
figure.show()

Again, the control campaign wins! Now let’s have a look at the relationship between the number of products added to the cart and the number of sales from both campaigns:

In [25]:
figure = px.scatter(data_frame = ab_data,
                    x="Purchases",
                    y="Added to Cart",
                    size="Purchases",
                    color= "Campaign Name",
                    trendline="ols")
figure.show()

Although the control campaign resulted in more sales and more products in the cart, the conversation rate of the test campaign is higher.

## Conclusion

Based on the outcomes of the A/B tests outlined above, it was evident that the control campaign outperformed in terms of both sales and visitor engagement. The control campaign not only garnered more views for the products but also led to a higher number of products being added to the cart and subsequently resulted in increased sales. However, when examining the conversion rate of products in the cart, the test campaign showcased a higher efficiency.

While the test campaign excelled in generating sales concerning the products viewed and added to the cart, the overall sales figure favored the control campaign. Consequently, a strategic approach emerges: the Test campaign proves effective for targeting a specific product to a particular audience, while the Control campaign shines when marketing multiple products to a broader audience. This nuanced understanding allows for tailored and optimized marketing strategies based on campaign objectives and target audience characteristics.