<a href="https://colab.research.google.com/github/trDalmi/Data-Science-Portfolio/blob/main/Hypothesis_testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
from scipy.stats import ttest_ind

df = pd.read_csv("/content/website_ab_test.csv")
print(df.head())

         Theme  Click Through Rate  Conversion Rate  Bounce Rate  \
0  Light Theme            0.054920         0.282367     0.405085   
1  Light Theme            0.113932         0.032973     0.732759   
2   Dark Theme            0.323352         0.178763     0.296543   
3  Light Theme            0.485836         0.325225     0.245001   
4  Light Theme            0.034783         0.196766     0.765100   

   Scroll_Depth  Age   Location  Session_Duration Purchases Added_to_Cart  
0     72.489458   25    Chennai              1535        No           Yes  
1     61.858568   19       Pune               303        No           Yes  
2     45.737376   47    Chennai               563       Yes           Yes  
3     76.305298   58       Pune               385       Yes            No  
4     48.927407   25  New Delhi              1437        No            No  


In [8]:
df.shape

(1000, 10)

In [7]:
summary = {
    'Number of Records' : df.shape[0],
    'Number of Columns' : df.shape[1],
    'Missing Values' : df.isnull().sum(),
    'Numerical Columns Summary' : df.describe()
}
summary

{'Number of Records': 1000,
 'Number of Columns': 10,
 'Missing Values': Theme                 0
 Click Through Rate    0
 Conversion Rate       0
 Bounce Rate           0
 Scroll_Depth          0
 Age                   0
 Location              0
 Session_Duration      0
 Purchases             0
 Added_to_Cart         0
 dtype: int64,
 'Numerical Columns Summary':        Click Through Rate  Conversion Rate  Bounce Rate  Scroll_Depth  \
 count         1000.000000      1000.000000  1000.000000   1000.000000   
 mean             0.256048         0.253312     0.505758     50.319494   
 std              0.139265         0.139092     0.172195     16.895269   
 min              0.010767         0.010881     0.200720     20.011738   
 25%              0.140794         0.131564     0.353609     35.655167   
 50%              0.253715         0.252823     0.514049     51.130712   
 75%              0.370674         0.373040     0.648557     64.666258   
 max              0.499989         0.49891

In [25]:
unique_locations = df['Location'].unique()
unique_locations

array(['Chennai', 'Pune', 'New Delhi', 'Kolkata', 'Bangalore'],
      dtype=object)

In [36]:
df['Purchases'] = df['Purchases'].map({'Yes': 1, 'No': 0})
df['Added_to_Cart'] = df['Added_to_Cart'].map({'Yes': 1, 'No': 0})
df1 = df.drop(columns = ['Location','Purchases','Added_to_Cart'])
df1

Unnamed: 0,Theme,Click Through Rate,Conversion Rate,Bounce Rate,Scroll_Depth,Age,Session_Duration
0,Light Theme,0.054920,0.282367,0.405085,72.489458,25,1535
1,Light Theme,0.113932,0.032973,0.732759,61.858568,19,303
2,Dark Theme,0.323352,0.178763,0.296543,45.737376,47,563
3,Light Theme,0.485836,0.325225,0.245001,76.305298,58,385
4,Light Theme,0.034783,0.196766,0.765100,48.927407,25,1437
...,...,...,...,...,...,...,...
995,Dark Theme,0.282792,0.401605,0.200720,68.478822,25,321
996,Dark Theme,0.299917,0.026372,0.762641,73.019821,38,1635
997,Light Theme,0.370254,0.019838,0.607136,33.963298,32,1237
998,Light Theme,0.095815,0.137953,0.458898,37.429284,24,893


In [37]:
# grouping data by theme and calculating mean values for the metrics
theme_performance = df1.groupby('Theme').mean()
# sorting the data by conversion rate for a better comparison
theme_performance_sorted = theme_performance.sort_values(by='Conversion Rate', ascending = False)
print(theme_performance_sorted)

             Click Through Rate  Conversion Rate  Bounce Rate  Scroll_Depth  \
Theme                                                                         
Light Theme            0.247109         0.255459     0.499035     50.735232   
Dark Theme             0.264501         0.251282     0.512115     49.926404   

                   Age  Session_Duration  
Theme                                     
Light Theme  41.734568        930.833333  
Dark Theme   41.332685        919.482490  


#Hypothesis Testing
Let’s start with hypothesis testing based on the Conversion Rate between the Light Theme and Dark Theme. Our hypotheses are as follows:

**Null Hypothesis (H0​):** There is no difference in Conversion Rates between the Light Theme and Dark Theme.


**Alternative Hypothesis (Ha​):** There is a difference in Conversion Rates between the Light Theme and Dark Theme.


We’ll use a two-sample t-test to compare the means of the two independent samples on conversion rates. Let’s proceed with the test:

In [38]:
# extracting conversion rates for both themes
conversion_rates_light = df[df['Theme'] == 'Light Theme']['Conversion Rate']
conversion_rates_dark = df[df['Theme'] == 'Dark Theme']['Conversion Rate']

# performing a two-sampe test
t_stat,p_value = ttest_ind(conversion_rates_light,conversion_rates_dark)
t_stat,p_value

(0.4744928265361651, 0.6352523154387317)

We’ll perform a two-sample t-test on the CTR for both themes. Let’s proceed with the calculation:

In [39]:
# extracting click thorugh rates for both themes
ctr_light = df[df['Theme'] == 'Light Theme']['Click Through Rate']
ctr_dark = df[df['Theme'] == 'Dark Theme']['Click Through Rate']

# performing a two-sample t-test
t_stat_ctr,p_value_ctr = ttest_ind(ctr_light,ctr_dark)
t_stat_ctr,p_value_ctr

(-1.9767016530706143, 0.04835031140582486)

In [40]:
# extracting bounce rates for both themes
bounce_rates_light = df[df['Theme'] == 'Light Theme']['Bounce Rate']
bounce_rates_dark = df[df['Theme'] == 'Dark Theme']['Bounce Rate']

# performing a two-sample t-test
t_stat_bounce,p_value_bounce = ttest_ind(bounce_rates_light,bounce_rates_dark)

In [42]:
# extracting scroll depths for both themes
scroll_depths_light = df[df['Theme'] == 'Light Theme']['Scroll_Depth']
scroll_depths_dark = df[df['Theme'] == 'Dark Theme']['Scroll_Depth']

# performing a two-sample t-test for scroll depth
t_stat_scroll,p_value_scroll = ttest_ind(scroll_depths_light,scroll_depths_dark)
#

In [45]:
# Creating a table for comparison
comparison_table = pd.DataFrame({
    'Metric': ['Conversion Rate', 'Click Through Rate', 'Bounce Rate', 'Scroll Depth'],
    'T-Statistic': [t_stat, t_stat_ctr, t_stat_bounce, t_stat_scroll],
    'P-Value': [p_value, p_value_ctr, p_value_bounce, p_value_scroll]
})

comparison_table

Unnamed: 0,Metric,T-Statistic,P-Value
0,Conversion Rate,0.474493,0.635252
1,Click Through Rate,-1.976702,0.04835
2,Bounce Rate,-1.200821,0.230106
3,Scroll Depth,0.75648,0.44954


* Click Through Rate:  The test reveals a statistically significant difference, with the Dark Theme likely performing better (P-Value = 0.048).


* Conversion Rate: No statistically significant difference was found (P-Value = 0.635).


* Bounce Rate: There’s no statistically significant difference in Bounce Rates between the themes (P-Value = 0.230).


* Scroll Depth: Similarly, no statistically significant difference is observed in Scroll Depths (P-Value = 0.450).


In summary, while the two themes perform similarly across most metrics, the Dark Theme has a slight edge in terms of engaging users to click through. For other key performance indicators like Conversion Rate, Bounce Rate, and Scroll Depth, the choice between a Light Theme and a Dark Theme does not significantly affect user behaviour according to the data provided.
