<a href="https://colab.research.google.com/github/udaisharma99/Hypothesis-Testing-Using-Python/blob/main/Hypothesis_Testing_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# *Importing the Python Libraries and Dataset*

In [2]:
import pandas as pd
from scipy.stats import ttest_ind

df = pd.read_csv('/content/website_ab_test.csv')

print(df.head())

         Theme  Click Through Rate  Conversion Rate  Bounce Rate  \
0  Light Theme            0.054920         0.282367     0.405085   
1  Light Theme            0.113932         0.032973     0.732759   
2   Dark Theme            0.323352         0.178763     0.296543   
3  Light Theme            0.485836         0.325225     0.245001   
4  Light Theme            0.034783         0.196766     0.765100   

   Scroll_Depth  Age   Location  Session_Duration Purchases Added_to_Cart  
0     72.489458   25    Chennai              1535        No           Yes  
1     61.858568   19       Pune               303        No           Yes  
2     45.737376   47    Chennai               563       Yes           Yes  
3     76.305298   58       Pune               385       Yes            No  
4     48.927407   25  New Delhi              1437        No            No  


# *Dataset Summary*

In [3]:
summary = {
    'Number of Records': df.shape[0],
    'Number of Columns': df.shape[1],
    'Missing Values': df.isnull().sum(),
    'Numerical Columns Summary': df.describe()
}

summary

{'Number of Records': 1000,
 'Number of Columns': 10,
 'Missing Values': Theme                 0
 Click Through Rate    0
 Conversion Rate       0
 Bounce Rate           0
 Scroll_Depth          0
 Age                   0
 Location              0
 Session_Duration      0
 Purchases             0
 Added_to_Cart         0
 dtype: int64,
 'Numerical Columns Summary':        Click Through Rate  Conversion Rate  Bounce Rate  Scroll_Depth  \
 count         1000.000000      1000.000000  1000.000000   1000.000000   
 mean             0.256048         0.253312     0.505758     50.319494   
 std              0.139265         0.139092     0.172195     16.895269   
 min              0.010767         0.010881     0.200720     20.011738   
 25%              0.140794         0.131564     0.353609     35.655167   
 50%              0.253715         0.252823     0.514049     51.130712   
 75%              0.370674         0.373040     0.648557     64.666258   
 max              0.499989         0.49891

In [4]:
# grouping data by theme and calculating mean values for the metrics
theme_performance = df.groupby('Theme').mean()

# sorting the data by conversion rate for a better comparison
theme_performance_sorted = theme_performance.sort_values(by='Conversion Rate', ascending=False)

print(theme_performance_sorted)

             Click Through Rate  Conversion Rate  Bounce Rate  Scroll_Depth  \
Theme                                                                         
Light Theme            0.247109         0.255459     0.499035     50.735232   
Dark Theme             0.264501         0.251282     0.512115     49.926404   

                   Age  Session_Duration  
Theme                                     
Light Theme  41.734568        930.833333  
Dark Theme   41.332685        919.482490  


  theme_performance = df.groupby('Theme').mean()


# *Getting Started with Hypothesis Testing*

1. Setting the stage for hypothesis testing: We'll set a significance level (alpha) of 0.05. This means we'll consider a result statistically significant if the p-value from our test is less than 0.05.
Formulating the hypotheses

2. We'll investigate the Conversion Rate between the Light Theme and Dark Theme.

- Null Hypothesis (H0): There's no difference in Conversion Rates between the Light and Dark Theme.

- Alternative Hypothesis (Ha): There's a difference in Conversion Rates between the Light and Dark Theme.

3. Choosing the statistical test
We'll employ a two-sample t-test to analyze the means of the two independent samples (light theme conversion rates and dark theme conversion rates).

In [12]:
# extracting conversion rates for both themes
conversion_rates_light = df[df['Theme'] == 'Light Theme']['Conversion Rate']
conversion_rates_dark = df[df['Theme'] == 'Dark Theme']['Conversion Rate']

# performing a two-sample t-test
t_stat, p_value = ttest_ind(conversion_rates_light, conversion_rates_dark, equal_var=False)

t_stat, p_value


(0.4748494462782632, 0.6349982678451778)

# *No significant difference in conversion rates found*

The two-sample t-test resulted in a p-value of approximately 0.635. This value is considerably higher than our chosen significance level of 0.05. Because the p-value is greater than 0.05, we fail to reject the null hypothesis. In other words, based on this data, we don't have statistically significant evidence to suggest a difference in conversion rates between the light theme and dark theme.

# *Shifting focus to Click-Through Rate (CTR)*

While the conversion rate analysis didn't reveal a significant difference, let's now investigate Click-Through Rate (CTR). This metric tells us how often users who see the ad actually click on it.

We'll maintain the same structure for our hypotheses:

- Null Hypothesis (H0): There's no difference in CTR between Light and Dark Theme.

- Alternative Hypothesis (Ha): There's a difference in CTR between Light and Dark Theme.

We'll employ a two-sample t-test again to compare the CTR means for the two themes.

In [13]:
# extracting click through rates for both themes
ctr_light = df[df['Theme'] == 'Light Theme']['Click Through Rate']
ctr_dark = df[df['Theme'] == 'Dark Theme']['Click Through Rate']

# performing a two-sample t-test
t_stat_ctr, p_value_ctr = ttest_ind(ctr_light, ctr_dark, equal_var=False)

t_stat_ctr, p_value_ctr


(-1.9781708664172253, 0.04818435371010704)

# *Click-Through Rate (CTR) reveals a potential difference*

The two-sample t-test for CTR between Light and Dark Theme yielded a p-value of approximately 0.048. This value is close to our significance level of 0.05.

Since the p-value is less than 0.05, we can reject the null hypothesis. This suggests there might be a statistically significant difference in CTR between the themes.


# *Bounce Rate and Scroll Depth*

So far, we've investigated conversion rate and click-through rate. Now, let's delve into two additional metrics crucial for website theme or design evaluation: bounce rate and scroll depth.

- Bounce Rate: This metric tells us the percentage of visitors who leave a webpage after viewing only one page. A high bounce rate can indicate that users aren't finding the information they need or the design isn't engaging.
- Scroll Depth: This metric measures how far down a webpage a user scrolls. A higher scroll depth suggests that users are interested in the content and potentially engaging further.

We'll conduct hypothesis tests for both bounce rate and scroll depth to gain a more comprehensive understanding of user behavior with the Light and Dark Themes. Following these tests, we'll create a summary table to consolidate all our findings.

In [15]:
# extracting bounce rates for both themes
bounce_rates_light = df[df['Theme'] == 'Light Theme']['Bounce Rate']
bounce_rates_dark = df[df['Theme'] == 'Dark Theme']['Bounce Rate']

# performing a two-sample t-test for bounce rate
t_stat_bounce, p_value_bounce = ttest_ind(bounce_rates_light, bounce_rates_dark, equal_var=False)

# extracting scroll depths for both themes
scroll_depth_light = df[df['Theme'] == 'Light Theme']['Scroll_Depth']
scroll_depth_dark = df[df['Theme'] == 'Dark Theme']['Scroll_Depth']

# performing a two-sample t-test for scroll depth
t_stat_scroll, p_value_scroll = ttest_ind(scroll_depth_light, scroll_depth_dark, equal_var=False)

# creating a table for comparison
comparison_table = pd.DataFrame({
    'Metric': ['Click Through Rate', 'Conversion Rate', 'Bounce Rate', 'Scroll Depth'],
    'T-Statistic': [t_stat_ctr, t_stat, t_stat_bounce, t_stat_scroll],
    'P-Value': [p_value_ctr, p_value, p_value_bounce, p_value_scroll]
})



# *Table comparing the performance of the Light Theme and Dark Theme across various metrics based on hypothesis testing*

In [16]:
comparison_table

Unnamed: 0,Metric,T-Statistic,P-Value
0,Click Through Rate,-1.978171,0.048184
1,Conversion Rate,0.474849,0.634998
2,Bounce Rate,-1.201888,0.229692
3,Scroll Depth,0.756228,0.449692


Summary:

The analysis reveals a few key insights:

- Click-Through Rate (CTR): Users are more likely to click on elements presented in the Dark Theme (statistically significant difference).

- Conversion Rate, Bounce Rate, and Scroll Depth: These metrics show no statistically significant difference between Light and Dark Themes. This suggests user behavior is similar for both themes in terms of completing desired actions, leaving the webpage quickly, and scrolling through content.

Overall, while the Dark Theme appears to have a slight edge in CTR, user behavior seems comparable for other key website performance metrics.