# Understanding Hypothesis Testing - Python example

by Alfonso Cervantes Barragán  
Mathematician & Data Scientist,    
Candidate to phD of natural Science



### Introduction
Hypothesis testing is a cornerstone of statistical analysis, enabling researchers, analysts, and decision-makers to draw meaningful conclusions from sample data. This powerful technique is widely used across diverse fields, from medicine and psychology to business and public policy. However, the technical jargon and mathematical complexity often make hypothesis testing seem daunting, especially for those without a strong statistical background.

This article aims to demystify the concept of hypothesis testing, breaking down the key principles and steps in a clear and accessible manner. By the end, you'll have a solid understanding of how hypothesis testing works, its practical applications, and the steps involved in conducting a hypothesis test. Whether you're a student, a professional, or simply someone with a curious mind, this guide will equip you with the knowledge and confidence to navigate the world of data-driven insights and make more informed decisions based on rigorous statistical analysis.


### What is a Hypothesis?
A hypothesis is a statement that can be tested. It’s an assumption or claim about a population parameter (like a mean or proportion). 



### Business Analytics Example
In business analytics, hypothesis testing can help answer questions such as:

**Marketing Campaigns**: Companies can use hypothesis testing to determine if a new marketing campaign had a significant effect on sales. The null hypothesis would be that the campaign had no impact, while the alternative is that it increased sales. By collecting sales data before and after the campaign and performing a statistical test like a t-test, the company can assess whether the difference in mean sales is statistically significant, allowing them to conclude if the campaign had a meaningful positive impact.

**Customer Satisfaction**: When launching a new product, businesses can use hypothesis testing to understand if the average customer satisfaction rating is different from their previous offerings. The null hypothesis would be that the mean satisfaction is the same as past products, while the alternative is that it is higher or lower. The company can survey customers, calculate the mean satisfaction score, and then use a one-sample t-test to compare this to the historical average, enabling them to determine if the new product has a significantly different customer satisfaction rating.

**Operational Efficiency**: Hypothesis testing can also help companies evaluate the impact of process improvements on operational efficiency. If a manufacturing company implements a new production process aimed at increasing efficiency, they can test the null hypothesis that the mean production time per unit is unchanged against the alternative that the mean time has decreased. By collecting production time data before and after the change and performing a two-sample t-test, the company can assess whether the difference in mean times is statistically significant, allowing them to conclude if the new process improved operational efficiency.



### Types of hypotheses 

There are two types of hypotheses in hypothesis testing:
1. **Null Hypothesis (H0)**: This is the default assumption that there is no effect or no difference. It represents the status quo .
2. **Alternative Hypothesis (H1 or Ha)**: This is what you want to prove. It suggests that there is an effect or a difference .


### Example to Illustrate
Imagine you run a coffee shop and claim that your average customer satisfaction rating is 4.5 out of 5. A competitor doubts this and wants to test your claim.

- **Null Hypothesis (H0)**: The average customer satisfaction rating is 4.5.
- **Alternative Hypothesis (H1)**: The average customer satisfaction rating is not 4.5.

#### Steps in Hypothesis Testing
1. **State the Hypotheses**: Clearly define your null and alternative hypotheses .

2. **Collect Data**: Gather a sample of data relevant to your hypothesis. For our example, you could collect satisfaction ratings from a sample of customers .

3. **Choose a Significance Level (α)**: This is the probability of rejecting the null hypothesis when it is actually true. Common values are 0.05 (5%) or 0.01 (1%).

4. **Calculate a Test Statistic**: This is a standardized value derived from your sample data, which is used to determine whether to reject the null hypothesis. Examples include the t-statistic, z-statistic, or chi-square statistic.

5. **Determine the P-Value**: The p-value indicates the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct .

6. **Make a Decision**:
   - If the p-value is less than or equal to the significance level (α), reject the null hypothesis (Rumsey, 2003).
   - If the p-value is greater than α, do not reject the null hypothesis .



### Example: Customer Satisfaction

Let's say you own a coffee shop and you believe your average customer satisfaction rating is 4.5 out of 5. You decide to test this hypothesis by surveying 30 customers and collecting their satisfaction ratings.

1. **State the Hypotheses**:
   - Null Hypothesis (H0): The average customer satisfaction rating is 4.5.
   - Alternative Hypothesis (H1): The average customer satisfaction rating is not 4.5.   

2. **Collect Data**: After surveying 30 customers, you obtain the following ratings (out of 5): 4.6, 4.7, 4.5, 4.8, 4.2, 4.5, 4.7, 4.9, 4.3, 4.4, 4.6, 4.7, 4.8, 4.9, 4.2, 4.5, 4.3, 4.1, 4.6, 4.7, 4.8, 4.9, 4.2, 4.4, 4.5, 4.6, 4.7, 4.3, 4.2, 4.5.   


3. **Choose a Significance Level (α)**: You choose a significance level of 0.05 (5%).   


4. **Calculate a Test Statistic**: Using statistical software or formulas, you calculate the test statistic. In this case, let's use a one-sample t-test to determine if the mean rating is different from 4.5.   


### Understanding the T-Statistic
The t-statistic is a ratio that compares the difference between the sample mean and the hypothesized population mean relative to the variability of the sample data. It helps determine how far your sample mean is from the hypothesized mean, in units of standard error. A larger absolute value of the t-statistic indicates a larger difference between the sample mean and the hypothesized mean.

The formula for the t-statistic in a one-sample t-test is:

$$ t = \frac{\bar{x} - \mu}{s / \sqrt{n}} $$

where:
- $\bar{x}$ is the sample mean.
- $\mu$ is the hypothesized population mean (4.5 in this example).
- $s$ is the sample standard deviation.
- $n$ is the sample size.

#### Step-by-Step Calculation
1. **Calculate the sample mean $\bar{x}$**:
   $$
   \bar{x} = \frac{\sum{x_i}}{n}
   $$
   For the given ratings: 
   $$
   \bar{x} = \frac{4.6 + 4.7 + 4.5 + \cdots + 4.5}{30} = 4.53
   $$

2. **Calculate the sample standard deviation $s$**:
   $$
   s = \sqrt{\frac{\sum{(x_i - \bar{x})^2}}{n-1}}
   $$
   For the given ratings, let's assume:
   $$ 
   s \approx 0.25
   $$

3. **Calculate the t-statistic**:
   $$
   t = \frac{4.53 - 4.5}{0.25 / \sqrt{30}} = 0.66
   $$

4. **Interpretation**:

The t-statistic value indicates how far the sample mean is from the hypothesized population mean in units of the standard error. In this example, a t-statistic of 0.66 means that the sample mean is 0.66 standard errors above the hypothesized mean of 4.5. 

A t-statistic closer to 0 suggests that the sample mean is close to the hypothesized mean, while a larger absolute value indicates a greater difference. Whether this difference is statistically significant depends on the corresponding p-value.


### Python Example
Here's a simple example in Python to illustrate hypothesis testing. We'll use a one-sample t-test to test if the average customer satisfaction rating is different from 4.5.Imagine you are a teacher and you believe that your class's average score on a math test is 75. To test this, you randomly select 10 students and find their average score to be 78 with a standard deviation of 4. You then use a t-test to see if the average score of these 10 students is significantly different from 75.


1. **Import Libraries**: 

NumPy is a fundamental library for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices, along with a vast collection of high-level mathematical functions to operate on these arrays efficiently, enabling vectorized operations and integration with other libraries for data manipulation and analysis.


SciPy is a library that builds on NumPy, providing user-friendly and efficient numerical routines for scientific and technical computing in Python, including functions for numerical integration, interpolation, optimization, linear algebra, and statistics, particularly the stats module which contains a large number of probability distribution functions and statistical functions for calculating measures like mean, median, standard deviation, and performing various statistical tests and analyses.

In [18]:
import numpy as np
from scipy import stats

2. **Sample Data**:
  

This list contains the customer satisfaction ratings collected from 30 customers.

In [19]:
ratings = [4.6, 4.7, 4.5, 4.8, 4.2, 4.5, 4.7, 4.9, 4.3, 4.4, 
              4.6, 4.7, 4.8, 4.9, 4.2, 4.5, 4.3, 4.1, 4.6, 4.7, 
              4.8, 4.9, 4.2, 4.4, 4.5, 4.6, 4.7, 4.3, 4.2, 4.5]

3. **Hypotheses**:
   - Null hypothesis: The mean rating is 4.5.
   - Alternative hypothesis: The mean rating is not 4.5.


4. **Calculate Sample Mean and Standard Deviation**:


In [20]:
mean_rating = np.mean(ratings)
std_dev = np.std(ratings, ddof=1)  # Use ddof=1 for sample standard deviation

5. **Perform the t-test**:

The `stats.ttest_1samp` function from the SciPy `stats` module is used to perform a one-sample t-test

The formula for the one-sample t-test is:

$$t = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}}$$

Where:
- $\bar{x}$ is the sample mean
- $\mu_0$ is the hypothesized population mean
- $s$ is the sample standard deviation
- $n$ is the sample size

The t-statistic calculated by this formula follows a t-distribution with $n-1$ degrees of freedom.

The `stats.ttest_1samp` function takes the following arguments:

- `a`: the sample data, as a 1D array-like
- `popmean`: the hypothesized population mean
- `axis`: the axis along which to compute the test (default is 0, meaning the test is performed on the entire sample)
- `nan_policy`: how to handle NaN values in the input (default is 'propagate', which keeps NaN values)

The function returns two values:

1. The t-statistic
2. The two-tailed p-value



In [28]:
t_statistic, p_value = stats.ttest_1samp(ratings, 4.5)

6. **Significance Level**:
  


In [29]:
alpha = 0.05

 We set the significance level (α) to 0.05.

7. **Print the Results**:
 


In [30]:
print("Sample Mean:", mean_rating)
print("Standard Deviation:", std_dev)
print("T-statistic:", t_statistic)
print("P-value:", p_value)

Sample Mean: 4.536666666666666
Standard Deviation: 0.23559657706859155
T-statistic: 0.8524385494789603
P-value: 0.4009561791598829


**Observations** 

The sample mean of 4.53 is not statistically different from the hypothesized mean, as indicated by the t-statistic of 0.85 and the corresponding p-value of 0.40, which is greater than the typical significance level of 0.05. this suggests that the difference between the sample mean and the hypothesized mean is likely due to chance, given the standard deviation of 0.23 and the sample size. This means there is not enough statistical evidence to conclude that the sample mean is significantly different from the hypothesized mean. The relatively high p-value indicates that the observed difference between the sample mean and the hypothesized mean could have reasonably occurred by chance, given the standard deviation and sample size. There is not strong enough evidence to rule out the possibility that the sample mean and hypothesized mean are actually the same.


8. **Conclusion**:
  

If the p-value is less than 0.05, we reject the null hypothesis, indicating that the average rating is different from 4.5. Otherwise, we do not reject the null hypothesis.

In [32]:
if p_value < alpha:
       print("Reject the null hypothesis: The average rating is different from 4.5.")
else:
       print("Fail to reject the null hypothesis: There is not enough evidence to say the average rating is different from 4.5.")

Fail to reject the null hypothesis: There is not enough evidence to say the average rating is different from 4.5.


### Glossary
- **Null Hypothesis (H0)**: The default assumption that there is no effect or difference.
- **Alternative Hypothesis (H1)**: The hypothesis that there is an effect or difference.
- **Significance Level (α)**: The threshold for rejecting the null hypothesis.
- **P-Value**: The probability of obtaining results at least as extreme as the observed results, assuming the null hypothesis is true.
- **Test Statistic**: A standardized value used in hypothesis testing to determine whether to reject the null hypothesis.
- **T-Statistic**: A ratio that compares the difference between the sample mean and the hypothesized population mean relative to the variability of the sample data.


### References

The following references provide additional details and examples on hypothesis testing in business analytics:

- Devore, J. L. (2011). Probability and Statistics for Engineering and the Sciences . Boston, MA: Cengage Learning (pp. 508-510). ISBN 978-0-538-73352-6. 


Additionally, the following online resources provide practical guidance and case studies on using hypothesis testing in business analytics:

[1] https://u-next.com/blogs/business-analytics/hypothesis-testing-in-business-analytics-a-beginners-guide/  
[2] https://www.youtube.com/watch?v=7xM6RQ2RRV8  
[3] https://bookdown.org/cuborican/RE_STAT/hypothesis-testing.html  
[4] https://online.hbs.edu/blog/post/hypothesis-testing  
[5] https://openstax.org/books/introductory-business-statistics-2e/pages/9-4-full-hypothesis-test-examples  

