# Hypothesis Testing Notebook

This notebook goes over the ideas, methods, and example application of Hypothesis testing.

Hypothesis testing is a statistical method that uses a sample of data to determine if an assumption (hypothesis) about a population is true, or if it is more likely to be due to random chance.

This method is used to test theories, avoid bias, and make inferences about a population.

## The basic steps involve:

    1. State your assumption where:
    
        a. Null Hypothesis (H0) is the defualt assumption -> Meaning no significant arelationship, effect, or difference.
        
        b. Alt Hypothesis (Ha) is the opposite of the null -> Suggests there is a relationship or effect.

        
    2. Collect and analyze data


    3. Perform a statistical test (t-test, z-test)
    

    4. Make a decision (p-value or significance level)
    
        a. Reject H0
        
        b. Fail to reject H0


    
## An Example:

    Imagine a pharmaceutical company is testing a new medication to see if it lowers blood pressure. 
    
        Null Hypothesis (H0): The new medication does not lower blood pressure. 
    
        Alternative Hypothesis (Ha): The new medication does lower blood pressure. 
    
    They would then conduct a study, give some people the new medication and others a placebo, and collect blood pressure data. If the statistical test shows a significant difference, they could conclude that the medication is effective for the broader population, not just the sample group.


More info -> [GeekforGeeks](https://www.geeksforgeeks.org/software-testing/understanding-hypothesis-testing/)



In [1]:
import numpy as np
from scipy import stats

Firstly, import numpy and stats from scipy.

These libraries are used to organize data into arrays to then use stats for it's ttest_rel() method.

In [2]:
b = np.array([120, 122, 118, 130, 125, 128, 115, 121, 123, 119])
a = np.array([115, 120, 112, 128, 122, 125, 110, 117, 119, 114])

Then, set up some dummy data into the numpy arrays.

In [3]:
alpha = 0.05

Set the alpha -> significance level. This is the value that will show whether of not the outcome is siginicant enough to matter.

In [4]:
t_stat, p_val = stats.ttest_rel(a, b)

Run the ttest_rel method, passing in the arrays. This will return values for t_stat and p_val.

In [5]:
m = np.mean(a - b)
s = np.std(a - b, ddof=1)
n = len(b)
t_manual = m / (s / np.sqrt(n))

Calculate the mean of the arrays, standard deviation, get the length of array b, then calculate the t-stat manually given the values.

In [7]:
decision = "Reject" if p_val <= alpha else "Fail reject"
concl = "Significant difference." if decision == "Reject" else "No significant difference."

Set up logic to reject or fail to reject based on whether the returned p_val is below the set alpha threshold.

Conclusion based on the outcome of decision.

In [8]:
print("T:", t_stat)
print("P:", p_val)
print("T manual:", t_manual)
print(f"Decision: {decision} H0 at α={alpha}")
print("Conclusion:", concl)

T: -9.0
P: 8.538051223166285e-06
T manual: -9.0
Decision: Reject H0 at α=0.05
Conclusion: Significant difference.


Print the results to view/confirm thew values and outcome.