<a href="https://colab.research.google.com/github/zakariajaadi/data-science-portofolio/blob/main/Student's%20t-test%20Hypothesis%20Testing%20with%20Scipy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Introduction

## Defintion of t-test
The t-statistic (or Student's t-test) is a statistical test used to assess whether the means of one or two groups are significantly different from each other. It's particularly useful when the sample size is small (n<30) or when the population variance is unknown (which is very common in practice).  While it can be used with small samples, the t-test is valid for any sample size, large or small, as long as the underlying assumptions are met (more on this below).  It's a useful alternative to the z-test, which requires the population variance to be known.
## Types of t-test

There are different types of t-tests:

1. **One-Sample t-test:** Used to compare the mean of a single sample to a known or hypothesized value (e.g., a population mean or a target value).

2. **Independent Samples t-Test:** Used to compare the means of two independent groups.  There are two versions of this test:
  1. **pooled variance t-test** one that assumes equal variances between the two groups
  2. **Welch's t-test:** does not assume equal variances between the two groups

  The choice between these two depends on whether the variances of the two groups are equal or not.

3. **Paired Sample t-Test (Dependent t-Test)**: Used when comparing the means of two related groups or the same group at different times (e.g., before and after an intervention).  This test analyzes the differences between the paired measurements."

# 2- One sample t-test implementation with scipy

In [None]:
import numpy as np
import scipy.stats as stats

# 1. Your class's test scores (replace with your actual data)
class_scores = np.array([78, 82, 90, 70, 75, 85, 68, 92, 77, 80, 88, 73, 79, 81, 83, 76, 87, 72, 95, 65, 89, 74, 91, 69, 84])

# 2. National average (population mean)
national_average = 75


# H0 (Null Hypothesis): The class's mean score is equal to the national average.
# H1 (Alternative Hypothesis): The class's mean score is different from the national average.

# 3. Perform the one-sample t-test
t_statistic, p_value = stats.ttest_1samp(class_scores, national_average)
# 4. Print the results
print(f"T-statistic: {t_statistic:.2f}")
print(f"P-value: {p_value:.3f}")

# 5. Interpret the results
alpha = 0.05  # Significance level

print("two tailed :")
if p_value < alpha:
    print("The class performed significantly differently from the national average.")
    if np.mean(class_scores)> national_average:
        print("The class performed significantly better than the national average.")
    else:
        print("The class performed significantly worse than the national average.")
else:
    print("There is no significant difference between the class's performance and the national average.")



T-statistic: 3.10
P-value: 0.005
two tailed :
The class performed significantly differently from the national average.
The class performed significantly better than the national average.


Explanation:

**Data**:  class_scores is a NumPy array containing the test scores of your students.  Replace these with your actual data.

**Population Mean**: national_average stores the known population mean (75).

**stats.ttest_1samp()**:  This function from scipy.stats performs the one-sample t-test. It takes the sample data and the population mean as arguments.

**Results**: The function returns the t-statistic and the p-value.

**Interpretation**:  We compare the p-value to our significance level (alpha, usually 0.05).  If the p-value is less than alpha, we reject the null hypothesis and conclude that there's a statistically significant difference between the class's performance and the national average.


# 2- One sample t-test implementation with scipy