In [1]:
# Pandas library for data manipulation.
import pandas as pd

# Scipy's stats module for statistical functions and tests.
from scipy import stats

# Statsmodels' weightstats module for weighted statistical tests.
from statsmodels.stats import weightstats as stests

## T-Test

A T-Test is a statistical method used to compare means of two groups and determine if their differences are statistically significant.

In [2]:
# Reading a CSV file named '50SP.csv' from the 'Dataset' folder into a pandas DataFrame
df = pd.read_csv('Dataset/BP.csv')
df.head()

Unnamed: 0,patient,sex,agegrp,bp_before,bp_after
0,1,Male,30-45,143,153
1,2,Male,30-45,163,170
2,3,Male,30-45,153,168
3,4,Male,30-45,153,142
4,5,Male,30-45,146,141


In [3]:
df[['bp_before', 'bp_after']].describe()

Unnamed: 0,bp_before,bp_after
count,120.0,120.0
mean,156.45,151.358333
std,11.389845,14.177622
min,138.0,125.0
25%,147.0,140.75
50%,154.5,149.5
75%,164.0,161.0
max,185.0,185.0


In [4]:
df.shape

(120, 5)

## P-val

The p-Value represents the probability of observing a test statistic as extreme as the one calculated, assuming the null hypothesis is true.

If p-value < 0.05, we reject H0 and support H1.
This means we have evidence of a significant change.

In [5]:
ttest,pval = stats.ttest_rel(df['bp_before'], df['bp_after'])
print(pval)

0.0011297914644840823


Paired sampled t-test The paired sample t-test is also called dependent sample t-test. It's an uni variate test that tests for a significant difference between 2 related variables. An example of this is if you where to collect the blood pressure for an individual before and after some treatment, condition, or time point.

- Null Hypothesis (H0): The mean difference between the two samples is zero, indicating that there is no significant effect or difference between them.

- Alternative Hypothesis (H1): The mean difference between the two samples is not zero, indicating that there is a significant effect, difference, or relationship between them.

In [6]:
if pval<0.05:
    print("reject hull hypothesis")
else :
    print("accept hull hypothesis")

reject hull hypothesis


The final result of the code is "reject hull hypothesis," meaning that the null hypothesis is rejected. This indicates that there is **enough** statistical evidence to conclude that there is a **significant** difference between the two tested groups.

# Another Example

## Golf

In [7]:
# Reading a CSV file named '50SP.csv' from the 'Dataset' folder into a pandas DataFrame
df = pd.read_csv('Dataset/Golf.csv')
df.head()

Unnamed: 0,Current,New
0,264,277
1,261,269
2,267,263
3,272,266
4,258,262


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   Current  40 non-null     int64
 1   New      40 non-null     int64
dtypes: int64(2)
memory usage: 772.0 bytes


In [9]:
df[['Current', 'New']].describe()

Unnamed: 0,Current,New
count,40.0,40.0
mean,270.275,267.5
std,8.752985,9.896904
min,255.0,250.0
25%,263.0,262.0
50%,270.0,265.0
75%,275.25,274.5
max,289.0,289.0


In [10]:
df.head()

Unnamed: 0,Current,New
0,264,277
1,261,269
2,267,263
3,272,266
4,258,262


In [11]:
ttest,pval = stats.ttest_rel(df['Current'], df['New'])
print(pval)

0.20916361823147053


In [12]:
if pval<0.05:
    print("reject hull hypothesis")
else :
    print("accept hull hypothesis")

accept hull hypothesis


The final result of the code is "accept hull hypothesis," indicating that the null hypothesis is accepted. This suggests that there is **not enough** statistical evidence to conclude that there is a significant difference between the two tested groups.