# One-sided t-Test (different sample variances)

In [3]:
import numpy as np
import pandas as pd
import math
import scipy.stats as stats
import altair as alt
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')

In this case t-distribution not as close as an approximation to the sampling distribution

**Approach:** reducing degree of freedom

Since sample variances are not equal, a pooled estimate is not appropriate.

### Test-Statistik

$$ t = \frac { \left( \overline { x } _ { 1} - \overline { x } _ { 2} \right) - \left( \mu _ { 1} - \mu _ { 2} \right) } { \sqrt { \frac { s _ { 1} ^ { 2} } { N _ { 1} } + \frac { s _ { 2} ^ { 2} } { N _ { 2} } } }$$

### Data

In [33]:
machine_1 = np.array([24.58, 22.09, 23.70, 18.89, 22.02, 28.71, 24.44, 20.91, 23.83, 20.83])
machine_2 = np.array([21.61, 19.06, 20.72, 15.77, 19.0, 25.88, 21.44,17.85, 20.86, 17.77])
df = pd.DataFrame({'m1_prod_time':machine_1, 'm2_prod_time':machine_2}) 
df.index.name = 'day'
df

Unnamed: 0_level_0,m1_prod_time,m2_prod_time
day,Unnamed: 1_level_1,Unnamed: 2_level_1
0,24.58,21.61
1,22.09,19.06
2,23.7,20.72
3,18.89,15.77
4,22.02,19.0
5,28.71,25.88
6,24.44,21.44
7,20.91,17.85
8,23.83,20.86
9,20.83,17.77


In [5]:
statistic, pvalue = stats.ttest_ind(df['m1_prod_time'], df['m2_prod_time'], equal_var=False)

In [6]:
statistic

2.4438973586184982

In [8]:
pvalue

0.025065978041644801

What is the Degree of Freedom?
Using statsmodel to get back df.

In [30]:
from statsmodels.stats.weightstats import ttest_ind

In [38]:
tstat, pvalue, dof_unequal = ttest_ind(df['m1_prod_time'], df['m2_prod_time'],
                                           usevar='unequal')

In [39]:
dof_unequal

17.986253632973302

if variance is equal

In [40]:
tstat, pvalue, dof_pooled = ttest_ind(df['m1_prod_time'], df['m2_prod_time'],
                                           usevar='pooled')

In [41]:
dof_pooled

18.0