<center> <img src="res/ds3000.png"> </center>

<center> <h1> Week 9 - Day 2 </h1> </center>

<center> <h2> Part 1: Paired-Samples t Test </h2></center>

## Outline
1. <a href='#1'>Dataset from a Within-Subjects Experiment</a>
2. <a href='#2'>Exploratory Data Analysis</a>
3. <a href='#3'>Paired-Samples t Test</a>
4. <a href='#4'>Assumption Checks</a>
5. <a href='#5'>Reporting the Results</a>

<a id="1"></a>

## 1. Dataset from a Within-Subjects Experiment

In [2]:
from scipy import stats

In [3]:
import pandas as pd

In [4]:
data = pd.read_csv("res/wand_candles_data_paired.csv")

In [5]:
data

Unnamed: 0,Participant,Elder_Wand,Regular_Wand
0,P01,17,16
1,P02,18,12
2,P03,20,14
3,P04,22,17
4,P05,22,19
5,P06,21,17
6,P07,23,16
7,P08,19,14
8,P09,19,15
9,P10,20,16


<a id="2"></a>

## 2. Exploratory Data Analysis
* Involves checking the descriptive stats and visualizing the data before conducting the test

### 2.1. Descriptive Statistics

In [39]:
descriptives = data.agg(["count", "mean", "std", "sem"])
descriptives

Unnamed: 0,count,mean,std,sem
Elder_Wand,20.0,21.1,2.174009,0.486123
Regular_Wand,20.0,15.6,1.788854,0.4


In [None]:
descriptives = descriptives.T.drop("Participant")
descriptives

### 2.2. Visualizing the Data

In [43]:
descriptives

Unnamed: 0,index,count,mean,std,sem
0,Elder_Wand,20.0,21.1,2.174009,0.486123
1,Regular_Wand,20.0,15.6,1.788854,0.4


In [45]:
descriptives.reset_index(inplace=True)
descriptives

Unnamed: 0,level_0,index,count,mean,std,sem
0,0,Elder_Wand,20.0,21.1,2.174009,0.486123
1,1,Regular_Wand,20.0,15.6,1.788854,0.4


In [46]:
import plotly.express as plt
graph = plt.bar(descriptives, x = "index", y = "mean", error_x = "sem", error_y = "sem", template='none', width=500, 
                labels = {"mean": "Number of Candles", "index": "Wand Used"})

graph.update_traces(marker_color="#FFF")
graph.update_traces(marker= dict(line={"width":3,"color":"#000000"}))

graph.update_xaxes(title_font={"size":16}, tickfont = {"size":14, "color":"gray"})
graph.update_yaxes(title_font={"size":16}, tickfont = {"size":14, "color":"gray"})


graph.show()

<a id="3"></a>

## 3. Paired-Samples t Test
* Use the **ttest_rel()** method available in SciPy's stats module
* **ttest_rel()** accepts two sequence-like objects (lists, Series, etc) corresponding to the distribution of scores in each group being compared
    * **ttest_rel(group_a, group_b)**
* **ttest_rel()** returns a tuple containing the calculated t statistic and p-value
* https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html#scipy.stats.ttest_rel

In [6]:
stats.ttest_rel?

In [7]:
data.head()

Unnamed: 0,Participant,Elder_Wand,Regular_Wand
0,P01,17,16
1,P02,18,12
2,P03,20,14
3,P04,22,17
4,P05,22,19


In [8]:
stats.ttest_rel(data["Elder_Wand"], data["Regular_Wand"])

Ttest_relResult(statistic=9.093835369447739, pvalue=2.3758447924868284e-08)

In [10]:
results = stats.ttest_rel(data["Elder_Wand"], data["Regular_Wand"])

In [11]:
#t value
tstatistic = results[0]
tstatistic

9.093835369447739

In [15]:
#p value in scientific notation
pvalue = results[1]
format(pvalue, '.10f')

'0.0000000238'

### 3.3. Degrees of Freedom
* Unfortunately, the ttest_rel() method does not provide the degrees of freedom (df) value. 
* We can calculate it ourselves though!
* For an paired-samples t test, df is calculated as follows:
    * df = n - 1

In [17]:
df = len(data) - 1

<a id="4"></a>

## 4. Assumption Checks
* Paired-samples t test makes one assumption:
    * Assumption of normality


### 4.2. Checking for Normality
* Shapiro-Wilk Test of Normality
    * Use the **shapiro()** method in SciPy's stats module
    * https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html#scipy.stats.shapiro

* **shapiro()** returns a tuple containing the results of the assumption check
    * You want non-significant results from assumption checks (p > .05)
    
* Need to pass in the entire distribution of the mean differences between the two paired-scores

In [6]:
mean_difference = data["Regular_Wand"] - data["Elder_Wand"] 
mean_difference

0     -1
1     -6
2     -6
3     -5
4     -3
5     -4
6     -7
7     -5
8     -4
9     -4
10    -6
11    -4
12    -8
13    -7
14   -13
15    -5
16    -1
17    -7
18    -5
19    -9
dtype: int64

In [7]:
shapiro_results = stats.shapiro(mean_difference)
shapiro_results

(0.9318200349807739, 0.1673773229122162)

<a id="5"></a>

  
## 5. Reporting the Results
* Report
    * descriptives
    * assumption checks
    * t statistic, degrees of freedom, and p-value
    * a bar graph

In [176]:
descriptives

Unnamed: 0,Group,count,mean,std,sem
0,Elder Wand,20,21.1,2.174009,0.486123
1,Regular Wand,20,15.6,1.788854,0.4


In [83]:
tstatistic, pvalue, df

(8.736590939739596, 1.2666542976485636e-10, 38)

In [86]:
shapiro_results

(0.9628366827964783, 0.20917081832885742)

In [177]:
graph.show()