<center> <img src="res/ds3000.png"> </center>

<center> <h1> Week 9 - Day 2 </h1> </center>

<center> <h2> Part 1: Paired-Samples t Test </h2></center>

## Outline
1. <a href='#1'>Dataset from a Within-Subjects Experiment</a>
2. <a href='#2'>Exploratory Data Analysis</a>
3. <a href='#3'>Paired-Samples t Test</a>
4. <a href='#4'>Assumption Checks</a>
5. <a href='#5'>Reporting the Results</a>

<a id="1"></a>

## 1. Dataset from a Within-Subjects Experiment

In [1]:
from scipy import stats

In [2]:
import pandas as pd

In [3]:
data = pd.read_csv("Day 2_res_wand_candles_data_paired.csv")

FileNotFoundError: File b'res/wand_candles_data_paired.csv' does not exist

In [None]:
data

<a id="2"></a>

## 2. Exploratory Data Analysis
* Involves checking the descriptive stats and visualizing the data before conducting the test

### 2.1. Descriptive Statistics

In [None]:
descriptives = data.agg(["count", "mean", "std", "sem"])
descriptives

In [None]:
descriptives = descriptives.T.drop("Participant")
descriptives

### 2.2. Visualizing the Data

In [None]:
descriptives

In [None]:
descriptives.reset_index(inplace=True)
descriptives

In [None]:
import plotly.express as plt
graph = plt.bar(descriptives, x = "index", y = "mean", error_x = "sem", error_y = "sem", template='none', width=500, 
                labels = {"mean": "Number of Candles", "index": "Wand Used"})

graph.update_traces(marker_color="#FFF")
graph.update_traces(marker= dict(line={"width":3,"color":"#000000"}))

graph.update_xaxes(title_font={"size":16}, tickfont = {"size":14, "color":"gray"})
graph.update_yaxes(title_font={"size":16}, tickfont = {"size":14, "color":"gray"})


graph.show()

<a id="3"></a>

## 3. Paired-Samples t Test
* Use the **ttest_rel()** method available in SciPy's stats module
* **ttest_rel()** accepts two sequence-like objects (lists, Series, etc) corresponding to the distribution of scores in each group being compared
    * **ttest_rel(group_a, group_b)**
* **ttest_rel()** returns a tuple containing the calculated t statistic and p-value
* https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html#scipy.stats.ttest_rel

In [None]:
stats.ttest_rel?

In [None]:
data.head()

In [None]:
stats.ttest_rel(data["Elder_Wand"], data["Regular_Wand"])

In [None]:
results = stats.ttest_rel(data["Elder_Wand"], data["Regular_Wand"])

In [None]:
#t value
tstatistic = results[0]
tstatistic

In [None]:
#p value in scientific notation
pvalue = results[1]
format(pvalue, '.10f')

### 3.3. Degrees of Freedom
* Unfortunately, the ttest_rel() method does not provide the degrees of freedom (df) value. 
* We can calculate it ourselves though!
* For an paired-samples t test, df is calculated as follows:
    * df = n - 1

In [None]:
df = len(data) - 1

<a id="4"></a>

## 4. Assumption Checks
* Paired-samples t test makes one assumption:
    * Assumption of normality


### 4.2. Checking for Normality
* Shapiro-Wilk Test of Normality
    * Use the **shapiro()** method in SciPy's stats module
    * https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html#scipy.stats.shapiro

* **shapiro()** returns a tuple containing the results of the assumption check
    * You want non-significant results from assumption checks (p > .05)
    
* Need to pass in the entire distribution of the mean differences between the two paired-scores

In [None]:
mean_difference = data["Regular_Wand"] - data["Elder_Wand"] 
mean_difference

In [None]:
shapiro_results = stats.shapiro(mean_difference)
shapiro_results

<a id="5"></a>

  
## 5. Reporting the Results
* Report
    * descriptives
    * assumption checks
    * t statistic, degrees of freedom, and p-value
    * a bar graph

In [None]:
descriptives

In [None]:
tstatistic, pvalue, df

In [None]:
shapiro_results

In [None]:
graph.show()