<center> <img src="res/ds3000.png"> </center>

<center> <h1> Week 9 - Day 2 </h1> </center>

<center> <h2> Part 1: Independent-Samples t Test </h2></center>

## Outline
1. <a href='#1'>SciPy</a>
2. <a href='#2'>Exploratory Data Analysis</a>
3. <a href='#3'>Independent-Samples t Test</a>
4. <a href='#4'>Assumption Checks</a>
5. <a href='#5'>Reporting the Results</a>

<a id="1"></a>

## 1. SciPy
* Fundamental library for scientific computing
    * https://docs.scipy.org/doc/scipy/reference/
* SciPy has a special module, stats, dedicated to common statistical tests used in data analysis
    * https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html

In [88]:
from scipy import stats

In [4]:
import pandas as pd

### 1.1. Dataset from a Between-Subjects Experiment

In [120]:
data = pd.read_csv("res/wand_candles_data.csv")

In [121]:
data

Unnamed: 0,Participant,Group,Candles
0,P01,Elder Wand,17
1,P02,Elder Wand,18
2,P03,Elder Wand,20
3,P04,Elder Wand,22
4,P05,Regular Wand,16
5,P06,Regular Wand,12
6,P07,Elder Wand,22
7,P08,Elder Wand,21
8,P09,Elder Wand,23
9,P10,Elder Wand,19


<a id="2"></a>

## 2. Exploratory Data Analysis
* Involves checking the descriptive stats and visualizing the data before conducting the test

### 2.1. Descriptive Statistics

In [169]:
descriptives = data.groupby("Group").agg(["count", "mean", "std", "sem"])
descriptives

Unnamed: 0_level_0,Candles,Candles,Candles,Candles
Unnamed: 0_level_1,count,mean,std,sem
Group,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Elder Wand,20,21.1,2.174009,0.486123
Regular Wand,20,15.6,1.788854,0.4


### 2.2. Visualizing the Data

In [170]:
descriptives = descriptives["Candles"]
descriptives

Unnamed: 0_level_0,count,mean,std,sem
Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Elder Wand,20,21.1,2.174009,0.486123
Regular Wand,20,15.6,1.788854,0.4


In [171]:
descriptives.reset_index(inplace=True)

In [175]:
import plotly.express as plt
graph = plt.bar(descriptives, x = "Group", y = "mean", error_x = "sem", error_y = "sem", template='none', width=500, 
                labels = {"mean": "Number of Candles", "Group": "Wand Used"})

graph.update_traces(marker_color="#FFF")
graph.update_traces(marker= dict(line={"width":3,"color":"#000000"}))

graph.update_xaxes(title_font={"size":16}, tickfont = {"size":14, "color":"gray"})
graph.update_yaxes(title_font={"size":16}, tickfont = {"size":14, "color":"gray"})


graph.show()

<a id="3"></a>

## 3. Independent-Samples t Test
* Use the **ttest_ind()** method available in SciPy's stats module
* **ttest_ind()** accepts two sequence-like objects (lists, Series, etc) corresponding to the distribution of scores in each group being compared
    * **ttest_ind(group_a, group_b)**
* **ttest_ind()** returns a tuple containing the calculated t statistic and p-value
* https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [14]:
stats.ttest_ind?

In [17]:
elder_wand = data[data["Group"] == "Elder Wand"]["Candles"]
elder_wand

In [19]:
regular_wand = data[data["Group"] == "Regular Wand"]["Candles"]
regular_wand

In [45]:
stats.ttest_ind(elder_wand, regular_wand)

Ttest_indResult(statistic=8.736590939739596, pvalue=1.2666542976485636e-10)

### 3.1. t Test Results
* ttest_ind() method returns a tuple containing the calculated t statistic and p-value
    * the first element of the tuple is the t statistic
    * the second element of the tuple is the p-value

In [31]:
results = stats.ttest_ind(elder_wand, regular_wand)

In [52]:
#t value
tstatistic = results[0]
tstatistic

8.736590939739596

In [54]:
#p value in scientific notation
pvalue = results[1]
pvalue

1.2666542976485636e-10

### 3.2. Converting the p-value to float
* When p-value is extremely small, it is returned in scientific notation format.
* Can change this format using the built-in **format()** method

In [35]:
format(1.243345543049043, ".2f")

'1.24'

In [57]:
format(pvalue, '.10f')

'0.0000000001'

### 3.3. Degrees of Freedom
* Unfortunately, the ttest_ind() method does not provide the degrees of freedom (df) value. 
* We can calculate it ourselves though!
* For an independent-samples t test, df is calculated as follows:
    * df = n1 + n2 - 2

In [49]:
df = len(elder_wand) + len(regular_wand) - 2
df

In [70]:
def report_independent_t (t, p, df):
    print("t(%d)=%.2f, p=%.3f" % (df, t, p))

In [71]:
report_independent_t(tstatistic, pvalue, df)

t(38)=8.74, p=0.000


<a id="4"></a>

## 4. Assumption Checks
* Independent-samples t test makes two assumptions:
    * Assumption of equality of variances
    * Assumption of normality


### 4.1. Checking for Equality of Variances
* Levene’s Test of Equality of Variances
    * Use the **levene()** method in SciPy's stats module
    * https://docs.scipy.org/doc/scipy/reference/stats.html
 
* **levene()** returns a tuple containing the results of the assumption check
    * You want non-significant results from assumption checks (p > .05)

In [80]:
levene_results = stats.levene(elder_wand, regular_wand)
levene_results

LeveneResult(statistic=0.9296636085626906, pvalue=0.34104653698915954)

### 4.2. Checking for Normality
* Shapiro-Wilk Test of Normality
    * Use the **shapiro()** method in SciPy's stats module
    * https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html#scipy.stats.shapiro

* **shapiro()** returns a tuple containing the results of the assumption check
    * You want non-significant results from assumption checks (p > .05)
    
* Need to pass in the entire distribution of the scores/data

In [82]:
shapiro_results = stats.shapiro(data["Candles"])
shapiro_results

(0.9628366827964783, 0.20917081832885742)

<a id="5"></a>

  
## 5. Reporting the Results
* Report
    * descriptives
    * assumption checks
    * t statistic, degrees of freedom, and p-value
    * a bar graph

In [176]:
descriptives

Unnamed: 0,Group,count,mean,std,sem
0,Elder Wand,20,21.1,2.174009,0.486123
1,Regular Wand,20,15.6,1.788854,0.4


In [83]:
tstatistic, pvalue, df

(8.736590939739596, 1.2666542976485636e-10, 38)

In [85]:
levene_results

LeveneResult(statistic=0.9296636085626906, pvalue=0.34104653698915954)

In [86]:
shapiro_results

(0.9628366827964783, 0.20917081832885742)

In [177]:
graph.show()