### Analyzing the Stroop Effect
Perform the analysis in the space below. Remember to follow [the instructions](https://docs.google.com/document/d/1-OkpZLjG_kX9J6LIQ5IltsqMzVWjh36QpnP2RYpVdPU/pub?embedded=True) and review the [project rubric](https://review.udacity.com/#!/rubrics/71/view) before submitting. Once you've completed the analysis and write up, download this file as a PDF or HTML file and submit in the next section.


(1) What is the independent variable? What is the dependent variable?

the dependent variable, where we hope to change is:
- the time it takes to name the ink colors

the independent variables are controlled by the experimenting scientist: 
- colour of the words displayed
- the number of congruent and incongruent words displayed
- the number of different colours used


(2) What is an appropriate set of hypotheses for this task? What kind of statistical test do you expect to perform? Justify your choices.

H0: There is no difference in population means of response time under congruent and incongruent scenarios, 
H0: μC = μI

Ha: The population mean of the response time under congruent and incongruent scenarios are significantly different.
Ha: μC != μI

μ is defined as the population mean.

As we do not have the population data available but only 24 paired samples of the congruent and incongruent words test we cannot use a z-test. Due to the small sample size a t-test suits best.
I will perform a paired sample t-test, two-tailed to compare the two means that are from the same group of people

(3) Report some descriptive statistics regarding this dataset. Include at least one measure of central tendency and at least one measure of variability. The name of the data file is 'stroopdata.csv'.

In [1]:
import pandas as pd
import numpy as np
import plotly.offline as py
import plotly.graph_objs as go
py.init_notebook_mode(connected=True) #https://plot.ly/python/getting-started/
#z-table https://s3.amazonaws.com/udacity-hosted-downloads/ZTable.jpg
#https://stackoverflow.com/questions/20864847/probability-to-z-score-and-vice-versa-in-python
import scipy.stats as st

url = 'stroopdata.csv'
df = pd.read_csv(url)
print (df)

print ("Mean:", np.mean(df))
print ("Median congruent:", np.percentile(df.Congruent, 50))
print ("Median incongruent:", np.percentile(df.Incongruent, 50))
print ("Standard deviation of the mean sample:", np.std(df, ddof=1))

#calc quartile
def IQR_range(Q1,Q3):
    IQR = Q3 - Q1
    outlier_min = Q1 - 1.5*IQR
    outlier_max = Q3 + 1.5*IQR
    return [outlier_min, outlier_max]

Q1, Q3 = (np.percentile(df.Congruent, [25,75]))
#Interquartile Range
print ("IQR Congruent min - max:", IQR_range(Q1,Q3))

Q1, Q3 = (np.percentile(df.Incongruent, [25,75]))
#Interquartile Range
print ("IQR Incongruent min - max:", IQR_range(Q1,Q3))



    Congruent  Incongruent
0      12.079       19.278
1      16.791       18.741
2       9.564       21.214
3       8.630       15.687
4      14.669       22.803
5      12.238       20.878
6      14.692       24.572
7       8.987       17.394
8       9.401       20.762
9      14.480       26.282
10     22.328       24.524
11     15.298       18.644
12     15.073       17.510
13     16.929       20.330
14     18.200       35.255
15     12.130       22.158
16     18.495       25.139
17     10.639       20.429
18     11.344       17.425
19     12.369       34.288
20     12.944       23.894
21     14.233       17.960
22     19.710       22.058
23     16.004       21.157
Mean: Congruent      14.051125
Incongruent    22.015917
dtype: float64
Median congruent: 14.3565
Median incongruent: 21.0175
Standard deviation of the mean sample: Congruent      3.559358
Incongruent    4.797057
dtype: float64
IQR Congruent min - max: [5.4370000000000029, 22.658999999999999]
IQR Incongruent min - max: [10.7

measure of central tendency:

Mean: 
Congruent      14.051125
Incongruent    22.015917

Median:
Congruent: 14.3565
Incongruent: 21.0175

Measure of variability:
Standard deviation of the mean sample (using Bessel's correction: 
Congruent      3.559358
Incongruent    4.797057

IQR Congruent min - max: [5.437, 22.659]
IQR Incongruent min - max: [10.715, 32.054]


(4) Provide one or two visualizations that show the distribution of the sample data. Write one or two sentences noting what you observe about the plot or plots.

In [2]:
data = df.Congruent
h_data = [go.Histogram(x=data)]
py.iplot(h_data)

data = df.Incongruent
h_data = [go.Histogram(x=data)]
py.iplot(h_data)

The congruent colour words have a more or less normal distribution with a peak at the meading at approximately 14.
The incongruent colour words have a positive skewness, the mean is right of the median.

(5) Now, perform the statistical test and report the results. What is the confidence level and your critical statistic value? Do you reject the null hypothesis or fail to reject it? Come to a conclusion in terms of the experiment task. Did the results match up with your expectations?

In [12]:
#paired sample t-test, two-tailed

#1 - Calculate the Differences (Congruent - Incongruent) of each test
df_D = df.Congruent - df.Incongruent
n = len(df_D)
degf = n - 1 #degrees of freedom
mn_D = np.mean(df_D)

#2 - Calculate the squared deviations difference - ((Differences of Congruent and Incongruent)-Mean difference)^2
#3 - Sum the squared deviations difference
#4 - Divide it by the number of samples - 1 to achieve the Variance Difference.
#5 - Square root the Variance Difference
S_D = np.std(df_D,ddof=1)

#Then you will have the SDDifference and can calculate the t statistics:
t = mn_D / (S_D / n**.5)
    
print("t-value:",t)

alpha = .05
t_c = st.t.ppf(alpha/2,degf)

print("t-critical value:",t_c)

st.ttest_rel(df.Congruent,df.Incongruent)


t-value: -8.020706944109957
t-critical value: -2.06865761042


Ttest_relResult(statistic=-8.020706944109957, pvalue=4.1030005857111781e-08)

The t-value of -8.0207 is far below the t critical value of -2.0687. 
The p-value is nearly 0 and therefore far below the alpha level of .05 
There is a statistical significant difference between the two tests. 
Therefore I reject the null hypothesis. 
My conclusion is that reading time slows down significantly for incongruent words which matches up with my expectations.