# P1: The stroop effect experiment

The stroop effect is a phenomenon observed in the 1930s by J. Ridley Stroop. It is a a demonstration of interference in the reaction time of a task. He observed that naming the font color of a printed word is an easier and quicker task if word meaning and font color are not incongruent. If both are printed in red, the average time to say "RED" in response to the word 'RED' is greater than the time to say "RED" in response to the word 'BLUE'. (*from [wikipedia](https://en.wikipedia.org/wiki/Stroop_effect)*)


![stroop effect](img/stroop_effect.gif)

## The project

This is the project 1 of the udacity's data analyst nanodegree. The objective is to apply descriptive and inferential statistics knowlegde to assess the stroop effect experiment on 24 subjects. Therefore, using inferential statistics concepts, such as t-tests and hypothesis testing, will be possible to comprove (or not) this effect on the given dataset.

## The experiment

It was given to each participant of the experiment two lists, one congruent and one incongruent. The time used to read out loud each list was recorded and saved on the dataset. So we have the performance (in seconds) of the 24 participants on reading the two types of list.

## Questions

#### 1. What is the independent variable in the experiment? What is the dependent variable?

* _The independent variable in this experiment is the test condition represented by the list. It was used two types of lists: congruents and incongruents. The congruents are lists with the word and color matching, like the word 'RED' with the ink 'RED', while the incongruents has mismatch. For example the word 'RED' with the ink 'BLUE'_
* _The dependent variable is the time in seconds that the subject used to read the list._

#### 2. What could be the null and alternative hypothesis for this experiment? What statistical test will you use?

### Hypothesis testing

   * *The **null hypothesis ($H_0$)** is that the type of list (congruent of incongruent) does not intefere the reaction time of the task. $\mu_c = \mu_i$ ($\mu_c$ as the population mean of reaction time on reading the congruent list and $\mu_i$ as the population mean of reaction time on reading the incongruent list)*
   * *The **alternative hypothesis ($H_a$)** is that incongruent lists take more time to be read than the congruent. Therefore, it would be an one-tailed test since it would be just verified if $\mu_c < \mu_i$*
   
       * $H_0: \mu_c = \mu_i$
       * $H_a: \mu_c < \mu_i$
        

    
### Statistical test
   * *The condition of the experiment suggests that the best approach to evaluate the result is **dependent t-test for paired samples**, because it was given, for the same subject, two conditions (congruent and incongruent list) to compare the performance.*
    
_To run a t-test, some criterias must be considered, as verified by the following records._
       
* **[t-tests assumptions](http://www.statisticssolutions.com/manova-analysis-paired-sample-t-test/):**
    1. The dependent variable must be continuous (interval/ratio)
         * _this criteria is matched because the time (in seconds) is on a continuous scale_
    2. The observations are independent of one another and the sample was randomly selected
         * _its reasonable to assume that the subjects were randomly selected in this experiment and that they are independent from each other_
    3. The dependent variable should be approximately normally distributed
         * _from the graphs below its possible to see that the congruent lists time to read histogram result in a normal distribution. The incongruent lists time to read histogram result in a positively skewed bell-shaped distribution_
            ![histogram](img/histogram.png)
    4. The dependent variable should not contain any outliers
         * _from the boxplot of the test conditions its possible to identify outliers on the incongruent dependent variable. This should be highlighted since outliers can bias the results and potentially lead to incorrect conclusions if not handled properly. However, removing data points can introduce other types of bias into the results, and potentially result in losing critical information (from [statistics solutions](http://www.statisticssolutions.com/manova-analysis-paired-sample-t-test/)). Therefore, it will be assumed that these outliers would not have a lot of influence on the t-test results_
            ![boxplot](img/boxplot.png)

#### 3. Report some descriptive statistics about the dataset. At least one measure of central tendency and one measure of variability

### The dataset

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('../documents/stroop_effect.csv')
df

Unnamed: 0,Congruent,Incongruent
0,12.079,19.278
1,16.791,18.741
2,9.564,21.214
3,8.63,15.687
4,14.669,22.803
5,12.238,20.878
6,14.692,24.572
7,8.987,17.394
8,9.401,20.762
9,14.48,26.282


### Descriptive Statistics

* **Congruent list**
    * _Measures of central tendency_
        * _mean:_ $\bar{x_c} = 14.05$
        * _median:_ $Md_c = 14.36$
        * _mode: no value occur more than once, but from the histogram it's possible to identify that the most occurences are between 14.00 to 17.00_
        <p>
    * _Measures of variability_
        * _variance:_ $S_c^2 = 12.67$
        * _standard deviation:_ $S_c = 3.56$
        
        <p>
* **Incongruent list**
    * _Measures of central tendency_
        * _mean:_ $\bar{x_i} = 22.02$
        * _median:_ $Md_i = 21.02$
        * _mode: no value occur more than once, but from the histogram it's possible to identify that the most occurences are between 18.69 to 21.69_
        <p>
    * _Measures of variability_
        * _variance:_ $S_i^2 = 23.01$
        * _standard deviation:_ $S_i = 4.80$
        <p>
* **Difference between Incongruent and Congruent**
    * _Measures of central tendency_
        * _mean:_ $\bar{x_d} = 7.96$
        <p>
    * _Measures of variability_
        * _standard deviation:_ $S_d = 4.86$

#### 4. Show one or two visualizations that shows the distribution of the data. Show your toughts about it

_line chart comparing the congruent and incongruent times_

![line chart](img/line_chart.png)

_From this line chart is possible to identify that all the subjects of the experiment has taken more time to read the incongruent list than the congruent. This can be interpreted as a good indicator for the alternative hypothesis_ ($H_a$) _but it is necessary to confirm if the longer time taken has statistical significance._

#### 5. Present your statistic test results

### Inputs for the t-test

- $\bar{x_d} = \bar{x_i} - \bar{x_c} = 22.016 - 14.051 = 7.965$
    * $\bar{x_i}$ _as the mean of the times to read the incongruent list_
    * $\bar{x_c}$ _as the mean of the times to read the congruent list_

- $n = 24; df = 23$
    * $n$ _representing the 24 subjects that participated of the experiment_

- $S_d = 4.865$
    * $S_d$ _as the sample standard deviation of the difference of the times to read the incongruent and congruent lists_

- $\alpha = .05$ _and Confidence level = 95%_

- [t-critical value](https://s3.amazonaws.com/udacity-hosted-downloads/t-table.jpg): $t_c(\alpha = .05, df = 23) = 1.714$

### Running t-test

- **t-test**

###  $t = \frac{\bar{x_d} - 0}{\frac{S_d}{\sqrt{n}}} = \frac{{7.965} - 0}{\frac{4.865}{\sqrt{24}}} = 8.021$

- **[p-value](http://www.graphpad.com/quickcalcs/pValue1/)**

   ### $p < 0.0001$

- **Cohen's d**

 ### $d = \frac{\bar{x} - \mu}{S} = \frac{7.965}{4.865} = 1.637 $

- ** Coefficient of determination **

 ### $r^2 = \frac{t^2}{t^2 + df} = \frac{8.021^2}{8.021^2 + 23} = 0.737$

- **Confidence Interval ($\alpha = .05$)**

 ### $CI = (\bar{x_d} - t_c * \frac{S_d}{\sqrt{n}}) , (\bar{x_d} + t_c * \frac{S_d}{\sqrt{n}}) = (7.965 - 1.714 * \frac{4.865}{\sqrt{24}}) , (7.965 + 1.714 * \frac{4.865}{\sqrt{24}})$
 
 ### $CI = (6.263 , 9.667) $
 

### Reporting results ([APA Style](http://my.ilstu.edu/~jhkahn/apastats.html))

$t(23) = 8.02, p < 0.0001, one-tailed$

$CI .95 = (6.26, 9.67)$

$d = 1.64$

$r^2 = .74$

[calculations on the spreadsheet](P1.xlsx)

### Conclusions

_To reject_ $H_0$ _is necessary evidence that the difference between the time taken to read the congruent and incongruent lists are not due to chance (i.e. it's because the type of list). The probability that with t = 8.021 and df = 23 could be due to chance is smaller than 0.0001, much lower than our alpha level of 0.05. Therefore, we have statistical proof that the type of list influence the time taken to read, implying in rejecting the null hypothesis_ $H_0$.

_Further on, 74% of the variation on the time used to read a list can be explained by the type of the list (congruent or incongruent)._

_Visually, we can compare our t-critical value (for $\alpha = .05$ and $df = 23$) and the t-statistic for the dependent t-test for paired samples on the graph below. It's possible to see how much more unlikely was to get the t-statistic by chance than our t-critical value for 95% of confidence._

![t-test hypothesis](img/t_test_hypothesis.png)

#### 6. What can explain this effect?

_There are some theories to explain the stroop effect. The main idea is that the brain should process diferently colors and words and that it affects the time to read the list when the words and colors are unmatched. Variations on the unmatched lists are suggested on the Faculty of Washington stroop effect [page](https://faculty.washington.edu/chudler/words.html#seffect), such as rotating the words, using non-colors words, non-sense words, etc._

_If we assume that the stroop effect shows the concurrence between attention on the brain, executing paralels actions showld degrade the performance, like drive and text on the phone, studying with the tv on, and so on._

<p>

_But... does it applies to us all?_

![juggler](img/juggler.jpg)

## References

* [Stroop effect on Faculty of Washington](https://faculty.washington.edu/chudler/words.html#seffect)
* [Stroop effect on wikipedia](https://en.wikipedia.org/wiki/Stroop_effect)
* [t-tests assumptions on Investopedia](http://www.investopedia.com/ask/answers/073115/what-assumptions-are-made-when-conducting-ttest.asp)
* [Paired sample t-tests on statistics solutions](http://www.statisticssolutions.com/manova-analysis-paired-sample-t-test/)
* [t-table](https://s3.amazonaws.com/udacity-hosted-downloads/t-table.jpg)
* [Calculating p-value on Graphpad](http://www.graphpad.com/quickcalcs/pValue1/)
* [APA Style for statistics](http://my.ilstu.edu/~jhkahn/apastats.html)