# Day 3 Exercises (Pandas + Seaborn)

In this set of exercises, you will perfrom part of the analysis from [Widge et al. (2019)](https://www.nature.com/articles/s41467-019-09557-4), *Deep brain stimulation of the internal capsule enhances human cognitive control and prefrontal cortex function*. In this paper, the authors found that deep brain stimulation (DBS) of the ventral striatum improved performance on a task of cognitive flexibility in a sample of 14 patients with severe depression. Specifically, response times on the task decreased (speeded up) with their DBS devices turned on.

To recreate some of the analyses and figures, you will use Pandas, Seaborn, and SciPy.stats. 

### Methods
To probe cognitive flexibility, the authors employed a modified version of the Multi-Source Interference Task (MSIT). The MSIT requires subjects to identify which of a set of three numbers is different than its neighbors. Subjects must keep three fingers of their right hand positioned over response keys corresponding to the digits 1-3. In **Control** (non-interference) trials, the target is in the same spatial position as its corresponding response key, and the flanking digits are not valid responses (i.e., they are 0s). In **Interference trials**, the target is out-of-position relative to its corresponding key-press and is flanked by other viable targets. 

Each block of trials contained 72 Control and 72 Interference trials. To prevent response sets or habituation, trial sequence in each block was pseudo-randomized so that subjects never had more than two trials in a row that shared the same interference level or desired response finger. This highly interleaved trial design was expected to place greater demands on cognitive control systems by reducing predictability of the stimuli. 

Patients performed this task twice: first with their DBS device turned **ON**, and then with their DBS device turned **OFF**.

### Data

The file *dbs.csv* contains the raw behavioral data from the 14 patients. The data is organized into six columns:

- *Subject:* the unique subject identifier.
- *Trial:* the trial number per block.
- *DBS:* the status of the DBS device (ON = 1, OFF = 0)
- *Interference:* the type of trial (Control = 0, Interference = 1)
- *Accuracy:* the accuracy of the response of the trial (Correct = 1, Incorrect = 0)
- *RT:* the response time on that trial (in seconds).

## Section 1: Preprocessing

a) Read in the DataFrame from the CSV file, *dbs.csv*.

b) Using `DataFrame.value_counts`, confirm there are 14 patients with 288 trials each.

c) Using `DataFrame.groupby`, compute the average accuracy per participant. Should any participant be excluded (e.g. lower than 70% accuracy)?

d) Response times for trials with incorrect responses are typically biased. In other words, they tend to be systematically faster than the average response (i.e. fast error) or slower than the average response (i.e. slow errors). 

Remove all rows corresponding to trials with incorrect responses. Confirm no incorrect responses remain.

e) Similarly, participants occasionally make unrealistic fast responses (e.g. < 300 ms). These typically correspond to accidental button presses or slips of the finger.

Remove all rows corresponding to trials with response times faster than 300 ms. Confirm the mininum response time per participant is above this threshold.

f) Reaction times are typically right-skewed. That is, on average, a distribution of reaction times is asymmetric with more responses falling on the slower side.

Using `sns.FacetGrid` (or any other plotting method of your choosing), plot the RT distribution per participant. Confirm that all (or most) RT distributions are right-skewed.

g) To correct for right-skew, some analysts apply a log transform to reaction time data. 

Make a new column in the DataFrame, **logRT**, that is applying a log transform the RT data.

h) Write a z-score function. The z-score is defined as:

$$ Z = \frac{X - \text{mean}(X)}{\text{sd}(X)} $$

i) Apply the z-score transformation to the log reaction times (logRT) *per subject*. This has the effect of normalizing all participants' data to be in a similar range (e.g. removes baseline differences in RTs). Store the z-scored values in a new DataFrame column, **zRT**.



Hint: Remember `DataFrame.groupby` and `DataFrame.apply`.

## Section 2: Visualization

a) Using Seaborn, visualize the difference in (z-scored) reaction times as a function of interference. Try out different plotting styles and see what you like best.

b) Using Seaborn, visualize the difference in (z-scored) reaction times as a function of DBS. Try out different plotting styles and see what you like best. Which contrast (interferece vs. DBS) seems to have the larger effect?

c) As an optional challenge, try to plot both contrasts (ON vs. OFF, Control vs. Intereference) simultaneously. 

Hint: Read up on the `hue` argument in Seaborn.

## Section 3: Statistics

a) Use `DataFrame.groupby` to calculate the average z-scored response time difference (collapsing across patients) for both contrasts (i.e. Inteference - Control, DBS ON - DBS OFF).

b) Perform an independent samples t-test to test for significance differences in the two contrasts. Which effect is larger?