# Question 2 Solution: Predicting Cognitive Performance from EEG Data

In this exercise, we will investigate the relationship between brainwave band dominance and reaction times.
The dataset consists of reaction time measurements taken across three different brainwave frequency bands: **theta** which represents the power in the theta frequency band (4-8 Hz), **alpha**  the power in the alpha band (8-12 Hz), and **beta** the power in the beta band (12-30 Hz).
Our goal is to determine whether the mean reaction times differ significantly across these three brainwave frequency bands using statistical methods, specifically the one-way ANOVA test.

Our approach to analyzing the data is structured as follows:

1. **One-Way ANOVA:** We will perform one-way ANOVA to assess whether there are significant differences in mean reaction times between the three brainwave groups (theta, alpha, and beta). This will help us understand if the type of brainwave activity influences reaction time performance.
2. **F-test:** Following the one-way ANOVA, we will use the F-test to evaluate the hypothesis that the variation in reaction times between the groups is greater than the variation within each group. This test will give us insight into the relative size of the between-group differences compared to the within-group variability.

By completing this exercise, we will gain deeper understanding of how to conduct a one-way ANOVA, interpret the results of the F-test, and evaluate the p-value to make an informed decision about whether to reject the null hypothesis. This analysis will allow us to determine if there are significant differences in mean reaction times across the three brainwave frequency bands, shedding light on the potential influence of brainwave activity on cognitive performance.


In [12]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

We are using pandas to create the dataset based on the problem statement.

In [13]:
# Data setup
data = {
    "ReactionTime": [250, 245, 260, 255, 250, 230, 235, 220, 240, 225, 210, 215, 205, 220, 215],
    "Group": (
        ["Theta"] * 5
        + ["Alpha"] * 5
        + ["Beta"] * 5
    ),
}

# Convert to DataFrame
df = pd.DataFrame(data)

In [14]:
df

Unnamed: 0,ReactionTime,Group
0,250,Theta
1,245,Theta
2,260,Theta
3,255,Theta
4,250,Theta
5,230,Alpha
6,235,Alpha
7,220,Alpha
8,240,Alpha
9,225,Alpha


## Perform ANOVA

In [7]:
# Using the formula API: Response ~ C(Factor)
model = ols("ReactionTime ~ C(Group)", data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)


Display the ANOVA table

In [8]:
# Display the ANOVA table
print("ANOVA Table:")
print(anova_table)

ANOVA Table:
               sum_sq    df          F    PR(>F)
C(Group)  3823.333333   2.0  44.980392  0.000003
Residual   510.000000  12.0        NaN       NaN


### ANOVA table explanation

1. sum_sq: Sum of Squares (SS) for each group (Between Groups) and residuals (Within Groups).
2. df: Degrees of Freedom for each source $df_{1} = K - 1$ (in this example,  K=3), $df_{2} = N - K$ (in this example, N=15).
3. F: F-statistic for testing group differences.
4. PR(>F): p-value for the F-statistic.

You can find more details in the following link: [Statsmodels Example Formulas Documentation](https://www.statsmodels.org/dev/example_formulas.html)


## Interpret the results

In [11]:
f_stat = anova_table["F"].iloc[0]
p_value = anova_table["PR(>F)"].iloc[0]

print("\nResults:")
print(f"F-statistic: {f_stat:.2f}")
print(f"p-value: {p_value:.8f}")


Results:
F-statistic: 44.98
p-value: 0.00000266


The F-statistic shows the ratio of variance explained by the group differences to the unexplained variance, while the p-value indicates if the group differences are statistically significant (p < 0.05).