# Homework 6: Mixed effects

This homework assignment is designed to give you practice fitting and interpreting mixed effects models. 

We will be using the **LexicalData.csv** and **Items.csv** files from the *Homework/lexDat* folder in the class GitHub repository again. 

This data is a subset of the [English Lexicon Project database](https://elexicon.wustl.edu/). It provides the reaction times (in milliseconds) of many subjects as they are presented with letter strings and asked to decide, as quickly and as accurately as possible, whether the letter string is a word or not. The **Items.csv** provides characteristics of the words used, namely frequency (how common is this word?) and length (how many letters?). Unlike in the previous homework, there isn't any missing data in the **LexicalData.csv** file. 

*Data courtesy of Balota, D.A., Yap, M.J., Cortese, M.J., Hutchison, K.A., Kessler, B., Loftis, B., Neely, J.H., Nelson, D.L., Simpson, G.B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445-459.*

---
## 1. Loading and formatting the data (1 point)

Load in data from the **LexicalData.csv** and **Items.csv** files. As in the previous homeworks, remove the commas from the reaction times and convert them from strings to numbers. Use `left_join` to add word characteristics `Length` and `Log_Freq_Hal` from **Items** to **LexicalData**. 

*Note: the `Freq_HAL` variable in **Items.csv** has a similar formatting issue, using string values with commas. We're not going to worry about fixing this since we're only using `Log_Freq_HAL`, which is the natural log transformation of `Freq_HAL`, in this homework.*

In [2]:
library(tidyverse)
setwd("C:/Users/Alice/Documents/GitHub/DataSciencePsychNeuro/Homeworks/lexDat")
lex <- read.csv("LexicalData.csv")
item <- read.csv("Items.csv")
head(lex)
head(item)
lex_clean <- lex %>% mutate(D_RT = as.numeric(gsub(',','',lex$D_RT))) %>% mutate(Word = D_Word) %>% left_join(item) %>% select(-Occurrences, -Freq_HAL, - Word)
head(lex_clean)


"package 'tidyverse' was built under R version 4.0.5"
-- [1mAttaching packages[22m ------------------------------------------------------------------------------- tidyverse 1.3.1 --

[32mv[39m [34mggplot2[39m 3.3.5     [32mv[39m [34mpurrr  [39m 0.3.4
[32mv[39m [34mtibble [39m 3.1.6     [32mv[39m [34mdplyr  [39m 1.0.7
[32mv[39m [34mtidyr  [39m 1.2.0     [32mv[39m [34mstringr[39m 1.4.0
[32mv[39m [34mreadr  [39m 1.4.0     [32mv[39m [34mforcats[39m 0.5.1

"package 'ggplot2' was built under R version 4.0.5"
"package 'tibble' was built under R version 4.0.5"
"package 'tidyr' was built under R version 4.0.5"
"package 'readr' was built under R version 4.0.4"
"package 'dplyr' was built under R version 4.0.5"
"package 'forcats' was built under R version 4.0.4"
-- [1mConflicts[22m ---------------------------------------------------------------------------------- tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filt

Unnamed: 0_level_0,Sub_ID,Trial,Type,D_RT,D_Word,Outlier,D_Zscore
Unnamed: 0_level_1,<int>,<int>,<int>,<chr>,<chr>,<chr>,<dbl>
1,157,1,1,710,browse,False,-0.437
2,67,1,1,1094,refrigerant,False,0.825
3,120,1,1,587,gaining,False,-0.645
4,21,1,1,984,cheerless,False,0.025
5,236,1,1,577,pattered,False,-0.763
6,236,2,1,715,conjures,False,-0.364


Unnamed: 0_level_0,Occurrences,Word,Length,Freq_HAL,Log_Freq_HAL
Unnamed: 0_level_1,<int>,<chr>,<int>,<chr>,<dbl>
1,1,synergistic,11,284,5.649
2,1,synonymous,10,951,6.858
3,1,syntactical,11,114,4.736
4,1,synthesis,9,6742,8.816
5,1,synthesized,11,2709,7.904
6,1,synthesizer,11,1390,7.237


Joining, by = "Word"


Unnamed: 0_level_0,Sub_ID,Trial,Type,D_RT,D_Word,Outlier,D_Zscore,Length,Log_Freq_HAL
Unnamed: 0_level_1,<int>,<int>,<int>,<dbl>,<chr>,<chr>,<dbl>,<int>,<dbl>
1,157,1,1,710,browse,False,-0.437,6,8.856
2,67,1,1,1094,refrigerant,False,0.825,11,4.644
3,120,1,1,587,gaining,False,-0.645,7,8.304
4,21,1,1,984,cheerless,False,0.025,9,2.639
5,236,1,1,577,pattered,False,-0.763,8,1.386
6,236,2,1,715,conjures,False,-0.364,8,5.268


---
## 2. Model fitting (4 points)

First, fit a linear model with `Log_Freq_HAL` and `Length` as predictors, and `D_RT` as the output. Include an interaction term. Use `summary()` to look at the model output. 

In [None]:
lm = lm(D_RT~Log_Freq_HAL + Length + Log_Freq_HAL:Length, data=lex_clean)
summary(lm)


Now, install `lme4` using `install.packages()` and then load the library. 

In [10]:
install.packages("lme4")
library(lme4)


"package 'lme4' is in use and will not be installed"


Now fit a mixed effects model that includes the same predictors as the linear model above, as well as random intercepts for `Sub_ID` (i.e., cases where subject ID shifts the RT mean). Use `summary()` to look at the model output. 

In [None]:
me.fit = lmer(D_RT~Log_Freq_HAL + Length + (1 | Sub_ID)), data=lex_clean)
summary(me.fit)

---
## 3. Model assessment (4 points)

Compare the three t-values for the fixed effects and the mixed effects models. How do they differ, and why? 

> *Write your response here* 
> 

Use the Aikeke Information Criterion (AIC) to compare these two models. Which one is better? 

In [None]:
# WRITE YOUR CODE HERE



> *Write your response here* 
> 

---
##  4. Reflection (1 point)

What other random effects could be controlled for in this data set? 

> *Write your response here* 
> 

**DUE:** 5pm EST, March 25, 2022

**IMPORTANT** Did you collaborate with anyone on this assignment? If so, list their names here. 
> *Someone's Name*