<a href="https://colab.research.google.com/github/tuomaseerola/emr/blob/master/Chapter7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Build a Regression Model

To run the code in your browser, open the file in Colab (click the icon "Open in Colab"). Alternatively, you can download the notebook and run it locally.

This notebook demonstrates running inferential statistical tests in R.

File `build_regression_model.ipynb` | Version `2/3/2023` |

---

## Preliminaries
Load libraries and install `MusicScienceData` package where the example data is stored.

In [None]:
library(ggplot2,quietly = TRUE)
library(tidyverse,quietly = TRUE)
if (!require(devtools)) install.packages("devtools",quiet=TRUE)
devtools::install_github("tuomaseerola/MusicScienceData@main",quiet=TRUE)
library(MusicScienceData,quiet=TRUE)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.4.1 
[32m✔[39m [34mreadr  [39m 2.1.3      [32m✔[39m [34mforcats[39m 0.5.2 
[32m✔[39m [34mpurrr  [39m 0.3.5      
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
Loading required package: devtools

Loading required package: usethis



## Grab a dataset
These are the raw ratings of emotions for the film soundtracks (Eerola & Vuoskoski, 2011).

In [None]:
d <- read.csv('https://raw.githubusercontent.com/tuomaseerola/emr/main/data/raw_ratings.csv')
d2 <- dplyr::filter(d,Emotion=='Dimensional')  #
d3 <- dplyr::filter(d2, Category=='Anger' |
  Category=='Fear' |
  Category=='Happy' |
  Category=='Sad' |
  Category=='Tender')
m1 <- lmer(Valence ~ Category * Gender + (1|id) + (1|Track), data = d3)
s <- summary(m1,corr=FALSE)
S<-s$coefficients; S<-round(S,2); S[,5]<-scales::pvalue(S[,5])
print(knitr::kable(S,format = 'simple',
                   caption = 'LMM results of Valence ratings.'))


“cannot open file 'raw_ratings.csv': No such file or directory”


ERROR: Error in file(file, "rt"): cannot open the connection


## Code 7.7

## Explore how features and ratings correlate

In [None]:
library(MusicScienceData)               # loads library w data
d1 <- MusicScienceData::soundtrack      # get ratings
d2 <- MusicScienceData::soundtrack_features[,c(2:3,5:6)] #
d1[,17:21] <- as.data.frame(scale(d2))  # normalise

tmp <- cor(d1[,c(3,17:20)]) # get correlations
print(round(tmp[2:5,1],2))  # display first line


## Construct a model

In [None]:
model.reg <- lm(Energy ~ RMS + sp_centr + spec_rolloff +
  spec_zcr, data = d1)
s <- summary(model.reg) # R2adj = 0.424 (Energy)
print(s)


## Prediction rate vs correlation?
$R^2$ and correlation coefficient ($r$) are directly related. 

In [None]:
r <- cor(d1$Energy, d1$RMS)
print( r^2 )    # print the squared correlation

summary(lm(Energy ~ RMS,data=d1)) # Summarise regression
