<a href="https://colab.research.google.com/github/tuomaseerola/emr/blob/master/Chapter7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 7 Statistical Analysis

A Jupyter notebook with **code examples in R** for _Chapter 7_ of _Routledge_ book titled **[How to Conduct Empirical Music Research](https://github.com/tuomaseerola/emr)** by [Tuomas Eerola](https://www.durham.ac.uk/staff/tuomas-eerola/), [Music and Science Lab]() at [Durham University](https://www.durham.ac.uk) and [Music and Science Lab](https://musicscience.net) scheduled to be published in 2023.

To run the code in your browser, open the file in Colab (click the icon "Open in Colab"). Alternatively, you can download the notebook and run it locally.

This notebook demonstrates running inferential statistical tests in R.

File `Chapter7.ipynb` | Version `29/9/2022` | [Back to Index](https://github.com/tuomaseerola/emr)

---

## Preliminaries
Load libraries and install `MusicScienceData` package where the example data is stored.

In [1]:
library(ggplot2,quietly = TRUE)
library(tidyverse,quietly = TRUE)
if (!require(devtools)) install.packages("devtools",quiet=TRUE)
devtools::install_github("tuomaseerola/MusicScienceData@main",quiet=TRUE)
library(MusicScienceData,quiet=TRUE)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.4.1 
[32m✔[39m [34mreadr  [39m 2.1.3      [32m✔[39m [34mforcats[39m 0.5.2 
[32m✔[39m [34mpurrr  [39m 0.3.5      
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
Loading required package: devtools

Loading required package: usethis



## Code 7.1
See text for the explanation.

In [4]:
df <- MusicScienceData::sadness         # define data
t <- t.test(ASM20 ~ gender, data=df)    # t test
print(t$statistic)                      # show the t value

print(scales::pvalue(t$p.value))
dplyr::summarise(dplyr::group_by(df, gender), # means and SDs
                 M=mean(ASM20,na.rm=TRUE),
                 SD=sd(ASM20,na.rm=TRUE))


        t 
-5.054596 
[1] "<0.001"


gender,M,SD
<fct>,<dbl>,<dbl>
Female,4.587983,1.369222
Male,4.960494,1.244163


## Code 7.2

In [5]:
df <- MusicScienceData::sadness         # define data
model.aov <- aov(ASM20 ~ age, data=df)  # run anova
F <- summary(model.aov)                 # summarise
print(F)


              Df Sum Sq Mean Sq F value  Pr(>F)   
age            5   29.9   5.986   3.321 0.00548 **
Residuals   1564 2819.4   1.803                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
7 observations deleted due to missingness


## Code 7.3

In [6]:
TABLE<-TukeyHSD(model.aov,conf.level = 0.95)
print(knitr::kable(TABLE$age,digits = 3,
                   caption = 'Comparison of age groups
                   for Item 20 in ASM survey.',
                   format = 'simple'))




Table: Comparison of age groups 
                   for Item 20 in ASM survey.

                       diff      lwr     upr   p adj
------------------  -------  -------  ------  ------
25 to 34-18 to 24     0.133   -0.133   0.399   0.713
35 to 44-18 to 24     0.232   -0.062   0.525   0.214
45 to 54-18 to 24     0.244   -0.088   0.576   0.289
55 to 64-18 to 24     0.493    0.107   0.879   0.004
65 to 74-18 to 24     0.418   -0.221   1.057   0.423
35 to 44-25 to 34     0.099   -0.174   0.371   0.906
45 to 54-25 to 34     0.111   -0.202   0.425   0.914
55 to 64-25 to 34     0.360   -0.011   0.731   0.063
65 to 74-25 to 34     0.285   -0.344   0.915   0.789
45 to 54-35 to 44     0.013   -0.324   0.349   1.000
55 to 64-35 to 44     0.261   -0.129   0.652   0.396
65 to 74-35 to 44     0.186   -0.455   0.828   0.962
55 to 64-45 to 54     0.249   -0.172   0.669   0.540
65 to 74-45 to 54     0.174   -0.486   0.834   0.975
65 to 74-55 to 64    -0.075   -0.764   0.614   1.000


## Code 7.4

In [7]:
df <- MusicScienceData::sadness                   # define data
model2.aov <- aov(ASM20 ~ age * gender, data=df)  # run anova
F2 <- summary(model2.aov)
print(F2)


              Df Sum Sq Mean Sq F value  Pr(>F)    
age            5   29.9    5.99   3.377 0.00488 ** 
gender         1   45.7   45.69  25.773 4.3e-07 ***
age:gender     5   11.5    2.31   1.303 0.25997    
Residuals   1558 2762.1    1.77                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
7 observations deleted due to missingness


## Code 7.5

In [8]:
d <- read.csv('https://github.com/tuomaseerola/emr/blob/main/data/raw_ratings.csv') #
d2 <- dplyr::filter(d,Emotion=='Dimensional')  #
d3 <- dplyr::filter(d2, Category=='Anger' |
  Category=='Fear' |
  Category=='Happy' |
  Category=='Sad' |
  Category=='Tender')
library(lme4)
library(lmerTest)
m1 <- lmer(Valence ~ Category * Gender + (1|id) + (1|Track), data = d3)
s <- summary(m1,corr=FALSE)
S<-s$coefficients; S<-round(S,2); S[,5]<-scales::pvalue(S[,5])
print(knitr::kable(S,format = 'simple',
                   caption = 'LMM results of Valence ratings.'))


“cannot open file 'raw_ratings.csv': No such file or directory”


ERROR: Error in file(file, "rt"): cannot open the connection


## Code 7.6

In [None]:
S <- d %>%
  filter(Category=='Sad') %>%
  group_by(Category,Gender) %>%
  summarise(M=mean(Valence,na.rm=T),SD=sd(Valence,na.rm=T),
            .groups = 'drop')
print(S)
