# Permutation analysis
#### Table of contents


---
## 0. Before starting
#### 0-1. Update R
1. Download binaries (Run R-3.X.X-win.exe) from https://www.r-project.org/.
2. Run R-3.X.X-win.exe
3. Set system path.
    - Type "env" in the Start button and select "Edit the system environment variables".
    - Push "Environment Variables..." button.
    - Change "Path" variable.
4. Install IRkernel for Jupyter
    - in Anaconda prompt (Admin)
    ```
    > R
    > install.packages('IRkernel')
    > IRkernel::installspec()
    ```
    
#### 0-2. Install required packages (ggpubr)
If ggpubr is not install, open command window.<BR>

**\<on Win10\ as regular user>**
```
C:\Users\User>R
> install.packages("ggpubr")
```
If this is the first time to install packages, it will ask if personal folder is created. Answer with yes. For additional installtions, you do not need use command window, but run install.pacages command in cell as below.

**\<on Ubuntu\>**
```
$ sudo R
> install.packages("ggpubr")
```

**\<using Jupyter\>**

In [None]:
install.packages("magrittr")

#### 0-3. Check R version

In [None]:
version

---
## 1. Load data from CSV file

#### (Optional) Install packages if not installed.
- **magrittr**: I cannot remember for what.
- **stringr**: Simple, Consistent Wrappers for Common String Operations<BR>
https://www.rdocumentation.org/packages/stringr/versions/1.4.0<BR>
    used as `for (colname in str_subset(names(df), rex)){` in the function conv_str2list().
- **hablar**: Simple tools for converting columns to new data types. Intuitive functions for columns with missing values.<BR>
https://cran.r-project.org/web/packages/hablar/<BR>
    used as `convert(lgl(single_animal))`.

In [100]:
# Install packages

install.packages("magrittr")
install.packages("stringr")
install.packages("hablar")

Installing package into 'C:/Users/User/Documents/R/win-library/3.6'
(as 'lib' is unspecified)

"package 'magrittr' is in use and will not be installed"
Installing package into 'C:/Users/User/Documents/R/win-library/3.6'
(as 'lib' is unspecified)

"package 'stringr' is in use and will not be installed"
Installing package into 'C:/Users/User/Documents/R/win-library/3.6'
(as 'lib' is unspecified)

"package 'hablar' is in use and will not be installed"


#### 1-1. Load libraries

In [101]:
# Load required library
library(magrittr)
library(hablar)

library(ggplot2)
library(ggpubr) # ggplot2 based publication ready plots

source("synchro_freeze.R")

### 1-2. Load data 

In [103]:
#######################################
# Load the big table into t1
# R accepts both way to describe path.
# filename <- "C:\\Users\\User\\Desktop\\project\\summary3.csv"
filename = "Z:/videos_synchrony/summary3.csv"

# Load big table
t1 = read.table(file=filename,header=TRUE, sep=",")
t1 = t1 %>% convert(lgl(single_animal))

#######################################
# Load the big table into t1
filename = "Z:/videos_synchrony/IVs.csv"

# Load groups table
t2 = read.table(file=filename,header=TRUE, sep=",")
t2 = t2 %>% convert(lgl(single_animal))

#######################################
# merge two data frames by ID
df <- merge(t2, t1, by=c("folder_videoname","single_animal"))
#names(df)[names(df) == "single_animal.x"] <- "single_animal"

#######################################
# Post process
# Convert strings to integer list
rex = "fz_start*|fz_end*|lagt_*"
df = conv_str2list(df, rex)

# Adjust dtype
df = df %>% convert(chr(folder_videoname,sex,familiarity,lighting,stress,comment,infusion_hpc,infusion_pfc))

# Set NA for empty cell
df[df==""]<-NA
# df[is.nan(df)] <- NA

#######################################
# Display summary
dim(df)
#str(df)
sapply(df, class)
#sapply(df, typeof)
head(df,2)

# if(which(df$single_animal.x != df$single_animal.y) == FALSE){
#     print("folder_videoname and single_animal are consistent.")
# } else {
#     print("something is wrong")
# }


folder_videoname,single_animal,sex,age,infusion_hpc,infusion_pfc,familiarity,lighting,partition,stress,...,fz_overlap,cohen_d,fz_start_sub1,fz_end_sub1,fz_start_sub2,fz_end_sub2,lagt_start_s1_s2,lagt_start_s2_s1,lagt_end_s1_s2,lagt_end_s2_s1
<chr>,<lgl>,<chr>,<int>,<chr>,<chr>,<chr>,<chr>,<lgl>,<chr>,...,<dbl>,<dbl>,<list>,<list>,<list>,<list>,<list>,<list>,<list>,<list>
20190408_testing_1_f10ab,False,female,75,,,familiar,visible,False,no_stress,...,18.95833,0.7139131,"249, 255, 267, 294, 331, 353, 364, 375, 403, 424, 430, 519, 550, 563, 577, 599, 634, 652, 689","253, 262, 272, 299, 338, 359, 368, 383, 409, 429, 435, 525, 558, 568, 594, 615, 640, 657, 694","264, 295, 303, 313, 321, 343, 398, 410, 425, 438, 482, 494, 517, 533, 557, 637, 652, 675, 685, 708","273, 302, 307, 317, 340, 389, 406, 419, 437, 468, 491, 502, 528, 555, 567, 642, 660, 680, 704, 713","15, 9, -3, 1, -10, -10, -21, 23, -5, 1, -5, -2, 7, -6, -20, 38, 3, 0, -4","3, -1, -9, 18, 10, 10, 5, -7, -1, -8, 37, 25, 2, -14, 6, -3, 0, 14, 4, -19","20, 11, 1, 3, 2, -19, 21, 6, -3, 8, 2, 3, -3, -1, -27, 27, 2, 3, 10","-1, -3, -8, -18, -2, -6, 3, -10, -2, -33, 34, 23, -3, 3, 1, -2, -3, 14, -10, -19"
20190408_testing_1_f6ab,False,female,75,,,familiar,visible,False,no_stress,...,22.08333,1.5590576,"267, 298, 341, 413, 536, 545, 560, 568, 580, 607, 630, 689, 712","272, 317, 407, 431, 543, 550, 565, 575, 586, 614, 675, 704, 720","289, 298, 338, 374, 516, 529, 550, 573, 584, 632, 646, 671, 704","296, 312, 363, 404, 521, 539, 564, 581, 595, 637, 651, 679, 712","22, 0, -3, -39, -7, 5, -10, 5, 4, -23, 2, 15, -8","9, 0, 3, -33, 20, 7, -5, -5, -4, -2, -16, 18, 8","24, -5, -3, -27, -4, -11, -1, 6, -5, -19, 4, 8, -8","21, 5, 44, 3, 22, 4, 1, 5, -9, -23, 24, -4, -8"


---
## 1-2. Distribution of lagtime for onset and offset of freezing
## 2-1. Load lagtime csv file and visualize

In [None]:
# Load csv file
# test <- read.csv(file="C:\\Users\\User\\Dropbox\\Shared w am\\2019-02-19 R03 Brain-to-brain synchrony\\fig\\fear_express_video\\freeze.csv",header=TRUE, sep=",")
test <- read.csv(file="freeze.csv",header=TRUE, sep=",")

test

In [None]:
# Plot the distribution
library(ggplot2)
library(ggbeeswarm)

ggplot(test,aes(type,lagtime)) +
 geom_boxplot() +
 geom_quasirandom(alpha = 0.2) +
 theme_bw()

# Colored Histogram with Different Number of Bins
subTest = test[test$type=='onset',]
hist(subTest$lagtime, breaks=seq(-50,50,1), col="blue", xlim=c(-50,50), ylim=c(0,60))

subTest = test[test$type=='offset',]
hist(subTest$lagtime, breaks=seq(-50,50,1), col="blue", xlim=c(-50,50), ylim=c(0,60))


### (Option) Export a plot as EPS file ##############################################
# Change the plot line
setEPS()
postscript("whatever.eps")
plot(rnorm(100), main="Hey Some Data")
dev.off()
#####################################################################################


## 2-2. Test Coefficients of Variation from multiple samples
https://cran.r-project.org/web/packages/cvequality/vignettes/how_to_test_CVs.html

If ggbeeswarm is not install, open terminal window and install it.
```
$ sudo R
> install.packages("ggbeeswarm")
> install.packages("cvequality")
```

In [None]:
# Load required library
library(cvequality)

In [None]:
test1 <- with(test,asymptotic_test(lagtime,type))
test1

In [None]:
test2 <- with(test,mslr_test(nr = 1e4, lagtime,type))
test2

## 1-2. One-Sample Wilcoxon Signed Rank Test in R
http://www.sthda.com/english/wiki/one-sample-wilcoxon-signed-rank-test-in-r

In [None]:
# We want to know, if the average of the data differs from mu (two-tailed test).

# One-sample wilcoxon test
res <- wilcox.test(test$V1, mu = 44)
# Printing the results
res

---
# 2. Distribution of lagtime for onset and offset of freezing
## 2-1. Load lagtime csv file and visualize

In [None]:
# Load csv file
# test <- read.csv(file="C:\\Users\\User\\Dropbox\\Shared w am\\2019-02-19 R03 Brain-to-brain synchrony\\fig\\fear_express_video\\freeze.csv",header=TRUE, sep=",")
test <- read.csv(file="freeze.csv",header=TRUE, sep=",")

test

In [None]:
# Plot the distribution
library(ggplot2)
library(ggbeeswarm)

ggplot(test,aes(type,lagtime)) +
 geom_boxplot() +
 geom_quasirandom(alpha = 0.2) +
 theme_bw()

# Colored Histogram with Different Number of Bins
subTest = test[test$type=='onset',]
hist(subTest$lagtime, breaks=seq(-50,50,1), col="blue", xlim=c(-50,50), ylim=c(0,60))

subTest = test[test$type=='offset',]
hist(subTest$lagtime, breaks=seq(-50,50,1), col="blue", xlim=c(-50,50), ylim=c(0,60))


### (Option) Export a plot as EPS file ##############################################
# Change the plot line
setEPS()
postscript("whatever.eps")
plot(rnorm(100), main="Hey Some Data")
dev.off()
#####################################################################################


## 2-2. Test Coefficients of Variation from multiple samples
https://cran.r-project.org/web/packages/cvequality/vignettes/how_to_test_CVs.html

If ggbeeswarm is not install, open terminal window and install it.
```
$ sudo R
> install.packages("ggbeeswarm")
> install.packages("cvequality")
```

In [None]:
# Load required library
library(cvequality)

In [None]:
test1 <- with(test,asymptotic_test(lagtime,type))
test1

In [None]:
test2 <- with(test,mslr_test(nr = 1e4, lagtime,type))
test2

---
# 3. Boxplot for the distribution of lagtime for each animal pair.

In [None]:
# Raw data for lag-time
# "s" stands for onset and "e" stands for offset of freezing
d1 <- c( 0,    -24,    3,    0,   16,    8,    9,   -3,    5,    4,   -3,   -1,  -2 )
e1 <- c("f1s", "f1s", "f1s", "f1s", "f1s", "f1s", "f1s", "f1s", "f1s", "f1s", "f1s", "f1s", "f1s") 

d2 <- c(   0,   -9,    0,    0,   13,    0,    0,    4,    2,   -1,   -8,   3)
e2 <- c("f1e", "f1e", "f1e", "f1e", "f1e", "f1e", "f1e", "f1e", "f1e", "f1e", "f1e", "f1e") 

d3 <- c(14,-1,9,8,-7,0,-8,0,0,-18,0)
e3 <- c("f2_1s","f2_1s","f2_1s","f2_1s","f2_1s","f2_1s","f2_1s","f2_1s","f2_1s","f2_1s","f2_1s")

d4 <- c(     13,      10,      -3,     -11,      -9,       1,      -3,       3,     -18,      -6,       4,     0)
e4 <- c("f2_1e", "f2_1e", "f2_1e", "f2_1e", "f2_1e", "f2_1e", "f2_1e", "f2_1e", "f2_1e", "f2_1e", "f2_1e", "f2_1e") 

d5 <- c(3,4,0,4,3,1,0,-8,-4,5,11)
e5 <- c("f2_2s","f2_2s","f2_2s","f2_2s","f2_2s","f2_2s","f2_2s","f2_2s","f2_2s","f2_2s","f2_2s")

d6 <- c(3,1,0,1,1,0,0,-9,-12,3,-2)
e6 <- c("f2_2e","f2_2e","f2_2e","f2_2e","f2_2e","f2_2e","f2_2e","f2_2e","f2_2e","f2_2e","f2_2e")

d7 <- c(-3,-1,-7,11,-11,3,-13,-2)
e7 <- c("f3_1s","f3_1s","f3_1s","f3_1s","f3_1s","f3_1s","f3_1s","f3_1s")

d8 <- c(-7,0,-2,2,-7,2,-6,0)
e8 <- c("f3_1e","f3_1e","f3_1e","f3_1e","f3_1e","f3_1e","f3_1e","f3_1e")

d9 <- c(3,-12,4,-2,5)
e9 <- c("f3_2s","f3_2s","f3_2s","f3_2s","f3_2s")

d10 <- c(-1,-3,0,-12)
e10 <- c("f3_2e","f3_2e","f3_2e","f3_2e")

d11 <- c(-11,-14,11,16,-57,19)
e11 <- c("f4_1s","f4_1s","f4_1s","f4_1s","f4_1s","f4_1s")

d12 <- c(-16,-10,-25,-1,0)
e12 <- c("f4_1e","f4_1e","f4_1e","f4_1e","f4_1e")

d13 <- c(3,-4,-18,18)
e13 <- c("f4_2s","f4_2s","f4_2s","f4_2s")

d14 <- c(8,3,-13,16,0)
e14 <- c("f4_2e","f4_2e","f4_2e","f4_2e","f4_2e")

# Concatenate the data
d <- c(d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14)
e <- c(e1,e2,e3,e4,e5,e6,e7,e8,e9,e10,e11,e12,e13,e14)

# Create data frame
mydata <- data.frame(d,e)
# Add column names
names(mydata) <- c("s1_s2","pair")

mydata

In [None]:
# Boxplot for the distribution of lag-time
library(ggplot2)
library(ggbeeswarm)

ggplot(mydata,aes(pair,s1_s2)) + geom_boxplot() + geom_quasirandom(alpha = 0.9) + theme_bw()

### (Option) Export a plot as EPS file ##############################################
# Change the plot line
setEPS()
postscript("whatever.eps")
plot(rnorm(100), main="Hey Some Data")
dev.off()
#####################################################################################

---

# R version

In [None]:
version

---
# Read csv file and test correlation
The csv file is generated by MATLAB code

[READING IN DATA FROM AN EXTERNAL FILE | R LEARNING MODULES](https://stats.idre.ucla.edu/r/modules/reading-in-data-from-an-external-file/)

In [None]:
test <- read.table('D:\\wataru\\Recording_Analysis\\Bases_dmPFC-BLA\\2017-12-19_vm81a_base\\myFile.txt', sep = ",")

In [None]:
ccf(test[,1], test[,3], lag = 200000, ylim = range(-1,1), type="correlation")

In [None]:
testTS <- ts(test)

In [None]:
length(testTS)
str(testTS)
class(testTS)
names(testTS)
testTS

In [None]:
test <- read.table('D:\\wataru\\Recording_Analysis\\Bases_dmPFC-BLA\\2017-12-19_vm81a_base\\myFile.txt', sep = ",")

In [None]:
data (sales)  # parts of Example 11.2.2 from Brockwell and Davies (1991).
sal <- diff (sales)
led <- diff(lead)
ccf (led, sal, lag = 20, ylim = range(-1,1), type="o")

In [None]:
set.seed(123)
x = arima.sim(model=list(0.2, 0, 0.5), n = 100)
y = arima.sim(model=list(0.4, 0, 0.4), n = 100)
ccf(x, y, type="correlation")

In [None]:
readClipboard()

In [None]:
# setwd("D:/wataru/Recording_Analysis/Bases_dmPFC-BLA")
# theta <- scan('test.txt')
# plot(theta)

theta <- scan('D:\\wataru\\Recording_Analysis\\Bases_dmPFC-BLA\\2017-12-19_vm81a_base\\test.txt')
plot(theta)

---
# Data Types
https://www.statmethods.net/input/datatypes.html

In [None]:
######################################################
# vectors
a <- c(1,2,5.3,6,-2,4) # numeric vector
b <- c("one","two","three") # character vector
c <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector

# Identify rows, columns or elements using subscripts.
a[4]
a[c(2,4)]

######################################################
# matrix
# generates 5 x 4 numeric matrix 
y<-matrix(1:20, nrow=5,ncol=4)
# another example
cells <- c(1,26,24,68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2") 
mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE,
  dimnames=list(rnames, cnames))

# Identify rows, columns or elements using subscripts.
x[,4] # 4th column of matrix
x[3,] # 3rd row of matrix 
x[2:4,1:3] # rows 2,3,4 of columns 1,2,3

######################################################
# Data Frames
# A data frame is more general than a matrix, in that different columns can have different
# modes (numeric, character, factor, etc.). This is similar to SAS and SPSS datasets.

d <- c(1,2,3,4)
e <- c("red", "white", "red", NA)
f <- c(TRUE,TRUE,TRUE,FALSE)
mydata <- data.frame(d,e,f)
names(mydata) <- c("ID","Color","Passed") # variable names

# Identify rows, columns or elements using subscripts.
mydata[2:3] # columns 3,4,5 of data frame
mydata[c("ID","Passed")] # columns ID and Age from data frame
mydata$Color # variable x1 in the data frame
mydata[1,3]

######################################################
# The ls() function returns a vector listing lists all the objects (vectors, data frames, etc) in your current workspace.
ls()

# Remove these three objects
rm("first_name", "last_name", "new_df")
 
# Or remove objects listed in a vector
rm(list = c("first_name", "last_name", "new_df"))
 
# Or remove all files from your workspace
rm(list = ls())
 
# Or remove vectors programmatically.  Delete objects with underscore in name
rm(list = ls()[grepl("_", ls())])

######################################################
# Lists
# An ordered collection of objects (components). A list allows you to gather a variety of 
# (possibly unrelated) objects under one name.
# example of a list with 4 components - 
# a string, a numeric vector, a matrix, and a scaler 

w <- list(name="Fred", mynumbers=a, mymatrix=y, age=5.3)

# example of a list containing two lists
# It looks concatenate the two lists
v <- c(w,w)

# Identify elements of a list using the [[]] convention.
mylist[[2]] # 2nd component of the list
mylist[["mynumbers"]] # component named mynumbers in list



######################################################
# Factors
# Tell R that a variable is nominal by making it a factor. The factor stores the nominal
# values as a vector of integers in the range [ 1... k ] (where k is the number of unique 
# values in the nominal variable), and an internal vector of character strings (the original 
# values) mapped to these integers.

# variable gender with 20 "male" entries and 
# 30 "female" entries 
gender <- c(rep("male",20), rep("female", 30)) 
gender <- factor(gender) 
# stores gender as 20 1s and 30 2s and associates
# 1=female, 2=male internally (alphabetically)
# R now treats gender as a nominal variable 
summary(gender)


# Reading file

test2.csv
```
 prgtype gender  id ses schtyp level
 general      0  70   4      1     1
  vocati      1 121   4      2     1
 general      0  86   4      3     1
  vocati      0 141   4      3     1
academic      0 172   4      2     1
academic      0 113   4      2     1
 general      0  50   3      2     1
academic      0  11   1      2     1
```

