### General instructions.

In this interactive tutorial, you can run each one of the cells by either clicking the ‘play’ button or by pressing ‘Shift + Enter’. You can make changes to the code as well.

# Graphical Forms of Data Charts: Dataset 1

## Filter and Fire dataset

Read and observe the Filter and Fire dataset in R:

In [1]:
FilterFiredata<-read.csv("FilterandFireData.csv") #read the dataset from the file
head(FilterFiredata) #observe the first 6 rows of the table

<class 'NameError'>: name 'FilterFiredata' is not defined

One of the greatest tools to make graphs in R is using the ggplot library. 
\
First we need to load the packages we will be using:

In [None]:
library(tidyverse)
library(RColorBrewer)
library(reshape2)

## Barplot
We can now start by observing the Baseline Accuracy displayed by neurons
for the detection of each handwritten digit by making a Bar plot.

In [None]:
ggplot(FilterFiredata,aes(x=as.factor(digit), y=Accuracy.baseline))+ geom_bar(stat = "identity", position = "identity")

In this case, we’ve had to add the *as.factor()* command into the column
*‘digit’* so each bar represents one of the digits tested in this task.
\
Factors are categorical data types, and the *as.factor()* command converts
the numerical values on the *‘digit’* column into categories. 
* Exercise: What happens to the plot if we remove the *as.factor()* command?

As you can see, this is the simpliest form of a plot. We can make it easier to visualize by adding color and changing the y axix values:

In [None]:
ggplot(FilterFiredata,aes(x=as.factor(digit), y=Accuracy.FF, fill = as.factor(digit)))+ geom_bar(stat = "identity", position = "identity") + labs(x="Digit", y="Accuracy Baseline") + coord_cartesian(ylim = c(88,100)) + guides(fill="none") 

You can change the color of the entire plot manually by using color names
or color code. You can also change the color depending on the
composition of the plot, for example, here we gave a different
color to each one of the digits by adding *fill =
as.factor(digit)*.
\
Here you can find more color palletes already included in R:
https://www.nceas.ucsb.edu/sites/default/files/2020-04/colorPaletteCheatsheet.pdf

The y axis values were changed with *coord_cartesian()*

* Excercise: Change the color / color pallette of this plot

## Box Plot

In [None]:
ggplot(FilterFiredata, aes(x=as.factor(digit), y=Accuracy.FF, fill = as.factor(digit))) + geom_boxplot() + labs(x="Digit", y="Accuracy after F&F model training") + guides(fill="none")

The *+ guides(fill=“none”)* command indicates there is no need to add a
color code on this graph, as it doesn’t add any more information. It is possible to color-code the plot by a different variable.

* Exercise: Change the coloring of the plot by the release.probability
variable (as a factor). What extra information is this plot now
providing?

## Violin Plot

As you can observe, the same data can be visualized in different ways:

In [None]:
ggplot(FilterFiredata, aes(x=as.factor(digit), y = Accuracy.FF, fill = as.factor(digit))) + geom_violin() + labs(x="Digit", y="Accuracy after F&F model training") + guides(fill="none") 

## Histogram
We will now plot an histogram but only for the values that were trained with the **digit 9**

In [None]:
FilterFiredata_digit9<-subset(FilterFiredata, digit==9) #Subsetting the data corresponding only to digit 9
ggplot(FilterFiredata_digit9, aes(x=Accuracy.FF)) + geom_histogram(color="black",fill="deeppink",binwidth = 0.2) + labs(x="Accuracy after F&F model training", y="Frequency")

In histograms, bins define the division of the histogram into bars. The
*binwidth* element in ggplot determines the size of each bin.

* Exercise: Change the binwidth value for a larger or smaller one. How
does that affect the histogram representation?

# Graphical Forms of Data Charts: Dataset 2

## Brain region-specific Gene Expression

Read and observe the Brain region-specific Gene Expression data in R

In [None]:
expressiondata<-read.table("ExpressionData.txt",header = T, row.names = 1) #read the dataset from the file

## Heatmap

In [None]:
heatmap(as.matrix(expressiondata),Colv = NA, Rowv = NA, scale="row", col=rev(brewer.pal(n = 11, name ="RdYlBu")))

The heatmap function requires a *matrix* as input, but our
‘expressiondata’ object is a *list*. These are two different data types,
but we can use the *as.matrix* function to convert our list into a matrix.

## Scatter plot

In [None]:
ggplot(expressiondata, aes(x=NAc1, y = NAc4)) + geom_point(color="blue",stat = "identity", position = "identity")

As we have plotted the same brain region of two different samples
(Nucleus Accumbens or rat1 and rat4), we can see both of them are highly
correlated with each other.

* Exercise: Plot now the NAc expression data against of that of the DG of
the same sample.
\
What is the dispersionlike? Is the
expression between the two brain regions correlated with each other? Is
it what you expected?

## Line plot

In [None]:
expressiondata$GeneNames <- rownames(expressiondata) #add a column with the rownames (Gene names) to the expression data
Genes <- group_by(expressiondata, GeneNames) #group the data per Gene, so we can plot them
ggplot(melt(Genes))+ geom_line(aes(x = variable, y = value, col = GeneNames, group = GeneNames)) + guides(col="none")+ labs(x="Samples", y="Gene Expression")

Here, the *melt* function was required to convert the data of the gene
expression data into a single matrix. This extra step was needed in R
as we intended to plot the gene expression data of every
sample, but usually the geom_line() can be used to plot a line between
two valiables(columns).

* Excercise: Using the code from the scatterplot above, try to visualize
it like a line plot instead, i.e. make a lineplot using the a brain
region column in the x axis and another brain region in the y axis.

### Advanced excercises.
If you'd like to have an extra challenge, we suggest you to download the original datasets. You can then try to replicate the plots from the research papers. 
* Filter and Fire original Dataset: https://www.kaggle.com/datasets/selfishgene/fiter-and-fire-paper
* Brain region-specific expression data original Dataset (Fig1d.Region_sepcific_expressed_Gene_cpm_Zscore.txt file): https://figshare.com/projects/Methamphetamine-induced_region-specific_transcriptomic_and_epigenetic_changes_in_the_brain_of_male_rats/177378