05-damr.Rmd

# DAM2 data, in practice {#damr -}

**A matter of metadata**

---------------------------

![A DAM experiment. Two replicates, 3 days of recording each; 10 days apart; three genotypes; two sexes, males and females](assets/dam_experiment.png)

## Aims {-}
In this practical chapter, we will use a real experiment to learn how to:

* Translate your experiment design into a metadata file
* Use this metadata file to load some data
* Set the zeitgeber reference (ZT0)
* Assess graphically the quality of the data
* Use good practices to exclude individuals from our experiments

## Prerequisites {-}

* You are familiar with the [TriKineticks DAM system](http://www.trikinetics.com/)
* Ensure you have read about the [rethomics workflow](workflow.html) and [metadata](metadata.html)
* Ensure you have [installed](intro.html#installing-rethomics-packages)
`behavr`, `damr` and `ggetho` packages:


```{r, eval=FALSE}
install.packages(c('ggetho', 'damr'))
```


```{r, echo=FALSE}
URL <- "https://github.com/rethomics/rethomics.github.io/raw/source/material/damr_tutorial.zip"
DATA_DIR <- paste(tempdir(), "damr_tutorial", sep="/")
dir.create(DATA_DIR)
knitr::opts_knit$set(root.dir = DATA_DIR)

dst <- paste(DATA_DIR, "damr_tutorial.zip", sep="/")
download.file(URL, dst)
unzip(dst, exdir= DATA_DIR)
```

## Background{-}

[Drosophila Activity Monitors](http://www.trikinetics.com/) (DAMs) are a wildely used tool to monitor activity of fruit flies over several days. I am assuming that, if you are reading this tutorial, you are already familiar with the system, but I will make a couple of points clear before we start something more hands-on:

* This tutorial is about single beam **DAM2** but will adapt very well to multibeam DAM5.
* We work with the raw data (the data from each monitor is in one single file, and all the monitor files are in the same folder)

## Getting the data{-}

For this tutorial, you need to [download some DAM2 data](https://github.com/rethomics/rethomics.github.io/raw/source/material/damr_tutorial.zip)
that we have made available.
This is just a zip archive containing four files.
Download and extract the files from the zip into a folder of your choice.
**Store the path in a variable**.
For instance, **adapt** something like:

```r{eval=F}
DATA_DIR <- "C:/Where/My/Zip/Has/been/extracted
```

Check that all four files live there:

```{r}
list.files(DATA_DIR, pattern= "*.txt|*.csv")
```

For this exercise, we will work with the data and metadata in the same place.
However, in practice, I recommend to:

* Have **all raw data from your acquisition platform in the same place** (possibly shared with others or a network drive)
* Have **one folder per "experiment"**. That is a folder that contains one metadata file, your R scripts, your figures regarding a set of consistent experiment.

For now, we can just [set our working directory](https://support.rstudio.com/hc/en-us/articles/200711843-Working-Directories-and-Workspaces) to `DATA_DIR`:

```{r, eval=FALSE}
setwd(DATA_DIR)
```

## From experiment design to metadata{-}

### Our toy experiment{-}


![A DAM experiment. Two replicates, 3 days of recording each; 10 days apart; three genotypes; two sexes, males and females](assets/dam_experiment.png)

In this example data, we were interested in comparing the behaviour of populations of fruit flies,
according to their sex and genotype.
We designed the experiment as shown is the figure above. In summary, we have:

* **three genotypes** (A, B and C)
* **two sexes** (male and female)
* **two replicates** (`2017-07-01 -> 2017-07-04` and `2017-07-11 -> 2017-07-14`)
* Altogether, **192 individuals**


### Metadata {-}
**It is crucial that you have read [metadata chapter](metadata.html)** to understand this part.
Our goal is to encode our whole experiment in a single file in which:

* each row is an individual
* each column is a metavariable

Luckily for you, I have already put this file together for you as `metadata.csv`!
Lets have a look at it (you can use `R`, excel or whatever you want). 
If you are using `R`, type this commands:

```{r}
library(damr)
metadata <- fread("metadata.csv")
metadata
```

Each of the 192 animals (rows) is defined by a set of mandatory columns (metavariables):

* `file` -- the data file (monitor) that it has been recorded in
* `start_datetime` -- the date and time (`YYYY-MM-DD HH:MM:SS`) of the start of the experiment. Time will be considered ZT0, see [note](damr.html#zt0)
* `stop_datetime` -- the last time point of the experiment (time is optional)
* `region_id` -- the channel ([1, 32])

For **our experiment**, we also defined custom columns:

* `sex` -- M and F for male and female, respectively
* `genotype` -- A, B or C (I just made up the names for the sake of simplicity)
* `replicate` -- so we can analyse how replicates differ from one another

Note that this format is very flexible and explicit.
For instance, if we decided to do a third replicate, we would just need to add new rows.
We could also add any condition we want as a new column (e.g. treatment, temperature, matting status and so on)

## Linking{-}

[Linking](metadata.html#linking-metadata) is the one necessary step before loading the data.
It allocates a unique identifier to each animal.

It is very simple to link metadata:

```{r}
metadata <- link_dam_metadata(metadata, result_dir = DATA_DIR)
metadata
```

As `result_dir`, we just use the directory where the data lives, which you decided when you extracted your data (`DATA_DIR`).

**Importantly, you do not need to cut the relevant parts of your DAM files** (this is an error-prone step that should be avoided). In other words, no need to use the `DAMFileScan` utility or manipulate in any way the original data.

You can keep all the data in one file per monitor. `rethomics` will use start and stop datetime to find the appropriate part directly from your metadata.

## Loading {-}

In order to work with the data the last step is to load it into a [behavr](behavr.html) structure. To do that simply use `load_dam` function (as shown below). This function will store all data in dt (or any other given name)

```{r}
dt <- load_dam(metadata)
summary(dt)
```

That is it, **all** our data is loaded in dt.

## Note on datetime {-}

### ZT0 {-}
In the circadian and sleep field, we need to align our data to a reference time of the day. Typically, when the light (would) turn on (ZT0).
In `damr`, the **time part of the start_datetime is used as a circadian reference**.
For instance, if you specify, in your metadata file `2017-01-01 09:00:00`, you imply that ZT0 is at `09:00:00`.
The time is looked-up in the DAM file, so it will be at *on same time zone settings as the computer that recorded the data*.

### Start and stop time {-}

When fetching some data, date and time are **always inclusive**.

When only the date is specified:

* start time will be at `00:00:00`
* stop time will be at `23:59:59`

For instance, `start_date = 2017-01-01` and `stop_date = 2017-01-01` retrieves all the data from the first of January 2017.


## Quality control {-}
### Detecting anomalies {-}
Immediatly after loading your data, it is a good idea to visualise it, in order to detect anomalies or at least to be sure that everything looks ok.
We can use `ggetho` for that, for example the following code will create an activity tile plot, useful to detect dead animals.

```{r, fig.width = 9, fig.height=16}
library(ggetho)
# I only show fisrt replicate
ggetho(dt[xmv(replicate) == 1 ], aes(z=activity)) +
      stat_tile_etho() +
      stat_ld_annotations()
```


Here, instead of ploting everything, I show how you can subset data according to metadata in order to display only replicate one (`dt[xmv(replicate) == 1]`). In practice, you could also plot everything.
You can do a lot more with `ggetho` (see the [visualisation chapter](ggetho.html))

What does this tile plot tell us?
Each row is an animal (and is labelled with its corresponding id).
Each column is a 30min window.
The colour intensity indicates the activity.

There are two things that we can immediatly notice:

* For most animals, the activity is rhythmic and synchronised with the light phase transisitions.
* Some animals are dead or missing. For instance take a look at `channel 26` in `Monitor64.txt`.

In other chapters, we will learn how to group individuals, visualise and compute statistics.

### How to exclude animals? {-}

We suggest to exclude animals *a priori* (e.g. because they died) by recording them as dead **in the metadata**. This way data is not modified or omited and can easily be recovered if needed.
For instance, you can add a column `status` in your metadata file and put a default value such as `"OK"`.
If an animal is to be removed, you can replace `"OK"` by **a reason** (e.g. `"dead"`, `"escaped"`, ...).
Then, you can load your data without those animals `load_dam_data(metadata[status == "OK"], ...)`.
This practice has the advantage of making it **very transparent**, why some individuals where excluded.
Also, as stated before, it can easily be reversed.


## Apply functions when loading {-}

Finaly, we may want to apply a function on the data as it is loaded, in order to preprocess it, saving time. This pre-processing will annotate the data, i.e create new information (new columns) based on the original data. As an example, we can perform a sleep (bouts of immobility of 5 min or more), from our `sleepr` package (which you will have installed).

```{r}
library(sleepr)
dt <- load_dam(metadata, FUN = sleepr::sleep_dam_annotation)
dt
```

```{r, echo=FALSE, eval=FALSE}
## to save the data for next tuto
# dt <- dt[xmv(replicate) == 1]
# rm(metadata)
# rm(pl)
# save(dt, file="/home/quentin/comput/rethomics/rethomics.github.io/material/sleep_dam.RData")
# load(file="/home/quentin/comput/rethomics/rethomics.github.io/material/sleep_dam.RData")
```

As you can  see, we now have additional columns in the data.


## Next steps {-}

* [Visualise data with `ggetho`](ggetho.html)
* [Sleep analysis with `sleepr`](sleepr.html)
* [Circadian analysis with `zeitgebr`](zeitgebr.html)