-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path05-damr.Rmd
250 lines (167 loc) · 9.87 KB
/
05-damr.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
# DAM2 data, in practice {#damr -}
**A matter of metadata**
---------------------------

## Aims {-}
In this practical chapter, we will use a real experiment to learn how to:
* Translate your experiment design into a metadata file
* Use this metadata file to load some data
* Set the zeitgeber reference (ZT0)
* Assess graphically the quality of the data
* Use good practices to exclude individuals from our experiments
## Prerequisites {-}
* You are familiar with the [TriKineticks DAM system](http://www.trikinetics.com/)
* Ensure you have read about the [rethomics workflow](workflow.html) and [metadata](metadata.html)
* Ensure you have [installed](intro.html#installing-rethomics-packages)
`behavr`, `damr` and `ggetho` packages:
```{r, eval=FALSE}
install.packages(c('ggetho', 'damr'))
```
```{r, echo=FALSE}
URL <- "https://github.com/rethomics/rethomics.github.io/raw/source/material/damr_tutorial.zip"
DATA_DIR <- paste(tempdir(), "damr_tutorial", sep="/")
dir.create(DATA_DIR)
knitr::opts_knit$set(root.dir = DATA_DIR)
dst <- paste(DATA_DIR, "damr_tutorial.zip", sep="/")
download.file(URL, dst)
unzip(dst, exdir= DATA_DIR)
```
## Background{-}
[Drosophila Activity Monitors](http://www.trikinetics.com/) (DAMs) are a wildely used tool to monitor activity of fruit flies over several days. I am assuming that, if you are reading this tutorial, you are already familiar with the system, but I will make a couple of points clear before we start something more hands-on:
* This tutorial is about single beam **DAM2** but will adapt very well to multibeam DAM5.
* We work with the raw data (the data from each monitor is in one single file, and all the monitor files are in the same folder)
## Getting the data{-}
For this tutorial, you need to [download some DAM2 data](https://github.com/rethomics/rethomics.github.io/raw/source/material/damr_tutorial.zip)
that we have made available.
This is just a zip archive containing four files.
Download and extract the files from the zip into a folder of your choice.
**Store the path in a variable**.
For instance, **adapt** something like:
```r{eval=F}
DATA_DIR <- "C:/Where/My/Zip/Has/been/extracted
```
Check that all four files live there:
```{r}
list.files(DATA_DIR, pattern= "*.txt|*.csv")
```
For this exercise, we will work with the data and metadata in the same place.
However, in practice, I recommend to:
* Have **all raw data from your acquisition platform in the same place** (possibly shared with others or a network drive)
* Have **one folder per "experiment"**. That is a folder that contains one metadata file, your R scripts, your figures regarding a set of consistent experiment.
For now, we can just [set our working directory](https://support.rstudio.com/hc/en-us/articles/200711843-Working-Directories-and-Workspaces) to `DATA_DIR`:
```{r, eval=FALSE}
setwd(DATA_DIR)
```
## From experiment design to metadata{-}
### Our toy experiment{-}

In this example data, we were interested in comparing the behaviour of populations of fruit flies,
according to their sex and genotype.
We designed the experiment as shown is the figure above. In summary, we have:
* **three genotypes** (A, B and C)
* **two sexes** (male and female)
* **two replicates** (`2017-07-01 -> 2017-07-04` and `2017-07-11 -> 2017-07-14`)
* Altogether, **192 individuals**
### Metadata {-}
**It is crucial that you have read [metadata chapter](metadata.html)** to understand this part.
Our goal is to encode our whole experiment in a single file in which:
* each row is an individual
* each column is a metavariable
Luckily for you, I have already put this file together for you as `metadata.csv`!
Lets have a look at it (you can use `R`, excel or whatever you want).
If you are using `R`, type this commands:
```{r}
library(damr)
metadata <- fread("metadata.csv")
metadata
```
Each of the 192 animals (rows) is defined by a set of mandatory columns (metavariables):
* `file` -- the data file (monitor) that it has been recorded in
* `start_datetime` -- the date and time (`YYYY-MM-DD HH:MM:SS`) of the start of the experiment. Time will be considered ZT0, see [note](damr.html#zt0)
* `stop_datetime` -- the last time point of the experiment (time is optional)
* `region_id` -- the channel ([1, 32])
For **our experiment**, we also defined custom columns:
* `sex` -- M and F for male and female, respectively
* `genotype` -- A, B or C (I just made up the names for the sake of simplicity)
* `replicate` -- so we can analyse how replicates differ from one another
Note that this format is very flexible and explicit.
For instance, if we decided to do a third replicate, we would just need to add new rows.
We could also add any condition we want as a new column (e.g. treatment, temperature, matting status and so on)
## Linking{-}
[Linking](metadata.html#linking-metadata) is the one necessary step before loading the data.
It allocates a unique identifier to each animal.
It is very simple to link metadata:
```{r}
metadata <- link_dam_metadata(metadata, result_dir = DATA_DIR)
metadata
```
As `result_dir`, we just use the directory where the data lives, which you decided when you extracted your data (`DATA_DIR`).
**Importantly, you do not need to cut the relevant parts of your DAM files** (this is an error-prone step that should be avoided). In other words, no need to use the `DAMFileScan` utility or manipulate in any way the original data.
You can keep all the data in one file per monitor. `rethomics` will use start and stop datetime to find the appropriate part directly from your metadata.
## Loading {-}
In order to work with the data the last step is to load it into a [behavr](behavr.html) structure. To do that simply use `load_dam` function (as shown below). This function will store all data in dt (or any other given name)
```{r}
dt <- load_dam(metadata)
summary(dt)
```
That is it, **all** our data is loaded in dt.
## Note on datetime {-}
### ZT0 {-}
In the circadian and sleep field, we need to align our data to a reference time of the day. Typically, when the light (would) turn on (ZT0).
In `damr`, the **time part of the start_datetime is used as a circadian reference**.
For instance, if you specify, in your metadata file `2017-01-01 09:00:00`, you imply that ZT0 is at `09:00:00`.
The time is looked-up in the DAM file, so it will be at *on same time zone settings as the computer that recorded the data*.
### Start and stop time {-}
When fetching some data, date and time are **always inclusive**.
When only the date is specified:
* start time will be at `00:00:00`
* stop time will be at `23:59:59`
For instance, `start_date = 2017-01-01` and `stop_date = 2017-01-01` retrieves all the data from the first of January 2017.
## Quality control {-}
### Detecting anomalies {-}
Immediatly after loading your data, it is a good idea to visualise it, in order to detect anomalies or at least to be sure that everything looks ok.
We can use `ggetho` for that, for example the following code will create an activity tile plot, useful to detect dead animals.
```{r, fig.width = 9, fig.height=16}
library(ggetho)
# I only show fisrt replicate
ggetho(dt[xmv(replicate) == 1 ], aes(z=activity)) +
stat_tile_etho() +
stat_ld_annotations()
```
Here, instead of ploting everything, I show how you can subset data according to metadata in order to display only replicate one (`dt[xmv(replicate) == 1]`). In practice, you could also plot everything.
You can do a lot more with `ggetho` (see the [visualisation chapter](ggetho.html))
What does this tile plot tell us?
Each row is an animal (and is labelled with its corresponding id).
Each column is a 30min window.
The colour intensity indicates the activity.
There are two things that we can immediatly notice:
* For most animals, the activity is rhythmic and synchronised with the light phase transisitions.
* Some animals are dead or missing. For instance take a look at `channel 26` in `Monitor64.txt`.
In other chapters, we will learn how to group individuals, visualise and compute statistics.
### How to exclude animals? {-}
We suggest to exclude animals *a priori* (e.g. because they died) by recording them as dead **in the metadata**. This way data is not modified or omited and can easily be recovered if needed.
For instance, you can add a column `status` in your metadata file and put a default value such as `"OK"`.
If an animal is to be removed, you can replace `"OK"` by **a reason** (e.g. `"dead"`, `"escaped"`, ...).
Then, you can load your data without those animals `load_dam_data(metadata[status == "OK"], ...)`.
This practice has the advantage of making it **very transparent**, why some individuals where excluded.
Also, as stated before, it can easily be reversed.
## Apply functions when loading {-}
Finaly, we may want to apply a function on the data as it is loaded, in order to preprocess it, saving time. This pre-processing will annotate the data, i.e create new information (new columns) based on the original data. As an example, we can perform a sleep (bouts of immobility of 5 min or more), from our `sleepr` package (which you will have installed).
```{r}
library(sleepr)
dt <- load_dam(metadata, FUN = sleepr::sleep_dam_annotation)
dt
```
```{r, echo=FALSE, eval=FALSE}
## to save the data for next tuto
# dt <- dt[xmv(replicate) == 1]
# rm(metadata)
# rm(pl)
# save(dt, file="/home/quentin/comput/rethomics/rethomics.github.io/material/sleep_dam.RData")
# load(file="/home/quentin/comput/rethomics/rethomics.github.io/material/sleep_dam.RData")
```
As you can see, we now have additional columns in the data.
## Next steps {-}
* [Visualise data with `ggetho`](ggetho.html)
* [Sleep analysis with `sleepr`](sleepr.html)
* [Circadian analysis with `zeitgebr`](zeitgebr.html)