-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path08-sleepr.Rmd
404 lines (303 loc) · 14.7 KB
/
08-sleepr.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
# Sleep analysis {#sleepr -}
<!-- **TODO the perfect one liner** -->
<!-- --------------------------- -->
<!--  -->
## Aims {-}
In this practical chapter, we will use a real experiment to learn how to:
* Annotate a [behavr table](behavr.html) with **sleep state**
* Use [ggetho](ggetho.html) to display individual and population sleep amounts
* Compute **average sleep** within a time window
* Perform standard statistics on average sleep
* Analyse sleep **architecture**, sleep **latency**,...
## Prerequisites {-}
* You have read about [behavr tables](behavr.html)
* You are familiar with [ggetho](ggetho.html), our vidualisation tool
* You have already read the [damr tutorial](damr.html)
* Ensure you have [installed](intro.html#installing-rethomics-packages)
`behavr`, `damr` and `ggetho` packages:
## Background{-}
This tutorial focused on sleep in *Drosophila*.
Traditionally, activity is first scored (e.g. through beam crosses/video tracking).
Then any bouts of inactivity longer than five minutes count as sleep.
You can easily adapt this tutorial to scoring other models/behaviours as long as you can define **two discrete states** (e.g. sleep vs asleep, moving vs immobile, left vs right, ...).
In the [DAM tutorial](damr.html), we have learnt how to load data from a real DAM expriment.
Since we already described it in length, it makes sense to use this experiment as an example for our sleep analysis.
**I will assume that you have already read and understood the [DAM tutorial](damr.html)**.
The last thing we did then was loading data and scoring sleep:
```{r, eval=FALSE}
library(sleepr)
dt <- load_dam(metadata, FUN = sleepr::sleep_dam_annotation)
dt
```
## Getting the data {-}
Instead of going through the whole `damr` tutorial again, I thought I would put the resulting data table online.
Importantly, for simplicity, **I have just kept replicate 1**.
We just need to download it and load it:
```{r}
library(sleepr)
library(ggetho)
URL <- "https://github.com/rethomics/rethomics.github.io/raw/source/material/sleep_dam.RData"
load(url(URL))
summary(dt)
```
## Data curation {-}
First of all, lets visualise all our sleep data.
It is important to **pay critical attention to this graph** in order to assess if anything has gone wrong:
```{r, fig.width = 9, fig.height=12}
ggetho(dt, aes(z=asleep)) +
stat_ld_annotations(height = 1)+
stat_tile_etho()
```
### Dead animals {-}
Some animals may have died during the experiment, and could be wrongly scored as asleep for very long durations.
`sleepr` has an utility function to remove data from dead animals:
```{r}
# we give our curated data another name so we can see the difference
dt_curated <- curate_dead_animals(dt)
summary(dt_curated)
```
As you can see, we now have `r nrow(dt_curated[meta=T])` individuals vs `r nrow(dt[meta=T])` in the original data.
To see which animals have been removed, we could run something like:
```{r}
setdiff(dt[, id, meta=T],
dt_curated[, id, meta=T])
```
Indeed, from the tile plot nothing seem to have happened in this channel.
Now let us look at the data after curation:
```{r, fig.width = 9, fig.height=12}
ggetho(dt_curated, aes(z=asleep)) +
stat_ld_annotations(ypos = "top")+
stat_tile_etho()
```
### Animals that died too early {-}
In addition, we could want to, for instance, **remove animals that did not live say longer than 2 days**.
Of course, you need to have a good reason to exclude some animals, and that depends on your specific experiment (this is just showing you how to do it).
To remove animals that did not live longer that 2 days, we use the power of `behavr` tables:
```{r}
# we make a summary table of all lifespan for each animals
lifespan_dt <- dt_curated[, .(lifespan = max(t)), by=id]
# we filter this table for lifespan>2 and we keep the id
valid_ids <- lifespan_dt[lifespan > days(2), id]
# we apply this filter
dt_curated <- dt_curated[id %in% valid_ids]
summary(dt_curated)
```
### Trimming {-}
Generally, we want to remove a point according the experimental time.
For instance, lets say we would like to keep only the first 60 hours of data (i.e. 2.5 days)
```{r}
dt_curated <- dt_curated[t %between% c(days(0), days(2.5))]
summary(dt_curated)
```
Which means we are only considering data **between** 0 and 2.5 days.
The same principle can be used to remove the beginning of an experiment. For instance, when animals are acclimatising to their new environment.
## Population plots{-}
Now that we have curated our data, we can start looking at the biology.
First, we make a global population plot:
```{r}
ggetho(dt_curated, aes(y=asleep, colour=sex)) +
stat_pop_etho() +
stat_ld_annotations() +
facet_grid(genotype ~ .)
```
The y axis shows the proportion of time sent sleeping, averaged for each animal within a 30min (default) time window.
Then, we can wrap (average) that over one day. We also polish the y axis label:
```{r}
ggetho(dt_curated, aes(y=asleep, colour=sex), time_wrap = hours(24)) +
stat_pop_etho() +
stat_ld_annotations() +
facet_grid(genotype ~ .) +
scale_y_continuous(name= "Fraction of time sleeping",labels = scales::percent)
```
That gives us a good understanding of what happens at the population level.
## Summarise data per animal{-}
Most likely, we want to summarise sleep amount so that we have **one number per animal**.
For instance, we can compute the overall average proportion of time spent sleeping:
```{r}
summary_dt <-
rejoin(dt_curated[,
.(
# this is where the computation happens
sleep_fraction = mean(asleep)
),
by=id])
summary_dt
```
With `rejoin`, we have put our summary and metadata together, which is suitable for standars graphics/statictics.
For instance, if we are interested in the effect of sleep and genotype on sleep amount, we can make a faceted boxplot, and also add individual points to show all data.
```{r}
ggplot(summary_dt, aes(x=sex, y=sleep_fraction, fill=sex)) +
geom_boxplot(outlier.colour = NA) +
geom_jitter(alpha=.5) +
facet_grid( genotype ~ .) +
scale_y_continuous(name= "Fraction of time sleeping",labels = scales::percent)
```
## Day sleep☼ -- Night sleep☾{-}
Often, we want to compare amount of sleep during the day vs night as they are different processes.
### Adding some phase information {-}
The simplest way to do that is to start by adding some phase information to our data.
* **L phase** (light) should be any point **between ZT0 and ZT12** -- [0,12), [24,36), ...
* **D phase** (dark) should be any point **between ZT12 and ZT24** -- [12,24), [36,48), ...
Numerically, this can be done very simply using a [modulo operation](https://en.wikipedia.org/wiki/Modulo_operation) on time. In `R`, modulo is `%%`.
The following line creates a new variable in dt. This variable is:
* `"L"` when the remainder of the division of (the corresponding) `t` by 24h is lower than 12h
* `"D"` otherwise
```{r}
dt_curated[, phase := ifelse(t %% hours(24) < hours(12), "L", "D")]
```
Since we have this column, we can make an improved summary (pay special attention to the last columns):
```{r}
summary_dt <-
rejoin(dt_curated[,
.(
# this is where the computation happens
sleep_fraction_all = mean(asleep),
sleep_fraction_l = mean(asleep[phase == "L"]),
sleep_fraction_d = mean(asleep[phase == "D"])
),
,by=id])
summary_dt
```
Now, we have three new variables: `sleep_fraction_all`, `sleep_fraction_l` and `sleep_fraction_d`.
We can just replace the y axis with our variable of interest (e.g. sleep in D phase):
```{r}
ggplot(summary_dt, aes(x=sex, y=sleep_fraction_d, fill=sex)) +
geom_boxplot(outlier.colour = NA) +
geom_jitter(alpha=.5) +
facet_grid(genotype ~ .) +
scale_y_continuous(name= "Fraction of time sleeping",labels = scales::percent)
```
If we wanted a plot with all three values, we could ["melt"](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reshape.html) our data,
picking all columns starting with `"sleep_fraction_"` as "measurment variables":
```{r}
summary_dt_melted <- melt(summary_dt, measure.vars = patterns("sleep_fraction_"),
variable.name = "phase", value.name = "sleep_fraction")
```
Now, instead of three columns for the three variables, we have two columns, one for the actual value and one to describe the phase (all vs L vs D).
This makes it convenient to use with ggplot:
```{r}
ggplot(summary_dt_melted, aes(x=phase, y=sleep_fraction, fill=sex)) +
geom_boxplot(outlier.colour = NA) +
geom_jitter(alpha=.5) +
facet_grid(genotype ~ .) +
scale_y_continuous(name= "Fraction of time sleeping",labels = scales::percent)
```
## Statistics {-}
Often, you want to go further than representing the data, and compute statistics.
`R` was designed primarilly as statistical progamming language.
As a result, a tremendous variety of simple and elaborate statics are implemented.
This section will not go in the details of what you can do in terms of stats, many authors have already published fantastic resources on this subject.
Instead, we present very simple examples of what can be done.
At this stage, what you do depends very much on your question, your knowledge of statistics and how much effort you
want to invest.
### Pairwise Wilcoxon tests {-}
Say we wanted to compute, for **females only**, all pairwise tests between all genotype groups (A vs B, B vs C and C vs A). This could be formulated as:
```{r}
pairwise.wilcox.test(summary_dt[sex=="F", sleep_fraction_all],
summary_dt[sex=="F", genotype])
```
We get a matrix showing us all p-values.
You could do that also within males, as long as we replace `sex == "F"` by `sex == "M"`
### Two way anova {-}
If we are interested in the effect of **sex AND genotype**, as well as **their interaction**,
we can model our response variable with a formula: `sleep_fraction_all ~ sex * genotype`:
```{r}
model <- aov(sleep_fraction_all ~ sex * genotype, summary_dt)
summary(model)
```
This shows a strong affect of sex, genotype and their interaction on sleep amount.
There are several way to follow up.
For instance see this [short tutorial](https://www.r-bloggers.com/two-way-analysis-of-variance-anova/).
## Sleep architecture {-}
Proportion alone is not always a sufficent measure to fully descibe the dynamics of sleep.
One way to go further is to study sleep as a series of bouts.
### Bout analysis{-}
The function `bout_analysis()`, in `sleepr` is designed for that.
We would use it like that:
```{r}
bout_dt <- bout_analysis(asleep, dt_curated)
```
The result is a new behavr table, with a few differences compared to the ones we used before:
* Each row in the data describes a bout
* The bout can take the values `asleep=TRUE` or `asleep=FALSE` (sleep bout or wake bout, respectively)
* `t` is the onset of the bout (in seconds)
* `duration` is length of the bout (in seconds)
Note that you can use this function to study bouts of other discrete behaviours.
For now, we are only interested in **sleep bout**, so we filter for `asleep == TRUE`.
We also remove the, now redundant, `asleep` column:
```{r}
bout_dt <- bout_dt[asleep == TRUE, -"asleep"]
```
### Bout length vs time of the day {-}
We can use `ggetho` to show how the average bout length depends on the time of the onset of the bout.
```{r}
ggetho(bout_dt, aes(y=duration / 60, colour=sex), time_wrap = hours(24)) +
stat_pop_etho() +
facet_grid(genotype ~ .) +
scale_y_continuous(name= "Bout length (min)")
```
Note that this is a bit noisy as we only have a few animals per combination of treatment.
### Architecture description {-}
One can count the total number of bouts and average bout duration for each individual like so:
```{r}
bout_dt[,
.(n_bouts = .N,
mean_bout_length = mean(duration)),
by=id]
```
You could apply the approach presented before to compute statistics according the the phase (night vs day bouts).
### Latency to sleep {-}
The latency describes how long it takes for an animal to initiate its **first sleep bout**.
Some researchers are also interested in the latency to the **longest bout**.
In this example, lets say we focus on the second day (and not the night -- 24 to 36 hours).
```{r}
bout_dt_second_day <- bout_dt[t %between% c(days(1), days(1) + hours(12))]
# We express t relatively to the first day
bout_dt_second_day[, t:= t - days(1)]
bout_summary <- bout_dt_second_day[,.(
latency = t[1], # the first bout is at t[1]
first_bout_length = duration[1],
latency_to_longest_bout = t[which.max(duration)],
length_longest_bout = max(duration),
n_bouts = .N,
mean_bout_length = mean(duration)
),
by=id]
bout_summary
```
For good measure, I also added number of bouts and average bout length as we have seen before.
You can, of course, use these results to plot things like the relationship between bout length and bout number:
```{r achitecture}
ggplot(rejoin(bout_summary), aes(n_bouts, mean_bout_length, colour=sex)) +
geom_point() +
facet_grid(genotype ~ .) +
scale_x_continuous(name="Number of bouts") +
scale_y_continuous(name="Average bout duration (s)")
```
Always **be critical** about what you do.
For instance, what whould be the latency to sleep of an animal that, in the period of observation, does not sleep?
## Merging all statistics {-}
Earlier, we made a `summary_dt` in which we computed some statitics such as sleep fraction in L and D phase.
In addition, we have now a `bout_summary` where we have other variables.
These data have both one row per animal.
Ideally, we could **"merge" them into a single table** that has all the individual statistics.
This way we can study the relationship, say, between sleep amount and latency.
In order to do that, we perform a so called "join":
```{r}
overall_summary <- summary_dt[bout_summary]
overall_summary
```
```{r}
ggplot(overall_summary, aes(latency / 60, sleep_fraction_l, colour=sex)) +
geom_point() +
geom_smooth(method="lm", alpha=.1)+
facet_grid(genotype ~ .)
```
## Take home message{-}
Data analysis and visualisation is about **translating your biological questions to another language**.
Problems in emerging areas of science can be very rich so they should be matched with the equally rich grammar that only a programming language can provide.
This tutorial was very simple and does not pretend to provied a canonical sleep analysis.
Instead, see it as a set of building blocks that you can use and rearrange to address your own questions.
## Next steps {-}
* [Visualise data with `ggetho`](ggetho.html)
* [Circadian analysis with `zeitgebr`](zeitgebr.html)