-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path09-zeitgebr.Rmd
274 lines (198 loc) · 10.4 KB
/
09-zeitgebr.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# Circadian rhythm analysis {#zeitgebr -}
<!-- **TODO the perfect one liner** -->
<!-- --------------------------- -->
<!--  -->
## Aims {-}
In this practical chapter, we will use a real experiment to learn how to:
* Use `zeitgebr` to compute periodograms
* Use [ggetho](ggetho.html) to draw double plotted actograms
* Average periodograms vs conditions
* Find and compare peaks
## Prerequisites {-}
* You have read about [behavr tables](behavr.html)
* You are familiar with [ggetho](ggetho.html), our visualisation tool
* You have already read the [damr tutorial](damr.html)
* Ensure you have [installed](intro.html#installing-rethomics-packages)
`behavr`, `damr`, `ggetho` and `zeitgebr` packages
## Background{-}
This tutorial focuses on circadian rhythm in *Drosophila*, but can be adapted easily to investigate periodicity in different contexts.
In it, we will use a DAM dataset that is provided (and preloaded) within the `damr` package.
## Getting the data {-}
Getting the data is a lot simpler than in the other tutorials as it is already on your computer:
```{r}
library(damr)
library(zeitgebr)
library(ggetho)
# We load the data
data(dams_sample)
# We copy the data to our own variable so we can alter it
dt <- copy(dams_sample)
```
This data set is a recording of 32 animals, in DD.
Let us print the metadata to understand the experiment a little bit:
```{r}
summary(dt)
print(dt[meta=T])
```
We can confirm we have 32 individuals.
They are described by one of three "period groups" (e.g. a genotype or treatment).
This is encoded in `period_group` metavariable can be either a "short"", "long"" or wild-type ("wt") period.
## Quality control {-}
This data is fairly clean already, so we will not do much.
For most purposes, you can apply the same principles as [data curation for sleep analysis](sleepr.html#data-curation).
### Regime changes {-}
In general (but not here), you will have a period of `LD` **"baseline"** preceding a change to a `DD` regime.
I suggest to encode that in the following way:
1. Manually add a column `baseline_days` in your metadata that defines the **number of days of baseline**.
2. **Sustract the number of baseline days** to all time points: `dt[, t := t - days(xmv(baseline_days))]`
As a result, `t = 0` now means "ZT0 of the transition day".
This makes a lot of sense as any baseline point is at a negative time (`t < 0`), whilst `t > 0` is for `DD`.
The nice thing is that you can work with data (e.g. replicates) that have different baseline duration as now **all time points are relative to the regime change**.
### Data enrichment {-}
By defaut, DAM data only has variables `t` and `activity`. The latter being the number of beam crosses over a time bin (e.g. one minute). We could define a variable `moving` that is `TRUE` when and only when `activity > 0`, and `FALSE` otherwise:
```{r}
dt[, moving := activity > 0]
```
### Overview plots {-}
The first representation we show could be the activity of all animals over time in the same page.
This would help to spot outliers.
We simply do that with our **tile plot**.
Note how we have changed the LD colours to grey and black, grey being for subjective days (remember, we are in DD).
```{r, fig.width = 9, fig.height=12}
ggetho(dt, aes(z=activity)) +
stat_ld_annotations(ld_colours = c("grey", "black"))+
stat_tile_etho()
```
Note that you can substitute other variables you have in `dt` (e.g. `moving`) for `activity`.
We see that two animals (region 5 and 10) seem to have died/escaped before the end of the experiment.
There is no right thing to do regarding dead animals. You could keep the data before death, or remove them altogether.
**It is eventually your responsibility to decide what to do with dead animals**.
Here, I simply remove data after death:
```{r}
library(sleepr)
dt_curated <- curate_dead_animals(dt)
summary(dt_curated)
ggetho(dt_curated, aes(z=activity)) +
stat_ld_annotations(ld_colours = c("grey", "black"))+
stat_tile_etho()
```
Our curated data is now stored in `dt_curated`.
## Double plotted actograms {-}
At the moment, the `id` is a very long string (e.g. `"2017-01-16 08:00:00|dams_sample.txt|01"`). It has the advantage to be unambiguous, but it is difficult to plot.
To help plotting, we can make a new variable that is simply **a different number for each individual**. Lets call it `uid`:
```{r}
dt_curated[, uid := 1 : .N, meta=T]
# We can map uid to id
dt_curated[, .(id, uid) ,meta=T]
```
As you see, we do keep `id` as a reference but `uid` for convenience in graphs.
To make a double plotted actogram, we use `ggetho`.
Read more about that in the [visualisation tutorial](ggetho.html#double-plotted-actograms).
Briefly, we set `multiplot` to `2` (`3` would be a triple plotted) one.
The variable of interest is on the `z` axis.
Note that instead of `moving`, you could plot raw `activity`.
Then, we use bar height to show the amount of movement (we could use `stat_tile_etho` which shows the variable of interest as a colour with pixel intensity).
Lastly, we split the graph by `uid`, so that each facet is a single animal.
```{r, fig.width = 12, fig.height=6}
ggetho(dt_curated, aes(z = moving), multiplot = 2) +
stat_bar_tile_etho() +
facet_wrap( ~ uid, ncol = 8)
```
Interestingly, you can use formula in facet to show or sort with other metavariables:
```{r, fig.width = 12, fig.height=6}
ggetho(dt_curated, aes(z=moving), multiplot = 2) +
stat_bar_tile_etho() +
facet_wrap( ~ period_group + uid, ncol=8, labeller = label_wrap_gen(multi_line=FALSE))
```
Now, we know which genotype matches each `uid`.
In your own work, you could generalise this concept to display more metavariables.
For instance, if you had a `sex` metavariable you could do: `facet_wrap( ~ period_group + sex + uid, ...)`.
## Periodograms {-}
### Computation {-}
An important part of circadian research is to compute the periodicity of the free running clock of multiple individuals.
Mathematically, we would like to build a preiodogram. That is, generally speaking, a representation of the density (i.e. **power**) of a signal at different **periods** (or frequencies).
In addition, a periodogram associates to each pair of power-period a **significance level**.
There are many algorithms to compute periodograms.
In `zeitgebr`, we have so far implemented:
* `ac_periodogram` -- An *autocorrelation* based method
* `ls_periodogram` -- *Lomb-Scargle* algorithm
* `chi_sq_periodogram` -- A *$\chi{}^2$* based one
See `?periodogram_methods` for references.
In order to compute periodograms, we use the `periodogram()` function.
We need to define which variable we want to study (e.g. `moving` or `activity`).
Then, we provide our data.
The methods described above can be passed as an argument.
For instance, if we want to analyse, with the $\chi{}^2$ method, activity:
```{r}
per_xsq_dt <- periodogram(activity,
dt_curated,
FUN = chi_sq_periodogram)
per_xsq_dt
```
The result is another `behavr` table, with the same metadata.
The data however, is now a list of power vs period (and significance).
Have a look at the other options (see `?periodogram`).
### Peaks finding {-}
Often, we want to know which are the peak periods in a periodogram.
This can be achieved thanks to the `find_peaks` function.
By default, it finds a maximum of three peaks, which are sorted by their power (relative to the significance thershold).
Peaks that are not signifant are not accounted for (see the `alpha` argument).
```{r}
per_xsq_dt <- find_peaks(per_xsq_dt)
per_xsq_dt
```
This function annotates our data by adding a column named `"peak"`.
Whenever the row corresponds to a peak, it puts a number and `NA` otherwise.
The number is the rank of the peak (`1` being the first/tallest one).
### Visualisation{-}
In `ggetho`, the function `ggperio()` is designed specifically for displaying periodograms.
One could plot all the periodograms like so:
```{r}
ggperio(per_xsq_dt) + geom_line(aes(group = id, colour=period_group))
```
But it is very hard to read, so we will **facet per `uid`**.
In addition, we can use a special geometry (`geom_peak`) to show the values of the peak.
We draw a signifiance line as well, just using the `signif_threshold` variable:
```{r, fig.width = 12, fig.height=6}
ggperio(per_xsq_dt) +
geom_line(aes(group = id, colour = period_group)) +
geom_peak(col = "black") +
geom_line(aes(y = signif_threshold)) +
facet_wrap(~ uid, ncol = 8)
```
Instead of drawing only the first peak, you could draw, for instance, the first and second (`geom_peak(peak_rank = 1:2)`) or, only the second (`geom_peak(peak_rank = 2)`).
An interesting thing to do is a population average periodogram.
In this graph, the solid lines are the average power per group, whilst the shaded areas are ± standard error:
```{r}
ggperio(per_xsq_dt, aes(
y = power - signif_threshold,
colour=period_group)) +
stat_pop_etho()
```
### Exctract the peak values {-}
At some point, you would like to **summarise** each animal by, say, it first peak.
Then, you can look at whether there are significant differences in peak periodicity vs genotype.
Doing that is quite straightforward.
First, we **select only rows where peak is exactly 1**, then [rejoin](https://rethomics.github.io/behavr.html#operating-on-behavr-tables) our table to its metadata:
```{r}
summary_dt <- rejoin(per_xsq_dt[peak==1])
summary_dt
```
`summary_dt` can be used as a standard `R` dataframe for further analysis.
For instance, with `ggplot`, I make a **boxplot showing the distribution of periods**, in h, for each groups.
I also add **a point for each animal** (so we see outliers).
Then, I make the **size of the points proportional to the relative power of the peak discovered**, so we get an idea of how much to "trust" this point.
```{r}
ggplot(summary_dt, aes(period_group, period, fill= period_group)) +
geom_boxplot(outlier.colour = NA) +
geom_jitter(aes(size=power - signif_threshold), alpha=.5) +
scale_y_hours(name = "Period")
```
Another direction could be to perform pairwise wilcoxon tests between groups:
```{r}
pairwise.wilcox.test(summary_dt$period, summary_dt$period_group )
```
This tells us that there is a statistically significant difference between each pair of groups.
## Next steps {-}
* [Visualise data with `ggetho`](ggetho.html)
* [Sleep analysis `sleepr`](sleepr.html)