-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path07-ggetho.Rmd
590 lines (424 loc) · 19.4 KB
/
07-ggetho.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
# Visualisation with ggetho{#ggetho -}
<!-- **Make the most of ggetho** -->
<!-- --------------------------- -->
<!-- <!--  -->
<!-- **TODO, a demo of some plots** -->
## Aims {-}
In this practical chapter, we will generate toy data to learn how to:
* Express a question as a relationship beween variables
* Use tile plots to show individual data
* Make population plots
* Wrap data around circadian time
* Make double-plotted actograms
* Annotate plot with light and dark phases
* Use ggplot tools (facets, scales) to enhance plots
* Plot average and individual periodograms
## Prerequisites {-}
* Some familiarity with [ggplot](http://ggplot2.org/)
* Ensure you have [installed](intro.html#installing-rethomics-packages)
`behavr`and `ggetho` packages:
```{r, eval=FALSE}
install.packages("ggetho")
```
## Lessons from ggplot{-}
In the previous tutorials, we have used `ggetho` to visualise out behavioural data.
This section will explain further how this package can be used to produce flexible plots and how it integrates with `ggplot2`.
[ggplot2](http://ggplot2.org/) is one of the most popular visualisation tools and an unavoidable `R` package.
It implements the powerful concepts of the ["Grammar of graphics"](https://www.cs.uic.edu/~wilkinson/TheGrammarOfGraphics/GOG.html).
The package `ggetho`, which we discuss here, extends `ggplot` for the specific case of behavioural analysis.
At this stage, you really want to have some familiarity with `ggplot2` so you understand its logic.
You will find a fair numbers of [videos](https://www.youtube.com/watch?v=TaxJwC_MP9Q) and [books](http://ggplot2.org/book/) online.
## Some behavioural data{-}
In this section, we will **simulate toy behavioural data**.
For that, we start by making some arbitrary metadata.
Here, we have 40 animals, condition "A" vs "B", and sex, male ("M") or female ("F").
```{r}
library(ggetho)
metadata <- data.table(id=sprintf("toy_experiment|%02d" , 1:40), region_id=1:40,
condition=c("A","B"),
sex=c("M","M", "F", "F"))
head(metadata)
dt <- toy_activity_data(metadata, seed=107)
```
Now, we have a [behavr](behavr.html) object, `dt`:
```{r}
summary(dt)
```
This data is stored in a [behavr table](behavr.html).
It has a column `moving` that that tells us whether an the animal `id` is moving at a time `t`.
## The `ggetho()` function{-}
`ggetho()` is the core function.
It expresses the **relationship between variables**.
In this respect, it works very much like `ggplot()`, but it also pre-processes the data.
It is important to understand the **difference between `ggplot()` and `ggetho()`**.
`ggplot()` works with data frames (or data tables), and does not preprocess the data.
`ggetho()` is only a layer on top of `ggplot()`.
It works **exclusively with `behavr` tables** and does preprocess data before calling `ggplot()`.
`ggetho()` does return `ggplot` a object, therefore, layers available in `ggplot2` can be used natively on top of `ggetho`.
Let's work with an example. Say, we would like:
* The proportion of time spent **moving, on the y axis**
* Versus **time, on the x axis**
We could write:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving))
pl
```
This generates an **empty plot** this is normal because we have, so far, **no layer**.
We will see some layers very soon!
The role of `ggetho` is to express a relationship between variables and to **compute a summary**, over a certain time window, of a variable of interest **for each individual**.
Importantly, **you decide which variable you want to plot**.
For instance, you could be interested in things like the number (sum) of beam crosses or the average position.
## Tile plots{-}
### Per individual{-}
One of the most interesting layers is `stat_tile_etho`.
It shows the **variable of interest in the (colour) z axis**.
The y axis is discrete (generally the id), that is **one row per individual**.
The x axis is time (by default, summerised, by `ggetho`, over 30 minutes).
So, if we want to show the proportion of time spent moving over time for each individual (id):
```{r}
pl <- ggetho(dt, aes(x=t, y=id, z=moving)) + stat_tile_etho()
pl
```
By defaut, each pixel is the **mean** (`summary_FUN = mean`, in `ggetho`), over 30 min (`summary_time_window = mins(30)`, in `ggetho()`).
Also, note that the default is `x=t` and `y=id`, so we could just obtain exactly the same with
`ggetho(dt, aes(z=moving)) + stat_tile_etho()`.
### Sorted individual{-}
Sometimes, we want to sort individuals based on a metavariable (discrete or continuous).
For instance let us compute the overall average fraction of time spent moving,
**add it to the metadata**, to then sort individuals from low to high movers:
First, we add a new metavariable (`mean_moving`):
```{r}
# the average time spent moving per 1000 (rounded)
mean_mov_dt <- dt[, .(mean_moving = round(mean(moving) * 1000)), by=id]
# join curent meta and the summary table
new_meta <- dt[mean_mov_dt, meta=T]
# set new metadata
setmeta(dt, new_meta)
head(dt[meta=T])
```
Proportion of time moving (`mean(moving)`) will be a number between zero and one, with sometimes many decimals,
which makes it hard to read.
A more intuitive way to express it is as a rounded per mille (‰).
I prefer it to per cent as we have more resolution (i.e. less ties)
For instance, `0.0113333` would be expressed as `11`.
Now, we can express a new relationship where we show the *interaction* between our custom variable and id, on the y axis:
```{r}
pl <- ggetho(dt, aes(x=t, y=interaction(id, mean_moving, sep = " : "), z=moving)) +
stat_tile_etho()
pl
```
Since we use `" : "` as a separator, we have, on the y axis, names as `<id> : <mean_sleep>`.
You can extend this concept to sort also by *males vs females*:
```{r}
pl <- ggetho(dt, aes(x=t, y=interaction(id, mean_moving, sex, sep = " : "), z=moving)) +
stat_tile_etho()
pl
```
### Group averages{-}
Sometimes, we also want to aggregate individuals per group.
For instance, males **average** vs females **average**.
This can be done by changing the `y` axis. Previously, we used `id`, which made one row per individual.
Instead, if we use a grouping variable like `sex`, we will plot one row per value of `sex` (i.e. two rows, one for males, one for females). In other words, we replace `id` by `sex` on the y axis:
```{r}
pl <- ggetho(dt, aes(x=t, y=sex, z=moving)) + stat_tile_etho()
pl
```
In this context, **every row is not an individual any more, but a population**.
The `method` argument of `stat_tile_etho()` allows you to use other aggregates (median, max, min, ...).
### Bar tiles {-}
The bar_tile is a variant of our tile plot.
Instead of colour intensity, **it shows our z variable by the height of the tiles**.
You can use it just by replacing `stat_tile_etho` by `stat_bar_tile_etho`:
```{r}
pl <- ggetho(dt, aes(x=t, z=moving)) + stat_bar_tile_etho()
pl
```
## Population plots{-}
### One population{-}
The problem with representing a variable on a colour axis is that it is not perceptually comparable, and we cannot make error bars.
When the number of groups is not too high, it makes sense to show **the variable of interest on the y axis**, and then draw lines between consecutive points.
For this, we can use the `stat_pop_etho()` function:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving)) + stat_pop_etho()
pl
```
By defaut, the local average and error bars are computed from the mean as standard errors (`method = mean_se`).
You can compute other types of error bars e.g. bootstrap (`method = mean_cl_boot`).
### Several populations{-}
Often, we want to compare population with respect to a variable.
There are different ways to split populations. We can, for instance, **use a different colour line for different groups**:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving, colour=sex)) + stat_pop_etho()
pl
```
Another way, is to use `ggplot`'s faceting system:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving)) + stat_pop_etho() +
facet_grid(sex ~ .)
pl
```
Of course, you can combine both when you have more than one relevant metavariable:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving, colour = sex)) +
stat_pop_etho() +
facet_grid( condition ~ .)
pl
```
## Wrapping data{-}
When behaviours are periodic, we sometimes want to average our variable at the same time over consecutive days.
In ggetho, we call that time *wrapping*.
It can be done simply with the `time_wrap` argument.
It will work the same for population or tile plots:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving), time_wrap = hours(24)) + stat_pop_etho()
pl
```
Note that you do not have to wrap specifically over 24h, you could work over different periods.
If you are interested in events that happen between the end and the start of the wrapping period (e.g. at ZT24).
You may want to wrap time with an **"offset"**. That is a phase shift. For instance, if we want to have ZT06 in the middle of our graph, we use an offset of +6h:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving),
time_wrap = hours(24),
time_offset = hours(6)) + stat_pop_etho()
pl
```
As you can see, it gives us a nice visualisation of the "activity peaks".
## Double-plotted actograms {-}
When analysing periodic behaviour, it makes sense to use a so called double-plotted actogram.
This is very useful **to understand periodicity of behaviours**.
This means data is plotted twice, in a staggered manner:
```
row1 [day 1, day2]
row1 [day 2, day3]
row1 [day 3, day4]
```
To do that, we can set the `multiplot` argument of `ggetho` to `2` (`3` would do a "tripple-plotted" actogram).
This averages the whole population:
```{r}
pl <- ggetho(dt, aes(x=t, z=moving), multiplot = 2) + stat_bar_tile_etho()
pl
```
In practice, we genrally want to do that for **one specific individual** (see next section to do that automatically):
```{r}
pl <- ggetho(dt[id=="toy_experiment|01"],
aes(x=t, z=moving), multiplot = 2) + stat_bar_tile_etho()
pl
```
One thing you can do is change the length of the period.
For instance **25h instead of 24h**:
```{r}
pl <- ggetho(dt[id=="toy_experiment|01"], aes(x=t, z=moving),
multiplot = 2,
multiplot_period = hours(25) # this is the important part
) +
stat_bar_tile_etho()
pl
```
Keep in mind that you can use the **tile representation if you prefer** it:
```{r}
pl <- ggetho(dt[id=="toy_experiment|01"], aes(x=t, z=moving),
multiplot = 2
) +
stat_tile_etho() # tile here
pl
```
## Faceting by ID {-}
When multiplotting, it is difficult to represent individuals (since both y and x axis are used).
### Default {-}
The best way to systematically represent all of them is to use facetting, which is a ggplot feature.
Since id represents unique individuals, **each facet (sub-rectangle) is one individual**:
```{r, fig.width=10, fig.height=6}
pl <- ggetho(dt, aes(x=t, z=moving),
multiplot = 2
) +
stat_bar_tile_etho() +
facet_wrap( ~ id)
pl
```
### Custom labeller {-}
Sometimes, the `id` variable will be very long, you can use the `id_labeller` to make things clearer:
```{r, fig.width=10, fig.height=6}
pl <- ggetho(dt, aes(x=t, z=moving),
multiplot = 2
) +
stat_bar_tile_etho() +
facet_wrap( ~ id, labeller = "id_labeller")
pl
```
### Numerical labelling {-}
Even with the trick above, ids may become unreadable when plotting many individuals.
What we can can do in this case is to use numbers instead.
First, we create a new metavariable named, for instance, `uid`.
It will be one **unique number** per individual:
```{r}
dt[, uid := 1:.N, meta=T]
print(dt[meta=T])
```
Then, we can use `uid` instead of `id`:
```{r, fig.width=10, fig.height=6}
pl <- ggetho(dt, aes(x=t, z=moving),
multiplot = 2
) +
stat_bar_tile_etho() +
facet_wrap( ~ uid)
pl
```
Since we kept both `id` and `uid` in the metadata, we can map them again each other.
The downside of using numbers, is that they are **ambiguous**.
Say, you decide to start with more data, or to remove some, then the numbers (`uid`) will not match their previous `id`.
Conversely, if you try to analyse different datasets independently, and merge results afterwards, you will end up with duplicates in `uid`. Therefore, only use this trick for readability and visualisation.
If you want to understand facets a bit more, have a look at [this tutorial](http://www.cookbook-r.com/Graphs/Facets_(ggplot2)/).
## LD annotations{-}
### Basics{-}
In circadian experiments, we often like to add annotations (black and white boxes) to show Dark and Light phases. We have another layer for that:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving)) + stat_pop_etho() + stat_ld_annotations()
pl
```
### Changing LD colours{-}
Sometimes you want different colours to explain, for instance, that days are "subjective"(grey).
```{r}
pl <- ggetho(dt, aes(x=t, y=moving)) + stat_pop_etho() +
stat_ld_annotations(ld_colours = c("grey", "black"))
pl
```
### LD in the background{-}
To put the annotation in the background, we can invert the order of the layers, set the height of the annotation to 1 (100%) and add some transparency (`alpha = 0.3`). We also remove the outline of the boxes:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving)) +
stat_ld_annotations(height=1, alpha=0.3, outline = NA) +
stat_pop_etho()
pl
```
### Phase and period{-}
Sometimes you want to show annotations with different phases and periods.
For instance, here, we shift the LD annotations 1h forward:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving)) +
stat_ld_annotations(phase = hours(1)) +
stat_pop_etho()
pl
```
One can also plot over a period different from 24h, say 20h days:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving)) +
stat_ld_annotations(period = hours(20)) +
stat_pop_etho()
pl
```
### Regime change{-}
When, you want to indicate a change in regime, say from LD to DD.
A simple way is to use multiple layers with explicit start and end points:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving)) +
# the default annotation layer
stat_ld_annotations() +
# on top of it, a second layer that
# starts at day 2 thoughout day 5,
# and where L colour is grey
stat_ld_annotations(x_limits = days(c(2,5)),
ld_colours = c("grey", "black" )) +
stat_pop_etho()
pl
```
## Coordinate and scales{-}
### Plot limits{-}
As `ggetho` creates regular ggplot objects, which we can extend. For instance, we can change the scales.
For instance, put the y scale as a percentage between 0 and 100:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving)) + stat_pop_etho() +
stat_ld_annotations()
pl <- pl + scale_y_continuous(limits = c(0,1),
labels = scales::percent)
pl
```
We can also use the same principle to zoom in a finished plot. E.g. between day one and day two:
```{r}
pl + coord_cartesian(xlim=c(days(1), days(2)))
```
### Time scale units{-}
By default, `ggetho` decides the unit of the time axis according to the range of the data.
Sometimes you want to override this behaviour to force time to be in a specific unit (here hours).
Using the plot above, we can add a scale:
```{r}
pl + ggetho::scale_x_hours()
```
`R` actually warns you since you are replacing the scale.
This is fine (as it is precisely what we wanted)!
### Coordinate systems{-}
Sometimes, it makes sense to use **polar coordinates** to show data around the clock:
```{r}
pl <- ggetho(dt, aes(x=t, y=moving, colour=sex), time_wrap = days(1)) +
stat_ld_annotations(height=.5,
alpha=.2,
x_limits = c(0, days(1)),
outline = NA) +
stat_pop_etho(geom = "polygon", fill=NA)
pl + coord_polar()
```
## Periodograms {-}
To draw periodogram, we can use our special fucntion `ggperio`.
The library `zeitgebr` generates periodogram as `behavr` tables. with columns for power, period ...
```{r}
library(zeitgebr)
dt[, t := ifelse(xmv(condition) == "A", t, t * 1.01)]
per_dt <- periodogram(moving, dt, FUN = chi_sq_periodogram, resample_rate = 1/mins(5))
per_dt
```
An average periodogram:
```{r}
ggperio(per_dt, aes(period, power, colour=condition)) +
stat_pop_etho()
```
Faceted by uid:
```{r}
ggperio(per_dt, aes(period, power, colour=condition)) +
geom_line() + facet_wrap( ~ uid)
```
Showing peaks:
```{r}
per_dt <-find_peaks(per_dt)
ggperio(per_dt, aes(period, power, colour=condition)) +
geom_line() +
geom_peak(colour="blue") +
facet_wrap( ~ uid)
```
## Spectrograms {-}
A "spectrogram" is a visualistion of the variation in the period spectrum over time.
The function `spectrogram()` in the `zeitgebr` library can generate a spectrogram for each individual in a `behavr` table, and return the result as a new `behavr` table with columns for power, period and time. `ggetho` makes it possible to visualise spectrograms as a heatmap, annotate them, aggregate them by condition.
Whilst periodogram represents the average periodicity of a signal over the entiere time series, sometimes, period itself changes over the course of an experiment.
Let's create a *dummy dataset* where the period one of the groups increases after 5 days:
```{r}
metadata <- data.table(id=sprintf("toy_experiment|%02d" , 1:10), region_id=1:10,
condition=c("A","B"),
sex=c("M","M", "F", "F"))
dt <- toy_activity_data(metadata, seed=107, duration = days(10))
# to emulate a time ticking 1.2 times slower after day 5 for group A only
dt[, t := ifelse(xmv(condition) == "A"& t > days(5), 1.2 * ( t - days(5)) + days(5), t)]
# the average activity over time
ggetho(dt, aes(y=moving, colour=condition)) + stat_pop_etho()
```
In this simple case, a periodogram would report poorly the overall behaviour of group A and give no information about the time of the change.
On the other hand, with a spectrogram, we should clearly see *when the shift of period happens*.
This spectrograms resulting from `spectrogram()` can be vusualised with our function, `ggspectro()`:
```{r}
spect_dt <- spectrogram(moving,
dt,
period_range = c(hours(6), hours(28)))
ggspectro(spect_dt) +
stat_tile_etho() +
scale_y_hours(name= "Period", log=T) + # log axis for period
facet_wrap(~ condition) +
stat_ld_annotations()
```
We see that at day 5, group A, but not group B, shifts its period from 12 h to approximatly 15 h.
Note that, in this example, we facet by condition, but we could also facet by `id` or, instead, average all individuals in one graph. An intesresting feature of such periodograms is that the phase information is discarded.
In other word, we can average the periodograms of multiple individuals (e.g. two populations) even if they are out of phase (e.g. if they are free running). This represents a clear advantage over [double-plotted actograms](#double-plotted-actograms), which are often only relevant on single individual.
Perdiodograms can also be wrapped in one day, for instance to study the modulations of period of an ultradian rhythm along the day (see `?ggspectro`).
## Take home message{-}
`ggetho` provides you with a new set of stats, layers and scales to represent behaviours.
In particular, `ggetho` and `ggperio` can be used to preprocess `behavr` tables in order to use a regular `ggplot2`
workflow.
## Next steps {-}
* [Sleep analysis with `sleepr`](sleepr.html)
* [Circadian analysis with `zeitgebr`](zeitgebr.html)