-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path04-behavr.Rmd
253 lines (191 loc) · 9.51 KB
/
04-behavr.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
# behavr tables{#behavr -}
**A single data structure to store data and metadata**
---------------------------

## Variables and metavariables{-}
As we have seen in the [previous section](metadata.html), metadata are crucial for proper statistical analysis of the experimental **data**. In the context of ethomics, the data are long time series of recorded **variables** such as position, orientation and number of beam crosses, for each individual.
**Variables** are different form **metavariables** in so far as the latter are made of only one value per animal.
It is easier (and less error prone) to always keep the data and metadata together.
In rethomics, in order to handle large amounts of data (together with metadata), we have designed the `behavr` package.
`behavr` tables are based on the very powerful package `data.table`, but enhanced with metadata.
A `behavr` table is, indeed, formed internally by two tables: the metadata table and the data table, both are linked by the `id` column (see figure above).
For most purposes, you can use a `behavr` table just like a `data.table`.
Therefore, do take a look at the [introduction to `data.table`](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html) for further details!
**When we load any behavioural data in `rethomics`, we get a `behavr` table as a result.**
In this section, we will discuss the usual operations that you can perform on `behavr` tables.
## Operating on `behavr` tables{-}
Now that we have all our data at the same place, we want to be able to manipulate it.
In the next part of this tutorial, we will create some toy data and learn how to manipulate it.
This is where basic knowledge of [`data.table`](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html) comes in handy.
The following table is an overview of operations in `behavr` tables.
`DT` represents an `behavr` table.
```{r, echo = FALSE}
section <- c("Generalities","Pure data", "","", "Pure metadata", "", "Meta & data", "", "Summarise", "", "Advanced")
ops = c("Summarise `behavr` table",
"**Create**/alter a variable",
"**Remove** a variable",
"**Select** data rows",
"Access metadata table",
"**Create**/alter metavariable",
"Use **metavariable as variable**",
"**Remove individuals** according to metavariable",
"Compute **individual statistics**",
"**Rejoin** metadata to data",
"**Stitch** experiments"
)
expr = c("`summary(DT)`",
"`DT[, new_column := some_value]`",
"`DT[, column_to_delete := NULL]`",
"`DT[criteria]`",
"`DT[meta = TRUE]`",
"`DT[, new_meta := some_value, meta=TRUE]`",
"`xmv(metavariable)`",
"`DT[criteria]`",
"`DT[, .( statistics = some_math()), by='id']`",
"`rejoin(DT)`",
"`stitch_on(DT, metavariable)`"
)
expl = c(
"How many individuals, variables, metavariables, etc? -- `summary(dt)`",
"When are animals 'very active'? -- `dt[, very_active := activity >= 2]`",
"Lets remove a variable we don't need? -- `dt[, very_active := NULL]`",
"Exclude data before the first hour -- `small_dt <- dt[t > hours(1)]`",
"Show metadata as table -- `dt[meta = TRUE]`",
"Define a new factor that is a comibiation of 'sex x condition' -- `dt[, treatment := paste(sex, condition, sep='|'), meta=T]`",
"Add 10s to all time, only for animals in condition `'A'` -- `dt[, t := ifelse(xmv(condition) == 'A', t + 10, t)]`",
"Remove all males (from data, and metadata) -- `dt_females <- dt[xmv(sex) == 'females']`",
"Compute the average activity, per animal -- `stat_dt <- dt[, .(mean_acti = mean(active)), by='id']`",
"Merge metadata and summary statistics -- `stat_dt <- rejoin(stat_dt)`",
"TODO -- TODO"
)
library(knitr)
kable(data.frame(
Section = section,
Operation = ops,
#Schematic = fig,
Expression = expr,
Example = expl),
row.names = FALSE)
```
## Playing with toy data{-}
The `behavr` package has a set of functions to make toy data. This provides us with a playgound to test functions and plots without having to get any real data.
In order to understand `behavr` object, lets create a toy one.
First, we make some dummy metadata (always needed to create a `behavr` table):
```{r}
library(behavr)
metadata <- data.table( id = paste("toy_experiment", 1:10, sep = "|"),
sex = rep(c("male", "female"), each = 5),
condition = c("A", "B") )
metadata
```
This metadata describes an hypothetical experiment with ten animals (`1:10`, five males and five females).
They are exposed to two conditions (`"A"` and `"B"`).
Then, we use `toy_dam_data()` to **simulate** (instead of linking/loading) one day of DAMS-like data for these ten animals (and two conditions):
```{r}
dt <- toy_dam_data(metadata, duration = days(1))
dt
```
As you can see, when we print `dt`, our `behavr` table, we have two sections: `METADATA` and `DATA`.
The former is actually just the metadata we created whilst the latter stores the data (i.e. the variables) for **all animals**.
The special column `id` is also known as a **key**, and is shared between both data and metadata.
It internally allows us to map them to one another.
In other words, it is a unique id for each individual.
In this specific example, the variables `t` and `activity` are the time and the number of beam crosses, respectively.
## Generalities{-}
A quick way to retreive general information about a `behavr` table is to use `summary`:
```{r}
summary(dt)
```
This tells us immediately how many variables, metavariables and data points, we have.
One can also print a detailed summary (i.e. one per animal):
```{r}
summary(dt, detailed = TRUE)
```
### Data{-}
Playing with variables is just like in `data.table`.
Read the [official data.table tutorial](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html) for more functionalities.
For instance, we can add a **new variable**, `very_active`, that is `TRUE` if and only if
there was at least two beam crosses in a minute, for a given individual:
```{r}
dt[, very_active := activity >= 2]
```
If we decide we don't need this variable anymore, we can **remove it**:
```{r}
dt[, very_active := NULL]
```
Sometimes, we would like to **filter the data**. That is, we select rows according to one or several criteria.
Often we would like to exclude the very start of the experiment. For example, we can keep data after one hour:
```{r}
dt <- dt[ t > hours(1)]
```
Note that that using `dt <-` mean we make a new table that overwrite the old one (since it has the same name).
### Metadata{-}
In order to access the metadata, we can add `meta = TRUE` inside the `[]`:
```{r}
dt[meta = TRUE]
```
This way, we can also **create new metavariables**.
For instance, say you want to collapse `sex` and `condition` which both have two levels into one `treatment`, with four levels:
```{r}
dt[, treatment := paste(sex, condition, sep='|'), meta=T]
# just to show the result:
dt[meta = TRUE]
```
`paste()` is a function that links strings of characters with an arbitrary separator (`"|"` here).
New metavariables can also be added from a summary (see [Summarise data](#summarise-data)).
### Data & Metadata {-}
The strength of `behavr` tables is their ability to seamlessly **use metavariables as though they were variables**.
For the sake of the example, let's say you would like to alter the variable `t` (time) so that we add ten seconds, *only to individuals that have condition `'A'`*.
```
dt[, t := ifelse(xmv(condition) == 'A', t + 10, t)]
```
The key here is the use of `xmv` (eXpand MetaVariable), which maps `condition` back in the data.
We can also use this mechanism to **remove individuals according to the value of a metavariable**.
For instance, lets get rid of the males!
```{r}
dt <- dt[xmv(sex) == 'female']
summary(dt)
```
When individuals are removed, metadata is automatically updated.
In effect, we removed males from both data and metadata.
This operation cannot be undone, as we overwrite `dt` with a new value.
An alternative would be to save the result in a new table (e.g. `dt_females <- dt[xmv(sex) == 'female']`)
This would use some additional memory, but it is safer.
### Summarise data{-}
Thanks to `data.table` `by` operations, it is simple and efficient to **compute statistics per individual**.
For instance, we may want to compute the average activity for each animal:
```{r}
stat_dt <- dt[,
.(mean_acti = mean(activity)),
by='id']
stat_dt
```
You can actually compute many variables in one go this way:
```{r}
stat_dt <- dt[,
.(mean_acti = mean(activity),
max_acti = max(activity)
),
by='id']
stat_dt
```
Then, if needed, this summary can be added back to the metadata:
```{r}
# create new metadata table by joining current meta and the summary table
new_meta <- dt[stat_dt, meta=T]
# set new metadata
setmeta(dt, new_meta)
head(dt[meta=T])
```
This way we can store per-individual aggregates and visualise or analyse them with respect to the
pre-existing metadata.
Now, in order to perform statistics, we would like to merge our summaries to the metadata.
This way we end up with only one `data.table`
That is, we want to **rejoin** them to one another (i.e. we enrich our summaries with the metadata):
```{r}
final_dt <- rejoin(stat_dt)
final_dt
```
This table is exactly what you need for statistics and visualisation in `R`!
<!-- ### Stitching{-} -->
<!-- TODO -->