-
Notifications
You must be signed in to change notification settings - Fork 1
/
trelliscope.Rmd
522 lines (372 loc) · 24.2 KB
/
trelliscope.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
---
title: "Introduction to Trelliscope"
output:
rmarkdown::html_vignette:
self_contained: false
vignette: >
%\VignetteIndexEntry{Introduction to Trelliscope}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
Trelliscope provides a simple mechanism to make a collection of visualizations and display them as interactive [small multiples](https://en.wikipedia.org/wiki/Small_multiple). This is a useful general visualization technique for many scenarios, particularly when looking at a somewhat large dataset comprised of many natural subsets. However, where Trelliscope differentiates itself from traditional faceting is in layout and interactivity. Traditionally, facets appear in one large plot that becomes overwhelming with too many groups. Trelliscope allows you to generate a large number of plots in an interactive window where the plots can be filtered and sorted based on metadata and paged through with a limited number of plots at a time.
The Trelliscope R package provides utilities to create the visualizations, specify metadata about the visualizations that can be used to interactively navigate them, and specify other aspects of how the viewer should behave.
## Data frames of visualizations
The basic principle behind the design of the R package is that you specify a collection of visualizations as a data frame, with one or more columns representing the plots (either as a plot object such as ggplot or as a reference to an image such as a png, svg, or even html file), and the other columns representing metadata about each visualization.
We refer to each plot (row) of a given visualization (column) as a **panel**, and hence will often refer to a visualization column as a collection of panels.
This package provides utilities to help build these data frames and then explore them in an interactive viewer.
### Pre-generated images
The simplest way to illustrate what is meant by a data frame of visualizations is to start with an example where the images have already been generated.
An example dataset that comes with the package contains images captured by the Mars rover Curiosity.
```{r}
library(trelliscope)
mars_rover
```
This data frame has a column that references images on the web, `img_src`. The other columns contain metadata about these images. We can create a Trelliscope data frame from this with the following:
```{r mars0}
d <- as_trelliscope_df(mars_rover, name = "mars rover")
d
```
This is simply the same data frame but with additional information for how to render the Trelliscope viewer app. At a minimum we provide a `name` for the resulting display, but we can also specify additional information such as a `path` to where we want the output to be written (a temporary directory if not specified), and a `description` and `tags`. You can also specify the `key_cols` which are the columns that combined, uniquely identify each row of the data. This is inferred if not provided but sometimes might not be what you would like it to be, as there often are many possibilities.
To see more information about trelliscope-specific settings, you can use `show_info()`:
```{r}
show_info(d)
```
Now to view this in the Trelliscope viewer app:
```{r mars, out.width="100%", out.height="520px", scale=0.7}
view_trelliscope(d)
```
You can use this viewer to interactively explore the images through filtering, sorting, and paging through the panels. Feel free to try out the example above! You can click the icon in the upper-right corner for a full-screen view of the viewer.
The behavior of the Trelliscope viewer can be customized in many ways, either through augmenting the data frame with new data, or by specifying additional visualizations, default sorting and filtering, and many other options. We will cover these throughout this document.
### R-generated visualizations
A likely more common use case when visualizing data during an analysis will be that the images do not yet exist and we will be generating them from subsets of the data we are analyzing. Trelliscope has utilities to make this convenient for visualization packages like ggplot2.
As a simple example, let's consider the `gap` dataset which is a modified version of data that originally comes with the [`gapminder`](https://cran.r-project.org/web/packages/gapminder/index.html) package. This modified version contains some extra columns such as ISO country code and country centroid latitude and longitude.
```{r}
gap
```
This data provides statistics such as life expectancy annually for 123 countries.
Suppose we want to visualize life expectancy vs. year for each country. With ggplot2, you would do something like this:
```{r echo=FALSE}
library(trelliscope)
suppressPackageStartupMessages(
library(ggplot2, warn.conflicts = FALSE)
)
```
```{r}
library(ggplot2)
ggplot(gap, aes(year, life_exp)) +
geom_point() +
facet_wrap(vars(country, continent))
```
There are too many panels to view on one page, making this a good candidate for Trelliscope.
Trelliscope provides a function `facet_panels()` that is the first step in turning a ggplot object into a Trelliscope data frame. You can swap out `facet_wrap()` for this function.
```{r}
p <- ggplot(gap, aes(year, life_exp)) +
geom_point() +
facet_panels(vars(country, continent))
class(p)
```
As you can see, `facet_panels()` simply modifies your ggplot object. If you print the resulting object `p`, a Trelliscope display will be written and displayed.
To take this and turn it into a data frame of visualizations with one row for each country/continent, we can apply the function `as_panels_df()`.
```{r}
p_df <- as_panels_df(p)
p_df
```
Here, just as in the Mars rover example, we have a data frame of visualizations. However, in this case, the visualizations are ggplot objects instead of image references.
Note that `as_panels_df()` has options such as `as_plotly = TRUE` that will convert the ggplot objects to plotly objects.
You can view the plot for any one row by calling the panel column and the index number of the row you want to look at. For example, if you wanted to see the generated plot for the second row, Albania, you would run the following code.
```{r message=FALSE}
p_df$panel[[2]]
```
Note that this nested data frame of visualizations can be a useful object to work with outside of using it with Trelliscope.
Just as in the Mars rover example, we can view this data frame of visualizations and cast it as a Trelliscope data frame with `as_trelliscope_df()` and view it with `view_trelliscope()`.
```{r gap0, out.width="100%", out.height="520px", scale=0.7}
tdf <- as_trelliscope_df(p_df,
name = "gapminder life expectancy")
view_trelliscope(tdf)
```
Note that there are several benefits to using `facet_panels()` and `as_panels_df()`. First, it fits more naturally into the ggplot2 paradigm, where you can build a Trelliscope visualization exactly as you would with building a ggplot2 visualization. Second, you can make use of the `scales` argument in `facet_panels()` (which behaves similarly to the same argument in `facet_wrap()`) to ensure that the x and y axis ranges of your plots behave the way you want. The default is for all plots to have the same `"fixed"` axis ranges. This is an important consideration in visualizing small multiples because if you are making visual comparisons you often want to be comparing things on the same scale.
The remainder of this tutorial will cover customizations that you can apply to Trelliscope data frames to provide more powerful interactions when viewing them in the app.
## Customizing your Trelliscope app
So far we have seen a few ways to get to a Trelliscope data frame that we can use to create a Trelliscope interactive visualization app. As we've seen, you can simply pass a Trelliscope data frame to `view_trelliscope()` to get to an immediate output. However, there are many other operations you can perform on a Trelliscope data frame to customize how the app behaves.
Let's revisit the gapminder example. Here we re-build a data frame of ggplot panels in one code block.
```{r gap1}
library(dplyr, warn.conflicts = FALSE)
tdf <- (
ggplot(gap, aes(year, life_exp)) +
geom_point() +
facet_panels(vars(country, continent, iso_alpha2))
) |>
as_panels_df(panel_col = "lexp_time") |>
as_trelliscope_df(name = "gapminder life expectancy")
tdf
```
Here we also added `iso_alpha2` (country code) as a redundant facetting variable so that it is available in our data frame for later use.
### Adding panels
Our data frame already has a panel column, but we can add more if we would like. The following functions are available to add panels to a Trelliscope data frame.
- `panel_url()`: Add a panel column with URLs to images
- `panel_local()`: Add a panel column with local image files
- `panel_lazy()`: Add a panel column by specifying a plot function that will be used to generate panels
Here we will show an example of using `panel_url()` to add a country flag images to our data frame. In another article we will [provide more examples of using these functions](panels.html). Note that a variation of `panel_lazy()` is used underneath the hood when you use `facet_panels()`.
A database of country flags is available [here](https://raw.githubusercontent.com/hafen/countryflags/master/png/512/) and flag images can be referenced by their 2-letter country code.
```{r}
flag_base_url <- "https://raw.githubusercontent.com/hafen/countryflags/master/png/512/"
tdf <- mutate(tdf,
flag_url = panel_url(paste0(flag_base_url, iso_alpha2, ".png"))
)
tdf
```
We can view a flag for any country by looking at a single entry from the column, e.g. `tdf$flag_url[[1]]`. This will open up the image in a web browser.
### Adding variables
One of the most useful things you can do to customize your Trelliscope app is to add additional variables to the data frame. These variables can be used to control how the panels are explored in the viewer through sorting, filtering, and labels.
For example, suppose we want to be able to explore countries based on summary statistics such as their mean life expectancy, etc. We can do this by computing a summaries of the gapminder data and joining this with `tdf`.
```{r}
gsumm <- gap |>
mutate(pct_chg = 100 * (life_exp - lag(life_exp)) / lag(life_exp)) |>
summarise(
mean_lexp = mean(life_exp),
mean_gdp = mean(gdp_percap),
max_lexp_pct_chg = max(pct_chg, na.rm = TRUE),
dt_lexp_max_pct_chg = as.Date(paste0(year[which.max(pct_chg)], "-01-01")),
.by = country
)
tdf <- left_join(tdf, gsumm, by = "country")
tdf
```
Trelliscope makes use of variable types to determine how data is displayed in the viewer as well as how it can be interacted with. Built-in R types such as "character", "factor", "numeric", "Date", and "POSIXct", are all supported. Character and factor variables have a filter interaction that allows you to filter the data by the values of the variable (with factors, the natural order of these values is according to the factor levels). Numeric, date and POSIXct variables have a range filter interaction that allows you to filter the data by a range of numbers/dates/times.
### Special variable types
Trelliscope provides some additional variable types that can be used to provide special functionality in the viewer. Currently, the following are provided:
- `number()`: Specifies a numeric type that allows specification of number of digits to display and whether to show the variable on the log scale.
- `currency()`: Specifies a numeric type that represents a currency and can have a currency symbol prepended to it.
- `href()`: Specifies a character type that represents a URL to link to.
These types make use of the [vctrs](https://vctrs.r-lib.org) package. You can create variables of these types by simply wrapping a vector with these functions and any additional paramters.
For example, below we add an example of each of these variables to our gapminder data frame:
```{r}
tdf <- tdf |>
mutate(
mean_lexp = number(mean_lexp, digits = 1),
mean_gdp = currency(mean_gdp, code = "USD"),
wiki_link = href(paste0("https://en.wikipedia.org/wiki/", country)),
)
tdf
```
More special variable types will come in the future as supporting filter interactions are added for them in the viewer. Some types we anticipate include geographic coordinates, network graph links, and more.
### Updating display attributes with pipe functions
A Trelliscope data frame is simply a data frame that also keeps track of attributes about the Trelliscope display. We can modify these attributes by applying pipe functions [pipe functions](https://r4ds.had.co.nz/pipes.html) that take a Trelliscope data frame as its primary argument and return a modified Trelliscope data frame.
The following pipe functions are available:
- Fine-tune how panels and variables are handled in the app
- `set_panel_options()`
- `set_var_labels()`
- `set_tags()`
- Set the default viewing state of the app
- `set_default_panel()`
- `set_default_filters()`
- `set_default_labels()`
- `set_default_layout()`
- `set_default_sort()`
- Additional features
- `add_inputs()`: specify input variables that capture user feedback for each panel
- `add_view()`: add a pre-defined "view" that allows users to navigate to specified states of the display
- `set_info_html()`: specify HTML to display in the info panel of the viewer
- `set_show_info_on_load()`: specify whether to show the info panel on load
- `add_charm()`: simple password protection for the generated app
- Writing and viewing
- `write_trelliscope()`
- `view_trelliscope()`
We will show examples of several of these in the following sections.
### Setting variable labels and tags
To help a user have a better understanding of what the variables represent and how they are associated, we can use variable labels and tags.
Variable labels can be added to a Trelliscope data frame using `set_var_labels()`. This function takes a named set of parameters as input, with the names indicating the variable name and the values indicating the labels. For example:
```{r}
tdf <- tdf |>
set_var_labels(
mean_lexp = "Mean life expectancy",
mean_gdp = "Mean GDP per capita",
max_lexp_pct_chg = "Max % year-to-year change in life expectancy",
dt_lexp_max_pct_chg = "Date of max % year-to-year change in life expectancy",
wiki_link = "Link to country Wikipedia entry"
)
```
Note that this function simply adds a "label" attribute to each specified column, which is a common practice in R for handling labels in data frames. If your data frame is already labeled or you have other means of adding these attributes, you do not need to use this function.
When there are many variables in a display, it can be useful to add tags to variables that help the user investigate variables associated with concepts of interest. Tags can be added to a Trelliscope data frame using `set_tags()`. This function takes a named set of parameters as input, with the names indicating the tag name and the values indicating the variable names to associate with that tag. For example, below we have a tag indicating variables representing computed country "stats" and a tag that indicates variables containing "info" about a country.
```{r}
tdf <- tdf |>
set_tags(
stats = c("mean_lexp", "mean_gdp", "max_lexp_pct_chg"),
info = c("country", "continent")
)
```
### Setting panel options
Trelliscope has defaults for how it will write out and show panel columns in a data frame. In our example, if we were to write out our `tdf` data frame, it would write our "lexp_time" panel column as 500x500 pixel png files. Suppose we wish to render these as 600x400 pixel svg files instead. We can do this with `set_panel_options()`. This function takes a named set of parameters as input, with the names indicating a panel column name (there can be more than one) and the values a call to `panel_options()` which allows us to specify a `width`, `height`, and `format` for "lexp_time". We also set the aspect ratio for the "flag_url" panel by specifying a width and height ratio (5:3 being the most common aspect ratio for flags). Note that for panels that already exist as files, the units of width and height do not matter as the panels are dynamically sized in the viewer and the only thing that matters is the aspect ratio at which they are displayed.
```{r}
tdf <- tdf |>
set_panel_options(
lexp_time = panel_options(width = 600, height = 400, format = "svg"),
flag_url = panel_options(width = 5, height = 3)
)
```
### Setting the default state of the app
Trelliscope apps by default display all panels in the order as they appear in the data frame. Often it makes sense to start the user off at a specific point in the app, such as pre-defining a sorting or filtering state, or defining which panel labels you want the user to see initially.
#### `set_default_labels()`
By default, the "key columns" will be shown as labels. If we'd like to change what labels are shown when the display is opened, we can use `set_default_labels()`, e.g.:
```{r}
tdf <- tdf |>
set_default_labels(c("country", "continent", "wiki_link"))
```
#### `set_default_layout()`
We can also set the default panel layout, for example that we wish to see 5 columns of panels on the initial view of the app (number of rows is computed based on the size of the user's browser window and the aspect ratio of the panels).
```{r}
tdf <- tdf |>
set_default_layout(ncol = 4)
```
#### `set_default_sort()`
We can set the default sort order with `set_default_sort()`. For this, we provide a vector of variable names and a vector of "asc" or "desc" values inidicatingm an ascending or descending sort order.
```{r}
tdf <- tdf |>
set_default_sort(c("continent", "mean_lexp"), dir = c("asc", "desc"))
```
#### `set_default_filters()`
We can set the default filter state with `set_default_filters()`. Currently there are two different kinds of filters:
- `filter_range(varname, min = ..., max = ...)`: works with numeric, date, or datetime variables
- `filter_string(varname, values = ...)`: works with factor or string variables
```{r}
tdf <- tdf |>
set_default_filters(
filter_string("continent", values = "Africa"),
filter_range("mean_lexp", max = 50)
)
```
More types of filters are planned in the future.
### Defining "views"
Views are predefined sets of state that are made available in the viewer to help the user conveniently get to regions of the display that are interesting in different ways. You can add a view chaining the display through the `add_view()` function.
`add_view()` takes a `name` as its first argument, and then any number of state specifications. The functions available to set the state are the following:
- `state_layout()`
- `state_labels()`
- `state_sort()`
- `filter_string()`
- `filter_range()`
The `state_*()` functions have the same parameters as and behave similarly to their `set_*()` counterparts except that unlike those, these do not receive a Trelliscope data frame and return a Trelliscope data frame, but instead just specify a state. The `filter_*()` functions we have seen already.
For example, suppose we wish to add a view that only shows countries with median life expectancy greater than or equal to 60, sorted from highest to lowest median life expectancy:
```{r}
tdf <- tdf |>
add_view(
name = "Countries with high life expectancy (mean >= 60)",
filter_range("mean_lexp", min = 60),
state_sort("mean_lexp", dir = "desc")
)
```
You can add as many views as you would like by chaining more calls to `add_view()`.
### Specifying user inputs
You can add user inputs that are attached to each panel of the display using the `add_inputs()` function. This function takes any number of arguments created by any of the following functions:
- `input_radio(name, label, options)`
- `input_text(name, label, width, height)`
- `input_checkbox(name, label, options)`
- `input_select(name, label, options)`
- `input_multiselect(name, label, options)`
- `input_number(name, label)`
These specify different input types.
For example, if we want a free text input for comments as well as yes/no question asking if the data looks correct for the panel, we can do the following.
```{r}
tdf <- tdf |>
add_inputs(
input_text(name = "comments", label = "Comments about this panel",
height = 6),
input_radio(name = "looks_correct",
label = "Does the data look correct?", options = c("no", "yes"))
)
```
Since the Trelliscope app is not backed by a server, persistent storage of user inputs is currently not supported. If you need to get inputs back from a user, an optional `email` argument can be provided which will help the user know how to get these back to you.
Let's see how all of these operations are reflected in our Trelliscope data frame:
```{r}
show_info(tdf)
```
### Writing and viewing the app
Now that we have built up our Trelliscope data frame, we can write it out as specified before with `write_trelliscope()`.
```{r}
tdf <- write_trelliscope(tdf)
```
This writes the panels if they haven't been written yet and then writes out a JSON representation of all of the other specifications we have made for the app to consume.
Here is the final output.
```{r gap, out.width="100%", out.height="630px", scale=0.6}
view_trelliscope(tdf)
```
Note that we can bypass `write_trelliscope()` by going straight to `view_trelliscope()` but `write_trelliscope()` allows us to do things like force already-written panels to re-render.
### Putting it all together
This example illustrates most of the features available in Trelliscope. As seen in the initial examples, Trelliscope displays can be created with minimal code, but additional functionality can be added with more code. A good amount of this code is already natural to you as we are in many cases simply updating a data frame with new columns, etc. The rest of the code is simply specifying the desired state of the app.
```{r eval=FALSE}
# create initial Trelliscope data frame
tdf <- (
ggplot(gap, aes(year, life_exp)) +
geom_point() +
facet_panels(vars(country, continent))
) |>
as_panels_df(panel_col = "lexp_time") |>
as_trelliscope_df(name = "gapminder life expectancy")
# add variables
gsumm <- gap |>
mutate(pct_chg = 100 * (life_exp - lag(life_exp)) / lag(life_exp)) |>
summarise(
mean_lexp = number(mean(life_exp), digits = 1),
mean_gdp = currency(mean(gdp_percap), code = "USD"),
max_lexp_pct_chg = max(pct_chg, na.rm = TRUE),
dt_lexp_max_pct_chg = as.Date(paste0(year[which.max(pct_chg)], "-01-01")),
wiki_link = href(paste0("https://en.wikipedia.org/wiki/", country)),
.by = country
)
tdf <- left_join(tdf, gsumm, by = "country")
# set variable labels
tdf <- tdf |>
set_var_labels(
mean_lexp = "Mean life expectancy",
mean_gdp = "Mean GDP per capita",
max_lexp_pct_chg = "Max % year-to-year change in life expectancy",
dt_lexp_max_pct_chg = "Date of max % year-to-year change in life expectancy",
wiki_link = "Link to country Wikipedia entry"
)
# set tags
tdf <- tdf |>
set_tags(
stats = c("mean_lexp", "mean_gdp", "max_lexp_pct_chg"),
info = c("country", "continent")
)
# set panel options
tdf <- tdf |>
set_panel_options(
lexp_time = panel_options_lazy(width = 600, height = 400, format = "svg")
)
# set default state
tdf <- tdf |>
set_default_labels(c("country", "continent", "wiki_link")) |>
set_default_layout(ncol = 4) |>
set_default_sort(c("continent", "mean_lexp"), dir = c("asc", "desc")) |>
set_default_filters(
filter_string("continent", values = "Africa"),
filter_range("mean_lexp", max = 50)
)
# add a view
tdf <- tdf |>
add_view(
name = "Countries with high life expectancy (mean >= 60)",
filter_range("mean_lexp", min = 60),
state_sort("mean_lexp", dir = "desc")
)
# add user inputs
tdf <- tdf |>
add_inputs(
input_text(name = "comments", label = "Comments about this panel",
height = 6),
input_radio(name = "looks_correct",
label = "Does the data look correct?", options = c("no", "yes"))
)
# view the display
view_trelliscope(tdf)
```
A few additional features are available in Trelliscope that we have not covered here and can be found by visiting other articles in this documentation:
- [A deeper dive into creating panel columns](panels.html)
- [Sharing and embedding Trelliscope displays](embed.html)
- [Visualizing very large datasets with Trelliscope](bigdata.html)