Permalink
Switch branches/tags
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
454 lines (345 sloc) 13.4 KB
---
title: "Extending tibble"
author: "Kirill Müller, Hadley Wickham"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Extending tibble}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
To extend the tibble package for new types of columnar data, you need to understand how printing works. The presentation of a column in a tibble is powered by four S3 generics:
* `type_sum()` determines what goes into the column header.
* `pillar_shaft()` determines what goes into the body of the column.
* `is_vector_s3()` and `obj_sum()` are used when rendering list columns.
If you have written an S3 or S4 class that can be used as a column, you can override these generics to make sure your data prints well in a tibble. To start, you must import the `pillar` package that powers the printing of tibbles. Either add `pillar` to the `Imports:` section of your `DESCRIPTION`, or simply call:
```{r, eval = FALSE}
usethis::use_package("pillar")
```
This short vignette assumes a package that implements an S3 class `"latlon"` and uses `roxygen2` to create documentation and the `NAMESPACE` file. For this vignette to work we need to attach pillar:
## Prerequisites
We define a class `"latlon"` that encodes geographic coordinates in a complex number. For simplicity, the values are printed as degrees and minutes only.
```{r}
#' @export
latlon <- function(lat, lon) {
as_latlon(complex(real = lon, imaginary = lat))
}
#' @export
as_latlon <- function(x) {
structure(x, class = "latlon")
}
#' @export
c.latlon <- function(x, ...) {
as_latlon(NextMethod())
}
#' @export
`[.latlon` <- function(x, i) {
as_latlon(NextMethod())
}
#' @export
format.latlon <- function(x, ..., formatter = deg_min) {
x_valid <- which(!is.na(x))
lat <- unclass(Im(x[x_valid]))
lon <- unclass(Re(x[x_valid]))
ret <- rep("<NA>", length(x))
ret[x_valid] <- paste(
formatter(lat, c("N", "S")),
formatter(lon, c("E", "W"))
)
format(ret, justify = "right")
}
deg_min <- function(x, pm) {
sign <- sign(x)
x <- abs(x)
deg <- trunc(x)
x <- x - deg
min <- round(x * 60)
ret <- sprintf("%d°%.2d'%s", deg, min, pm[ifelse(sign >= 0, 1, 2)])
format(ret, justify = "right")
}
#' @export
print.latlon <- function(x, ...) {
cat(format(x), sep = "\n")
invisible(x)
}
latlon(32.7102978, -117.1704058)
```
More methods are needed to make this class fully compatible with data frames, see e.g. the [hms](https://github.com/tidyverse/hms/) package for a more complete example.
## Using in a tibble
Columns on this class can be used in a tibble right away, but the output will be less than ideal:
```{r}
library(tibble)
data <- tibble(
venue = "rstudio::conf",
year = 2017:2019,
loc = latlon(
c(28.3411783, 32.7102978, NA),
c(-81.5480348, -117.1704058, NA)
),
paths = list(
loc[1],
c(loc[1], loc[2]),
loc[2]
)
)
data
```
(The `paths` column is a list that contains arbitrary data, in our case `latlon` vectors. A list column is a powerful way to attach hierarchical or unstructured data to an observation in a data frame.)
The output has three main problems:
1. The column type of the `loc` column is displayed as `<S3: latlon>`. This default formatting works reasonably well for any kind of object, but the generated output may be too wide and waste precious space when displaying the tibble.
1. The values in the `loc` column are formatted as complex numbers (the underlying storage), without using the `format()` method we have defined. This is by design.
1. The cells in the `paths` column are also displayed as `<S3: latlon>`.
In the remainder I'll show how to fix these problems, and also how to implement rendering that adapts to the available width.
## Fixing the data type
To display `<geo>` as data type, we need to override the `type_sum()` method. This method should return a string that can be used in a column header. For your own classes, strive for an evocative abbreviation that's under 6 characters.
```{r include=FALSE}
import::from(pillar, type_sum)
```
```{r}
#' @importFrom pillar type_sum
#' @export
type_sum.latlon <- function(x) {
"geo"
}
```
Because the value shown there doesn't depend on the data, we just return a constant. (For date-times, the column info will eventually contain information about the timezone, see [#53](https://github.com/r-lib/pillar/pull/53).)
```{r}
data
```
## Rendering the value
To use our format method for rendering, we implement the `pillar_shaft()` method for our class. (A [*pillar*](https://en.wikipedia.org/wiki/Column#Nomenclature) is mainly a *shaft* (decorated with an *ornament*), with a *capital* above and a *base* below. Multiple pillars form a *colonnade*, which can be stacked in multiple *tiers*. This is the motivation behind the names in our API.)
```{r include=FALSE}
import::from(pillar, pillar_shaft)
```
```{r}
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x)
out[is.na(x)] <- NA
pillar::new_pillar_shaft_simple(out, align = "right")
}
```
The simplest variant calls our `format()` method, everything else is handled by pillar, in particular by the `new_pillar_shaft_simple()` helper. Note how the `align` argument affects the alignment of NA values and of the column name and type.
```{r}
data
```
We could also use left alignment and indent only the `NA` values:
```{r}
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x)
out[is.na(x)] <- NA
pillar::new_pillar_shaft_simple(out, align = "left", na_indent = 5)
}
data
```
## Adaptive rendering
If there is not enough space to render the values, the formatted values are truncated with an ellipsis. This doesn't currently apply to our class, because we haven't specified a minimum width for our values:
```{r}
print(data, width = 35)
```
If we specify a minimum width when constructing the shaft, the `loc` column will be truncated:
```{r}
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x)
out[is.na(x)] <- NA
pillar::new_pillar_shaft_simple(out, align = "right", min_width = 10)
}
print(data, width = 35)
```
This may be useful for character data, but for lat-lon data we may prefer to show full degrees and remove the minutes if the available space is not enough to show accurate values. A more sophisticated implementation of the `pillar_shaft()` method is required to achieve this:
```{r}
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
deg <- format(x, formatter = deg)
deg[is.na(x)] <- pillar::style_na("NA")
deg_min <- format(x)
deg_min[is.na(x)] <- pillar::style_na("NA")
pillar::new_pillar_shaft(
list(deg = deg, deg_min = deg_min),
width = pillar::get_max_extent(deg_min),
min_width = pillar::get_max_extent(deg),
subclass = "pillar_shaft_latlon"
)
}
```
Here, `pillar_shaft()` returns an object of the `"pillar_shaft_latlon"` class created by the generic `new_pillar_shaft()` constructor. This object contains the necessary information to render the values, and also minimum and maximum width values. For simplicity, both formattings are pre-rendered, and the minimum and maximum widths are computed from there. Note that we also need to take care of `NA` values explicitly. (`get_max_extent()` is a helper that computes the maximum display width occupied by the values in a character vector.)
For completeness, the code that implements the degree-only formatting looks like this:
```{r}
deg <- function(x, pm) {
sign <- sign(x)
x <- abs(x)
deg <- round(x)
ret <- sprintf("%d°%s", deg, pm[ifelse(sign >= 0, 1, 2)])
format(ret, justify = "right")
}
```
All that's left to do is to implement a `format()` method for our new `"pillar_shaft_latlon"` class. This method will be called with a `width` argument, which then determines which of the formattings to choose:
```{r}
#' @export
format.pillar_shaft_latlon <- function(x, width, ...) {
if (all(crayon::col_nchar(x$deg_min) <= width)) {
ornament <- x$deg_min
} else {
ornament <- x$deg
}
pillar::new_ornament(ornament)
}
data
print(data, width = 35)
```
## Adding color
Both `new_pillar_shaft_simple()` and `new_ornament()` accept ANSI escape codes for coloring, emphasis, or other ways of highlighting text on terminals that support it. Some formattings are predefined, e.g. `style_subtle()` displays text in a light gray. For default data types, this style is used for insignificant digits. We'll be formatting the degree and minute signs in a subtle style, because they serve only as separators. You can also use the [crayon](https://cran.r-project.org/package=crayon) package to add custom formattings to your output.
```{r}
#' @importFrom pillar pillar_shaft
#' @export
pillar_shaft.latlon <- function(x, ...) {
out <- format(x, formatter = deg_min_color)
out[is.na(x)] <- NA
pillar::new_pillar_shaft_simple(out, align = "left", na_indent = 5)
}
deg_min_color <- function(x, pm) {
sign <- sign(x)
x <- abs(x)
deg <- trunc(x)
x <- x - deg
rad <- round(x * 60)
ret <- sprintf(
"%d%s%.2d%s%s",
deg,
pillar::style_subtle("°"),
rad,
pillar::style_subtle("'"),
pm[ifelse(sign >= 0, 1, 2)]
)
ret[is.na(x)] <- ""
format(ret, justify = "right")
}
data
```
Currently, ANSI escapes are not rendered in vignettes, so the display here isn't much different from earlier examples. This may change in the future.
## Fixing list columns
To tweak the output in the `paths` column, we simply need to indicate that our class is an S3 vector:
```{r include=FALSE}
import::from(pillar, is_vector_s3)
```
```{r}
#' @importFrom pillar is_vector_s3
#' @export
is_vector_s3.latlon <- function(x) TRUE
data
```
This is picked up by the default implementation of `obj_sum()`, which then shows the type and the length in brackets. If your object is built on top of an atomic vector the default will be adequate. You, will, however, need to provide an `obj_sum()` method for your class if your object is vectorised and built on top of a list.
An example of an object of this type in base R is `POSIXlt`: it is a list with 9 components.
```{r}
x <- as.POSIXlt(Sys.time() + c(0, 60, 3600))
str(unclass(x))
```
But it pretends to be a vector with 3 elements:
```{r}
x
length(x)
str(x)
```
So we need to define a method that returns a character vector the same length as `x`:
```{r include=FALSE}
import::from(pillar, obj_sum)
```
```{r}
#' @importFrom pillar obj_sum
#' @export
obj_sum.POSIXlt <- function(x) {
rep("POSIXlt", length(x))
}
```
## Testing
If you want to test the output of your code, you can compare it with a known state recorded in a text file. For this, pillar offers the `expect_known_display()` expectation which requires and works best with the testthat package. Make sure that the output is generated only by your package to avoid inconsistencies when external code is updated. Here, this means that you test only the shaft portion of the pillar, and not the entire pillar or even a tibble that contains a column with your data type!
The tests work best with the testthat package:
```{r}
library(testthat)
```
```{r include = FALSE}
unlink("latlon.txt")
unlink("latlon-bw.txt")
```
The code below will compare the output of `pillar_shaft(data$loc)` with known output stored in the `latlon.txt` file. The first run warns because the file doesn't exist yet.
```{r error = TRUE, warning = TRUE}
test_that("latlon pillar matches known output", {
pillar::expect_known_display(
pillar_shaft(data$loc),
file = "latlon.txt"
)
})
```
From the second run on, the printing will be compared with the file:
```{r}
test_that("latlon pillar matches known output", {
pillar::expect_known_display(
pillar_shaft(data$loc),
file = "latlon.txt"
)
})
```
However, if we look at the file we'll notice strange things: The output contains ANSI escapes!
```{r}
readLines("latlon.txt")
```
We can turn them off by passing `crayon = FALSE` to the expectation, but we need to run twice again:
```{r error = TRUE, warning = TRUE}
library(testthat)
test_that("latlon pillar matches known output", {
pillar::expect_known_display(
pillar_shaft(data$loc),
file = "latlon.txt",
crayon = FALSE
)
})
```
```{r}
test_that("latlon pillar matches known output", {
pillar::expect_known_display(
pillar_shaft(data$loc),
file = "latlon.txt",
crayon = FALSE
)
})
readLines("latlon.txt")
```
You may want to create a series of output files for different scenarios:
- Colored vs. plain (to simplify viewing differences)
- With or without special Unicode characters (if your output uses them)
- Different widths
For this it is helpful to create your own expectation function. Use the tidy evaluation framework to make sure that construction and printing happens at the right time:
```{r}
expect_known_latlon_display <- function(x, file_base) {
quo <- rlang::quo(pillar::pillar_shaft(x))
pillar::expect_known_display(
!! quo,
file = paste0(file_base, ".txt")
)
pillar::expect_known_display(
!! quo,
file = paste0(file_base, "-bw.txt"),
crayon = FALSE
)
}
```
```{r error = TRUE, warning = TRUE}
test_that("latlon pillar matches known output", {
expect_known_latlon_display(data$loc, file_base = "latlon")
})
```
```{r}
readLines("latlon.txt")
readLines("latlon-bw.txt")
```
Learn more about the tidyeval framework in the [dplyr vignette](http://dplyr.tidyverse.org/articles/programming.html).
```{r include = FALSE}
unlink("latlon.txt")
unlink("latlon-bw.txt")
```