-
Notifications
You must be signed in to change notification settings - Fork 130
/
tibble.Rmd
202 lines (143 loc) · 5.36 KB
/
tibble.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
---
title: "Tibbles"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Tibbles}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r setup, include = FALSE}
library(tibble)
set.seed(1014)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
Tibbles are a modern take on data frames.
They keep the features that have stood the test of time, and drop the features that used to be convenient but are now frustrating.
```{r}
library(tibble)
```
## Creating
`tibble()` is a nice way to create data frames.
It encapsulates best practices for data frames:
* It never changes an input's type (i.e., no more `stringsAsFactors = FALSE`!).
```{r}
tibble(x = letters)
```
This makes it easier to use with list-columns:
```{r}
tibble(x = 1:3, y = list(1:5, 1:10, 1:20))
```
List-columns are often created by `tidyr::nest()`, but they can be useful to
create by hand.
* It never adjusts the names of variables:
```{r}
names(data.frame(`crazy name` = 1))
names(tibble(`crazy name` = 1))
```
* It evaluates its arguments lazily and sequentially:
```{r}
tibble(x = 1:5, y = x ^ 2)
```
* It never uses `row.names()`.
The whole point of tidy data is to store variables in a consistent way.
So it never stores a variable as special attribute.
* It only recycles vectors of length 1.
This is because recycling vectors of greater lengths is a frequent source of bugs.
## Coercion
To complement `tibble()`, tibble provides `as_tibble()` to coerce objects into tibbles.
Generally, `as_tibble()` methods are much simpler than `as.data.frame()` methods.
The method for lists has been written with an eye for performance:
```{r error = TRUE, eval = FALSE}
l <- replicate(26, sample(100), simplify = FALSE)
names(l) <- letters
timing <- bench::mark(
as_tibble(l),
as.data.frame(l),
check = FALSE
)
timing
```
```{r echo = FALSE, eval = (Sys.getenv("IN_GALLEY") == "")}
readRDS("timing.rds")
```
The speed of `as.data.frame()` is not usually a bottleneck when used interactively, but can be a problem when combining thousands of messy inputs into one tidy data frame.
## Tibbles vs data frames
There are three key differences between tibbles and data frames: printing, subsetting, and recycling rules.
### Printing
When you print a tibble, it only shows the first ten rows and all the columns that fit on one screen.
It also prints an abbreviated description of the column type, and uses font styles and color for highlighting:
```{r}
tibble(x = -5:100, y = 123.456 * (3^x))
```
Numbers are displayed with three significant figures by default, and a trailing dot that indicates the existence of a fractional component.
You can control the default appearance with options:
* `options(pillar.print_max = n, pillar.print_min = m)`: if there are more than `n` rows, print only the first `m` rows.
Use `options(pillar.print_max = Inf)` to always show all rows.
* `options(pillar.width = n)`: use `n` character slots horizontally to show the data. If `n > getOption("width")`, this will result in multiple tiers. Use `options(pillar.width = Inf)` to always print all columns, regardless of the width of the screen.
See `?pillar::pillar_options` and `?tibble_options` for the available options, `vignette("types")` for an overview of the type abbreviations, `vignette("numbers")` for details on the formatting of numbers, and `vignette("digits")` for a comparison with data frame printing.
### Subsetting
Tibbles are quite strict about subsetting.
`[` always returns another tibble.
Contrast this with a data frame: sometimes `[` returns a data frame and sometimes it just returns a vector:
```{r}
df1 <- data.frame(x = 1:3, y = 3:1)
class(df1[, 1:2])
class(df1[, 1])
df2 <- tibble(x = 1:3, y = 3:1)
class(df2[, 1:2])
class(df2[, 1])
```
To extract a single column use `[[` or `$`:
```{r}
class(df2[[1]])
class(df2$x)
```
Tibbles are also stricter with `$`.
Tibbles never do partial matching, and will throw a warning and return `NULL` if the column does not exist:
```{r, error = TRUE}
df <- data.frame(abc = 1)
df$a
df2 <- tibble(abc = 1)
df2$a
```
However, tibbles respect the `drop` argument if it is provided:
```{r}
data.frame(a = 1:3)[, "a", drop = TRUE]
tibble(a = 1:3)[, "a", drop = TRUE]
```
Tibbles do not support row names.
They are removed when converting to a tibble or when subsetting:
```{r}
df <- data.frame(a = 1:3, row.names = letters[1:3])
rownames(df)
rownames(as_tibble(df))
tbl <- tibble(a = 1:3)
rownames(tbl) <- letters[1:3]
rownames(tbl)
rownames(tbl[1, ])
```
See `vignette("invariants")` for a detailed comparison between tibbles and data frames.
### Recycling
When constructing a tibble, only values of length 1 are recycled.
The first column with length different to one determines the number of rows in the tibble, conflicts lead to an error:
```{r, error = TRUE}
tibble(a = 1, b = 1:3)
tibble(a = 1:3, b = 1)
tibble(a = 1:3, c = 1:2)
```
This also extends to tibbles with *zero* rows, which is sometimes important for programming:
```{r}
tibble(a = 1, b = integer())
tibble(a = integer(), b = 1)
```
### Arithmetic operations
Unlike data frames, tibbles don't support arithmetic operations on all columns.
The result is silently coerced to a data frame.
Do not rely on this behavior, it may become an error in a forthcoming version.
```{r}
tbl <- tibble(a = 1:3, b = 4:6)
tbl * 2
```