-
Notifications
You must be signed in to change notification settings - Fork 2k
/
faq-annotation.Rmd
246 lines (187 loc) · 7.57 KB
/
faq-annotation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
---
title: "FAQ: Annotation"
---
```{=html}
<style>
.content h3 {
margin-top: -30px !important;
}
details {
margin-bottom: 40px;
}
</style>
```
```{r, include = FALSE}
library(ggplot2)
library(dplyr)
knitr::opts_chunk$set(
fig.dpi = 300,
collapse = TRUE,
comment = "#>",
fig.asp = 0.618,
fig.width = 6,
out.width = "80%"
)
```
### Why is annotation created with `geom_text()` pixellated? How can I make it more crisp?
You should use `annotate(geom = "text")` instead of `geom_text()` for annotation.
<details>
<summary>See example</summary>
In the following visualisation we have annotated a histogram with a red line and red text to mark the mean. Note that both the line and the text appears pixellated/fuzzy.
```{r}
#| fig.alt: "Histogram of highway miles per gallon for 234 cars. A red line is
#| placed at the position 23.44 and is adorned with the label 'mean 23.44'.
#| Both the line and the text appear pixellated due to overplotting."
mean_hwy <- round(mean(mpg$hwy), 2)
ggplot(mpg, aes(x = hwy)) +
geom_histogram(binwidth = 2) +
geom_segment(
x = mean_hwy, xend = mean_hwy,
y = 0, yend = 35,
color = "red"
) +
geom_text(
x = mean_hwy, y = 40,
label = paste("mean\n", mean_hwy),
color = "red"
)
```
This is because `geom_text()` draws the geom once per each row of the data frame, and plotting these on top of each other. For annotation (as opposed to plotting the data using text as geometric objects to represent each observation) use `annotate()` instead.
```{r}
#| fig.alt: "Histogram of highway miles per gallon for 234 cars. A red line is
#| placed at the position 23.44 and is adorned with the label 'mean = 23.44'.
#| Both the line and the text appear crisp."
ggplot(mpg, aes(x = hwy)) +
geom_histogram(binwidth = 2) +
annotate("segment",
x = mean_hwy, xend = mean_hwy, y = 0, yend = 35,
color = "red"
) +
annotate("text",
x = mean_hwy, y = 40,
label = paste("mean =", mean_hwy),
color = "red"
)
```
</details>
### How can I make sure all annotation created with `geom_text()` fits in the bounds of the plot?
Set `vjust = "inward"` and `hjust = "inward"` in `geom_text()`.
<details>
<summary>See example</summary>
Suppose you have the following data frame and visualization. The labels at the edges of the plot are cut off slightly.
```{r}
#| fig.alt: "A plot showing the words 'two', 'three' and 'four' arranged
#| diagonally. The 'two' and 'four' labels have been clipped to the panel's
#| edge and are not displayed completely."
df <- tibble::tribble(
~x, ~y, ~name,
2, 2, "two",
3, 3, "three",
4, 4, "four"
)
ggplot(df, aes(x = x, y = y, label = name)) +
geom_text(size = 10)
```
You could manually extend axis limits to avoid this, but a more straightforward approach is to set `vjust = "inward"` and `hjust = "inward"` in `geom_text()`.
```{r}
#| fig.alt: "A plot showing the words 'two', 'three' and 'four' arranged
#| diagonally. The 'two' and 'four' labels are aligned to the top-right and
#| bottom-left relative to their anchor points, and are displayed in their
#| entirety."
ggplot(df, aes(x = x, y = y, label = name)) +
geom_text(size = 10, vjust = "inward", hjust = "inward")
```
</details>
### How can I annotate my bar plot to display counts for each bar?
Either calculate the counts ahead of time and place them on bars using `geom_text()` or let `ggplot()` calculate them for you and then add them to the plot using `stat_count()` with `geom = "text"`.
<details>
<summary>See example</summary>
Suppose you have the following bar plot and you want to add the number of cars that fall into each `drv` level on their respective bars.
```{r}
#| fig.alt: "A bar chart showing the number of cars for each of three types
#| of drive train."
ggplot(mpg, aes(x = drv)) +
geom_bar()
```
One option is to calculate the counts with `dplyr::count()` and then pass them to the `label` mapping in `geom_text()`.
Note that we expanded the y axis limit to get the numbers to fit on the plot.
```{r}
#| fig.alt: "A bar chart showing the number of cars for each of three types
#| of drive train. The count values are displayed on top of the bars as text."
mpg %>%
dplyr::count(drv) %>%
ggplot(aes(x = drv, y = n)) +
geom_col() +
geom_text(aes(label = n), vjust = -0.5) +
coord_cartesian(ylim = c(0, 110))
```
Another option is to let `ggplot()` do the counting for you, and access these counts with `after_stat(count)` that is mapped to the labels to be placed on the plot with `stat_count()`.
```{r}
#| fig.alt: "A bar chart showing the number of cars for each of three types
#| of drive train. The count values are displayed on top of the bars as text."
ggplot(mpg, aes(x = drv)) +
geom_bar() +
stat_count(geom = "text", aes(label = ..count..), vjust = -0.5) +
coord_cartesian(ylim = c(0, 110))
```
</details>
### How can I annotate my stacked bar plot to display counts for each segment?
First calculate the counts for each segment (e.g. with `dplyr::count()`) and then place them on the bars with `geom_text()` using `position_stack(vjust = 0.5)` in the `position` argument to place the values in the middle of the segments.
<details>
<summary>See example</summary>
Suppose you have the following stacked bar plot.
```{r}
#| fig.alt: "A stacked bar chart showing the number of cars for each of seven
#| types of cars. The fill colour of the bars indicate the type of drive
#| train."
ggplot(mpg, aes(x = class, fill = drv)) +
geom_bar()
```
You can first calculate the counts for each segment with `dplyr::count()`, which will place these values in a column called `n`.
```{r}
mpg %>%
count(class, drv)
```
You can then pass this result directly to `ggplot()`, draw the segments with appropriate heights with `y = n` in the `aes`thetic mapping and `geom_col()` to draw the bars, and finally place the counts on the plot with `geom_text()`.
```{r}
#| fig.alt: "A stacked bar chart showing the number of cars for each of seven
#| types of cars. The fill colour of the bars indicate the type of drive
#| train. In the middle of each filled part, the count value is displayed as
#| text."
mpg %>%
count(class, drv) %>%
ggplot(aes(x = class, fill = drv, y = n)) +
geom_col() +
geom_text(aes(label = n), size = 3, position = position_stack(vjust = 0.5))
```
</details>
### How can I display proportions (relative frequencies) instead of counts on a bar plot?
Either calculate the proportions ahead of time and place them on bars using `geom_text()` or let `ggplot()` calculate them for you and then add them to the plot using `stat_coun()` with `geom = "text"`.
<details>
<summary>See example</summary>
Suppose you have the following bar plot but you want to display the proportion of cars that fall into each `drv` level, instead of the count.
```{r}
#| fig.alt: "A bar chart showing the number of cars for each of three types
#| of drive train."
ggplot(mpg, aes(x = drv)) +
geom_bar()
```
One option is to calculate the proportions with `dplyr::count()` and then use `geom_col()` to draw the bars
```{r}
#| fig.alt: "A bar chart showing the proportion of cars for each of three types
#| of drive train."
mpg %>%
dplyr::count(drv) %>%
mutate(prop = n / sum(n)) %>%
ggplot(aes(x = drv, y = prop)) +
geom_col()
```
Another option is to let `ggplot()` do the calculation of proportions for you, and access these counts with `..prop..`.
Note that we also need to the `group = 1` mapping for this option.
```{r}
#| fig.alt: "A bar chart showing the proportion of cars for each of three types
#| of drive train."
ggplot(mpg, aes(x = drv, y = ..prop.., group = 1)) +
geom_bar()
```
</details>