forked from mmistakes/minimal-mistakes
/
2017-11-28-secret-santa-graph-traversal.Rmd
499 lines (398 loc) · 14.8 KB
/
2017-11-28-secret-santa-graph-traversal.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
---
title: Secret Santa is a graph traversal problem
excerpt: "Making a graph, connecting nodes twice"
tags:
- r
- graphs
- yuletidyverse
- dplyr
share: true
header:
overlay_image: "assets/images/caleb-woods-1280.jpg"
caption: "Photo credit: [**Caleb Woods**](https://unsplash.com/photos/xxmszPRm_ck)"
---
Last week at Thanksgiving, my family drew names from a hat for our annual game
of Secret Santa. Actually, it wasn't a hat but you know what I mean. (Now that I
think about it, I don't think I've ever seen names drawn from a literal hat
before!) In our family, the rules of Secret Santa are pretty simple:
* The players' names are put in "a hat".
* Players randomly draw a name from a hat, become that person's Secret Santa,
and get them a gift.
* If a player draws their own name, they draw again.
Once again this year, somebody asked if we could just use an app or a website to
handle the drawing for Secret Santa. _Or I could write a script to do it_ I
thought to myself. The problem nagged at the back of my mind for the past few
days. _You could just shuffle the names... no, no, no. It's trickier than that._
In this post, I describe a couple of algorithms for Secret Santa sampling using
R and directed graphs. I use the
[DiagrammeR package](https://github.com/rich-iannone/DiagrammeR) which creates
graphs from dataframes of nodes and edges, and I liberally use
[dplyr verbs](http://r4ds.had.co.nz/transform.html) to manipulate tables of
edges.
If you would like a more practical way to use R for Secret Santa, including
automating the process of drawing names and _emailing players_, see
[this blog post](http://livefreeordichotomize.com/2017/11/15/secret-sampling/).
## Making a graph, connecting nodes twice
Let's start with a subset of my family's game of just five players. I assign
each name a unique ID number.
```{r}
library(DiagrammeR)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)
players <- tibble::tribble(
~ Name, ~ Number,
"Jeremy", 1,
"TJ", 2,
"Jonathan", 3,
"Alex", 4,
"Marissa", 5
)
```
```{r, include = FALSE}
render_graph <- function(graph, height = 300, ...) {
g <- DiagrammeR::render_graph(graph, height = height, ...)
svg_dim <- sprintf("svg width=\"%s\" height=\"%s\"", "100%", height)
DiagrammeRsvg::export_svg(g) %>%
stringr::str_replace("^<[?]xml.+\n", "") %>%
stringr::str_replace("^<[!]DOCTYPE.+\n.+\n", "") %>%
stringr::str_replace("svg width=\"\\d+pt\" height=\"\\d+pt\"",
svg_dim) %>%
htmltools::HTML() %>%
knitr::asis_output()
}
if (interactive()) render_graph <- DiagrammeR::render_graph
```
We can think of the players as nodes in a directed graph. An edge connecting two
players indicates a "gives-to" (Secret Santa) relationship. Suppose I drew
Marissa's name. Then the graph will have an edge connecting me to her. In the
code below, I use DiagrammeR to create a graph by combining a node dataframe
(`create_node_df()`) and an edge dataframe (`create_edge_df()`).
```{r}
nodes <- create_node_df(
n = nrow(players),
type = players$Name,
label = players$Name
)
tj_drew_marissa <- create_edge_df(
from = 2,
to = 5,
rel = "gives-to",
color = "#FF4136",
penwidth = 1
)
create_graph(nodes, tj_drew_marissa) %>%
render_graph()
```
Before the game starts, anyone could draw anyone else's name, so let's visualize
all possible gives-to relations. We can do this by using `combn(n, 2)` to
generate all n-choose-2 pairs and creating two edges for each pair.
```{r}
combn(players$Name, 2)
# All the edge-manipulating functions in this post take an optional take `...`
# argument for setting the style of edges.
create_all_giving_edges <- function(xs, ...) {
aes_options <- quos(...)
pairs <- combn(seq_along(xs), 2)
# Each column from `combn()` is a pair. We make an edge moving down the column
# and another edge up the column by having each row as a `from` index and a
# `to` index.
from <- c(pairs[1, ], pairs[2, ])
to <- c(pairs[2, ], pairs[1, ])
create_edge_df(from = from, to = to) %>%
mutate(!!! aes_options) %>%
as_tibble()
}
all_possible_edges <- create_all_giving_edges(
players$Name,
rel = "potential-gift",
penwidth = .5,
color = "#CCCCCC90"
)
create_graph(nodes, all_possible_edges) %>%
render_graph()
```
### A fast, simple solution is a Hamiltonian path
Do you need to organize a gift-giving drawing for a group of people? The easiest
solution is to shuffle the names and have the first name give to the second name,
the second to the third, and so on with last name giving looping back around to
the first name. This solution is equivalent to walking through the graph and
visiting every node just once. Such a path is called a
[Hamiltonian path](https://en.wikipedia.org/wiki/Hamiltonian_path).
Here we find a Hamiltonian path and create a helper function `overwrite_edges()`
to update the edges that fall on the path.
```{r hamiltonian edges}
overwrite_edges <- function(old_df, new_df) {
old_df %>%
anti_join(new_df, by = c("from", "to")) %>%
bind_rows(new_df)
}
create_hamiltonian_gift_edges <- function(xs, ...) {
loop_from <- sample(seq_along(xs))
# last name gives to first
loop_to <- c(loop_from[-1], loop_from[1])
create_edge_df(from = loop_from, to = loop_to, ...)
}
# For reproducible blogging
set.seed(11282017)
hamiltonian_edges <- create_hamiltonian_gift_edges(
players$Name,
rel = "gives-to",
color = "#FF4136",
penwidth = 1
)
all_possible_edges %>%
overwrite_edges(hamiltonian_edges) %>%
create_graph(nodes, .) %>%
render_graph()
```
As promised, the red paths loop through all nodes exactly once. No one
is their own gift giver `r emo::ji("white_check_mark")`, and everyone
has an incoming red path `r emo::ji("white_check_mark")` and an outgoing
red path `r emo::ji("white_check_mark")`. Very nice. *Actually*... let's
put that checklist into a validation function.
```{r}
# Check for valid gift-giving edges
has_valid_gift_edges <- function(edge_df, indices) {
indices <- sort(unique(indices))
pairs <- edge_df %>% filter(rel == "gives-to")
no_self_loop <- !any(pairs$from == pairs$to)
exhaustive_from <- isTRUE(all.equal(sort(pairs$from), indices))
exhaustive_to <- isTRUE(all.equal(sort(pairs$to), indices))
all(no_self_loop, exhaustive_from, exhaustive_to)
}
has_valid_gift_edges(hamiltonian_edges, all_possible_edges$from)
```
Despite its elegance, this solution does not simulate drawing names from a
hat! Because each node is visited only once, there is no backtracking, so there
there is no reciprocal gift-giving or other sub-circuits in the graph.
Whether you think this is a bad thing is a matter of preference. Personally, I
would like all remaining pairs to be equally probable at each step of the
drawing. This is not the case when backtracking is not allowed. (For example, if
I draw Marissa, then all of the remaining edges are not equally likely because
P(Marissa draws TJ | TJ draws Marissa) = 0.)
## Okay, do the hat-drawing thing already
Let's think about what happens when I draw Marissa's name from a nice big red
Santa hat.
* The edge from TJ to Marissa is fixed. (I drew her name.)
* All other edges from TJ become illegal. (I can't draw any more names.)
* All other edges onto Marissa become illegal. (No one else can draw her name
either.)
To simulate a single hat-drawing, we randomly select a legal edge, fix it, and
delete all illegal edges. Let's work through a couple of examples.
First, we need some helper functions.
```{r}
draw_secret_santa_edge <- function(edge_df, ...) {
aes_options <- quos(...)
edge_df %>%
filter(rel != "gives-to") %>%
sample_n(1) %>%
mutate(!!! aes_options)
}
find_illegal_edges <- function(edge_df, edge, ...) {
aes_options <- quos(...)
outgoing <- edge_df %>%
filter(from %in% edge$from)
incoming <- edge_df %>%
filter(to %in% edge$to)
# The one edge that is not illegal is in both
# the incoming and outgoing sets
to_keep <- dplyr::intersect(outgoing, incoming)
outgoing %>%
bind_rows(incoming) %>%
anti_join(to_keep, by = c("from", "to")) %>%
mutate(!!! aes_options)
}
```
Here we draw a single edge (red with fat arrow). All of the other edges that
point to the same node are illegal (navy) as are all of the edges that have the
same origin as the drawn edge.
```{r}
current_pick <- draw_secret_santa_edge(
all_possible_edges,
rel = "gives-to",
color = "#FF4136",
penwidth = 1,
arrowsize = 1
)
current_illegal_edges <- all_possible_edges %>%
find_illegal_edges(
current_pick,
color = "#001f3f",
penwidth = .5
)
all_possible_edges %>%
overwrite_edges(current_pick) %>%
overwrite_edges(current_illegal_edges) %>%
create_graph(nodes, .) %>%
render_graph(title = "Selected vs. illegal")
```
We delete those illegal edges and leaving us with the following graph.
```{r}
edges_after_pick1 <- all_possible_edges %>%
overwrite_edges(current_pick %>% mutate(arrowsize = NULL)) %>%
anti_join(current_illegal_edges, by = "id")
create_graph(nodes, edges_after_pick1) %>%
render_graph(title = "After one draw")
```
The name has been removed from the hat, and the graph is simpler now!
Let's do it again. Draw a random legal edge (fat arrow) and identify all the
illegal paths (navy).
```{r}
current_pick <- edges_after_pick1 %>%
draw_secret_santa_edge(
rel = "gives-to",
color = "#FF4136",
penwidth = 1,
arrowsize = 1
)
current_illegal_edges <- edges_after_pick1 %>%
find_illegal_edges(
edge = current_pick,
color = "#001f3f",
penwidth = .5
)
edges_after_pick1 %>%
overwrite_edges(current_pick) %>%
overwrite_edges(current_illegal_edges) %>%
create_graph(nodes, .) %>%
render_graph(title = "Selected vs. illegal")
```
After deleting illegal edges, the problem simplifies further.
```{r}
edges_after_pick2 <- edges_after_pick1 %>%
overwrite_edges(current_pick %>% mutate(arrowsize = NULL)) %>%
anti_join(current_illegal_edges, by = "id")
create_graph(nodes, edges_after_pick2) %>%
render_graph(title = "After two draws")
```
You can tell where this is going... Loop Town!
To finish up, we are going to repeat this process until there are only
gift-giving edges left. We will control the loop with this helper function
which tells us if there are any free edges remaining.
```{r}
has_free_edge <- function(edge_df) {
edges_left <- edge_df %>% filter(rel != "gives-to") %>% nrow()
edges_left != 0
}
```
In the function below, the while-loop does the same steps as above: Randomly
selecting a free edge and removing illegal edges.
```{r}
draw_edges_from_hat <- function(edge_df, ...) {
aes_options <- quos(...)
raw_edge_df <- edge_df
indices <- unique(c(raw_edge_df$from, raw_edge_df$to))
while (has_free_edge(edge_df)) {
pick <- edge_df %>%
draw_secret_santa_edge(!!! aes_options) %>%
mutate(rel = "gives-to")
illegal_edges <- edge_df %>%
find_illegal_edges(pick)
edge_df <- edge_df %>%
overwrite_edges(pick) %>%
anti_join(illegal_edges, by = "id")
}
if (!has_valid_gift_edges(edge_df, indices)) {
warning(call. = FALSE, "Invalid drawing. Trying again.")
edge_df <- Recall(raw_edge_df, !!! aes_options)
}
edge_df
}
edges_after_pick2 %>%
draw_edges_from_hat(color = "#FF4136", penwidth = 1) %>%
create_graph(nodes, .) %>%
render_graph(height = 400, title = "Final graph")
```
After the while-loop, the function checks if it has a valid set of gift edges,
and if it doesn't, the function calls itself again. This bit is intended to
handle more constrained situations. Such as...
## The nibling gift exchange
I lied above. Secret Santa is not so simple in family. For my generation
(with me, my siblings and our partners), there's a rule that a player
can't draw their partner's name. Similarly, my nieces and nephews (and
now also my child `r emo::ji("sparkling_heart")`) have their own gift
exchange with an added constraint: A player can't give their sibling a
gift. The elegant and simple Hamiltonian solution fails under these
constraints unless you write a special shuffling algorithm. Our
hat-drawing approach, however, handles this situation with minimal
effort. Let's work through an example with my nieces and nephews (and
`r emo::ji("baby")`!). To protect the very young, I have replaced their
names with Pokémon names.
Below we define the children and their families and do some
data-wrangling so that we have columns with the family at the start and
end of each edge.
```{r}
niblings <- tibble::tribble(
~ Family, ~ Name, ~ Number,
"Water", "Squirtle", 1,
"Water", "Wartortle", 2,
"Electric", "Pikachu", 3,
"Plant", "Bulbasaur", 4,
"Plant", "Ivysaur", 5,
"Plant", "Venusaur", 6,
"Fighting", "Machamp", 7,
"Fighting", "Machoke", 8,
"Normal", "Ratata", 9,
"Normal", "Raticate", 10,
"Psychic", "Mew", 11,
"Psychic", "Mewtwo", 12
)
nibling_edges <- niblings$Name %>%
create_all_giving_edges(
rel = "potential-gift",
penwidth = .5,
color = "#CCCCCC90"
) %>%
left_join(niblings, by = c("from" = "Number")) %>%
rename(from_fam = Family) %>%
select(-Name) %>%
left_join(niblings, by = c("to" = "Number")) %>%
rename(to_fam = Family) %>%
select(-Name) %>%
select(id, from, to, rel, from_fam, to_fam, everything())
nibling_edges
```
In the graph below, we draw an olive edge to connect edge pair of siblings.
These edges are illegal before we even start drawing names.
```{r}
sibling_edges <- nibling_edges %>%
filter(from_fam == to_fam) %>%
mutate(
rel = "sibling",
color = "#3D9970",
penwidth = 1)
# Update edges that represent siblings
nibling_edges <- nibling_edges %>%
overwrite_edges(sibling_edges)
nibling_nodes <- create_node_df(
n = nrow(niblings),
type = niblings$Name,
label = niblings$Name)
nibling_edges %>%
overwrite_edges(sibling_edges) %>%
create_graph(nibling_nodes, .) %>%
render_graph(height = 400)
```
The solution for this trickier, more constrained version of the problem is to
delete the illegal edges beforehand so that they can never be drawn from the
hat. After that, the same algorithm works as before.
```{r}
nibling_edges %>%
filter(rel != "sibling") %>%
draw_edges_from_hat(color = "#FF4136") %>%
create_graph(nibling_nodes, .) %>%
render_graph(height = 500)
```
***
Like usual, I'm not sure how to close one of these blog posts. I guess: When a
problem involves relations among individuals, always consider whether there is a
graph hiding in the background. Even the simple process of drawing names from a
hat for Secret Santa describes a graph: a graph of gift-giving relations among
individuals. This graph wasn't obvious to me until I thought about Hamilitonian
path solution. _Hey, wait a minute, that's a big gift-giving circle! It's like
some kind of network... ooooooh._
```{r, include = FALSE}
.parent_doc <- knitr::current_input()
```
```{r, child = "_R/_footer.Rmd"}
```