_R/2017-11-28-secret-santa-graph-traversal.Rmd

---
title: Secret Santa is a graph traversal problem
excerpt: "Making a graph, connecting nodes twice"
tags:
  - r
  - graphs
  - yuletidyverse
  - dplyr
share: true
header:
  overlay_image: "assets/images/caleb-woods-1280.jpg"
  caption: "Photo credit: [**Caleb Woods**](https://unsplash.com/photos/xxmszPRm_ck)"
---

Last week at Thanksgiving, my family drew names from a hat for our annual game
of Secret Santa. Actually, it wasn't a hat but you know what I mean. (Now that I
think about it, I don't think I've ever seen names drawn from a literal hat
before!) In our family, the rules of Secret Santa are pretty simple:

* The players' names are put in "a hat". 
* Players randomly draw a name from a hat, become that person's Secret Santa,
  and get them a gift.
* If a player draws their own name, they draw again.

Once again this year, somebody asked if we could just use an app or a website to
handle the drawing for Secret Santa. _Or I could write a script to do it_ I
thought to myself. The problem nagged at the back of my mind for the past few
days. _You could just shuffle the names... no, no, no. It's trickier than that._

In this post, I describe a couple of algorithms for Secret Santa sampling using
R and directed graphs. I use the
[DiagrammeR package](https://github.com/rich-iannone/DiagrammeR) which creates 
graphs from dataframes of nodes and edges, and I liberally use 
[dplyr verbs](http://r4ds.had.co.nz/transform.html) to manipulate tables of 
edges.

If you would like a more practical way to use R for Secret Santa, including
automating the process of drawing names and _emailing players_, see 
[this blog post](http://livefreeordichotomize.com/2017/11/15/secret-sampling/).


## Making a graph, connecting nodes twice

Let's start with a subset of my family's game of just five players. I assign
each name a unique ID number.

```{r}
library(DiagrammeR)
library(magrittr)
library(dplyr, warn.conflicts = FALSE)

players <- tibble::tribble(
      ~ Name, ~ Number,
    "Jeremy",        1,
        "TJ",        2, 
  "Jonathan",        3, 
      "Alex",        4, 
   "Marissa",        5
)
```

```{r, include = FALSE}
render_graph <- function(graph, height = 300, ...) {
  g <- DiagrammeR::render_graph(graph, height = height, ...)
  
  svg_dim <- sprintf("svg width=\"%s\" height=\"%s\"", "100%", height)
  
  DiagrammeRsvg::export_svg(g) %>% 
    stringr::str_replace("^<[?]xml.+\n", "") %>%
    stringr::str_replace("^<[!]DOCTYPE.+\n.+\n", "") %>%
    stringr::str_replace("svg width=\"\\d+pt\" height=\"\\d+pt\"", 
                         svg_dim) %>%
    htmltools::HTML() %>%
    knitr::asis_output()
}

if (interactive()) render_graph <- DiagrammeR::render_graph
```

We can think of the players as nodes in a directed graph. An edge connecting two
players indicates a "gives-to" (Secret Santa) relationship. Suppose I drew
Marissa's name. Then the graph will have an edge connecting me to her. In the
code below, I use DiagrammeR to create a graph by combining a node dataframe
(`create_node_df()`) and an edge dataframe (`create_edge_df()`).

```{r}
nodes <- create_node_df(
  n = nrow(players),
  type = players$Name,
  label = players$Name
)

tj_drew_marissa <- create_edge_df(
  from = 2, 
  to = 5, 
  rel = "gives-to",
  color = "#FF4136",
  penwidth = 1
)

create_graph(nodes, tj_drew_marissa) %>% 
  render_graph()
```


Before the game starts, anyone could draw anyone else's name, so let's visualize
all possible gives-to relations. We can do this by using `combn(n, 2)` to
generate all n-choose-2 pairs and creating two edges for each pair.

```{r}
combn(players$Name, 2)

# All the edge-manipulating functions in this post take an optional take `...`
# argument for setting the style of edges.
create_all_giving_edges <- function(xs, ...) {
  aes_options <- quos(...)
  pairs <- combn(seq_along(xs), 2)
  # Each column from `combn()` is a pair. We make an edge moving down the column
  # and another edge up the column by having each row as a `from` index and a
  # `to` index.
  from <- c(pairs[1, ], pairs[2, ])
  to <- c(pairs[2, ], pairs[1, ])

  create_edge_df(from = from, to = to) %>% 
    mutate(!!! aes_options) %>% 
    as_tibble()
}

all_possible_edges <- create_all_giving_edges(
  players$Name, 
  rel = "potential-gift", 
  penwidth = .5,
  color = "#CCCCCC90"
)

create_graph(nodes, all_possible_edges) %>% 
  render_graph()
```


### A fast, simple solution is a Hamiltonian path

Do you need to organize a gift-giving drawing for a group of people? The easiest
solution is to shuffle the names and have the first name give to the second name,
the second to the third, and so on with last name giving looping back around to
the first name. This solution is equivalent to walking through the graph and
visiting every node just once. Such a path is called a 
[Hamiltonian path](https://en.wikipedia.org/wiki/Hamiltonian_path). 

Here we find a Hamiltonian path and create a helper function `overwrite_edges()`
to update the edges that fall on the path.

```{r hamiltonian edges}
overwrite_edges <- function(old_df, new_df) {
  old_df %>%
    anti_join(new_df, by = c("from", "to")) %>%
    bind_rows(new_df)
}

create_hamiltonian_gift_edges <- function(xs, ...) {
  loop_from <- sample(seq_along(xs))
  # last name gives to first
  loop_to <- c(loop_from[-1], loop_from[1])
  create_edge_df(from = loop_from, to = loop_to, ...)
}

# For reproducible blogging
set.seed(11282017)

hamiltonian_edges <- create_hamiltonian_gift_edges(
  players$Name,
  rel = "gives-to",
  color = "#FF4136",
  penwidth = 1
)

all_possible_edges %>% 
  overwrite_edges(hamiltonian_edges) %>% 
  create_graph(nodes, .) %>% 
  render_graph()
```

As promised, the red paths loop through all nodes exactly once. No one
is their own gift giver `r emo::ji("white_check_mark")`, and everyone
has an incoming red path `r emo::ji("white_check_mark")` and an outgoing
red path `r emo::ji("white_check_mark")`. Very nice. *Actually*... let's
put that checklist into a validation function.

```{r}
# Check for valid gift-giving edges
has_valid_gift_edges <- function(edge_df, indices) {
  indices <- sort(unique(indices))
  pairs <- edge_df %>% filter(rel == "gives-to")
  no_self_loop <- !any(pairs$from == pairs$to)
  exhaustive_from <- isTRUE(all.equal(sort(pairs$from), indices))
  exhaustive_to <- isTRUE(all.equal(sort(pairs$to), indices))
  all(no_self_loop, exhaustive_from, exhaustive_to)
}

has_valid_gift_edges(hamiltonian_edges, all_possible_edges$from)
```

Despite its elegance, this solution does not simulate drawing names from a
hat! Because each node is visited only once, there is no backtracking, so there
there is no reciprocal gift-giving or other sub-circuits in the graph. 

Whether you think this is a bad thing is a matter of preference. Personally, I
would like all remaining pairs to be equally probable at each step of the
drawing. This is not the case when backtracking is not allowed. (For example, if
I draw Marissa, then all of the remaining edges are not equally likely because
P(Marissa draws TJ | TJ draws Marissa) = 0.)


## Okay, do the hat-drawing thing already

Let's think about what happens when I draw Marissa's name from a nice big red
Santa hat.

* The edge from TJ to Marissa is fixed. (I drew her name.)
* All other edges from TJ become illegal. (I can't draw any more names.)
* All other edges onto Marissa become illegal. (No one else can draw her name 
  either.)

To simulate a single hat-drawing, we randomly select a legal edge, fix it, and
delete all illegal edges. Let's work through a couple of examples. 

First, we need some helper functions.

```{r}
draw_secret_santa_edge <- function(edge_df, ...) {
  aes_options <- quos(...)
  
  edge_df %>%
    filter(rel != "gives-to") %>%
    sample_n(1) %>%
    mutate(!!! aes_options)
}

find_illegal_edges <- function(edge_df, edge, ...) {
  aes_options <- quos(...)
  
  outgoing <- edge_df %>%
    filter(from %in% edge$from)

  incoming <- edge_df %>%
    filter(to %in% edge$to)

  # The one edge that is not illegal is in both 
  # the incoming and outgoing sets
  to_keep <- dplyr::intersect(outgoing, incoming)

  outgoing %>% 
    bind_rows(incoming) %>% 
    anti_join(to_keep, by = c("from", "to")) %>% 
    mutate(!!! aes_options)
}
```

Here we draw a single edge (red with fat arrow). All of the other edges that
point to the same node are illegal (navy) as are all of the edges that have the
same origin as the drawn edge.

```{r}
current_pick <- draw_secret_santa_edge(
  all_possible_edges,
  rel = "gives-to", 
  color = "#FF4136", 
  penwidth = 1, 
  arrowsize = 1
)

current_illegal_edges <- all_possible_edges %>%
  find_illegal_edges(
    current_pick, 
    color = "#001f3f", 
    penwidth = .5
  )

all_possible_edges %>%
  overwrite_edges(current_pick) %>% 
  overwrite_edges(current_illegal_edges) %>% 
  create_graph(nodes, .) %>% 
  render_graph(title = "Selected vs. illegal")
```

We delete those illegal edges and leaving us with the following graph.

```{r}
edges_after_pick1 <- all_possible_edges %>%
  overwrite_edges(current_pick %>% mutate(arrowsize = NULL)) %>% 
  anti_join(current_illegal_edges, by = "id") 

create_graph(nodes, edges_after_pick1) %>% 
  render_graph(title = "After one draw")
```

The name has been removed from the hat, and the graph is simpler now!

Let's do it again. Draw a random legal edge (fat arrow) and identify all the 
illegal paths (navy).

```{r}
current_pick <- edges_after_pick1 %>% 
  draw_secret_santa_edge(
    rel = "gives-to", 
    color = "#FF4136", 
    penwidth = 1,
    arrowsize = 1
  )

current_illegal_edges <- edges_after_pick1 %>%
  find_illegal_edges(
    edge = current_pick, 
    color = "#001f3f", 
    penwidth = .5
  )

edges_after_pick1 %>%
  overwrite_edges(current_pick) %>% 
  overwrite_edges(current_illegal_edges) %>% 
  create_graph(nodes, .) %>% 
  render_graph(title = "Selected vs. illegal")
```

After deleting illegal edges, the problem simplifies further.

```{r}
edges_after_pick2 <- edges_after_pick1 %>%
  overwrite_edges(current_pick %>% mutate(arrowsize = NULL)) %>% 
  anti_join(current_illegal_edges, by = "id") 

create_graph(nodes, edges_after_pick2) %>% 
  render_graph(title = "After two draws")
```

You can tell where this is going... Loop Town! 

To finish up, we are going to repeat this process until there are only
gift-giving edges left. We will control the loop with this helper function 
which tells us if there are any free edges remaining.

```{r}
has_free_edge <- function(edge_df) {
  edges_left <- edge_df %>% filter(rel != "gives-to") %>% nrow()
  edges_left != 0
}
```

In the function below, the while-loop does the same steps as above: Randomly
selecting a free edge and removing illegal edges. 

```{r}
draw_edges_from_hat <- function(edge_df, ...) {
  aes_options <- quos(...)
  raw_edge_df <- edge_df
  indices <- unique(c(raw_edge_df$from, raw_edge_df$to))
  
  while (has_free_edge(edge_df)) {
    pick <- edge_df %>% 
      draw_secret_santa_edge(!!! aes_options) %>% 
      mutate(rel = "gives-to")
    
    illegal_edges <- edge_df %>%
      find_illegal_edges(pick)

    edge_df <- edge_df %>%
      overwrite_edges(pick) %>% 
      anti_join(illegal_edges, by = "id") 
  }

  if (!has_valid_gift_edges(edge_df, indices)) {
    warning(call. = FALSE, "Invalid drawing. Trying again.")
    edge_df <- Recall(raw_edge_df, !!! aes_options)
  }
  
  edge_df
}

edges_after_pick2 %>% 
  draw_edges_from_hat(color = "#FF4136", penwidth = 1) %>% 
  create_graph(nodes, .) %>% 
  render_graph(height = 400, title = "Final graph")
```

After the while-loop, the function checks if it has a valid set of gift edges,
and if it doesn't, the function calls itself again. This bit is intended to 
handle more constrained situations. Such as...


## The nibling gift exchange

I lied above. Secret Santa is not so simple in family. For my generation
(with me, my siblings and our partners), there's a rule that a player
can't draw their partner's name. Similarly, my nieces and nephews (and
now also my child `r emo::ji("sparkling_heart")`) have their own gift
exchange with an added constraint: A player can't give their sibling a
gift. The elegant and simple Hamiltonian solution fails under these
constraints unless you write a special shuffling algorithm. Our
hat-drawing approach, however, handles this situation with minimal
effort. Let's work through an example with my nieces and nephews (and
`r emo::ji("baby")`!). To protect the very young, I have replaced their
names with Pokémon names.

Below we define the children and their families and do some
data-wrangling so that we have columns with the family at the start and
end of each edge.

```{r}
niblings <- tibble::tribble(
  ~ Family, ~ Name, ~ Number,
  "Water", "Squirtle", 1,
  "Water", "Wartortle", 2,
  "Electric", "Pikachu", 3, 
  "Plant", "Bulbasaur", 4, 
  "Plant", "Ivysaur", 5, 
  "Plant", "Venusaur", 6, 
  "Fighting", "Machamp", 7, 
  "Fighting", "Machoke", 8, 
  "Normal", "Ratata", 9,
  "Normal", "Raticate", 10,
  "Psychic", "Mew", 11,
  "Psychic", "Mewtwo", 12
)

nibling_edges <- niblings$Name %>% 
  create_all_giving_edges(
    rel = "potential-gift", 
    penwidth = .5,
    color = "#CCCCCC90"
  ) %>% 
  left_join(niblings, by = c("from" = "Number")) %>% 
  rename(from_fam = Family) %>% 
  select(-Name) %>% 
  left_join(niblings, by = c("to" = "Number")) %>% 
  rename(to_fam = Family) %>% 
  select(-Name) %>% 
  select(id, from, to, rel, from_fam, to_fam, everything())

nibling_edges
```

In the graph below, we draw an olive edge to connect edge pair of siblings.
These edges are illegal before we even start drawing names.

```{r}
sibling_edges <- nibling_edges %>% 
  filter(from_fam == to_fam) %>% 
  mutate(
    rel = "sibling",
    color = "#3D9970", 
    penwidth = 1)

# Update edges that represent siblings
nibling_edges <- nibling_edges %>% 
  overwrite_edges(sibling_edges)
  
nibling_nodes <- create_node_df(
  n = nrow(niblings),
  type = niblings$Name,
  label = niblings$Name)

nibling_edges %>% 
  overwrite_edges(sibling_edges) %>% 
  create_graph(nibling_nodes, .) %>% 
  render_graph(height = 400)
```

The solution for this trickier, more constrained version of the problem is to
delete the illegal edges beforehand so that they can never be drawn from the
hat. After that, the same algorithm works as before.

```{r}
nibling_edges %>% 
  filter(rel != "sibling") %>% 
  draw_edges_from_hat(color = "#FF4136") %>% 
  create_graph(nibling_nodes, .) %>% 
  render_graph(height = 500)
```

*** 

Like usual, I'm not sure how to close one of these blog posts. I guess: When a
problem involves relations among individuals, always consider whether there is a
graph hiding in the background. Even the simple process of drawing names from a
hat for Secret Santa describes a graph: a graph of gift-giving relations among
individuals. This graph wasn't obvious to me until I thought about Hamilitonian 
path solution. _Hey, wait a minute, that's a big gift-giving circle! It's like 
some kind of network... ooooooh._ 


```{r, include = FALSE}
.parent_doc <- knitr::current_input()
```
```{r, child = "_R/_footer.Rmd"}
```