In [3]:
IRdisplay::display_html("
<style>
.rendered_html table, .rendered_html th, .rendered_html tr, .rendered_html td {
     font-size: 100%;
}
body.rise-enabled div.inner_cell>div.input_area {
    font-size: 150%;
}

body.rise-enabled div.output_subarea.output_text.output_result {
    font-size: 150%;
}
body.rise-enabled div.output_subarea.output_text.output_stream.output_stdout {
  font-size: 150%;
}
</style>
")

In [4]:
library(tidyverse)
library(nycflights13)

── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.2.1     ✔ purrr   0.3.2
✔ tibble  2.1.3     ✔ dplyr   0.8.3
✔ tidyr   0.8.3     ✔ stringr 1.4.0
✔ readr   1.3.0     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()  masks stats::filter()
✖ purrr::flatten() masks jsonlite::flatten()
✖ dplyr::lag()     masks stats::lag()


# Lecture 18: Abstraction & Debugging

## So, your code does not work.
1. What makes you say your code isn't working?
2. What did you expect your code to do and why?
3. What did your code do instead and how do you know?

## Examples of some types of bugs
Each of the code boxes below features a certain type of bug. We'll see strategies for how to debug each type.

### Syntax error
The easiest to debug: your code won't even parse.

In [3]:
ggplot(mpg) + geom_bar(aes(x = cty, y = hwy)

ERROR: Error in parse(text = x, srcfile = src): <text>:2:0: unexpected end of input
1: ggplot(mpg) + geom_bar(aes(x = cty, y = hwy)
   ^


You can probably look at this example and see immediately where the problem is. But what about the following example?

In [12]:
mustart <- model.extract(mf, "mustart")
etastart <- model.extract(mf, "etastart")
fit <- eval(call(if (is.function(method)) "method" else method, 
    x = X, y = Y, weights = weights, start = start, etastart = etastart, 
    mustart = mustart, offset = offset, family = family, 
    control = control, intercept = attr(mt, "intercept" > 
        0L, singular.ok = singular.ok)
)
if (length(offset) && attr(mt, "intercept") > 0L) {
    fit2 <- eval(call(if (is.function(method)) "method" else method, 
        x = X[, "(Intercept)", drop = FALSE], y = Y, weights = weights, 
        offset = offset, family = family, control = control, 
        intercept = TRUE))
    if (!fit2$converged) 
        warning("fitting to calculate the null deviance did not converge -- increase 'maxit'?")
    fit$null.deviance <- fit2$deviance
}

ERROR: Error in parse(text = x, srcfile = src): <text>:9:1: unexpected 'if'
8: )
9: if
   ^


Strategies for debugging syntax errors:
- Start at the indicated line. See if you can quickly spot the error.
- If not, start deleting things. (Remember, your only goal is to get it to parse.)
- Eventually, you'll delete enough code that it will parse. Backtrack.

### Exercise 
Debug the following syntax error(s):

In [45]:
# racial composition of midwest counties
midwest %>% mutate(pop_10k = cut(poptotal / 1e4,
                                 breaks = (0, 1, 2, 5, 10, 50, 100, Inf))) %>%
            select(pop_10k, popwhite:popother) %>%
            gather("race", "population", -pop_10k) %>%
            mutate(race = str_sub(race, 4)) %>%
            group_by(pop_10k, race) %>
            summarize(n=n(), population = sum(population)) %>%
            group_by(pop_10k) %>%
            mutate(population = round(population / sum(population), 3)) 
            spread(race, population)

ERROR: Error in parse(text = x, srcfile = src): <text>:3:45: unexpected ','
2: midwest %>% mutate(pop_10k = cut(poptotal / 1e4,
3:                                  breaks = (0,
                                               ^


### Runtime error
The code parses, but crashes when I run it.

Strategies for debugging runtime errors:
- Similar to syntax errors: start at the indicated line. See if you can quickly spot the error.
- Runtime errors often occur because you have made some assumption about the input that is not true.
- Use `print()` statements to how monitor execution progresses.

### Exercise
Resolve the runtime errors:

In [54]:
# racial composition of midwest counties
midwest %>% mutate(pop_10k = cut(poptotal / 1e4,
                                 breaks = c(0, 1, 2, 5, 10, 50, 100, Inf))) %>%
            select(pop_10k, popwhite:popother) %>%
            gather("race", "population", -pop_10k) %>%
            mutate(race = str_sub(race, 4)) %>%
            group_by(pop_10k, race) %>%
            summarize(n=n(), population = sum(population)) %>%
            group_by(pop_10k) %>%
            mutate(population = round(population / sum(population), 3)) +
            spread(race, population)

ERROR: Error in spread(race, population): object 'race' not found


### Logical errors
The program runs and returns an answer, but the answer isn't what I expect.

In [72]:
int01 <- function(f) {
    x <- 1:1000 / 1000
    sum(mean(f(x)))
}
int01(function(x) x^2)
int01(sin)
-(cos(1) - cos(0))

[1] 0.3338335

[1] 0.4601184

[1] 0.4596977

Strategies for debugging runtime errors:
- Use `stop()` to ensure that various assumptions you have made about your program are in fact true.
- Use `print()` statements to inspect intermediate variables. 
- Advanced tools (debuggers) exist.

### Exercise
The following function has several types of bugs. Fix them:

In [51]:
# given a vector `y` and a data frame `df`, return the name of the
# column in `df` which has the highest squared correlation with `y`.
# (assume `y` and all columns of `df` are numeric.)
most_correlated <- function(y, df) {
    highest_corr = na  # highest correlation seen so far
    highest_i = ""     # index of that column in df
    for (i in 1:ncol(df) { # loop over each column of df
        col = df$i         # extract i-th column of df
        if (cor(col, y)^2 > highest_corr)  # if this correlation exceeds previous high:
            highest_cor <- corr(col, y)^2  # set new highest correlation
            highest_i <- i                 # store corresponding index
    } # end for loop
    return(colnames(cor)[[i]])  # return name of most correlated column in df
}

ERROR: Error in parse(text = x, srcfile = src): <text>:7:26: unexpected '{'
6:     highest_i = ""     # index of that column in df
7:     for (i in 1:ncol(df) {
                            ^


The expected output of the function is:

    ### example 1
    > y = rnorm(5)
    > df <- tibble(x1 = -y, x2 = rnorm(5))
    > most_correlated(y, df)
    [1] "x1"
    ### example 2
    > most_correlated(mpg$hwy, select(mpg, cty, displ))
    [1] "cty"

### Advice for seeking help
Suppose you absolutely cannot figure out your bug. Fortunately, there are great online resources where people will help you for free!
- Slack (this class)
- Github (maintained software)
- Stackoverflow

In order to maximize your chance of getting help, it helps to follow a few guidelines:
- Be a specific as you can about what you were expecting to happen, what happened, and why this is a bug.
- Post the *exact* error message that you get, along with context.
- Post a minimal bit of code that someone can use to reproduce your bug.

## Abstraction
Why use functions? They let us break down complicated problems into smaller, more manageable subproblems. Complex software is written using this principle.

![example of a stack trace](http://2.bp.blogspot.com/-9nBb0CvqBIg/T2UKV06nD5I/AAAAAAAAAkQ/Pl2Hfj5HUlY/s1600/short-stack.png)

## The wiki-link game
To illustrate how and when to write functions, we will write a program that plays the [wiki-link game](https://en.wikipedia.org/wiki/Wikipedia:Wiki-Link_Game).

Relative to what we have seen so far, this is advanced. To solve it, we will break the problem into smaller pieces which we can then tackle.

In [None]:
play_wiki_link <- function(n) {
    # implement
}

In [151]:
play_wiki_link <- function(n) {
    url <- get_random_wiki_url()
    stopped <- FALSE
    i <- 1
    visited <- c(url)
    while (TRUE) {
        links <- extract_links(url)
        if (length(links) < n) {
            # short-page ending
            print("short page")
            break
        }
        # take the n-th link
        chosen_link <- links[n]
        # decide how to proceed
        if (chosen_link %in% visited) {
            # infinite loop ending
            print("infinite loop")
            break
        } 
        if (empty_link(chosen_link)) {
            # red link ending
            print("empty link")
            break
        }
        # link is valid
        url <- str_c("https://en.wikipedia.org", chosen_link)
        print(chosen_link,appendLF=FALSE)
        flush.console()
        visited <- c(visited, chosen_link)
        i <- i + 1
        Sys.sleep(.5)
    }
    i
}

play_wiki_link(5)

[1] "/wiki/Wikipedia:Verifiability"
[1] "/wiki/Wikipedia:What_%22Ignore_all_rules%22_means#Use_common_sense"
[1] "/wiki/Wikipedia:Shortcut"
[1] "/wiki/Wikipedia:Ignore_all_rules"
[1] "/wiki/Wikipedia:Policies_and_guidelines"
[1] "/wiki/Wikipedia:Principles"
[1] "/wiki/Wikipedia:Five_pillars"
[1] "/wiki/Wikipedia:Wikipedia_is_an_encyclopedia"
[1] "/wiki/Encyclopedia"
[1] "/wiki/File:Ringelbergius,_%27Lucubrationes...KYKLOPEDEIA...%27_ed._Basel_1541_original.JPG"
[1] "short page"


[1] 11

In [None]:
library(httr)
get_random_wiki_url <- function() {
    # https://en.wikipedia.org/wiki/Special:Random
    GET('https://en.wikipedia.org/wiki/Special:Random')$url
}
get_random_wiki_url()

In [134]:
library(xml2)
extract_links <- function(url) {
    page <- xml2::read_html(url)
    xml_find_all(page, "//div[@id='mw-content-text']//a") %>% 
        xml_attr('href') %>% 
        discard(is.na) %>% 
        keep(~ any(c(startsWith(., "/wiki/"), startsWith(., "/w/index.php?"))))
}

extract_links(get_random_wiki_url())

  [1] "/wiki/File:Iran_location_map.svg"               
  [2] "/wiki/Geographic_coordinate_system"             
  [3] "/wiki/List_of_countries"                        
  [4] "/wiki/Iran"                                     
  [5] "/wiki/Provinces_of_Iran"                        
  [6] "/wiki/Hormozgan_Province"                       
  [7] "/wiki/Counties_of_Iran"                         
  [8] "/wiki/Jask_County"                              
  [9] "/wiki/Bakhsh"                                   
 [10] "/wiki/Central_District_(Jask_County)"           
 [11] "/wiki/Rural_Districts_of_Iran"                  
 [12] "/wiki/Jask_Rural_District"                      
 [13] "/wiki/Time_zone"                                
 [14] "/wiki/UTC%2B3:30"                               
 [15] "/wiki/Iran_Standard_Time"                       
 [16] "/wiki/Daylight_saving_time"                     
 [17] "/wiki/UTC%2B4:30"                               
 [18] "/wiki/Iran_Daylight_Time"                

In [142]:
empty_link <- function(link) {
    # is it a red link?
    startsWith(link, "/w/index.php?")
}

links <- extract_links(get_random_wiki_url())
print(links[1:5])
empty_link(links[1:5])

[1] "/wiki/File:Ukraine_Volin_highland_en.jpg"
[2] "/wiki/File:Ukraine_Volin_highland_en.jpg"
[3] "/wiki/Ukrainian_language"                
[4] "/wiki/Upland_(geology)"                  
[5] "/wiki/Western_Ukraine"                   


[1] FALSE FALSE FALSE FALSE FALSE