# Chapter 2 Names and Values

In [2]:
set.seed(1014)
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.1     [32m✔[39m [34mdplyr  [39m 1.0.0
[32m✔[39m [34mtidyr  [39m 1.1.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



## Section 2.1

### Question 1

Given the following data frame, how do I create a new columna names `3` that contains the sum of `1` and `2`? You may only use the $ operator, not [[. What makes `1`, `2`, and `3` challenging as varible names.

In [3]:
df <- data.frame(runif(3), runif(3))
names(df) <- c(1, 2)

df$`3` <- df$`1` + df$`2`
df

1,2,3
<dbl>,<dbl>,<dbl>
0.08075014,0.157208442,0.2379586
0.83433304,0.007399441,0.8417325
0.60076089,0.466393497,1.0671544


### Question 2

In the following code, how much memory does `y` occupy?

In [4]:
x <- runif(1e6)

y <- list(x, x, x)

object.size(y)


24000224 bytes

### Question 3

On which line does `a` get copied in the following example?

a <- c(1, 5, 3, 2)
b <- a
b[[1]] <- 10

`a` gets copied in the second line.

### Solutions

- Question 1: Correct
- Question 2: Incorrect
- Question 3: Incorrect

## Section 2.2

Consider the code: `x <- c(1,2,3)`.

This code is doing two things:

1. It's creating an object, a vector of values, c(1, 2, 3)
2. And it's binding that object to a name, `x`.

What does this mean? It means `x` is not the object at all. It's the the binding - essentially, the name. This also means that when I run this piece of code:

`y <- x`

I am not creating a new object at all. I just get another binding to the vector that was creating. (This is why Question 2 was wrong. The line I said was the copy was really just another binding.)

I can see the new object's location using `lobstr::obj_addr()` as shown in the example below.

In [5]:
x <- c(1,2,3)
y <- x

library(lobstr)
obj_addr(x)
obj_addr(y)

### Non-syntatic Names 

R has strict rules about what constitutes a valid name. A __syntatic__ name must consist of letters, digits, `.`, and `_`. There are two rules to follow:

1. A valid name cannot start with `_` or a digit.
2. A valid name cannot be any of the __reserved words__ like `TRUE`, `NULL`, `if`, and `function`. (This list can be seen in `?Reserved`)

A name that doesn't follow these rules are known as __non-syntatic__ names.

In [10]:
_abc <- 1

ERROR: Error in parse(text = x, srcfile = src): <text>:1:1: unexpected input
1: _
    ^


In [7]:
if <- 10

ERROR: Error in parse(text = x, srcfile = src): <text>:1:4: unexpected assignment
1: if <-
       ^


I can override these these rules by using the backticks, but this is not recommended.

In [8]:
`_abc` <- 1
`_abc`

In [9]:
`if` <- 10
`if`

### 2.2.2 Exercises

#### Question 1

Explain the relationship between `a`, `b`, `c`, and `d` in the following code:

In [12]:
a <- 1:10
b <- a
c <- b
d <- 1:10

obj_addr(a)
obj_addr(b)
obj_addr(c)
obj_addr(d)

The vector of `1:10` is binded to `a`, `b`, and `c`. Evidence of this is the similar locations of `a`, `b`, and `c`. A new vector of `1:10` is created and bound to `d`.

#### Question 2

The following code accesses the mean function in multiple ways. Do they all point to the same underlying object? Verify this with `lobstr::obj_addr()`.

In [18]:
mean

In [19]:
obj_addr(mean)

In [20]:
base::mean

In [22]:
obj_addr(base::mean)

In [24]:
get("mean")

In [25]:
obj_addr(get("mean"))

In [26]:
evalq(mean)

In [27]:
obj_addr(evalq(mean))

In [17]:
match.fun("mean")

In [28]:
obj_addr(match.fun("mean"))

Yes, all these calls point to the same underlying function object.

#### Question 3

By default, base R import functions, like `read.csv`, will automatically convert non-syntactic names to syntatic ones. Why might this be problematic? What options allows you to suppress this behaviour?

The problem with this is calling the variables later in the code. Additionally, R will read in the variables using the backticks and I might notice that.

Within `read.csv`, and other data import options, I can submit a string vector to pre-set the names upon import.

#### Question 4

What rules `make.names()` use to convert non-syntatic names into syntatic ones?

First, an error will be kicked out if one of the two rules are broken. If characters besides `.` or `_` are used, they are replaced with `.`. 

#### Question 5

I slightly simplified the rules that govern syntatic names. Why is `.123e1` not a syntatic name? Read `?make.names` for the full details.

In [31]:
?make.names

The variable `.123e1` is not syntatic because it starts with a `.`.

## 2.3 Copy-on-modify

Previously, it was shown that the following code binds both `x` and `y` are bound to the same vector.

In [32]:
x <- c(1,2,3)
y <- x

So what happens when I modify `y`? R does not create a new object. Instead, R makes a copy, applies the modification, and binds it to `y`.

In [33]:
y[[3]] <- 4
y
x

In [34]:
obj_addr(x)
obj_addr(y)

This behaviour is called __copy-on-modify__. 

### 2.3.1 tracemem()

I can see when an object is copied with the help of `base::tracemem()`.

In [35]:
x <- c(1,2,3)
cat(tracemem(x), "\n")

<0x7fb78b4da368> 


In [36]:
tracemem(x)

In [37]:
y <- x
y[[3]] <- 4L


tracemem[0x7fb78b4da368 -> 0x7fb78ddc1298]: eval eval withVisible withCallingHandlers doTryCatch tryCatchOne tryCatchList tryCatch try handle timing_fn evaluate_call evaluate doTryCatch tryCatchOne tryCatchList doTryCatch tryCatchOne tryCatchList tryCatch <Anonymous> handle_shell <Anonymous> <Anonymous> 


In [38]:
y[[3]] <- 5L
untracemem(x)

### 2.3.2 Function calls

The same rules for copying also apply to function calls. Consider the following code:

In [40]:
f <- function(a){
    a
}

x <- c(1,2,3)
tracemem(x)

z <- f(x) # there isn't a copy here...

untracemem(x)

### 2.3.3 Lists

It's not just names (i.e., variables) that point to values... elements of lists do too! (mind blown!)

In [46]:
l1 <- list(1,2,3)
l2 <- l1
l2[[3]] <- 4
ref(l1, l2)

[38;5;214m█[39m [[1m1[22m:0x7fb78b46ab58] <list> 
├─[[1m2[22m:0x7fb7898fede8] <dbl> 
├─[[1m3[22m:0x7fb7898fee20] <dbl> 
└─[[1m4[22m:0x7fb7898fef00] <dbl> 
 
[38;5;214m█[39m [[1m5[22m:0x7fb78f0f9d38] <list> 
├─[[38;5;244m2:0x7fb7898fede8[39m] 
├─[[38;5;244m3:0x7fb7898fee20[39m] 
└─[[1m6[22m:0x7fb7898fefe0] <dbl> 

### 2.3.4 Data frames

Data frames are lists of vectors, so copy-on-modify has important consequences when you modify a data frame.

In [47]:
d1 <- data.frame(x = c(1, 5, 6),
                 y = c(2, 4, 3))

d2 <- d1
d2[,2] <- d2[,2] * 2

In [48]:
ref(d1, d2)

[38;5;214m█[39m [[1m1[22m:0x7fb78b4bbcc8] <df[,2]> 
├─[3m[38;5;244mx[39m[23m = [[1m2[22m:0x7fb7908d9d38] <dbl> 
└─[3m[38;5;244my[39m[23m = [[1m3[22m:0x7fb7908d9c98] <dbl> 
 
[38;5;214m█[39m [[1m4[22m:0x7fb78f0c94c8] <df[,2]> 
├─[3m[38;5;244mx[39m[23m = [[38;5;244m2:0x7fb7908d9d38[39m] 
└─[3m[38;5;244my[39m[23m = [[1m5[22m:0x7fb79092c488] <dbl> 

In [49]:
d3 <- d1
d3[1,] <- d3[1,] * 3
ref(d1, d2, d3)

[38;5;214m█[39m [[1m1[22m:0x7fb78b4bbcc8] <df[,2]> 
├─[3m[38;5;244mx[39m[23m = [[1m2[22m:0x7fb7908d9d38] <dbl> 
└─[3m[38;5;244my[39m[23m = [[1m3[22m:0x7fb7908d9c98] <dbl> 
 
[38;5;214m█[39m [[1m4[22m:0x7fb78f0c94c8] <df[,2]> 
├─[3m[38;5;244mx[39m[23m = [[38;5;244m2:0x7fb7908d9d38[39m] 
└─[3m[38;5;244my[39m[23m = [[1m5[22m:0x7fb79092c488] <dbl> 
 
[38;5;214m█[39m [[1m6[22m:0x7fb78b611ec8] <df[,2]> 
├─[3m[38;5;244mx[39m[23m = [[1m7[22m:0x7fb78e576f28] <dbl> 
└─[3m[38;5;244my[39m[23m = [[1m8[22m:0x7fb78e576ed8] <dbl> 

### 2.3.5 Character vectors

In [50]:
x <- c("a","a","abc","d")
ref(x, character = TRUE)

[38;5;214m█[39m [[1m1[22m:0x7fb790ee3248] <chr> 
├─[[1m2[22m:0x7fb7840c58b0] <string: "a"> 
├─[[38;5;244m2:0x7fb7840c58b0[39m] 
├─[[1m3[22m:0x7fb7900e5d28] <string: "abc"> 
└─[[1m4[22m:0x7fb7852049d0] <string: "d"> 

### 2.3.6 Exercises

#### Question 1

Why is `tracemem(1:10)` not useful?

In [53]:
tracemem(1:10)

In [54]:
tracemem(1:10)

In [55]:
tracemem(1:10)

`tracemem(1:10)` isn't useful because each time I call it, there isn't anything to bind `1:10` to. So, a new vector is created and stored.

#### Question 2

Explain why `tracemem()` shows two copies when you run this code. Hint: carefully look at the difference betwen this code and the code shown earlier in the section.

In [56]:
x <- c(1L, 2L, 3L)
tracemem(x)

x[[3]] <- 4

tracemem[0x7fb78b1d7e08 -> 0x7fb78a5f9708]: eval eval withVisible withCallingHandlers doTryCatch tryCatchOne tryCatchList tryCatch try handle timing_fn evaluate_call evaluate doTryCatch tryCatchOne tryCatchList doTryCatch tryCatchOne tryCatchList tryCatch <Anonymous> handle_shell <Anonymous> <Anonymous> 
tracemem[0x7fb78a5f9708 -> 0x7fb78790b298]: eval eval withVisible withCallingHandlers doTryCatch tryCatchOne tryCatchList tryCatch try handle timing_fn evaluate_call evaluate doTryCatch tryCatchOne tryCatchList doTryCatch tryCatchOne tryCatchList tryCatch <Anonymous> handle_shell <Anonymous> <Anonymous> 


Two copies are shown because I change the third position of the list from `3L` to `4`.

#### Question 3

Sketch out the relationship between the follwoing objects:

In [57]:
a <- 1:10
b <- list(a, a)
c <- list(b, a, 1:10)
ref(a, b, c)

[[1m1[22m:0x7fb78cfd0040] <int> 
 
[38;5;214m█[39m [[1m2[22m:0x7fb78e0f4fc8] <list> 
├─[[38;5;244m1:0x7fb78cfd0040[39m] 
└─[[38;5;244m1:0x7fb78cfd0040[39m] 
 
[38;5;214m█[39m [[1m3[22m:0x7fb790167a18] <list> 
├─[[38;5;244m2:0x7fb78e0f4fc8[39m] 
├─[[38;5;244m1:0x7fb78cfd0040[39m] 
└─[[1m4[22m:0x7fb78d87e070] <int> 

#### Question 4

What happens when you run this code?

In [58]:
x <- list(1:10)
x[[2]] <- x

ref(x)

[38;5;214m█[39m [[1m1[22m:0x7fb78f230388] <list> 
├─[[1m2[22m:0x7fb790c319e0] <int> 
└─[38;5;214m█[39m [[1m3[22m:0x7fb78e1627f0] <list> 
  └─[[38;5;244m2:0x7fb790c319e0[39m] 

## 2.4 Object Size 

I can find out how much memory an object takes with `lobstr::obj_size`. 

So, the question is what is the difference between `obj_size` and `object.size`? In short, `obj_size` is more accurate and relevant in knowing the memory allocation size while `object.size` calculates size literally without considering references and other such nuances into account. (source: https://github.com/r-lib/lobstr/issues/21)

In [59]:
obj_size(letters)
object.size(letters)

1,712 B

1712 bytes

In [60]:
obj_size(ggplot2::diamonds)
object.size(ggplot2::diamonds)

3,456,344 B

3456848 bytes

In [61]:
x <- runif(1e6)
obj_size(x)
object.size(x)

8,000,048 B

8000048 bytes

In [62]:
y <- list(x, x, x)
obj_size(y)
object.size(y) # notice the difference between obj_size and object.size

8,000,128 B

24000224 bytes

### 2.4.1 Exercises

#### Question 1

In the following example, why are `object.size(y)` and `obj_size(y)` so radically different? Consult the documentation of `object.size()`.

In [66]:
y <- rep(list(runif(1e4)), 100)

object.size(y)
obj_size(y)

8005648 bytes

80,896 B

In [67]:
?object.size

First, `object.size` does not detect if elements of a list are shared (multiple binds to the same element). `obj_size` can detect if elements of a list are shared.

#### Question 2

Take the following list. Why is its size somewhat misleading?

In [68]:
funs <- list(mean, sd, var)
obj_size(funs)

17,608 B

Standard deviation and variance both use the mean. Thus, these elements of the list are bound to the mean. If the mean, sd, and var were computed without calling the mean function directly, the size would be much bigger.

#### Question 3

Predict the output of the following code:

In [69]:
a <- runif(1e6)
obj_size(a)

b <- list(a, a)
obj_size(b)
obj_size(a, b)

b[[1]][[1]] <- 10
obj_size(b)
obj_size(a, b)

b[[2]][[1]] <- 10
obj_size(b)
obj_size(a, b)

8,000,048 B

8,000,112 B

8,000,112 B

16,000,160 B

16,000,160 B

16,000,160 B

24,000,208 B

## 2.5 Modify-in-place

There are two exceptions to copy-on-modify:

1. Objects with a single binding get a special performance optimization
2. Environments, a special type of object, are always modified in place.

### 2.5.1 Objects with a single binding

If an object, _v_, has a single name bound to it, R will modify it in place.

Two complications make predicting exactly when R applies this optimization challenging:

1. R can make copies when it sometimes doesn't have to.
2. Whenever I call the vast majority of functions, it makes a reference to the object.

These exceptions can be explored by a case study of for loops. For loops have the reputation for being slow in R. This slowness is caused by every interation of the loop creating a copy.

In [71]:
x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))

for (i in seq_along(medians)) {
  x[[i]] <- x[[i]] - medians[[i]]
}

In [72]:
cat(tracemem(x), "\n")

<0x7fb78f6e29e8> 


In [73]:
for(i in 1:5){
    x[[i]] <- x[[i]] - medians[[i]]
}

tracemem[0x7fb78f6e29e8 -> 0x7fb78e4d8dd8]: eval eval withVisible withCallingHandlers doTryCatch tryCatchOne tryCatchList tryCatch try handle timing_fn evaluate_call evaluate doTryCatch tryCatchOne tryCatchList doTryCatch tryCatchOne tryCatchList tryCatch <Anonymous> handle_shell <Anonymous> <Anonymous> 
tracemem[0x7fb78e4d8dd8 -> 0x7fb78e4d8cf8]: [[<-.data.frame [[<- eval eval withVisible withCallingHandlers doTryCatch tryCatchOne tryCatchList tryCatch try handle timing_fn evaluate_call evaluate doTryCatch tryCatchOne tryCatchList doTryCatch tryCatchOne tryCatchList tryCatch <Anonymous> handle_shell <Anonymous> <Anonymous> 
tracemem[0x7fb78e4d8cf8 -> 0x7fb78e4d8c88]: eval eval withVisible withCallingHandlers doTryCatch tryCatchOne tryCatchList tryCatch try handle timing_fn evaluate_call evaluate doTryCatch tryCatchOne tryCatchList doTryCatch tryCatchOne tryCatchList tryCatch <Anonymous> handle_shell <Anonymous> <Anonymous> 
tracemem[0x7fb78e4d8c88 -> 0x7fb78e4d8c18]: [[<-.data.frame [

In [74]:
untracemem(x)

So, what is going on in this code? If I look closely, I will see there are 3 (yeah, 3!) copies of the data frame created on each interation. This means I am really making in the neighborhood of 15 copies! This will eat my memory!!!

In order to reduce the number of copies, it would be wise to use a list instead of a data frame.

In [75]:
y <- as.list(x)
cat(tracemem(y), "\n")

for(i in 1:5){
    y[[i]] <- y[[i]] - medians[[i]]
}

<0x7fb78e641cb8> 
tracemem[0x7fb78e641cb8 -> 0x7fb78bdb71e8]: eval eval withVisible withCallingHandlers doTryCatch tryCatchOne tryCatchList tryCatch try handle timing_fn evaluate_call evaluate doTryCatch tryCatchOne tryCatchList doTryCatch tryCatchOne tryCatchList tryCatch <Anonymous> handle_shell <Anonymous> <Anonymous> 


It's not hard to determine when a copy is going to be made. It is hard to prevent it. If I find myself restoring to exotic tricks to avoid copies, it may be time to rewrite your function in C++, as described in Chapter 25.