# 1. Expression & Assignment

You can write an expression and get a result immediately. Below we calculate $2 + \sqrt{4} + ln(e^2) + 2^2$.

In [0]:
# an expression
2 + sqrt(4) + log(exp(2)) + 2^2

You could also assign an expression to a variable and then print out the variable.

In [0]:
# assignment
x <- (pi == 3.14)
print(x)

`<-` is the assignment operator.

Aside: Why `"<-"` but not `"="` like in most other programming languages? In fact, `"="` is also an assignment operator in R, but it's slightly different from `"<-"`. If you want to know more about R's assignment operators, read [this stackoverflow post](https://stackoverflow.com/questions/1741820/what-are-the-differences-between-and-assignment-operators-in-r).

# 2. Data Structure

R's basic data structure can be summarized as below.

| Dimension | Homogeneous   | Heterogeneous |
|:---------:|:-------------:|:-------------:|
| 1D        | Atomic vector | List          |
| 2D        | Matrix        | Dataframe     |
| nD        | Array         | -             |

## 2.1 Atomic vector

In [0]:
# create R vectors
vec_character <- c("Hello,", "World!")
vec_integer <- c(1L, 2L, 3L)
vec_double <- c(1.1, 2.2, 3.3, -4.4)
vec_logical <- c(TRUE, TRUE, FALSE)

`c()` is a function in base R. It combines values into a vector.

You can retrieve an element in a vector using the `[]` with a numeric index. R's vector indexing starts with `1`.

In [0]:
# retrieve the first element of vec_double
print(vec_double[1])

Select multiple elements in a vector is also easy.

In [0]:
# select multiple elements
print(vec_double[c(1, 3)])
print(vec_double[1:2])
print(vec_double[c(-1, -2)])
print(vec_double[c(TRUE, FALSE, TRUE, FALSE)])
print(vec_double[vec_double < 0])

`str()` is a handy function to display the structure of a vector (or any object in general).

In [0]:
# display structure of a vector
str(vec_character)
str(vec_integer)
str(vec_double)
str(vec_logical)

`length()` is another useful function to tell you the length of a vector.

In [0]:
# display length of a vector
print(length(vec_double))

### Exercise - 2.1

Define two vectors `(1, 2, -3)` and `(4, -5, -6)`. Perform an element-wise multiplication, and sum up the positive elements in the resulting vector (hint: 1. use operator `*` for element-wise multiplication; 2. use `sum()` function to sum up elements in a numeric vector).

In [0]:
# your code playground here

## 2.2 List

List is another kind of 1-dimension vector. It can contain different types of elements.

In [0]:
l1 <- list(
  1:3, 
  "a", 
  c(TRUE, FALSE, TRUE), 
  c(2.3, 5.9),
  c(1L, 2L)
)

str(l1)

List can contain list as well.

In [0]:
l2 <- list(list(list(1)))
str(l2)

Retrieving elements in a list is similar as retrieving elements in a vector.

In [0]:
str(l1[1:2])
str(l1[c(1, 3)])

Note that using `[]` always returns a list. To return the element in a list as it is, use `[[]]`. `[[]]` only returns one single element, so you can only specify a single index.

In [0]:
str(l1[[1]])

### Exercise - 2.2

Retrieve the vector `(2.3, 5.9)` in the below list and sum up its elements. 

In [0]:
# define a list
l_ex <- list(
  1:3, 
  list(
    "a", 
    c(2.3, 5.9)
  )
)

# your code below

## 2.3 Matrix (self-study)


Use `matrix()` function to create a matrix. Use `dim()` to find dimension of a matrix.

In [0]:
# use the matrix() function to create a matrix
y <- matrix(1:6, nrow = 2, ncol = 3)
print(y)
print(dim(y))

Subsetting a matrix is similar to subsetting a vector.

In [0]:
print(y[1:2, c(1,3)])
print(y[1:2, -2])

Note that `[]` by default simplify the subsetting result to lowest possible dimension.

In [0]:
# y[1, 1:2] gives a vector
str(y[1, 1:2])

Matrix algebra is easy. See here for a list of R matrix operations, https://www.statmethods.net/advstats/matrix.html.

In [0]:
# define two matrics
m1 <- matrix(1:4, nrow = 2)
m2 <- matrix(5:8, nrow = 2)
print(m1)
print(m2)

# element-wise multiplication
print(m1 * m2)

# matrix multiplication
print(m1 %*% m2)

# transpose
print(t(m1))

# solve Ax = b problem
b <- matrix(7:8, nrow = 2)
print(b)
print(solve(m1, b))

## 2.4 Data frame

A data frame is like a 2-D table in Excel. More precisely, it is a list of vectors (as columns) with equal length, and plus a few extra attributes (let's not get into attributes of a data structure today).


In [0]:
df1 <- data.frame(x = 1:3, y = letters[1:3], z = c(1.1, 2.2, 3.3))

print(df1)

In [0]:
str(df1)

By default, `data.frame()` turn character vector into a factor variable (categorical variable). Use `stringsAsFactors = FALSE` to disable this behavior.

In [0]:
# use 'stringsAsFactors = FALSE' to keep strings as they are
df2 <- data.frame(
  x = 1:3,
  y = c("a", "b", "c"),
  stringsAsFactors = FALSE
)
str(df2)

It's often useful to find out column names and number of columns and rows.

In [0]:
# find out column names using names() or colnames()
print(names(df1))
print(colnames(df1))

# find out number of columns
print(length(df1))
print(ncol(df1))

# find out number of rows
print(nrow(df1))

Subsetting a dataframe is similar to that of a list or matrix.

In [0]:
# select a single column using []
print(df1['x'])

In [0]:
# note the result of df1['x'] is still a dataframe
str(df1['x'])

In [0]:
# select a single column using [[]]
print(df1[['x']])

# note the result is NOT a dataframe any more. It's a vector.
str(df1[['x']])

In [0]:
# select multiple columns
print(df1[c('x', 'z')])

In [0]:
# select multiple columns
print(df1[1:2, c('x', 'z')])

# 3 Programming Structure

## 3.1 Three basic control flows

### 3.1.1 Sequence

Code runs line by line in sequence with no conditional executions or loops. For example, let's write some code to calculate the following sum of squares.

$$y = \sum_{t=1}^{3} t^2.$$

In [0]:
# sum of squares
t <- 1:3
y <- sum(t^2)
print(y)

###3.1.2 Selection / Conditional execution

```
# if, else
if (condition) {
  # code executed when condition is TRUE
} else {
  # code executed when condition is FALSE
}
```

```
# if, else if, else
if (condition1) {
  # do something when condition1 is TRUE
} else if (conditon2) {
  # do something else if condition1 is FALSE but condition2 is TRUE
} else {
  # do something if neither condition1 nor condition2 is TRUE
}
```

For example, if the above sum of squares `y > 10` then print out "result greater than 10", or otherwise print out "result less or equal to 10".

In [0]:
if (y > 10) {
  print("result greater than 10")
} else {
  print("result less or equal to 10")
}


### 3.1.3 Iteration

```
# for loop
for (var in seq) {
  do something
}
```

```
# while loop
while (condition) {
  do something if condition is TRUE
}
```

For example, let's solve the above sum of squares using a loop.

In [0]:
t <- 1:3
y <- 0

for (x in t) {
  y <- y + x^2
}

print(y)


#### Exercise - 3.1.3

Calculate the above sum using a while loop.

In [0]:
# your code goes here

##3.2 Functions

A function is a unit of code block that (usually) takes in some inputs (arguments) and returns a result. We have already used many R functions, for example, `print()` and `length()`.

The two main reasons to write functions are *reusability* and *abstraction*. If you find yourself repeating the same code logic or if you want to divide a large piece of code into small logical units, you should consider writing functions.

In fact, "everything that happens [in R] is a function call." (John Chambers, the creator of the S programming language and a core member of the R project. R is modelled after S.) For example, even the plus operation "`+`" is a function.

Let's write a slightly general version of sum of squares using a function.

$$y = \sum_{t=1}^{n} t^2.$$

This function takes in 1 argument `n`, and return the sum `y` defined above.

In [0]:
ss <- function(n) {
  t <- 1:n
  
  # the last expression is automatically returned
  # or otherwise, you could write return(sum(t^2))
  sum(t^2)
}

print(ss(2))
print(ss(3))

### Exercise - 3.2 (self-study)

Write an R function to calculate $$y = \sum_{t=1}^{n} f(t),$$ where $n$ is an integer and $f()$ is a generic function that you can define later (e.g. $f(t) = t ^ 2$ or $f(t) = t ^ 3$). The R function takes in two argument, `n` and `f` as in the above formula, and returns the sum $y$.

In [0]:
# your code here

f_sum <- function(f, n) {

  # insert your code below
  
  return(y)
}

In [0]:
# if your code works as intended, the following should run

# define t squared as f1
f1 <- function(t) {
  t^2
}

print(f_sum(f1, 3L))

# let's calculate pi
# calculate pi using the pi formula
f_pi <- function(x) {
  4 * (-1) ^ (x + 1) / (2 * x - 1)
}

print(f_sum(f_pi, 10000L))