## R Prep Minicourse
### Week 11: Functions (and apply)

**Credits:** [Datacamp's Intermediate R Course](https://www.datacamp.com/courses/intermediate-r)

### Function Arguments

Before even thinking of using an R function, you should clarify which arguments it expects. All the relevant details such as a description, usage, and arguments can be found in the documentation. To consult the documentation on the `mean()` function, for example, you can use one of following R commands:

```
help(mean)
?mean
```

A quick hack to see the arguments of the `mean()` function is the `args()` function.

In [1]:
args(mean)
?mean

0,1
mean {base},R Documentation

0,1
x,"An R object. Currently there are methods for numeric/logical vectors and date, date-time and time interval objects. Complex vectors are allowed for trim = 0, only."
trim,the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
na.rm,a logical value indicating whether NA values should be stripped before the computation proceeds.
...,further arguments passed to or from other methods.


In the help documentation, notice that both trim and na.rm have default values. This makes them **optional arguments**.

Let's now apply the mean function to the vectors we used last week.

In [2]:
# The linkedin and facebook vectors have already been created for you
linkedin <- c(16, 9, 13, 5, 2, 17, 14)
facebook <- c(17, 7, 5, 16, 8, 13, 14)

# Calculate average number of views
avg_li <- mean(linkedin)
avg_fb <- mean(facebook)

avg_li
avg_fb

In [3]:
# Calculate the mean of the element-wise sum of linkedin and facebook and store the result in a variable avg_sum.
avg_sum <- mean(linkedin + facebook)

# Calculate the mean once more, but this time set the trim argument equal to 0.2 and assign the result to avg_sum_trimmed.
avg_sum_trimmed <- mean(linkedin + facebook, trim = 0.2)

# Print out both avg_sum and avg_sum_trimmed; can you spot the difference?
avg_sum
avg_sum_trimmed

In [4]:
x <- c(3, 4, 4, 6)
mean(x)
mean(x, trim=0.25)

The `mean()` function has an optional argument, `na.rm` that specified whether or not to remove missing values from the input vector before calculating the mean.

Let's see what happens if your vectors linkedin and facebook contain missing values (`NA`).


In [5]:
linkedin <- c(16, 9, 13, 5, NA, 17, 14)
facebook <- c(17, NA, 5, 16, 8, 13, 14)

# Basic average of linkedin
mean(linkedin)

# Advanced average of linkedin
mean(linkedin, na.rm = TRUE)

You already know that R functions return objects that you can then use somewhere else. This makes it easy to use functions inside functions.

In [6]:
# Calculate the mean absolute deviation
mean(abs(linkedin - facebook), na.rm = TRUE)

### Designing your own functions

Wow, things are getting serious... you're about to write your own function! Before you have a go at it, have a look at the following function template:

```
my_fun <- function(arg1, arg2) {
  body
}
```

In [7]:
# Create a function sum_abs(), that takes two arguments and returns the sum of the absolute values of both arguments.
sum_abs <- function(x, y) {
    sum(abs(x) + abs(y))
}

# Call the function sum_abs() with arguments -2 and 3.
sum_abs(-2, 3)

There are situations in which your function does not require an input. Let's say you want to write a function that gives us the random outcome of throwing a fair die:

In [8]:
throw_die <- function() {
  number <- sample(1:6, size = 1)
  number
}

throw_die()

You can define default argument values in your own R functions as well. You can use the following recipe to do so:

```
my_fun <- function(arg1, arg2 = val2) {
  body
}
```

In [9]:
# Create a function pow_two(): it takes one argument and returns that number squared (that number times itself).
# Add an optional argument, named print_info, that is TRUE by default.
# Wrap an if construct around the print() function: this function should only be executed if print_info is TRUE.

pow_two <- function(x, print_info = TRUE) {
  y <- x ^ 2
  if (print_info) {
    print(paste(x, "to the power two equals", y))
    }
  return(y)
}

pow_two(19)
pow_two(19, 0)

[1] "19 to the power two equals 361"


Another thing to keep in mind is function scoping. It implies that variables that are defined inside a function are not accessible outside that function. Try running the following code and see if you understand the results:

In [10]:
pow_two <- function(x) {
  z <- x ^ 2
  return(z)
}
pow_two(4)
z

ERROR: Error in eval(expr, envir, enclos): object 'z' not found


R passes arguments by value. What does this mean? Simply put, it means that an R function cannot change the variable that you input to that function. Let's look at a simple example:

In [11]:
increment <- function(x, inc = 1) {
  x <- x + inc
  x
}
count <- 5
a <- increment(count, 2)
b <- increment(count)
count <- increment(count, 2)
count

### The apply family

Before you go about solving the exercises below, have a look at the documentation of the `lapply()` function. The Usage section shows the following expression:

```
lapply(X, FUN, ...)
```

To put it generally, `lapply` takes a vector or list X, and applies the function FUN to each of its members. If FUN requires additional arguments, you pass them after you've specified X and FUN (...). The output of `lapply()` is a list, the same length as X, where each element is the result of applying FUN on the corresponding element of X.

In [12]:
# We've compiled a vector of famous mathematicians/statisticians and the year they were born. Up to you to extract some information!
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")

# Split names from birth year
split_math <- strsplit(pioneers, split = ":")

# Use lapply() to convert the character vectors in split_math to lowercase letters:
# apply tolower() on each of the elements in split_math. 
split_low <- lapply(split_math, tolower)

# Take a look at the structure of split_low
str(split_low)
split_low

List of 4
 $ : chr [1:2] "gauss" "1777"
 $ : chr [1:2] "bayes" "1702"
 $ : chr [1:2] "pascal" "1623"
 $ : chr [1:2] "pearson" "1857"


You can use `lapply()` on your own functions as well. You just need to code a new function and make sure it is available in the workspace. After that, you can use the function inside `lapply()` just as you did with base R functions.

In [13]:
# Code a function called select_first() that takes a vector as input and returns the first element of this vector.
select_first <- function(x) {
    x[1]
}

# Apply select_first() over the elements of split_low with lapply() and assign the result to a new variable names.
names <- lapply(split_low, select_first)
names

# Write a function select_second() that does the exact same thing for the second element of an inputted vector.
select_second <- function(x) {
    x[2]
}

# Apply select_second() over split_low: years
years <- lapply(split_low, select_second)
years

Writing your own functions and then using them inside `lapply()` is quite an accomplishment! But defining functions to use them only once is kind of overkill, isn't it? That's why you can use so-called anonymous functions in R.

```
# Named function
triple <- function(x) { 3 * x }

# Anonymous function with same implementation
function(x) { 3 * x }

# Use anonymous function inside lapply()
lapply(list(1,2,3), function(x) { 3 * x })
```

In [14]:
# Transform the two calls of lapply() above such that they use an anonymous function that does the same thing.
names <- lapply(split_low, function(x) x[1])
years <- lapply(split_low, function(x) x[2])


`lapply()` provides a way to handle functions that require more than one argument, such as the `multiply()` function:

```
multiply <- function(x, factor) {
  x * factor
}
lapply(list(1,2,3), multiply, factor = 3)
```

In [15]:
# write a generic version of the select functions that you've coded earlier: select_el(). 
# It takes a vector as its first argument, and an index as its second argument. 
# It returns the vector's element at the specified index.

select_el <- function(x, index) {
  x[index]
}

names <- lapply(split_low, select_el, 1)
years <- lapply(split_low, select_el, 2)
names
years