<a href="https://colab.research.google.com/github/christophermalone/DSCI325/blob/main/Module4_Part1R.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Function Writing in R

A <strong>function</strong> is a collection of code that is created to make common tasks that are often repeated more efficient.  If you find yourself writing code the "same" code repeatedly, then a function may prove to be useful.

Consider the following vector in R that contains 5 values.

In [2]:
mydata <- c(12, 15, 16, 18, 14)

The following <strong>for</strong> loop can be used to obtain the average value of mydata.

In [3]:
total <- 0
for(i in 1:5){
  total <- total + mydata[i]
}
total/5

Now, suppose there is another vector, called yourdata, that contains 5 values.

In [4]:
yourdata <- c(-5, -3, -8, -13, -6)

In order my code to work on your data, I would need to modify the code, i.e. change the reference to mydata to yourdata.

In [5]:
total <- 0
for(i in 1:5){
  total <- total + yourdata[i]
}
total/5

Once again, suppose there is another vector, called theirdata, that contains 5 values.

In [6]:
theirdata <- c(134, 152, 126, 147, 136)

And, once again, in order my code to work on theirdata, I would need to modify the code, i.e. change the reference to yourdata to theirdata.

In [7]:
total <- 0
for(i in 1:5){
  total <- total + theirdata[i]
}
total/5

## Creating a New Function

Instead of modifying the code each time to compute the mean of a vector of 5 observations, it would be better to write a function to accomplish this task. 

An R Function has the following structure.
<ul>
<strong>FunctionName</strong> <- <i>function( arguements )</i> {

&nbsp;&nbsp;&nbsp; <i>function description</i>
&nbsp;&nbsp;&nbsp;<pre>code for function</pre>

}
</ul>

Consider the following code that creates a function named <strong>mymean</strong>.  This function will be used to compute the mean of a vector with 5 elements.


In [8]:
mymean <- function(X){
  #Purpose: Compute the mean of a vector of data
  #Args: X = a vector of numeric data
  #Returns: The average of the values in X

  total <- 0
  for(i in 1:5){
    total <- total + X[i]
  }
  total/5
}

The print() function can be used to see all contents of the function, i.e. all documentation and code.

In [15]:
print(mymean)

function(X){
  #Purpose: Compute the mean of a vector of data
  #Args: X = a vector of numeric data
  #Returns: The average of the values in X

  total <- 0
  for(i in 1:5){
    total <- total + X[i]
  }
  total/5
}
<bytecode: 0x560f17c552a0>


Using the <strong>mymean()</strong> function to compute the mean of various vectors.

In [9]:
mymean(mydata)

In [10]:
mymean(yourdata)

In [11]:
mean(theirdata)

## Looking at an Existing Function

Try, simply typing the name of an existing function.  Notice, not much code is provided.

In [18]:
mean

Next, try adding the .default onto the end of existing function name.  This function appears somewhat *messy*; however, this is necessary prevent someone from doing something dumb with this function.  For example, trying to compute the mean of a vector that does not contain numeric quantities.

In [17]:
mean.default

Notice, that for a custom function, the code is returned when the function name is typed.

In [16]:
mymean

## More on Function Writing - Using Default Values

At times, it might be convinent to specify a default value that can be used by the function.

In [19]:
mymean <- function(X, n=5){
  #Purpose: Compute the mean of a vector of data
  #Args: X = a vector of numeric data
  #      n = length of the X vector
  #Returns: The average of the values in X

  total <- 0
  for(i in 1:n){
    total <- total + X[i]
  }
  total/n
}

Using the updated mydata function.

In [26]:
mymean(mydata)

Next, create a new vector, called noonesdata, that only has 4 values.

In [27]:
noonesdata <- c(56, 59, 54, 51)

When computing the average, we must specify that the vector being passed through the function only has 4 values instead of 5.

In [28]:
mymean(noonesdata, n=4)

The following should **fail** as noonesdata only has four values, but the mymean() function has a default specification of n=5.

In [29]:
mymean(noonesdata)

The best option might be to let the function automatically determine the length of the vector -- instead of having the length "hard coded" into the function or being having it passed as a function argument. 

In [32]:
mymean <- function(X){
  #Purpose: Compute the mean of a vector of data
  #Args: X = a vector of numeric data
  #Returns: The average of the values in X

  total <- 0
  n <- length(X)
  for(i in 1:n){
    total <- total + X[i]
  }
  total/n
}

Using the updated mymean() function -- which appears to be working successfully.

In [33]:
mymean(noonesdata)



---



---



## UCLA Loneliness Data

The tidyverse() package will be used for this example, so install this library.

In [36]:
library(tidyverse)

Load the UCLA Loneliness dataset into R using read_csv().

In [37]:
UCLA_Loneliness <- read_csv('/content/sample_data/UCLA_LonelinessScale.csv')

[1mRows: [22m[34m278[39m [1mColumns: [22m[34m22[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (1): ParticipantID
[32mdbl[39m (21): Age, Statement1, Statement2, Statement3, Statement4, Statement5, S...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Using str() function to investigate the structure of this data.frame.

In [38]:
str(UCLA_Loneliness)

spec_tbl_df [278 × 22] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ParticipantID: chr [1:278] "ParticipantID_001" "ParticipantID_002" "ParticipantID_003" "ParticipantID_004" ...
 $ Age          : num [1:278] 42 50 57 35 51 60 50 57 57 47 ...
 $ Statement1   : num [1:278] 1 2 4 4 1 4 4 4 3 4 ...
 $ Statement2   : num [1:278] 2 3 4 1 2 3 1 1 2 2 ...
 $ Statement3   : num [1:278] 4 2 4 1 4 2 1 3 3 1 ...
 $ Statement4   : num [1:278] 1 1 3 1 4 3 3 2 2 2 ...
 $ Statement5   : num [1:278] 1 2 1 3 2 4 4 2 4 2 ...
 $ Statement6   : num [1:278] 1 4 3 2 2 3 4 2 3 4 ...
 $ Statement7   : num [1:278] 4 1 2 1 4 1 1 3 1 1 ...
 $ Statement8   : num [1:278] 1 4 1 1 3 1 2 1 2 1 ...
 $ Statement9   : num [1:278] 3 3 3 3 2 3 3 4 3 4 ...
 $ Statement10  : num [1:278] 2 2 2 1 2 4 2 4 2 2 ...
 $ Statement11  : num [1:278] 1 3 2 1 2 3 2 3 3 2 ...
 $ Statement12  : num [1:278] 4 3 3 4 4 1 2 2 1 4 ...
 $ Statement13  : num [1:278] 4 1 4 2 3 3 4 2 3 3 ...
 $ Statement14  : num [1:278] 1 3 3 1 4 1 1 4 3 3 ...
 $ 

Looking at the first few rows of the data.frame using head().

In [39]:
head(UCLA_Loneliness)

ParticipantID,Age,Statement1,Statement2,Statement3,Statement4,Statement5,Statement6,Statement7,Statement8,⋯,Statement11,Statement12,Statement13,Statement14,Statment15,Statement16,Statement17,Statement18,Statement19,Statement20
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
ParticipantID_001,42,1,2,4,1,1,1,4,1,⋯,1,4,4,1,2,1,3,4,1,1
ParticipantID_002,50,2,3,2,1,2,4,1,4,⋯,3,3,1,3,4,3,3,3,4,2
ParticipantID_003,57,4,4,4,3,1,3,2,1,⋯,2,3,4,3,3,2,4,3,3,1
ParticipantID_004,35,4,1,1,1,3,2,1,1,⋯,1,4,2,1,3,4,4,3,2,2
ParticipantID_005,51,1,2,4,4,2,2,4,3,⋯,2,4,3,4,3,2,2,4,3,3
ParticipantID_006,60,4,3,2,3,4,3,1,1,⋯,3,1,3,1,4,3,4,1,4,4


Getting the dimension of the data.frame using dim().

In [40]:
dim(UCLA_Loneliness)



---



---



## Task: Write a Custom Function

Write a custom function that computes the mean of a column from a specifed data.frame.  The data.frame contains some columns that are reversed coded, so for such columns recoding of the values is necessary for these columns.  Statments with reverse ordering are provided in the documentation provided with this data.


In [None]:
Code for Task