# First introduction to R

In [1]:
library("cowsay")
say("My name is Phuc Lam Vo", by = "pig")


 ________________________ 
< My name is Phuc Lam Vo >
 ------------------------ 
     \
      \

       _//| .-~~~-.
     _/oo  }       }-@
    ('')_  }       |
     `--'| { }--{  }
          //_/  /_/ [nosig]

In [2]:
say("My very first code cell", by = "cat")


 _________________________ 
< My very first code cell >
 ------------------------- 
         \
          \

            |\___/|
          ==) ^Y^ (==
            \  ^  /
             )=*=(
            /     \
            |     |
           /| | | |\
           \| | |_|/\
      jgs  //_// ___/
               \_)

# Data types in R

In [3]:
typeof(0.1)

This may sound weird but the fact is that computer has only finite memory, so there might be some "generally acceptable" representation error. However, they can propagate and affect the accuracy of your results if not handled correctly (beyond the scope of this unit).

In [4]:
0.1 + 0.2 == 0.3

In [5]:
writeBin(0.1 + 0.2, raw(), size = 8)  # 8 bytes = 64 bits
writeBin(0.3, raw(), size = 8)

[1] 34 33 33 33 33 33 d3 3f

[1] 33 33 33 33 33 33 d3 3f

We can use $all.equal()$ function in R for safer comparison

In [6]:
all.equal(0.1 + 0.2, 0.3)

### In R, integers represent whole numbers without decimal points. We can create them using the L suffix, like 5L, and check their type with typeof(5L)

In [9]:
5L
typeof(5L) # Checking the type of the data
typeof(5)

In [10]:
# We can check if the data type is the integer or not
is.numeric(1)
is.numeric('a')

You can use R as a calculator with various arithmetic operations like: /,%,*,etc

In [11]:
# Exercise:
50/4
typeof(50/4)
14*(4+6)/2
typeof(14*(4+6)/2)

In [12]:
1L+1L
typeof(1L+1L)
typeof(1+1)

In [13]:
5L/3L
typeof(5L/3L)

5/3
typeof(5/3)

In [14]:
T & F
!(T&F)

In [16]:
# Exercise:
(3^2 < 2^3) & (log(3) > 1.5)

In [1]:
# Exercise:
radius <- 3
area <- 2* pi * radius
print(area)

[1] 18.84956


### Function

In [17]:
computeArea = function(radius){
  area = pi*radius^2
  return(area)
}
computeArea(3)

In [18]:
computeCirc = function(radius)
{
    diameter = 2*pi*radius
    return(diameter)
}
computeCirc(2)

### Vector

In [19]:
x <- c(1,2,3)
typeof(x)
is.vector(x)

In [2]:
a = c(1,2,3)
b = c(4,5,6)
print(c(a,b)) # you can concatenate two vectors and create a new vector
is.vector(c(a,b))

[1] 1 2 3 4 5 6


In [3]:
x = c(1,4,3,6,3,24,7,2,3,4)
print(x)
print(x[c(1,3,5)]) # this will return the 1st, 3rd, 5th element in the vector x

 [1]  1  4  3  6  3 24  7  2  3  4


[1] 1 3 3


In [None]:
x = c(1,4,3,6,3,24,7,2,3,4)
whichIdx = x>3 # this will return the boolean value for each element in the vector.
print(whichIdx)
x[whichIdx]

 [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE


In [4]:
# You can do any arithmetic operation with vector in R
x = 1:5
y = seq(1,10,2)
x + y

#
a = 1:6
b = c(1,2)
a+b

#
z = c(1,4,3,6,3,24,7)
z > 5

#
n = seq(1,9,2) #1,3,5,7,9
n + 1

In [6]:
z <- c(1,2,3,4)
z[1] <- 3 # Modify the first element in the vector 
print(z)

[1] 3 2 3 4


In [25]:
a <- c('a',1,2) # R will automatically convert the all elements to be in the common type, which is character
print(a)
typeof(a[2])

[1] "a" "1" "2"


Some useful functions for working with vectors in R:

- `length(x)` — returns the number of elements in vector x.
- `sum(x)` — computes the sum of all elements in x.
- `mean(x)` — calculates the average value of x.
- `max(x)` - finds the maximum value in `x`.
- `sort(x)` — returns a sorted version of x.
- `unique(x)` — extracts unique elements from x.
- `c()` — combines values/vectors into a new vector.
- `seq(from, to, by)` — generates equally spaced sequences of numbers.
- `rep(x, times)` — repeats elements to create vectors.
- `lapply(x, FUN)` — applies a function `FUN` to each element of a vector, returns a list.
- `sapply(x, FUN)` — similar to lapply(), but results are simplified (e.g., to a vector or matrix).
- `which()` — returns indices of elements matching a condition.
- `names(x)` - returns the names of elements.
- `:`  — (not a function) creates a sequence of consecutive integers between two numbers.

In [24]:
x = seq(from = 1, to = 10, by = 2)
print(x)

y = 1:10
print(y)
sum(y)
mean(y)

[1] 1 3 5 7 9
 [1]  1  2  3  4  5  6  7  8  9 10


In [None]:
# Excersize: 
x <- seq(1,21,2)
y <- x %%3 == 0
x1 <- x[y]
print(x1)
sum(x1)

[1]  3  9 15 21


### Matrix

#### **Common Functions for Working with Matrices**

- `dim(x)` — returns the dimensions (rows and columns) of matrix `x`.  
- `nrow(x)` — returns the number of rows.  
- `ncol(x)` — returns the number of columns.  
- `t(x)` — returns the transpose of matrix `x`.  
- `rowSums(x)` — computes the sum of each row.  
- `colSums(x)` — computes the sum of each column.  
- `rowMeans(x)` — computes the mean of each row.  
- `colMeans(x)` — computes the mean of each column.  
- `apply(x, MARGIN, FUN)` — applies a function `FUN` over rows (`MARGIN = 1`) or columns (`MARGIN = 2`).  
- `matrix(data, nrow, ncol)` — creates a matrix from the given data.  
- `cbind()` - combines two matrices by columns.
- `rbind()` - combines two matrices by columns.
- `dimnames(x)` — gets or sets row and column names.  
- `rownames(x)` — returns or sets row names.  
- `colnames(x)` — returns or sets column names.

In [None]:
A = matrix(data = 1:9,
           nrow = 3,
           ncol = 3)
print(A)
print(A[1:2, 3:3]) #This will print out the value at A[1,3] and then A[2,3]
print(A[1:2, 1:2])

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
[1] 7 8
     [,1] [,2]
[1,]    1    4
[2,]    2    5


In [None]:
A[,1] # Return all the value at column 1

In [8]:

print(A)

# Modify the second row
A[2, ] = c(10, 11, 12)
print(A)    

# Modify the first column
A[, 1] = c(100, 101, 102)
print(A)

# Modify a specific element (row 3, column 2)
A[3, 2] = 999
print(A)

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9


     [,1] [,2] [,3]
[1,]    1    4    7
[2,]   10   11   12
[3,]    3    6    9
     [,1] [,2] [,3]
[1,]  100    4    7
[2,]  101   11   12
[3,]  102    6    9
     [,1] [,2] [,3]
[1,]  100    4    7
[2,]  101   11   12
[3,]  102  999    9


In R, matrices are stored in column-major order, meaning elements are filled column by column, not row by row.

To us, a matrix is a 2-dimensional array. But to the computer, a matrix is stored as a 1-dimensional array (a vector), so 1-dimensional indexing can be used to extract elements, which can be useful for high-performance applications. However, two-dimensional indexing remains the standard and safer approach.

In [None]:
# Matrix in R is a column-major ordering, which means that it will return the values base on the order of the columns
A[1:4]

In [None]:
print(A)
print(A + c(1,2,3)) # So for this operation, it will add 1 for all elements in the first row, add 2 for all elements in the second row and add 3 for all elements in the third row
print(A %% c(1,2,3) == 0) # Same order is applied for this operation

     [,1] [,2] [,3]
[1,]  100    4    7
[2,]  101   11   12
[3,]  102  999    9
     [,1] [,2] [,3]
[1,]  101    5    8
[2,]  103   13   14
[3,]  105 1002   12
      [,1]  [,2] [,3]
[1,]  TRUE  TRUE TRUE
[2,] FALSE FALSE TRUE
[3,]  TRUE  TRUE TRUE


In [16]:
# Exercise:

# Create a 3x3 matrix m filled by columns with the first 9 odd numbers (1, 3, 5, ..., 17).
matroid = matrix(data=seq(1,17,2),nrow=3,ncol=3)
print(matroid)

# Use rowSums() to calculate the sum of each row.
ans1 <- rowSums(matroid)
print(ans1)

# Use colMeans() to calculate the the means of each column
ans2 <- colMeans(matroid)
print(ans2)

# Use apply() to find the maximum value in each row. Recall that each row/column of a matrix is a vector.
ans3 <- apply(matroid,MARGIN = 1,FUN = max)
print(ans3)

     [,1] [,2] [,3]
[1,]    1    7   13
[2,]    3    9   15
[3,]    5   11   17
[1] 21 27 33
[1]  3  9 15
[1] 13 15 17


### List

A list is a datatype that is nearly the same as vector, but for list it requires the same datatype for all the elements while it is not neccessary in vector

In [20]:
list1 <- list('a','b','c')

In [21]:
print(list1)

[[1]]
[1] "a"

[[2]]
[1] "b"

[[3]]
[1] "c"



In [27]:
mxb107Info = list(faculty = "Faculty of Science", school = "School of Mathematical Sciences", year = 2025, unitCode = "MXB107",
unitName = "Probability & Statistics") # We can also give the name for the elements inside the list
print(mxb107Info)

$faculty
[1] "Faculty of Science"

$school
[1] "School of Mathematical Sciences"

$year
[1] 2025

$unitCode
[1] "MXB107"

$unitName
[1] "Probability & Statistics"



In [28]:
print(mxb107Info)
mxb107Info$faculty
mxb107Info$faculty[[1]]

$faculty
[1] "Faculty of Science"

$school
[1] "School of Mathematical Sciences"

$year
[1] 2025

$unitCode
[1] "MXB107"

$unitName
[1] "Probability & Statistics"



In [30]:
print(mxb107Info)
mxb107Info$faculty
typeof(mxb107Info[4])
print(mxb107Info[4])

$faculty
[1] "Faculty of Science"

$school
[1] "School of Mathematical Sciences"

$year
[1] 2025

$unitCode
[1] "MXB107"

$unitName
[1] "Probability & Statistics"



$unitCode
[1] "MXB107"



#### **Common Functions for Working with Lists**

- `str(x)` — displays the structure of a list — useful for inspection.
- `lapply(x, FUN)` — applies a function `FUN` to each element of a list, returns a list.
- `sapply(x, FUN)` — similar to lapply(), but results are simplified (e.g., to a vector or matrix).
- `unlist(x)` — flattens a list into a vector if possible.
- `append(x, value)` — adds elements to a list.

In [33]:
Circle = function(radius)
{
    area = pi * (radius^2)
    circumference = 2 * pi * radius
    return (list(area = area,circumference = circumference))
}
Circle(3)

### Dataframe

A data frame is a table-like data structure in R where each column can hold different types (numeric, character, etc), and each row represents an observation. It’s widely used for storing and manipulating datasets. Think of that as an Excel table in R, but more efficient.

All the columns in the dataframe must have the same length

In [1]:
MXB107_ClassInfo <- data.frame(
  Class = c(
    "LEC01 01", "LEC01 01", "PRC01 01", "PRC01 01", "PRC01 02",
    "PRC01 03", "PRC01 07", "PRC01 08", "PRC01 02", "PRC01 04",
    "PRC01 05", "PRC01 06"
  ),
  Type = c(
    "Lecture (Internal)", "Lecture (Online)", "Practical (Online)", "Practical (Internal)",
    "Practical (Online)", "Practical (Internal)", "Practical (Internal)", "Practical (Internal)",
    "Practical (Internal)", "Practical (Internal)", "Practical (Internal)", "Practical (Internal)"
  ),
  Day = c(
    "Wed", "Wed", "Wed", "Thu", "Thu", "Thu", "Thu", "Thu", "Fri", "Fri", "Fri", "Fri"
  ),
  Location = c(
    "GP B117", "Online", "Online", "GP D413", "Online", "GP G216", "GP S520", "GP S517",
    "GP G216", "GP G216", "GP S502", "GP S519"
  ),
  Limit = c(
    240, 1000, 30, 35, 30, 35, 25, 30, 35, 35, 35, 35
  ),
  Teaching_Staff = c(
    "Chris Drovandi", "Chris Drovandi", "Narayan Srinivasan", "Narayan Srinivasan",
    "Oliver Vu", "Minh Long Nguyen", "Ryan Kelly", "Nicholas Gecks-Preston",
    "Oliver Vu", "Arwen Nugteren", "Arwen Nugteren", "Minh Long Nguyen"
  ),
  From = c(
    "11", "11", "16", "16", "16", "09", "14", "09",
    "9", "11", "15", "15"
  ),
  To = c(
    "13", "13", "18", "18", "18", "11", "16", "11",
    "11", "13", "17", "17"
  ),
  stringsAsFactors = FALSE
)
print(MXB107_ClassInfo)
str(df)

      Class                 Type Day Location Limit         Teaching_Staff From
1  LEC01 01   Lecture (Internal) Wed  GP B117   240         Chris Drovandi   11
2  LEC01 01     Lecture (Online) Wed   Online  1000         Chris Drovandi   11
3  PRC01 01   Practical (Online) Wed   Online    30     Narayan Srinivasan   16
4  PRC01 01 Practical (Internal) Thu  GP D413    35     Narayan Srinivasan   16
5  PRC01 02   Practical (Online) Thu   Online    30              Oliver Vu   16
6  PRC01 03 Practical (Internal) Thu  GP G216    35       Minh Long Nguyen   09
7  PRC01 07 Practical (Internal) Thu  GP S520    25             Ryan Kelly   14
8  PRC01 08 Practical (Internal) Thu  GP S517    30 Nicholas Gecks-Preston   09
9  PRC01 02 Practical (Internal) Fri  GP G216    35              Oliver Vu    9
10 PRC01 04 Practical (Internal) Fri  GP G216    35         Arwen Nugteren   11
11 PRC01 05 Practical (Internal) Fri  GP S502    35         Arwen Nugteren   15
12 PRC01 06 Practical (Internal) Fri  GP

Indexing a data frame is quite similar to indexing a matrix. However, there are some additional operations specific to data frames:

Access columns by name: df$name or df[["name"]]
Access rows and columns by position:
df[i, ] (i-th row),
or df[,j] (j-th column),
or df[i, j] (element in i-th row, j-th column).
Extract multiple rows or columns using vectors: df[rowIdxVec, colIdxVec], df[rowIdxVec, c("nameofCol1", "nameofCol2")].

In [2]:
MXB107_ClassInfo[1,2]

In [3]:
MXB107_ClassInfo[["Class"]]

In [None]:
MXB107_ClassInfo[1,c("Class","Day")] #Access row 1 for column Class and Day

Unnamed: 0_level_0,Class,Day
Unnamed: 0_level_1,<chr>,<chr>
1,LEC01 01,Wed


Like matrix, you can access the column of the dataframe through its name

In [5]:
MXB107_ClassInfo$Location

##### Modifying a dataframe

Create a new column

In [None]:
MXB107_ClassInfo$Semester = "Semester 2" # This will help create a new column in our dataframe and assign all the values in this column as Semester 2.

In [7]:
str(MXB107_ClassInfo)

'data.frame':	12 obs. of  9 variables:
 $ Class         : chr  "LEC01 01" "LEC01 01" "PRC01 01" "PRC01 01" ...
 $ Type          : chr  "Lecture (Internal)" "Lecture (Online)" "Practical (Online)" "Practical (Internal)" ...
 $ Day           : chr  "Wed" "Wed" "Wed" "Thu" ...
 $ Location      : chr  "GP B117" "Online" "Online" "GP D413" ...
 $ Limit         : num  240 1000 30 35 30 35 25 30 35 35 ...
 $ Teaching_Staff: chr  "Chris Drovandi" "Chris Drovandi" "Narayan Srinivasan" "Narayan Srinivasan" ...
 $ From          : chr  "11" "11" "16" "16" ...
 $ To            : chr  "13" "13" "18" "18" ...
 $ Semester      : chr  "Semester 2" "Semester 2" "Semester 2" "Semester 2" ...


Modify a column

In [8]:
MXB107_ClassInfo$From = as.numeric(MXB107_ClassInfo$From)
MXB107_ClassInfo$To = as.numeric(MXB107_ClassInfo$To)
# This code will help changing the datatype of the column From and To in the dataframe from character into the numerical type.

In [10]:
str(MXB107_ClassInfo)

'data.frame':	12 obs. of  9 variables:
 $ Class         : chr  "LEC01 01" "LEC01 01" "PRC01 01" "PRC01 01" ...
 $ Type          : chr  "Lecture (Internal)" "Lecture (Online)" "Practical (Online)" "Practical (Internal)" ...
 $ Day           : chr  "Wed" "Wed" "Wed" "Thu" ...
 $ Location      : chr  "GP B117" "Online" "Online" "GP D413" ...
 $ Limit         : num  240 1000 30 35 30 35 25 30 35 35 ...
 $ Teaching_Staff: chr  "Chris Drovandi" "Chris Drovandi" "Narayan Srinivasan" "Narayan Srinivasan" ...
 $ From          : num  11 11 16 16 16 9 14 9 9 11 ...
 $ To            : num  13 13 18 18 18 11 16 11 11 13 ...
 $ Semester      : chr  "Semester 2" "Semester 2" "Semester 2" "Semester 2" ...


Modify a row

In [12]:
print(MXB107_ClassInfo[1,])

MXB107_ClassInfo[1,] = list("1 LEC01 01", "Lecture", "Wed", "GP B117", 240, "Chris Drovandi", 11, 13, "Semester 2")
print(MXB107_ClassInfo[1,])

     Class               Type Day Location Limit Teaching_Staff From To
1 LEC01 01 Lecture (Internal) Wed  GP B117   240 Chris Drovandi   11 13
    Semester
1 Semester 2
       Class    Type Day Location Limit Teaching_Staff From To   Semester
1 1 LEC01 01 Lecture Wed  GP B117   240 Chris Drovandi   11 13 Semester 2


In [13]:
print(MXB107_ClassInfo)

        Class                 Type Day Location Limit         Teaching_Staff
1  1 LEC01 01              Lecture Wed  GP B117   240         Chris Drovandi
2    LEC01 01     Lecture (Online) Wed   Online  1000         Chris Drovandi
3    PRC01 01   Practical (Online) Wed   Online    30     Narayan Srinivasan
4    PRC01 01 Practical (Internal) Thu  GP D413    35     Narayan Srinivasan
5    PRC01 02   Practical (Online) Thu   Online    30              Oliver Vu
6    PRC01 03 Practical (Internal) Thu  GP G216    35       Minh Long Nguyen
7    PRC01 07 Practical (Internal) Thu  GP S520    25             Ryan Kelly
8    PRC01 08 Practical (Internal) Thu  GP S517    30 Nicholas Gecks-Preston
9    PRC01 02 Practical (Internal) Fri  GP G216    35              Oliver Vu
10   PRC01 04 Practical (Internal) Fri  GP G216    35         Arwen Nugteren
11   PRC01 05 Practical (Internal) Fri  GP S502    35         Arwen Nugteren
12   PRC01 06 Practical (Internal) Fri  GP S519    35       Minh Long Nguyen

You can also modify just a single entry or smaller entry in a dataframe

In [14]:
MXB107_ClassInfo[1,"From"] = 12

#### **Common Functions for Working with Data Frames**

- `str(df)` — displays the structure of the data frame  
- `summary(df)` — provides summary statistics for each column  
- `nrow(df)` and `ncol(df)` — return the number of rows and columns  
- `head(df)` and `tail(df)` — show the first or last few rows  
- `names(df)` or `colnames(df)` — get or set the column names  
- `subset(df, condition)` — extract rows that meet a logical condition  

In [15]:
subset(MXB107_ClassInfo, MXB107_ClassInfo$Teaching_Staff == "Minh Long Nguyen")

Unnamed: 0_level_0,Class,Type,Day,Location,Limit,Teaching_Staff,From,To,Semester
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<chr>
6,PRC01 03,Practical (Internal),Thu,GP G216,35,Minh Long Nguyen,9,11,Semester 2
12,PRC01 06,Practical (Internal),Fri,GP S519,35,Minh Long Nguyen,15,17,Semester 2


### Working with files in R

In [17]:
getwd() # Show the current working directories

In [None]:
dir.create("datasets") #create a new directory name datasets

In [None]:
setwd() # change the working directory

In [20]:
list.files() #List all the files and directories in our current working directory

### Testing in R

In [26]:
install.packages(c("ggplot2", "tidyr", "dplyr", "stringr", "magrittr","kurtosis","skewness"))

"packages 'kurtosis', 'skewness' are not available for this version of R

Versions of these packages for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages"



The downloaded binary packages are in
	/var/folders/qn/zzlz3xfn3sg_7htk7xqrfzfw0000gn/T//Rtmp9yTn0Y/downloaded_packages


In [30]:

if (!require("testthat")) install.packages("testthat"); library("testthat")

test_that("Test if all packages have been loaded", {

   expect_true(all(c("ggplot2", "tidyr", "dplyr", "stringr", "magrittr") %in% loadedNamespaces()))

})

# test_that("Test if all utility functions have been loaded", {
#   expect_true(exists("skewness"))
#   expect_true(exists("kurtosis"))
# })

-- [1m[33mFailure[39m: Test if all packages have been loaded[22m ------------------------------
all(...) is not TRUE

`actual`:   [32mFALSE[39m
`expected`: [32mTRUE[39m 



ERROR: [1m[33mError[39m:[22m
[33m![39m Test failed
