# Moving from Excel to R

Many of the functions and features of Excel exist and can be leveraged in R (and Python). 

This example notebook will review several key functions used in Excel and show their R equivalents.

Perhaps the biggest difference is having to read the data into R before being able to query it. 

Once the data is read in, many common data functions are avialable.

Overview of this notebook:

- Reading in Excel in R
- Single vector math
    - Min
    - Max
    - Mean
    - Sum
    - Count
    - Unique
- Multiple vector math
    - Row/Column based math (e.g., sum of column)
- Creating new columns in R
    - Sequence of integers
    - Sample of floats
- Saving your work
- Reloading your saved work

# Reading in Excel in R

In [1]:
# Install packages
install.packages("tidyverse")


The downloaded binary packages are in
	/var/folders/9q/yp0trhm570d82rk1lfh67tq00000gn/T//RtmpDTP0Ai/downloaded_packages


In [2]:
# Load libraries
library(tidyverse)
library(readxl)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.5     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.4     [32m✔[39m [34mdplyr  [39m 1.0.7
[32m✔[39m [34mtidyr  [39m 1.1.4     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.0.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



In [3]:
# Read in simple data for example
df = read_excel("../../Data/Excel_data.xlsx")

In [4]:
# View data
df

Integers,Random_numbers,Categorical_data
<dbl>,<dbl>,<chr>
1,0.84824357,Group1
2,0.09386627,Group2
3,0.01546409,Group3
4,0.88211509,Group1
5,0.44850776,Group2
6,0.04717416,Group3
7,0.83961475,Group1
8,0.50393142,Group2
9,0.12537182,Group3
10,0.76823077,Group2


In [5]:
# List column names
names(df)

# Single Vector Math in R

In [6]:
# Get min of Integers column
min(df$Integers)

In [7]:
# Get min of Integers column
max(df$Integers)

In [8]:
# Length of a vector
length(df$Integers)

In [9]:
# Mean of random numbers column
mean(df$Random_numbers)

In [23]:
# Sum of random numbers in column
sum(df$Random_numbers)

In [10]:
# List of unique items in column
unique(df$Categorical_data)

In [11]:
# Counts of each group
dplyr::count(df, Categorical_data, sort = TRUE)

Categorical_data,n
<chr>,<int>
Group2,4
Group1,3
Group3,3


# Creating New Columns-Multiple Vector Math in R

In [12]:
# Create a new column by adding the two numeric columns
df$NewColumn = df$Integers + df$Random_numbers

In [20]:
# View data
df

Integers,Random_numbers,Categorical_data,NewColumn,newInts,newFloats
<dbl>,<dbl>,<chr>,<dbl>,<int>,<dbl>
1,0.84824357,Group1,1.848244,1,0.661376456
2,0.09386627,Group2,2.093866,2,0.138990269
3,0.01546409,Group3,3.015464,3,0.099125593
4,0.88211509,Group1,4.882115,4,0.887230776
5,0.44850776,Group2,5.448508,5,0.007663397
6,0.04717416,Group3,6.047174,6,0.654984571
7,0.83961475,Group1,7.839615,7,0.082449667
8,0.50393142,Group2,8.503931,8,0.232198527
9,0.12537182,Group3,9.125372,9,0.147984678
10,0.76823077,Group2,10.768231,10,0.242302226


In [14]:
# Create new cols of intergers
df$newInts <- seq(1, 10)

In [15]:
# Create new col of floats
df$newFloats <- runif(10, 0, 1)

In [16]:
# View data
df

Integers,Random_numbers,Categorical_data,NewColumn,newInts,newFloats
<dbl>,<dbl>,<chr>,<dbl>,<int>,<dbl>
1,0.84824357,Group1,1.848244,1,0.661376456
2,0.09386627,Group2,2.093866,2,0.138990269
3,0.01546409,Group3,3.015464,3,0.099125593
4,0.88211509,Group1,4.882115,4,0.887230776
5,0.44850776,Group2,5.448508,5,0.007663397
6,0.04717416,Group3,6.047174,6,0.654984571
7,0.83961475,Group1,7.839615,7,0.082449667
8,0.50393142,Group2,8.503931,8,0.232198527
9,0.12537182,Group3,9.125372,9,0.147984678
10,0.76823077,Group2,10.768231,10,0.242302226


# Saving the data

In [17]:
# Save new df to a csv format file
write.csv(df, '../../Data/Example_output.csv')

# Reading in and checking the output

In [21]:
# Read in csv file that was just output
df2 = read_csv('../../Data/Example_output.csv')

New names:
* `` -> ...1

[1m[1mRows: [1m[22m[34m[34m10[34m[39m [1m[1mColumns: [1m[22m[34m[34m7[34m[39m

[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (1): Categorical_data
[32mdbl[39m (6): ...1, Integers, Random_numbers, NewColumn, newInts, newFloats


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



In [22]:
# View the data
df2

...1,Integers,Random_numbers,Categorical_data,NewColumn,newInts,newFloats
<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>
1,1,0.84824357,Group1,1.848244,1,0.661376456
2,2,0.09386627,Group2,2.093866,2,0.138990269
3,3,0.01546409,Group3,3.015464,3,0.099125593
4,4,0.88211509,Group1,4.882115,4,0.887230776
5,5,0.44850776,Group2,5.448508,5,0.007663397
6,6,0.04717416,Group3,6.047174,6,0.654984571
7,7,0.83961475,Group1,7.839615,7,0.082449667
8,8,0.50393142,Group2,8.503931,8,0.232198527
9,9,0.12537182,Group3,9.125372,9,0.147984678
10,10,0.76823077,Group2,10.768231,10,0.242302226


Here, in df2, we can see that the new columns have been created and loaded properly