## Demo of sourcing an R function from a script

Here we demonstrate how we can use the `source` function to read in a function stored in another R script contained within this repository. The function in the file `../R/count_classes.R` is named `count_classes` (it doesn't have to have the same name as the file, but it often makes sense to do this), and sourcing the file allows us to access the function in this notebook. We will demonstrate using it below to calculate the number of observations in each class of a data set.

In [1]:
options(repr.matrix.max.rows = 6)
library(tidyverse)
source("../R/count_classes.R")

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.2     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.3     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


Here's some data (the Wisconsin Breast Cancer data set, originally from the [UCI machine learning repository](https://archive-beta.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+diagnostic)) where we would like to calculate the number of observations in each class:

In [2]:
cancer <- read_csv("https://raw.githubusercontent.com/UBC-DSCI/introduction-to-datascience/main/data/wdbc.csv")
cancer

[1mRows: [22m[34m569[39m [1mColumns: [22m[34m12[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (1): Class
[32mdbl[39m (11): ID, Radius, Texture, Perimeter, Area, Smoothness, Compactness, Con...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


ID,Class,Radius,Texture,Perimeter,Area,Smoothness,Compactness,Concavity,Concave_Points,Symmetry,Fractal_Dimension
<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
842302,M,1.096100,-2.0715123,1.268817,0.9835095,1.5670875,3.2806281,2.65054179,2.5302489,2.215565542,2.2537638
842517,M,1.828212,-0.3533215,1.684473,1.9070303,-0.8262354,-0.4866435,-0.02382489,0.5476623,0.001391139,-0.8678888
84300903,M,1.578499,0.4557859,1.565126,1.5575132,0.9413821,1.0519999,1.36227979,2.0354398,0.938858720,-0.3976580
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
926954,M,0.7016669,2.043775,0.6720844,0.5774446,-0.839745,-0.03864567,0.04654658,0.1056844,-0.8084058,-0.8947996
927241,M,1.8367249,2.334403,1.9807813,1.7336925,1.524426,3.26926717,3.29404559,2.6565283,2.1353154,1.0427779
92751,B,-1.8068114,1.220718,-1.8127934,-1.3466044,-3.109349,-1.14974083,-1.11389274,-1.2607103,-0.8193490,-0.5605392


To calculate the number observations of each class, we will use the `count_classes` function from the `../R/count_classes.R` file that we sourced in the first code cell of this notebook:

In [3]:
count_classes(cancer, Class)

class,count
<chr>,<int>
B,357
M,212


Ta da! Now isn't that easier to read for a human trying to understand the analysis, compared to if we included the source code for that function in this notebook?