## Demo of sourcing an R function from a script

Here we demonstrate how we can use the `import` reserved word 
to read in a function 
stored in another Python script contained within this repository. 
The function in the file `../src/count_classes.R` is named `count_classes` 
(it doesn't have to have the same name as the file, 
but it often makes sense to do this), 
and importing the file allows us to access the function in this notebook. 
We will demonstrate using it below to calculate the number of observations 
in each class of a data set.

In [1]:
import pandas as pd
import sys
import os

# Import the count_classes function from the src folder
sys.path.append('..')
from src.count_classes import count_classes

Here's some data (the Wisconsin Breast Cancer data set, originally from the [UCI machine learning repository](https://archive-beta.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+diagnostic)) where we would like to calculate the number of observations in each class:

In [2]:
cancer = pd.read_csv("https://raw.githubusercontent.com/UBC-DSCI/introduction-to-datascience/main/data/wdbc.csv")
cancer

Unnamed: 0,ID,Class,Radius,Texture,Perimeter,Area,Smoothness,Compactness,Concavity,Concave_Points,Symmetry,Fractal_Dimension
0,842302,M,1.096100,-2.071512,1.268817,0.983510,1.567087,3.280628,2.650542,2.530249,2.215566,2.253764
1,842517,M,1.828212,-0.353322,1.684473,1.907030,-0.826235,-0.486643,-0.023825,0.547662,0.001391,-0.867889
2,84300903,M,1.578499,0.455786,1.565126,1.557513,0.941382,1.052000,1.362280,2.035440,0.938859,-0.397658
3,84348301,M,-0.768233,0.253509,-0.592166,-0.763792,3.280667,3.399917,1.914213,1.450431,2.864862,4.906602
4,84358402,M,1.748758,-1.150804,1.775011,1.824624,0.280125,0.538866,1.369806,1.427237,-0.009552,-0.561956
...,...,...,...,...,...,...,...,...,...,...,...,...
564,926424,M,2.109139,0.720838,2.058974,2.341795,1.040926,0.218868,1.945573,2.318924,-0.312314,-0.930209
565,926682,M,1.703356,2.083301,1.614511,1.722326,0.102368,-0.017817,0.692434,1.262558,-0.217473,-1.057681
566,926954,M,0.701667,2.043775,0.672084,0.577445,-0.839745,-0.038646,0.046547,0.105684,-0.808406,-0.894800
567,927241,M,1.836725,2.334403,1.980781,1.733693,1.524426,3.269267,3.294046,2.656528,2.135315,1.042778


To calculate the number observations of each class, we will use the `count_classes` function from the `../R/count_classes.R` file that we sourced in the first code cell of this notebook:

In [3]:
count_classes(cancer, 'Class')

Unnamed: 0,class,count
0,B,357
1,M,212


Ta da! Now isn't that easier to read for a human trying to understand the analysis, compared to if we included the source code for that function in this notebook?