# Introduction - Writing R in Python

## Motivation

This notebook introduces the [rpy2 package](https://rpy2.github.io/doc/latest/html/index.html) that I will use to interactively switch back and forth between R objects, functions and methods and Python objects, functions and methods. While it is already possible to run R-scripts within Python, this provides a more interactive way to do so.

At the moment, there will be little to no rpy2 in the folder, but I will slowly construct examples with rpy2 as the opportunity/need arises.

## Environment set-up 

I reccomend using Anaconda and following more or less the steps laid out below if you are new, otherwise, you know what to do already. Note that since rpy2 must be installed via pip, __conda cannot be used to manage packages in the virtual environment after rpy2 is installed__. Thus, install as many of the required packages as possible __before__ installing rpy2. Pip will work to install packages later of course, but anaconda provides a stronger guarantee of compatibility. 

Anyway, in my case, I had (unwisely) already installed my needed packages in my base installation so my set-up went like the below.

* Update conda: ```conda update conda```
* Update packages within base environment: ```conda update --all```
* Clone base environment to create new environment: ```conda create rpy_env --clone base``` OR Create new environment ```conda create --name rpy_env```
* Activate new environment: ```conda activate rpy_env```
* Install whatever other packages you need now via conda-forge! ```conda install -c conda-forge <package>``` (```jupyter-lab```, ```numpy```,```pandas```,```seaborn```)
* Install rpy2 via pip ```pip install rpy2```
* Add the environment to a Jupyter notebook kernel:```python -m ipykernel install --user --name rpy_env --display-name "rpy_env"```

Reminder: the standard packages needed are:
* Jupyter Lab
* Numpy
* Seaborn
* Pandas

NOTE: Each folder may have additional package requirements and instructions. Look at the README.md file in each folder if the notebook doesn't run!

At this point, you should be able to launch Jupyter lab and create a notebook with rpy_env in the kernel.

__ref:__ Compiled from stackexchange and Anaconda documentation

## Working with rpy2

First, import the required packages

In [1]:
# Pandas / Numpy <-> R Dataframe / Array

import pandas as pd
import numpy as np
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter

### Send a Python Object to an R Object

Here I create a panda dataframe out of a numpy array using then use the panda conversion package to turn it into an R dataframe. There is a direct numpy conversion as well - works basically the same, see the documentation.

In [2]:
# Create a panda dataframe -> send to R data frame 

# use numpy's random function to create a dataframe of random values
pd_df = pd.DataFrame(np.random.rand(5,5))

# convert from panda into py2r
with localconverter(ro.default_converter + pandas2ri.converter):
    r_df = ro.conversion.py2rpy(pd_df)
    
# import r base to use the summary function
base = importr('base')

# use R's summary function on an R Dataframe and print the results
print(base.summary(r_df))

       0                1                 2                3         
 Min.   :0.3910   Min.   :0.08131   Min.   :0.1440   Min.   :0.1983  
 1st Qu.:0.4988   1st Qu.:0.10158   1st Qu.:0.2640   1st Qu.:0.3194  
 Median :0.6404   Median :0.52276   Median :0.5576   Median :0.4600  
 Mean   :0.6262   Mean   :0.45600   Mean   :0.4278   Mean   :0.5053  
 3rd Qu.:0.7297   3rd Qu.:0.67382   3rd Qu.:0.5578   3rd Qu.:0.7244  
 Max.   :0.8710   Max.   :0.90051   Max.   :0.6156   Max.   :0.8242  
       4          
 Min.   :0.08903  
 1st Qu.:0.31151  
 Median :0.35192  
 Mean   :0.43158  
 3rd Qu.:0.51563  
 Max.   :0.88982  



### Use an R  function on a panda dataframe
Here we define a function in R and call it in Python using ```pd.apply()``` to add 5 to each element in one column of our pandas dataframe. We then add 5 to the R dataframe ```r_df``` and convert it to a panda dataframe.

Note the need to call ```localconverter()``` in each case! Failing to call this function will result in scrambled data. There may be some other ways around it, but for this repository I will use this method.

In [3]:
# use an r function on a panda dataframe

# rename df columns to make them easy to call
pd_df.columns = ['zero','one','two','three','four']

# define a function in r
add_five_r = ro.r('''
    add_five <- function(x) {
        return(x + 5)
    }
''')

# apply function to all elements of the "one" column in the panda df
pd_df.one= pd_df.one.apply(lambda x: add_five_r(x)[0])
print(pd_df)

# use my function defined in r to add 5 to my r data frame, then convert to panda df
with localconverter(ro.default_converter + pandas2ri.converter):
    df_r_plus_five = add_five_r(r_df)
df_r_plus_five

       zero       one       two     three      four
0  0.729691  5.900511  0.143963  0.198339  0.311507
1  0.870968  5.522759  0.557564  0.724392  0.089032
2  0.391015  5.081306  0.615648  0.824165  0.515625
3  0.640358  5.101580  0.557766  0.460047  0.889818
4  0.498767  5.673823  0.263953  0.319388  0.351916


Unnamed: 0,0,1,2,3,4
0,5.729691,5.900511,5.143963,5.198339,5.311507
1,5.870968,5.522759,5.557564,5.724392,5.089032
2,5.391015,5.081306,5.615648,5.824165,5.515625
3,5.640358,5.10158,5.557766,5.460047,5.889818
4,5.498767,5.673823,5.263953,5.319388,5.351916


### Import data from R -> Use in Pandas
Here I import the Iris dataset from R, convert it into a pandas dataframe and then print a table of the mean values by species using the ```pandas.groupby()``` function.

In [6]:
# import data from R, send to Pandas dataframe

from rpy2.robjects.packages import data
import seaborn as sns

# import r's datasets library
datasets = importr('datasets')

with localconverter(ro.default_converter + pandas2ri.converter):
    iris = data(datasets).fetch('iris')['iris']

grouped = iris.groupby(['Species']).agg({'Sepal.Length': 'mean','Sepal.Width':'mean', 'Petal.Length':'mean', 'Petal.Width':'mean'})
grouped

Unnamed: 0_level_0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
Species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
setosa,5.006,3.428,1.462,0.246
versicolor,5.936,2.77,4.26,1.326
virginica,6.588,2.974,5.552,2.026
