## RStudio on SageMaker Introduction
In collaboration with RStudio PBC, we are excited to announce the general 
availability of RStudio on Amazon SageMaker, the industry’s first fully managed 
RStudio Workbench IDE in the cloud. RStudio on SageMaker provides the familiar 
IDE that is known and loved throughout the R community.

### Benefits on SageMaker
RStudio on SageMaker establishes user authentication through IAM or SSO. Once 
authenticated the user assumes their SageMaker execution role which has granular 
permissions for all AWS functionality.

This means that once authenticated, you can access S3 datasets, train and host 
models using SageMaker, launch AWS Glue jobs, etc without the need to 
re-authenticate yourself within the IDE.

Additionally, among many of other benefits, you can right size the instance 
backing your RStudio session, and use the full flexibility of the cloud.

### User EFS Mount
When on-boarding a UserProfile to a SageMaker domain, a home directory is added 
to the Domains EFS (Network) storage. This is your personal storage location 
where can put code repositories, datasets, and other file objects. You can 
see this EFS mount as your Home directory within the RStudio IDE panel.

### Right IDE at the right time
This EFS home is shared across the Studio IDE you choose. In other words, you 
can utilize Studio's Jupyter or RStudio IDE with access to the same datasets 
and code repositories. 

### Terminal
Within your RStudio Session, you have access to the terminal within your 
container and can make OS level installs / utilize command line programs like 
`git`. 

## Data Access
There are several methods to access data within the RStudio on SageMaker IDE. 

### Download to EFS
Using OS tooling

In [17]:
install.packages("properties")
library(properties)

envProps <- read.properties("../env.properties")

Sys.setenv(
    "AWS_ACCESS_KEY_ID" = envProps$access_key, 
    "AWS_SECRET_ACCESS_KEY" = envProps$secret_key,
    "AWS_DEFAULT_REGION" = 'us-east-1')


Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

“incomplete final line found on '../env.properties'”


In [1]:
system("mkdir -p ./dataset/", intern=TRUE)
system("wget https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data -O ./dataset/abalone.csv", intern=TRUE)

Using aws cli

In [2]:
system("aws s3 cp s3://sagemaker-sample-files/datasets/tabular/uci_abalone/abalone.csv ./dataset/", intern=TRUE)

### Utilize Native R Packages to read from Disk or HTTP

In [3]:
if (!'tidyverse' %in% installed.packages()) {install.packages('tidyverse')}
suppressWarnings(library(tidyverse))

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [4]:
df_http <- read_csv(file = 'http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data', show_col_types = FALSE)
df_disk <- read_csv(file = 'dataset/abalone.csv', show_col_types = FALSE)
head(df_http)

M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
I,0.425,0.3,0.095,0.3515,0.141,0.0775,0.12,8
F,0.53,0.415,0.15,0.7775,0.237,0.1415,0.33,20


In [5]:
head(df_disk)

M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
I,0.425,0.3,0.095,0.3515,0.141,0.0775,0.12,8
F,0.53,0.415,0.15,0.7775,0.237,0.1415,0.33,20


### Utilize Python Boto3 or SageMaker SDK with Reticulate
First, load the `reticulate` library and import the `sagemaker` Python module.
Once the module is loaded, use the `$` notation in R instead of the `.` notation
in Python to use available classes.

The reticulate and python SDKs come pre-installed in the RStudio on SageMaker
containers.

In [6]:
# Packages ----
suppressWarnings(library(reticulate))
path_to_python <- system("which python", intern = TRUE)
use_python(path_to_python)


# Python packages ----
sagemaker <- import("sagemaker")
class(sagemaker)

Let’s create an Amazon Simple Storage Service (S3) bucket for your data.

In [7]:
session <- sagemaker$Session()
bucket <- session$default_bucket()
print(bucket)

[1] "sagemaker-us-east-1-482851446821"


Upload data to personal S3 bucket

In [8]:
abalone_on_s3_uri <- session$upload_data(path = 'dataset/abalone.csv', bucket = bucket, key_prefix = 'data')
print(abalone_on_s3_uri)

[1] "s3://sagemaker-us-east-1-482851446821/data/abalone.csv"


### Utilize Native R Packages to read from S3
The `aws.s3` library provides a `s3read_using` function to load data directly 
into memory. Using the additional `aws.ec2metadata` library, we are able to 
utilize your SageMaker execution role's credentials.

In [13]:
if (!'aws.s3' %in% installed.packages()) {install.packages('aws.s3')}

# aws.ec2metadata doesn't work as IMDBv2 is being used on VM
if (!'aws.ec2metadata' %in% installed.packages()) {install.packages('aws.ec2metadata')}

In [21]:
library(aws.s3)


df_s3 <- s3read_using(FUN = read.csv, object = "data/abalone.csv", bucket = bucket)
head(df_s3)

Unnamed: 0_level_0,M,X0.455,X0.365,X0.095,X0.514,X0.2245,X0.101,X0.15,X15
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
5,I,0.425,0.3,0.095,0.3515,0.141,0.0775,0.12,8
6,F,0.53,0.415,0.15,0.7775,0.237,0.1415,0.33,20


## Package Management
Users are able to install packages using the native R `install` command as well 
as through the graphical interface in Rstudio. When creating your domain there 
is an optional parameter to set a RStudio Package Manager URL so your team 
can utilize internal repositories as well.

## Publishing to RStudio Connect
Functionality to publishing to RStudio Connect works as expected and depending 
on your networking configuration, your domain can utilize RStudio connect 
servers in a private subnet.
