R on Azure
Development of data science and AI becomes easier than ever before thanks to cloud computing. The Github repo site collects a set of R packages, tools, and case-studies for doing R data science on Azure cloud.
R packages and tools
These packages and tools are categoried into four groups, representing four typical tasks data scientists or AI developers may frequently work on.
|Cloud resource operation and administration||Simplify the way to interact with Azure cloud platform and operate resouces on Azure for various tasks.|
|Scalable and advanced analytics||Enable large-scale and parallel data analytics in R environment.|
|Remote interaction and access to Cloud instance||Enhance work efficiency on cloud for R based analytics.|
|Application and service deployment||Make operationalizing solution and deploying it as service easy.|
Cloud resource operation and administration
R packages and tools in this category are featured by offering a simplified way to interact with Azure cloud platform and operate resouces (e.g., blob storage, Data Science Virtual Machine, Azure Batch Service, etc.) on Azure for various tasks.
- AzureSMR - R package for managing a selection of Azure resources. Targeted at Data Scientists who need to control Azure Resources without needing to both Administrators. APIs include Storage Blobs, HDInsight(Nodes, Hive, Spark), ARM, VMs.
- AzureDSVM - R package that offers convenient harness of Azure DSVM, remote execution of scalable and elastic data science work, and monitoring of on-demand resource consumption.
- doAzureParallel - R package that allows users to submit parallel workloads in Azure.
- rAzureBatch - a HTTP proxy library written in R for Azure.
- AzureML - an R interface to AzureML experiments, datasets, and web services.
Scalable and advanced analytics
R packages and tools in this category allow one to performan large-scale R-based analytics on cloud with the bleeding-edge frameworks such as Spark, Hadoop, Microsoft Cognitive Toolkit, Tensorflow, Keras, etc. NOTE: many of the tools are pre-installed and configured for direct use on Azure Data Science Virtual Machine.
- dplyrXdf - a dplyr backend for Revolution Analytics xdf files.
- sparklyr - R interface for Apache Spark.
- SparkR - SparkR is an R package that provides a light-weight frontend to use Apache Spark from R.
- CNTK-R - R bindings to the CNTK library.
- tensorflow - R interface to Tensorflow.
- mxnet - The MXNet R package brings flexible and efficient GPU computing and state-of-art deep learning to R.
- keras - R interface to Keras.
- darch - Create deep architectures in R.
- deepnet - Implement some deep learning architectures and neural network algorithms, including BP,RBM,DBN,Deep autoencoder and so on.
- gpuR - R interface to use GPU.
- RevoScaleR - a collection of portable, scalable, and distributable R functions for importing, transforming, and analyzing data at scale.
- MicrosoftML - a package that provides state-of-the-art fast, scalable machine learning algorithms and transforms for R.
- h2o - R interface to H2O.
Interaction and remote access
The R packages and tools in this category help data scientists or developers to easily remote access or interact with Azure cloud instances or services for convenient development.
- mrsdeploy - an R package that provides functions for establishing a remote session in a console application and for publishing and managing a web service that is backed by the R code block or script you provided.
- R Tools for Visual Studio - IDE with R support.
- RStudio Server - IDE for remote R session with access via Internet browser.
- JupterHub - Jupyter notebook with multi-user access.
- IRKernel - R kernel for Jupyter notebook.
Application and service deployment
The R packages and tools in this category are used for deploying an R-based analytics or applicaiton as services or interfaces that can be conveniently consumed by end-users or developers.
- mrsdeploy - an R package that provides functions for deploying easily-consumable service within R session.
- AzureML - an R package to allow one to interact with Azure Machine Learning Studio for publishing R functions as API services.
- Azure Container Instances - Azure service to allow running a containerized R analytics on cloud.
- Azure Container Services - Azure service that simplifies deployment, management, and operation of orchestrated containers of R analytics.
- Shiny server - Develop and publish Shiny based web applications online.
Real-world use cases
The real-world use cases below show case Azure cloud-based analytical solutions that involve the aforementioned R packages or tools.
|Use case||Key R packages or tools|
|Campaign management||RevoScaleR, RTVS/RStudio|
|Customer churn prediction||RevoScaleR, MicrosoftML, RTVS/RStudio|
|Energy demand forecasting||RevoScaleR, MicrosoftML, RTVS/RStudio|
|Fraud detection||RevoScaleR, RTVS/RStudio|
|Galaxies classification||RevoScaleR, mrsdeploy, MicrosoftML, RTVS/RStudio|
|Performance test tuning||RevoScaleR, RTVS/RStudio|
|Predictive maintenance||RevoScaleR, RTVS/RStudio|
|Retail forecasting||RevoScaleR, RTVS/RStudio|
|Credit risk scoring||MicrosoftML, mrsdeploy, Shiny, RTVS/RStudio|
|Drop-out prediction||MicrosoftML, Jupyter Notebook|
|Product demand forecasting||RevoScaleR, RTVS/RStudio|
|Solar panel forecasting||AzureSMR, AzureDSVM, keras, RTVS/RStudio|
|Employee attrition prediction||AzureSMR, AzureDSVM, Azure Container Services, Shiny, RTVS/RStudio|
|Flight delay prediction||AzureSMR, AzureDSVM, MicrosoftML, SparkR, RTVS/RStudio|
|Monte Carlo price simulation||doAzureParallel, RTVS/RStudio|