Welcome to the Microsoft R for Data Science Course Repository. You can find the latest materials from the workshop here, and links for course materials from prior iterations of the course ca be found in the version pane. While this course is intended for data scientists and analysts interested in the Microsoft R programming stack (i.e., Microsoft employees in the Algorithms and Data Science group), other programmers might find the material useful as well.
Please refer to the course syllabus for the full syllabus. The goal of this course is to cover the following modules, although some of the latter modules might be repalced for a hackathon/office hours.
- Topics:
- R Fundamentals
- Data Manipulation with
dplyr
- Data Manipulation with
dplyrXdf
- Modeling and Scoring with Microsoft R
- Parallel Computing with the
RevoScaleR
package - Deploying Models with the
AzureML
package - RxSpark and R APIs for Spark
We will use DSVMs (Data Science Virtual Machines) from the Azure marketplace to run the course materials. For the Spark training, we will use Spark HDInsight Premium clusters, also from Azure. If you are interested in running these materials in a different environment, see the course wiki for instructions.
- JupyterHub:
- RStudio Server:
- You can find credentials for the VMs at aka.ms/redmond-r
We are still in the process of transitioning our course materials from our Revolution Repository to the Azure repository and Cortana gallery. Currently, you can find the following two courses on the Cortana Gallery: