Skip to content

viennadatasciencegroup/kf-2017-11-09-R-and-spark

Repository files navigation

Integrating R into the big data ecosystem using sparklyR

R is a powerful language for data science, but on its own cant cope with large amounts of big data. sparklyR bridges this gap by connecting R to the hadoop ecosystem using spark via the tidy grammar of dplyR.

Agenda

  • Types of BigData
  • Introduction to Hadoop
  • Hadoop Ecosystem
  • Introduction to Spark (RDD)
  • Spark overview
  • Integration of spark with R via sparklyR
  • Architecture
  • Demo
  • Downsides of on spark native languages
  • streaming and R?

The slides can be found here https://docs.google.com/presentation/d/1NHG7-WoEUsjrdxFjy01OmZjxWB-FZomhfrxO-QapzKg/edit?usp=sharing as well as a PDF within the repository.

Here is the sample code for the lab

Releases

No releases published

Packages

No packages published