Skip to content

teamclairvoyant/intro-to-spark

Repository files navigation

#Introduction to Apache Spark Workshop

##Description

This Repo contains all the files, folders, code, code and data needed for the Intro to Apache Spark Workshop.

##Apache Spark

Apache Spark is a framework for writing big data, streaming, machine learning and graphx jobs.

Main Web Page: http://spark.apache.org/

##Files and Folders

  • spark-workshop-data
    • A folder containing all the data used in the exercises
  • spark_workshop_codebase
    • A maven project containing a codebase that you can use to do the exercises
  • Advanced Exercise Answers
    • Possible solutions to the Advanced Exercises
  • CDH Quick Start
    • A document that helps describe how to use the VM
  • Exercise Answers
    • Possible solutions to the standard Exercises
  • Intro to Apache Spark Slides
    • Powerpoint Slides used in the presentation segment of the workshop
  • Pamplet
    • A document that can be sent out to participants that includes all the information the individual should need to know about the workshop
  • Schedule
    • A rough schedule detailing how long each section and exercise should take and when things are set to begin
  • Setup
    • A Document that contins some VM Setup instructiosn to ensure that the users environment is ready for the final VM to be provided
  • Setup and Exercises
    • A Document that contins the VM Setup instructions and all the exercises that participants will do
  • Workshop Abstract
    • A descrtiption of the workshop
  • load-data-into-hdfs.sh
    • A shell script that can be placed on the VM that will replace the spark-workshop-data in that VM with whats on the local drive
  • load-file-to-vm.sh
    • Loads all the needed files including the data and codebase onto the VM

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published