#Introduction to Apache Spark Workshop
##Description
This Repo contains all the files, folders, code, code and data needed for the Intro to Apache Spark Workshop.
##Apache Spark
Apache Spark is a framework for writing big data, streaming, machine learning and graphx jobs.
Main Web Page: http://spark.apache.org/
##Files and Folders
- spark-workshop-data
- A folder containing all the data used in the exercises
- spark_workshop_codebase
- A maven project containing a codebase that you can use to do the exercises
- Advanced Exercise Answers
- Possible solutions to the Advanced Exercises
- CDH Quick Start
- A document that helps describe how to use the VM
- Exercise Answers
- Possible solutions to the standard Exercises
- Intro to Apache Spark Slides
- Powerpoint Slides used in the presentation segment of the workshop
- Pamplet
- A document that can be sent out to participants that includes all the information the individual should need to know about the workshop
- Schedule
- A rough schedule detailing how long each section and exercise should take and when things are set to begin
- Setup
- A Document that contins some VM Setup instructiosn to ensure that the users environment is ready for the final VM to be provided
- Setup and Exercises
- A Document that contins the VM Setup instructions and all the exercises that participants will do
- Workshop Abstract
- A descrtiption of the workshop
- load-data-into-hdfs.sh
- A shell script that can be placed on the VM that will replace the spark-workshop-data in that VM with whats on the local drive
- load-file-to-vm.sh
- Loads all the needed files including the data and codebase onto the VM