## Summary

This notebook is to ensure that you can get spark running on your machine.  Do not concern yourself with the code, only with what is returned.

Also, be patient, the first time you run this, the magic `%classpath add mvn` will go out and grab spark from maven central and it will take some time.  

In [None]:
assert(scala.util.Properties.versionString == "version 2.11.12")

## Get Spark

In [None]:
%classpath add mvn org.apache.spark spark-mllib_2.11 2.3.0

## Import Spark

In [None]:
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext

## Start Spark Session

In [None]:
val conf = new SparkConf().setMaster("local[*]").setAppName("test") // Spark configuration
val sc = new SparkContext(conf)  // initialize spark context
val sqlContext = new org.apache.spark.sql.SQLContext(sc)  // initialize sql context
implicit val spark = SparkSession.builder.config(conf).getOrCreate() // start spark session 
import spark.implicits._

## Sum First 100 Counting numbers

In [None]:
val n = 100
sc.parallelize(1 to n).sum

the above should return `5050.0`

## Create a DataFrame

In [None]:
val xs = sc.parallelize( List(("cat", "neil", 10), ("dog", "keith", 3)) )

In [None]:
val xDf = xs.toDF("animal", "name", "age")

In [None]:
xDf.printSchema

`xDf.printSchema` should return 

```
root
 |-- animal: string (nullable = true)
 |-- name: string (nullable = true)
 |-- age: integer (nullable = false)
```

In [None]:
xDf.show

`xDf.show` should return 

```
+------+-----+---+
|animal| name|age|
+------+-----+---+
|   cat| neil| 10|
|   dog|keith|  3|
+------+-----+---+
```

In [None]:
val titanicData = sqlContext.read.format("csv").option("header", "true").option("inferSchema", "true").load("data/train.csv")
titanicData.printSchema

In [None]:
val p = titanicData.filter("survived = 1").count / titanicData.count.toDouble
println(f"survival rate of passengers: $p%.3f")