# Hello from a Python Notebook

## Introduction

In this exercise we'll follow up on the hello-world exercise that we did in the Spark Shell by parsing and executing code inside a Python enabled Notebook.

We've put the lyrics of the song into a text file for you.  It's called `hello-adele.txt` and is placed in a mapped folder for the Notebooks (called `Resources`).

## Running some code

The code can be written in fragments. We'll start by creating a `SparkSession`:

In [None]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Python Spark SQL").getOrCreate()

We'll use the naming convension of calling that session `spark` for our Python programmers. This is a little bit different from the Shell where we had an implicit sc object. We generally don't want to use the SparkContext as it can only be instantiated once per notebook. If someone runs the cell creating the context several times, they would experience an exception. The cell above can be executed multipe times.

Next we can create a relationship to the lines in the text file:

In [None]:
lines = spark.sparkContext.textFile("../Resources/hello-adele.txt") # Assumes you are one in the work directory

`lines` is now a relationship to an RDD. 

In Python, the way to access the first line is a bit different:

In [None]:
lines.first()

And the second line can be collected as:

In [None]:
lines.collect()[1]

In the exercise, we also counted the lines. This can be done as follows in Python and Spark:

In [None]:
lines.count()

And of course we can get to all the words and read the first 5 perhaps?

In [None]:
lines.flatMap(lambda line: line.split(" ")).take(5)

The syntax for lambda functions is quite different, but other than that, it is almost identical to what you would have to do in Scala.

That's probably enough... At think at this point we can jump to the final solution:

In [None]:
lines\
.flatMap(lambda line: line.split())\
.filter(lambda word: word != "")\
.map(lambda word: (word.lower(), 1))\
.reduceByKey(lambda lv,rv: lv + rv)\
.sortBy(lambda a : a[1], False)\
.take(5)


## Congratulations!

You just ran Spark inside a Notebook!

Feel free to play around yourself in the cell below. Or you can make your own notes in the notebook. If you want to save some of the changes, make sure to export the notebook by selecting `File` -> `Download As` -> `Notebook`