### How to Create a Spark Session

!["Spark Image"](spark.png)

Image courtesy of DataBricks: [How to use Spark Session in Apache Spark 2.0](https://www.databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html)

The basic building block for working with Spark in a modern context. In shorter terms: it’s the modern entry point into Spark.

The Spark Session was [introduced in the 2.0 release of Apache Spark](https://www.databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html), and was designed to both replace and consolidate the previous methods for accessing Spark: contexts! Sessions also made it easier to:

+ configure runtime properties of Spark applications after instantiation, (e.g. set spark.sql.shuffle.partitions)
+ create DataFrames and DataSets
+ use Spark SQL and access Hive (or Glue, or equivalent)

So, instead of using the Spark Context, SQL Context and Hive Context objects of the past, you can simply use a Spark Session. You maintain all of the convenience of those legacy objects with one straightforward instantiation.

***The steps for creating a Spark Session are provided in Python languages: Select your API of choice and proceed!***

In [7]:
import findspark
findspark.init()
findspark.find()

'C:\\Spark\\sparkhome'

In [8]:
from pyspark.sql import SparkSession

In [10]:
spark = SparkSession\
    .builder\
    .appName("My Spark App")\
    .master("local[2]")\
    .getOrCreate()

**Note:**

***.master()*** is not always required, but allows you to establish where your Spark application should run. If you choose local[*], where * is the number of worker threads to use, you’ll be running a local Spark application on your local machine. If you choose YARN, you’ll leverage an existing Hadoop cluster upon which a running Spark application should recognize. In notebook environments like Databricks or Qubole Notebooks, it’s unlikely you need to create a session at all — just use the provided spark (i.e. session) variable for your needs.

**Conclusion**
Now, you’re ready to start using Spark. That might not have been very exciting, but in the next exercise we’ll dive headfirst into our first actual Spark application — data deduplication!