#### SparkSession和SparkContext的关系
Spark中不同模块设置了不同的容器，也即Context
在2.0版本以前：
其中主体的Spark RDD操作时，使用的是
**SparkContext**
而使用Streaming SQL Hive时，分别使用 Streaming Context， SQL Context ， Hive Context

在2.0后中，为了统一上述的Context，引入**SparkSession，实质上是SQLContext、HiveContext、SparkContext的组合。**

#### 在SparkSession中使用SparkContext
在Spark中，SparkContext只允许存在一个，而SparkSession实际上也是一种其他类型的Context，
所以并不能同时创建SparkContext和SparkSession。

实际上SparkSession中包含了一个SparkContext的模块，可以通过spark.sparkContext的方法调用

In [2]:
from pyspark.sql import Row
from pyspark import SparkContext
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('correlation example').getOrCreate()

# Load a text file and convert each line to a Row.
lines = spark.sparkContext.textFile("file:/D:/Software/spark-2.4.3-bin-hadoop2.7/examples/src/main/resources/people.txt")
lines.take(10)
parts = lines.map(lambda l: l.split(","))
people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))


In [8]:
lines.take(10)

['Michael, 29', 'Andy, 30', 'Justin, 19']

In [5]:
people.take(10)

[Row(age=29, name='Michael'),
 Row(age=30, name='Andy'),
 Row(age=19, name='Justin')]

#### 可以使用SQL语句操作dataframe数据

In [6]:
# Infer the schema, and register the DataFrame as a table.
schemaPeople = spark.createDataFrame(people)
schemaPeople.createOrReplaceTempView("people")

# SQL can be run over DataFrames that have been registered as a table.
teenagers = spark.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")

# The results of SQL queries are Dataframe objects.
# rdd returns the content as an :class:`pyspark.RDD` of :class:`Row`.
teenNames = teenagers.rdd.map(lambda p: "Name: " + p.name).collect()
for name in teenNames:
    print(name)
# Name: Justin

Name: Justin


In [7]:
schemaPeople.show(2)


+---+-------+
|age|   name|
+---+-------+
| 29|Michael|
| 30|   Andy|
+---+-------+
only showing top 2 rows

