![DataStax Academy](https://s3.amazonaws.com/datastaxtraining/vq8Jr36Gk48v/datastax-academy.svg "DataStax Academy")

# Exercise 03.01 - Cassandra Connector: Cassandra Retrieve Data

## Background

This exercise will have you practice using the Spark Cassandra API, which you'll be using to implement a CQL query to retrieve data in Spark.

The data comes from a Cassandra table, `videos_by_year_title`, with the following definition:

***

## Directions

#### 1. Write a Spark query to retrieve movies in the year 2015 where the movie title starts with the letter `T` or greater. Print the results taking only the first five records. Be sure to use the Spark Cassandra API rather than the Spark functions themselves so that Cassandra does as much of the processing as possible.

In [1]:
val qryPart1 = sc.cassandraTable("killr_video", "videos_by_year_title").select("title")
val qryPart2 = qryPart1.where("added_year = 2015 and title >= 'T'").limit(5)
val resultRDD = qryPart2.collect
resultRDD.foreach(row => println(row.getString("title")))

Taken 3
Tales of Halloween
Tangerine
Ted 2
Terminator Genisys


#### 2. Rewrite your query to use Spark API instead of the Spark Cassandra API functions. Remember that this way is less optimal because it causes Spark to load and process all the data from Cassandra itself.

In [2]:
val rows = sc.cassandraTable("killr_video", "videos_by_year_title")
val filterYear = rows.filter(record => record.getInt("added_year") == 2015)
val filterTitle = filterYear.map(record => record.getString("title")).filter(title => title >= "T")
filterTitle.take(5).foreach(println)

Taken 3
Tales of Halloween
Tangerine
Ted 2
Terminator Genisys


Notice with the Cassandra API, order of function calls doesn’t matter as the calls are metadata for a Cassandra query. However in the Spark API, order matters. We can’t `take()` before we `filter()` or else `filter()` will only evaluate the number of records we `take()`.