# Lesson 1 Demo 2: Creating a Table with Apache Cassandra

**In this demo we are going to walk through the basics of creating a table in Apache Cassandra, inserting row of data, and doing a simple SQL query to validate the information.**

**We will use a python wrapper/python driver called cassandra to run the Apache Cassandra queries. This library should be preinstalled but in the future to install this library you can run this command in a notebook to install locally:**

! pip install cassandra-driver 

In [1]:
import cassandra

### First let's create a connection to the database 

This connects to our local instance of Apache Cassandra. This connection will reach out to the database and insure we have the correct privilages to connect to this database. Once we get back the cluster object, we need to connect and that will create our session that we will use to execute queries.

**Note 1: This block of code will be standard in all notebooks.**

In [2]:
from cassandra.cluster import Cluster 
try: 
    cluster = Cluster(['127.0.0.1']) #If you have a locally installed Apache Cassandra instance
    session = cluster.connect() 
except Exception as e:
    print(e)

### Let's Test our Connection 

We are trying to do a select * on a table we have not created yet. We should expect to see a nicely handled error.

In [3]:
try:
    session.execute("""select * from music_library""")
except Exception as e:
    print(e)

Error from server: code=2200 [Invalid query] message="No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename"


### Let's create a keyspace to do our work in

Note: Ignore the Replication Strategy and factor information for now. Those will be discussed later. Just know that on a one node local instance this will be the strategy and replication factor. 

In [4]:
try:
    session.execute("""
    CREATE KEYSPACE IF NOT EXISTS udacity
    WITH REPLICATION =
    {
    'class': 'SimpleStrategy', 'replication_factor': 1970
    }
    """
    )
except Exception as e:
    print(e)

**Connect to our Keyspace. Compare this to how we had to create a new session in PostgreSQL.**

In [5]:
try:
    session.set_keyspace('udacity')
except Exception as e:
    print(e)


**Let's imagine we would like to start creating a Music Library of albums. Each album has a lot of information we could add to the music library table, but we will just start with album name, artist name, year.
But ...
We are working with Apache Cassandra a NoSQL database. We can't model our data and create our table without more information.
What queries will I be performing on this data?**


**In this case I would like to be able to get every album that was released in a particular year.**

`select * from music_library WHERE YEAR=1970`

**Because of this I need to be able to do a WHERE on YEAR. YEAR will become my partition key, and artist name will be my clustering column to make each Primary Key unique. Remember there are no duplicates in Apache Cassandra.**

`Table Name: music_library
column 1: Album Name
column 2: Artist Name
column 3: Year
PRIMARY KEY(year, artist_name)`

**Now to translate this information into a Create Table Statement.**

**More information on Data Types can be found here: https://datastax.github.io/python-driver/

**Note: Don't worry if this all seems confusing, we will spend all of Lesson 3 on these topics.**

In [6]:
query = "CREATE TABLE IF NOT EXISTS music_library "
query = query + "(year int, artist_name text, album_name text, PRIMARY KEY (year, artist_name))"
try:
    session.execute(query) 
except Exception as e:
    print(e)

**No error was found, but let's check to ensure our table was created, `select count(*)` which should return 0 as we have not inserted any rows.**

Note: Depending on the version of Apache Cassandra you have installed, this might throw an 'ALLOW FILTERING' error instead of a result of "0". This is to be expected, as this type of query should not be performed on large datasets, we are only doing this for the sake of the demo.

In [7]:
query = "select count(*) from music_library"
try:
    count = session.execute(query)
except Exception as e:
    print(e)

print(count.one())

Row(count=0)


**Let's insert two row**

Note the syntax here

In [8]:
query = "INSERT INTO music_library (year, artist_name, album_name)"
query = query + " VALUES(%s, %s, %s)"

try:
    session.execute(query, (1970, "The Beatles", "Let It Be"))
except Exception as e:
    print(e)

try:
    session.execute(query, (1965, "The Beatles", "Rubber Soul"))
except Exception as e:
    print(e)

**Validate your data was inserted into the table.**

Note: The for loop is used for printing the results. If executing queries in the open, this would not be required.
    
Note: Depending on the version of Apache Cassandra you have installed, this might throw an 'ALLOW FILTERING' error instead of print the 2 rows we just inserted. This is to be expected, as this type of query should be performed on large datasets, we are only doing this for the sake of the demo. 

In [9]:
query = 'SELECT * FROM music_library'
try:
    rows = session.execute(query)
except Exception as e:
    print(e)

for row in rows:
    print(row.year, row.album_name, row.artist_name)

1965 Rubber Soul The Beatles
1970 Let It Be The Beatles


**Let's Validate our Data Model with our original query.**

`select * from music_library`

In [10]:
query = 'select * from music_library WHERE YEAR=1970'
try:
    rows = session.execute(query)
except Exception as e:
    print(e)

for row in rows:
    print(row.year, row.album_name, row.artist_name)

1970 Let It Be The Beatles


**For the sake of the demo, I will drop the table.**

In [11]:
query = 'drop table music_library'
try: 
    rows = session.execute(query)
except Exception as e:
    print(e)

**And Finally close the session and cluster connection**

In [12]:
session.shutdown()
cluster.shutdown()