# Working with Cassandra using Python

***

## Intro

Cassandra is an open source NoSQL database licenced with the Apache Licence. It's main focus is on providing flexibility, scalability and performance. Unlike it's peers, Cassandra can scale horizontally and dynamically by adding more servers, without the need to re-shard or reboot. Cassandra seeks to avoid vertical scalability limitations of any sort:
> 1. there are no dedicated name nodes (all cluster nodes can serve as such);
> 2. no practical architectural limitations on data sizes, row/column counts etc.

A single global Cassandra cluster can simultaneously service applications and asynchronously replicate data across multiple geographic locations through a customizable replication factor and special support to determine which cluster nodes to designate as replicas. This makes it well suited for cross-datacenter and cross-regional deployment with no single-points-of-failure.

***

#### Installation

To install Cassandra on Ubuntu, we will need to first update our packages:

```bash
sudo apt-get update
sudo apt-get upgrade
```

Then install Java:

```bash
sudo apt-get install default-jdk
```

We can now install Apache Cassandra:
```bash
echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list

# Adds Cassandra repository keys
curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key A278B781FE4B2BDA

# Updating package index
sudo apt-get update

# Installing Cassandra
sudo apt-get install cassandra
```

To start the service:

```bash
sudo service cassandra start
```

To stop the service:

```bash
sudo service cassandra stop
```

To check the status of the service:

```bash
sudo service cassandra status
```


***

#### Connecting to Cassandra

To begin, we need to setup an instance of `Cluster`. As the name suggests, we will typically have one instance of `Cluster` for each Cassandra cluster we want to interact with.

In [None]:
# Instantiating a connection
from cassandra.cluster import Cluster
cluster = Cluster()

This attempts to connect to Cassandra on our local machine. Instantiating a `Cluster` does not actually connect to any node.

To specify a list of nodes for our cluster:

```python
from cassandra.cluster import Cluster
cluster = Cluster(['192.168.0.1', '192.168.0.2'])
```

The set of IP addresses passed to `Cluster` are initial contact points. After the driver connects one of these nodes, it will automatically discover the rest of the nodes in the cluster and connect to them. So, there is no need to list every node in our cluster.

To establish a connection and begin making queries, we need a `Session`:

In [None]:
# Creating a session
session = cluster.connect()

#### References

[ 1 ] How to install Apache Cassandra on [Ubuntu](https://www.rosehosting.com/blog/how-to-install-apache-cassandra-on-ubuntu-16-04/).

[ 2 ] Netflix [blog](https://medium.com/netflix-techblog/nosql-at-netflix-e937b660b4c) scaling their highly distributed infrastructure.