<center><h1>HW4: NoSQL on Cloud</h1></center>

<h2>Install Required Packages</h2>

In [None]:
!pip install google-cloud-bigtable
!pip install google-cloud-happybase

<h2>Initialize the Application</h2>

In [None]:
from google.cloud import bigtable
from google.cloud import happybase
from google.cloud.bigtable import column_family

#Populate project_id and instance_id if you are running on the cloud
project_id = ""
instance_id = ""

client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)


<h2>Create Tables</h2>

In [None]:
table_id = 'test'
print("Creating the {} table.".format(table_id))
table = instance.table(table_id)

print("Creating column family readingsection with Max Version GC rule...")
# Create a column family with GC policy : most recent N versions
# Define the GC policy to retain only the most recent 2 versions
max_versions_rule = column_family.MaxVersionsGCRule(2)
column_family_id = "readingsection"
column_families = {column_family_id: max_versions_rule}
if not table.exists():
    table.create(column_families=column_families)
else:
    print("Table {} already exists.".format(table_id))


<h2>Insert Rows into Tables</h2>

In [None]:
import datetime

print("Writing some reading types to the table.")
elements = ["Book", "Notebook", "Newspaper", "Journal"]
values = ["Intro to Java", "Jupyter Notebook", "New York Post", "IEEE"]
rows = []
column = "readings".encode()
for i, text in enumerate(elements):
    # All elements should be encoded
    row_key = text.encode()
    value = values[i].encode()
    # initialize a row with this key
    row = table.direct_row(row_key)
    # create a cell using the 4 pieces of data
    row.set_cell(
        column_family_id, column, value, timestamp=datetime.datetime.utcnow()
    )
    rows.append(row)
# insert rows
table.mutate_rows(rows)

<h2>Find a single Element in the Table</h2>

In [None]:
from google.cloud.bigtable import row_filters

row_filter = row_filters.CellsColumnLimitFilter(1)


print('Getting a single element by row key.')
key = 'Book'.encode()

row = table.read_row(key, row_filter)
cell = row.cells[column_family_id][column][0]
print(cell.value.decode('utf-8'))


<h2>Retrieve All Rows in BigTable Table!</h2>

In [None]:
print("Scanning for all readings:")
partial_rows = table.read_rows()

for row in partial_rows:
    cell = row.cells[column_family_id][column][0]
    print(cell.value.decode("utf-8"))

<h2>Delete Tables</h2>

In [None]:
print("Deleting the {} table.".format(table_id))
table.delete()

<h3>Read the Example for More Inforamtion <a href="https://cloud.google.com/bigtable/docs/samples-python-hello">https://cloud.google.com/bigtable/docs/samples-python-hello</a></h3>

<h2>BigTable Cloud Execution</h2>

To connect your Python code from your local machine to use Google BigTable on the cloud, conduct the following activities:
<ul>
<li>1. Generate Credentials JSON file according to the steps listed below</li>
    <li>2. Set environment variable: <b>GOOGLE_APPLICATION_CREDENTIALS</b> with the value as your absolute path to your credentials JSON file (e.g., /Users/xyz/credentials.json)</li>
     </ul>

<center><img src="https://www.andrew.cmu.edu/user/mfarag/static/create_credentials_json.png"/></center>

<ul>
    <li>3. Delete Your <b>BIGTABLE_EMULATOR_HOST</b> environment variable and restart your notebook</li>
    <li>4. Enable the Billing on Your GCP Account</li>
<li>5. Ensure you have a project created</li>
<li>6. Enable the BigTable API and Create BigTable Instance from <a href="https://console.cloud.google.com/bigtable/instances">https://console.cloud.google.com/bigtable/instances</a></li>
    <li>7. Create BigTable Instance on GCP and write down the instance ID</li>
    <li>8. Update your project ID and Instance ID in the code.</li>
     </ul>

<center><img src="https://www.andrew.cmu.edu/user/mfarag/static/bigTable_invalid_use_cases.png"/></center>

<center><img src="https://www.andrew.cmu.edu/user/mfarag/static/bigTable_valid_use_cases.png"/></center>

<h2>BigTable Application Example</h2>

<center><img src="https://www.andrew.cmu.edu/user/mfarag/static/nosql_in_big_app.png"/></center>

<h1>Optional Content - Not Included in Exam - AWS DynamoDB</h1>

<h2>Access DynamoDB</h2>

In most cases, you don’t have to install DynamoDB or maintain it. <b>You can sign up for an AWS account, create a DynamoDB table, and just go. Alternatively, you can use it on Docker for demonstration purposes.</b> In general, DynamoDB does require some operations-style thinking and preparation, but you’ll never need to provide it an XML configuration or set up a complex cluster like other NoSQL databases. Due to its massive capabilities, <a href="https://www.duolingo.com/">Duolingo</a> is using DynamoDB for its day-to-day operations. 

<center><img src="http://stat.cmu.edu/~mfarag/652/lectures/l10/duolingo.png"/></center>

<div style="background-color:AliceBlue"><br/><b>Let's create our first Shopping Cart in DynamoDB via Python<br/><br/></b></div>

We will use <b>Boto</b> Python library to write Python code that communicates with DynamoDB. Boto supports all current AWS cloud services, including Elastic Compute Cloud, DynamoDB, AWS Config, CloudWatch and Simple Storage Service.

In [None]:
# install Boto3 to your System
# for some of you, you may need to replace pip with pip3
!pip install boto3

<h2>Run DynamoDB Locally on Docker</h2>

To run Docker Locally, you need to pull DynamoDB Container and run it. More information can be found <a href="https://hub.docker.com/r/amazon/dynamodb-local">here</a>

<ul>
    <li>Run the docker pull command for the image: <b>docker pull amazon/dynamodb-local</b></li>
    <li>Execute the docker run command: <b>docker run -p 8000:8000 amazon/dynamodb-local</b></li>
    <li>In your terminal, run <b>docker container ls</b> or check your docker desktop to see if the container is running</li>
</ul>

In [None]:
print('creating dynamodb resource')
import boto3
dynamodb = boto3.resource(
    'dynamodb',
    endpoint_url='http://localhost:8000', ### This is the URL for my DynamoDB docker container
    region_name='us-east-1', ### This is a bit ugly but you have to specify valid AWS region for this code to work!
    aws_access_key_id='accesskey',
    aws_secret_access_key='secretkey',
    verify=False)

print ('got resource:', dynamodb) ## This is a confirmation that I'm able to connect to DynamoDB without problems 

In [None]:
table = dynamodb.Table('ShoppingCart')
print(table)
# delete the table if it exists
try:
    table.delete()
except Exception as e:
    print(e)


### Let's see if we can create the table or retrieve it if it is already created.
try:
    result = dynamodb.create_table( ### Now, I'm creating the table
        TableName='ShoppingCart',
        KeySchema=[
            {
                'AttributeName': 'ItemName',
                'KeyType': 'HASH'  # Partition key
            }
        ],
        AttributeDefinitions=[
            {
                'AttributeName': 'ItemName',
                'AttributeType': 'S'
            }
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 1,
            'WriteCapacityUnits': 1
        }
    )
    print('Created table:', result)
except: 
    print('There is a problem with the table creation. Try again later!')

table = dynamodb.Table('ShoppingCart')
print('got table:', table)

<h3> DynamoDB’s Almost Schemaless Data Model</h3>

In the Shopping Cart example, it seemed like the ShoppingCart table is being created with a strict schema according to which items in the table can have only an ItemName attribute. But DynamoDB doesn’t work that way. Whenever you create a table, you only have to define attributes that function as keys (sometimes referred to as key attributes).

So we could store items in our shopping cart table that have any number of other properties (brand name, year manufactured, ISBN, whatever) if we wanted to, without having to specify those attributes when we create the table. The only restriction on our ShoppingCart table is that each item must have an ItemName.

<b>Difference between indexes in MongoDB and DynamoDB</b>

But there’s a catch here: although schema restraints apply only to key attributes, you can’t query for attributes that aren’t specified as keys or indexes when you create the table (more on indexes later). So if you started storing items with a brand name attribute in the ShoppingCart table, you wouldn’t be able to discover items by brand name. If you wanted to do that, you’d have to create a new table and add the brand name as a key or index. And so even though schema design doesn’t force you into a straitjacket, you should make that decision very carefully.

This is in contrast to a database like <b>Mongo</b>, which is schemaless but allows you to query for whatever fields you want at any time.

<div style="background-color:AliceBlue"><br/><b>Notice that when we add a new item to our table, we can add as many additional attributes as we want - and we need to specify our primary key item (ItemName)-.<br/><br/></b></div>

In [None]:
table = dynamodb.Table('ShoppingCart')
response = table.put_item(
    Item={
        'ItemName': 'Dog Food',
        'Price': 32,
        'Store': 'Giant Eagle - North Shore'
    }
)

print("Put Item succeeded")
print(response)

In [None]:
table.put_item(
    Item={
        'ItemName': 'Dog Food',
        'Price': 40,
    }
)

<h3>Column Data Types</h3>

In <b>DynamoDB</b>, you can add a single value (or list of values) in every field. If you are adding single values, then you can use the following types:

<center><img src="http://stat.cmu.edu/~mfarag/652/lectures/l10/dynamo-scalar.png"/></center>

While if you want to add lists in your attributes directly, you can use the following datatypes:

<center><img src="http://stat.cmu.edu/~mfarag/652/lectures/l10/dynamo-set.png"/></center>

<h3>JSON in DynamoDB</h3>

JSON is the most common format of communication between applications on the web. Take a look at the following JSON:

<center><img src="http://stat.cmu.edu/~mfarag/652/lectures/l10/complex_json_example.jpg"/></center>

<div style="background-color:AliceBlue"><b><br/>If we were to fit this JSON into Postgres, how many tables would we need?<br/><br/></b></div>

<div style="background-color:AliceBlue"><b>Let's see how we can add new item for Tomato Sauce (and don't forget to include the price).</b></div>

In [None]:
import json

datadict = json.loads('{"ItemName": "Tomato Sauce", "Price": 6, "Expiration Date": {"N": "2023"},"Producer": {"Name": "Heinz"}}')
table = dynamodb.Table('ShoppingCart')
response = table.put_item(Item = datadict)
print(response)

<br/>Now, let's do some <b>CRUD</b> operations on the DynamoDB table.

<h3>CRUD Operations in DynamoDB</h3>

DynamoDB lets you search for your shopping cart items. If you are searching for a key, you can use <b>get_item()</b>. For example, let's search for Tomato Sauce in the shopping cart. 

In [None]:
from boto3.dynamodb.conditions import Key

table = dynamodb.Table('ShoppingCart')
def search_item_name(name):
    try:
        response = table.get_item(Key={'ItemName': name})
    except ClientError as e:
        print(e.response['Error']['Message'])
    else:
        return response['Item']
print (search_item_name('Tomato Sauce'))

You could alternatively use <b> query</b> function for this purpose.

In [None]:
from boto3.dynamodb.conditions import Key

table = dynamodb.Table('ShoppingCart')
def search_item_name(name):
    response = table.query(
        KeyConditionExpression=Key('ItemName').eq(name)
    )
    return response['Items']
print (search_item_name('Tomato Sauce'))

<div style="background-color:AliceBlue">&#9432; Note that every time you <b>put</b> an item with the same key in the table, it doesn't create a new record. Rather, it just updates the existing record with the updated values. Go back to our Tomato Sauce insertion example in the previous example and change the expiration date and search for Tomato Sauce again. Notice the difference!. If you were add an item with the same primary key in RDBMS, you would have received an error right away!</div>

Now if you would like to search for non-key element, you can use <b>FilterExpression</b> with <b>scan operation</b> as in the following:

In [None]:
from boto3.dynamodb.conditions import Attr

def scan_price_on_items(price):
    scan_kwargs = {
        'FilterExpression': Attr('Price').eq(price)
    }
    response = table.scan(**scan_kwargs)
    print(response.get('Items', []))

scan_price_on_items(6)

Also, you can retrieve all the records in one table using <b>scan</b> operation. Take the following example:

In [None]:
response = table.scan()
print(response.get('Items', []))

Finally, you can delete an item from your ShoppingCart data store using <b>delete_item</b> function. Take the following example to delete 'Tomato Sauce'

In [None]:
table = dynamodb.Table('ShoppingCart')
def delete_item(name):
    response = table.delete_item(Key={'ItemName': name})
    print(response)
print (delete_item('Tomato Sauce'))

<b>Note:</b> DynamoDB retrieves large data in pages. In order to know if you returned all the data, check LastEvaluatedKey.<br/>
    
The primary key of the item where the operation stopped, inclusive of the previous result set. Use this value to start a new operation, excluding this value in the new request.<br/>

If LastEvaluatedKey is empty, then the "last page" of results has been processed and there is no more data to be retrieved. For more information, check this page: <a href="https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html">https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html</a>

