# Introduction to NoSQL and Object Storage.

This lesson walks through the create and read operations on `redis`. We will also fetch data from `google cloud storage`.

## Redis

Redis is an in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs and geospatial indexes with radius queries.

We will be connecting to a redis database hosted on Redis Labs. Redis Labs is a cloud database service that allows you to host redis databases on the cloud.

Prerequisite: The learner is requested to set up an account on Redis [here](https://redis.io/) and set up a (free tier) cluster. 

If you need some guides, please refer to the screenshots below:

[Step 1](../assets/redis_create_db_step1.png)  (create database)

[Step 2](../assets/redis_create_db_step2.png)  (choose **free** cluster, leave all other settings as **default** including `Name`, `Cloud vendor`, `Region`. Click the `Create database` button below.)

[Step 3](../assets/redis_create_db_step3.png)  (click 'connect' to get connect instructions)

[Step 4](../assets/redis_create_db_step4.png)  (choose 'Redis Client' - 'Python')

[Step 5](../assets/redis_create_db_step5.png) (copy and paste the python code into the cell below - Note: please use the `Copy` button provided at the bottom right instead of manually copying and paste. If you manually copy and paste, your password(auto-generated) will not be copied over!)

We will be using the `redis-py` library to connect to the redis database

### Connecting to Redis

#### **Connection Code**

In [38]:
# Paste your code from Step 5 above below this line
# -------------------------------------------------
"""Basic connection example.
"""

"""Basic connection example.
"""

"""Basic connection example.
"""

import redis

r = redis.Redis(
    host='redis-11621.c85.us-east-1-2.ec2.cloud.redislabs.com',
    port=11621,
    decode_responses=True,
    username="default",
    password="0SktUugx9synNE673fjVBn38ZTR0nTQi",
)

success = r.set('foo', 'bar')
# True

result = r.get('foo')
print(result)
# >>> bar

# Please delete the settings below after .env file is created and password configured



bar


In [6]:
# # Either use the code provided from Step 5 above or the code below to connect to your Redis database.
# # Make sure to replace <REDIS-URL> and <YOUR-PASSWORD> with your actual Redis database URL and password.
# # If you are using the code from Step5, you can skip this section.
# import redis

r = redis.Redis(
   host='redis-11621.c85.us-east-1-2.ec2.cloud.redislabs.com', # E.g.'redis-xxxxx.c252.ap-southeast-1-1.ec2.cloud.redislabs.com'
   port=10908,
   password='0SktUugx9synNE673fjVBn38ZTR0nTQi' 
 )

<span style="color: red"><b>IMPORTANT</b></span>

<span style="color: red"><b>Your connection secrets including the password is yours. Do not share the secrets above. Do not sync the information above to your public repository. To safely keep your password and connection string locally. Please run the following cell. Then you need to copy the host, port and password to the dotenv file.</b></span>

#### **Creating a dotenv Template Locally**

<span style="color: blue"><b>You only need to run the cell below ONCE. Do not run this cell again if you have copied your password.</b></span>

In [None]:
import os

# Cell: Create the .env Template File
# -----------------------------------
template_content = # .env file for Redis Connection
# 1. REPLACE the placeholder values (XXXXX) below with the actual
#    Host, Port, Username, and Password from the connection code in the cell above.
# 2. Save the file.

    host='redis-11621.c85.us-east-1-2.ec2.cloud.redislabs.com',
    port=11621,
    decode_responses=True,
    username="default",
    password="0SktUugx9synNE673fjVBn38ZTR0nTQi"


# Check if env file already there
env_file_path = '../.env'
if os.path.exists(env_file_path):
    print(f"Warning: The file '{env_file_path}' already exists.")
    print("Please check the file and update the values if necessary.")
    print("Please open the '../.env' file and replace the 'XXXXX' placeholders with your actual secrets.")
    exit(1)
else:
    # Use mode 'w' to overwrite the file cleanly
    try:
        with open('../.env', 'w') as f:
            f.write(template_content)

        print("Template file created: ../.env")
        print("Please open the '../.env' file and replace the 'XXXXX' placeholders with your actual secrets.")
    except Exception as e:
        print(f"Error creating file: {e}")


NameError: name 'f' is not defined

#### **Testing Redis Connection with dotenv**

**You can start from here if you have already setup the env file.**

In [36]:
import os
import redis
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()
REDIS_HOST = os.getenv('REDIS_HOST')
REDIS_PORT = os.getenv('REDIS_PORT')
REDIS_USERNAME = os.getenv('REDIS_USERNAME')
REDIS_PASSWORD = os.getenv('REDIS_PASSWORD')

import redis

r = redis.Redis(
    host=REDIS_HOST,
    port=REDIS_PORT,
    decode_responses=True,
    username=REDIS_USERNAME,
    password=REDIS_PASSWORD,
)

success = r.set('foo', 'bar')
# True

result = r.get('foo')
print(result)
# >>> bar

python-dotenv could not parse statement starting at line 1
python-dotenv could not parse statement starting at line 4


ConnectionError: Error -2 connecting to redis-11621.c85-east-1-2.ec2.cloud.redislabs.com:10908. Name or service not known.

<span style="color: red"><b>IMPORTANT</b></span>

<span style="color: red"><b>If your connection is good using dotenv. Please remove what you have pasted previously on the `Connection Code`. </b></span>

A Redis database holds `key:value pairs` and supports commands such as GET, SET, and DEL, as well as several hundred additional commands.

- Redis keys are always strings.
- Redis values may be a number of different data types. Some of the more essential value data types are- string, list, hashes, and sets. Some advanced types include geospatial items and stream.

Many Redis commands operate in constant O(1) time, just like retrieving a value from a Python dict or any hash table.

Let's create a new key called `'name'` with the value `'Aaron'`.

In [39]:
r.set('name', 'Aaron')

True

Read the value of the key `'name'`:

In [40]:
r.get('name')

'Aaron'

We can update the value with `.set` too:

In [41]:
r.set('name', 'Bob')

True

In [42]:
r.get('name')

'Bob'

> Set a key `age` with value of `20`.
>
> Then read the value.

In [43]:
r.set('age', 20)
r.get('age')

'20'

To push a list, you need to use `rpush`:

In [44]:
r.rpush("names", "Aaron", "Bob", "Charlie")

6

In [46]:
r.lpush("names", "Aaron", "Bob", "Charlie")

9

In [45]:
#list by index
r.lindex("names", 2)

'Charlie'

You can use `mset` to set multiple keys at once.

In [47]:
#m = multiple
r.mset({
    "name": "John",
    "age": 30,
})

True

In [48]:
r.mget("name", "age")

['John', '30']

In [29]:
r.get('age')

'30'

Redis `hashes` are record types structured as collections of field-value pairs. You can use hashes to represent basic objects.

```python
# Create a new hash with my name as the key
r.hset(
    'zane lim',
    mapping={
        "age": 21,
        "email": "zl@gmail.com",
        "hobby": "coding",
    },
)
```

Then get the hash nested value back:


In [50]:
#h for hash
r.hset(
    'zane lim',
    mapping={
        "age": 21,
        "email": "zl@gmail.com",
        "hobby": "coding",
    },
)

0

In [51]:
r.hget("zane lim", "email")
#r.hget("zane lim", "hobby")

'zl@gmail.com'

Get the object back as a dictionary:

In [52]:
r.hgetall("zane lim")

{'age': '21', 'email': 'zl@gmail.com', 'hobby': 'coding'}

> Create a new hash with your name as the key, and a mapping of `age`, `email`, `hobby`.

In [53]:
r.hset(
    'Danny Teo',
    mapping={
        "age": 59,
        "email": "dannyteo.bigdata@gmail.com",
        "hobby": "coding & IT",
    },
)

3

In [54]:
r.hgetall("Danny Teo")

{'age': '59', 'email': 'dannyteo.bigdata@gmail.com', 'hobby': 'coding & IT'}

Always a good practice to shutdown your Redis cluster if not going to be used in future. Click into your DB and hit `Delete`. See this [screenshot](../assets/redis_terminate_db.png) for a guide.

## Google Cloud Storage

Google Cloud Storage is an Object Storage service in Google Cloud.

### Bucket
- A bucket is a container for objects stored in Google Cloud Storage.
- Every object is contained in a bucket.
- Each bucket is associated with a project.
- A bucket has a unique name across all of Google Cloud Storage.

### Object
- An object is a piece of data, such as a file, that is stored in Google Cloud Storage.
- An object is also called a `blob` (binary large object) in Google Cloud Storage. 
- An object is composed of the object's data and its metadata. 
- Metadata is a collection of name-value pairs that describe the object. You can use metadata to search for objects.

We will be using the `google-cloud-storage` python library to fetch data from the public [Landsat Collection 1](https://console.cloud.google.com/storage/browser/gcp-public-data-landsat;tab=objects?prefix=&forceOnObjectsSortingFiltering=false) dataset demonstrated just now.

In [10]:
from google.cloud import storage


In [14]:
# Create a client for GCP Storage
#client = storage.Client()

# Some system may require explicit project ID
# Uncomment the line below to specify project ID if the previous line fails

client = storage.Client(project='project-6d6e72cf-66ac-4cc8-95b') #project-6d6e72cf-66ac-4cc8-95b

In [60]:
# For those who still got problem with GCP authentication
# Uncomment the line below to create an anonymous client

client = storage.Client.create_anonymous_client()

In [15]:
bucket = client.get_bucket('gcp-public-data-landsat')

Note that you need to do `gcloud auth application-default login` to run the cell above. 

If the error persists, you may also need to restart the kernel (in VSCode, click the `Restart` button).

Get bucket metadata:

In [16]:
print("Bucket name: {}".format(bucket.name))
print("Bucket location: {}".format(bucket.location))
print("Bucket storage class: {}".format(bucket.storage_class))

Bucket name: gcp-public-data-landsat
Bucket location: US
Bucket storage class: STANDARD


List blobs in a bucket:

In [17]:
blobs = bucket.list_blobs()

print("Blobs in {}:".format(bucket.name))
for ix, item in enumerate(blobs):
    print("\t" + item.name)
    if ix == 50:
        break

Blobs in gcp-public-data-landsat:
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_ANG.txt
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B10.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B11.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B2.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B3.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B4.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B5.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B6.TIF
	LC08/01/

Get a blob and display metadata:

In [18]:
blob = bucket.get_blob("LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF")

print("Name: {}".format(blob.id))
print("Size: {} bytes".format(blob.size))
print("Content type: {}".format(blob.content_type))
print("Public URL: {}".format(blob.public_url))

Name: gcp-public-data-landsat/LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF/1502391058568908
Size: 75085385 bytes
Content type: application/octet-stream
Public URL: https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF


Download a blob to a local directory:

In [19]:
output_file_name = "../output/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF"
blob.download_to_filename(output_file_name)

print("Downloaded blob {} to {}.".format(blob.name, output_file_name))

Downloaded blob LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF to ../output/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF.
