# Introduction to NoSQL and Object Storage.

This lesson walks through the create and read operations on `redis`. We will also fetch data from `google cloud storage`.

## Redis

Redis is an in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs and geospatial indexes with radius queries.

We will be connecting to a redis database hosted on Redis Labs. Redis Labs is a cloud database service that allows you to host redis databases on the cloud.

Prerequisite: The learner is requested to set up an account on Redis [here](https://redis.io/) and set up a (free tier) cluster. 

If you need some guides, please refer to the screenshots below:

[Step 1](../assets/redis_create_db_step1.png)  (create database)

[Step 2](../assets/redis_create_db_step2.png)  (choose free cluster)

[Step 3](../assets/redis_create_db_step3.png)  (click 'connect' to get connect instructions)

[Step 4](../assets/redis_create_db_step4.png)  (choose 'Redis Client' - 'Python')

[Step 5](../assets/redis_create_db_step5.png) (copy and paste the python code into a cell below)

We will be using the `redis-py` library to connect to the redis database

In [2]:
%pip install redis

Collecting redis
  Obtaining dependency information for redis from https://files.pythonhosted.org/packages/df/a7/2fe45801534a187543fc45d28b3844d84559c1589255bc2ece30d92dc205/redis-6.3.0-py3-none-any.whl.metadata
  Downloading redis-6.3.0-py3-none-any.whl.metadata (10 kB)
Collecting async-timeout>=4.0.3 (from redis)
  Obtaining dependency information for async-timeout>=4.0.3 from https://files.pythonhosted.org/packages/fe/ba/e2081de779ca30d473f21f5b30e0e737c438205440784c7dfc81efc2b029/async_timeout-5.0.1-py3-none-any.whl.metadata
  Using cached async_timeout-5.0.1-py3-none-any.whl.metadata (5.1 kB)
Downloading redis-6.3.0-py3-none-any.whl (280 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hUsing cached async_timeout-5.0.1-py3-none-any.whl (6.2 kB)
Installing collected packages: async-timeout, redis
Successfully installed async-timeout-5.0.1 redis-6.3.0
Note: you may need to restart the k

In [6]:
%pip install dotenv

Collecting dotenv
  Obtaining dependency information for dotenv from https://files.pythonhosted.org/packages/b2/b7/545d2c10c1fc15e48653c91efde329a790f2eecfbbf2bd16003b5db2bab0/dotenv-0.9.9-py2.py3-none-any.whl.metadata
  Downloading dotenv-0.9.9-py2.py3-none-any.whl.metadata (279 bytes)
Collecting python-dotenv (from dotenv)
  Obtaining dependency information for python-dotenv from https://files.pythonhosted.org/packages/5f/ed/539768cf28c661b5b068d66d96a2f155c4971a5d55684a514c1a0e0dec2f/python_dotenv-1.1.1-py3-none-any.whl.metadata
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Downloading dotenv-0.9.9-py2.py3-none-any.whl (1.9 kB)
Downloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv, dotenv
Successfully installed dotenv-0.9.9 python-dotenv-1.1.1
Note: you may need to restart the kernel to use updated packages.


In [27]:
import os
from dotenv import load_dotenv
load_dotenv()
redis_password = os.environ.get('REDIS_PASSWORD')

In [28]:
import redis

r = redis.Redis(
    host='redis-10354.c295.ap-southeast-1-1.ec2.redns.redis-cloud.com',
    port=10354,
    decode_responses=True,
    username="admin",
    password=redis_password,
)

success = r.set('foo', 'bar')
# True

result = r.get('foo')
print(result)
# >>> bar

bar


In [4]:
# Either use the code provided from Step 5 above or the code below to connect to your Redis database.
# Make sure to replace <REDIS-URL> and <YOUR-PASSWORD> with your actual Redis database URL and password.
# If you are using the code from Step5, you can skip this section.
import redis

r = redis.Redis(
  host='redis-10354.c295.ap-southeast-1-1.ec2.redns.redis-cloud.com:10354', # E.g.'redis-10908.c252.ap-southeast-1-1.ec2.cloud.redislabs.com'
  port=10908,
  password='*******' 
)

A Redis database holds `key:value pairs` and supports commands such as GET, SET, and DEL, as well as several hundred additional commands.

- Redis keys are always strings.
- Redis values may be a number of different data types. Some of the more essential value data types are- string, list, hashes, and sets. Some advanced types include geospatial items and stream.

Many Redis commands operate in constant O(1) time, just like retrieving a value from a Python dict or any hash table.

Let's create a new key called `'name'` with the value `'Aaron'`.

In [10]:
print(r)

<redis.client.Redis(<redis.connection.ConnectionPool(<redis.connection.Connection(db=0,username=admin,password=g5um2DeN9s!4!D3,socket_timeout=None,encoding=utf-8,encoding_errors=strict,decode_responses=True,retry_on_error=[],retry=<redis.retry.Retry object at 0x76c8a2b002e0>,health_check_interval=0,client_name=None,lib_name=redis-py,lib_version=6.3.0,redis_connect_func=None,credential_provider=None,protocol=2,host=redis-10354.c295.ap-southeast-1-1.ec2.redns.redis-cloud.com,port=10354,socket_connect_timeout=None,socket_keepalive=None,socket_keepalive_options=None)>)>)>


In [11]:
r.set('name', 'Aaron')

True

Read the value of the key `'name'`:

In [10]:
r.get('name')

'Aaron'

We can update the value with `.set` too:

In [12]:
r.set('name', 'Bob')

True

In [12]:
r.get('name')

'Aaron'

> Set a key `age` with value of `20`.
>
> Then read the value.

To push a list, you need to use `rpush`:

In [13]:
r.rpush("names", "Aaron", "Bob", "Charlie")

6

In [14]:
r.lindex("names", 1)

'Bob'

You can use `mset` to set multiple keys at once.

In [15]:
r.mset({
    "name": "John",
    "age": 30,
})

True

In [16]:
r.mget("name", "age")

['John', '30']

Redis `hashes` are record types structured as collections of field-value pairs. You can use hashes to represent basic objects.

```python
# Create a new hash with my name as the key
r.hset(
    'zane lim',
    mapping={
        "age": 21,
        "email": "zl@gmail.com",
        "hobby": "coding",
    },
)
```

Then get the hash nested value back:


In [17]:
r.hset(
    'zane lim',
    mapping={
        "age": 21,
        "email": "zl@gmail.com",
        "hobby": "coding",
    },
)

0

In [18]:
r.hget("zane lim", "email")

'zl@gmail.com'

Get the object back as a dictionary:

In [19]:
r.hgetall("zane lim")

{'age': '21', 'email': 'zl@gmail.com', 'hobby': 'coding'}

> Create a new hash with your name as the key, and a mapping of `age`, `email`, `hobby`.

Always a good practice to shutdown your Redis cluster if not going to be used in future. Click into your DB and hit `Delete`. See this [screenshot](../assets/redis_terminate_db.png) for a guide.

## Google Cloud Storage

Google Cloud Storage is an Object Storage service in Google Cloud.

### Bucket
- A bucket is a container for objects stored in Google Cloud Storage.
- Every object is contained in a bucket.
- Each bucket is associated with a project.
- A bucket has a unique name across all of Google Cloud Storage.

### Object
- An object is a piece of data, such as a file, that is stored in Google Cloud Storage.
- An object is also called a `blob` (binary large object) in Google Cloud Storage. 
- An object is composed of the object's data and its metadata. 
- Metadata is a collection of name-value pairs that describe the object. You can use metadata to search for objects.

We will be using the `google-cloud-storage` python library to fetch data from the public [Landsat Collection 1](https://console.cloud.google.com/storage/browser/gcp-public-data-landsat;tab=objects?prefix=&forceOnObjectsSortingFiltering=false) dataset demonstrated just now.

In [22]:
%pip install google-cloud-storage

Collecting google-cloud-storage
  Obtaining dependency information for google-cloud-storage from https://files.pythonhosted.org/packages/be/48/823ce62cf29d04db6508971a0db13a72c1c9faf67cea2c206b1c9c9f1f02/google_cloud_storage-3.2.0-py3-none-any.whl.metadata
  Downloading google_cloud_storage-3.2.0-py3-none-any.whl.metadata (13 kB)
Collecting google-auth<3.0.0,>=2.26.1 (from google-cloud-storage)
  Obtaining dependency information for google-auth<3.0.0,>=2.26.1 from https://files.pythonhosted.org/packages/17/63/b19553b658a1692443c62bd07e5868adaa0ad746a0751ba62c59568cd45b/google_auth-2.40.3-py2.py3-none-any.whl.metadata
  Using cached google_auth-2.40.3-py2.py3-none-any.whl.metadata (6.2 kB)
Collecting google-api-core<3.0.0,>=2.15.0 (from google-cloud-storage)
  Obtaining dependency information for google-api-core<3.0.0,>=2.15.0 from https://files.pythonhosted.org/packages/14/4b/ead00905132820b623732b175d66354e9d3e69fcf2a5dcdab780664e7896/google_api_core-2.25.1-py3-none-any.whl.metadata
 

In [20]:
from google.cloud import storage

In [21]:
client = storage.Client()

In [22]:
bucket = client.get_bucket('gcp-public-data-landsat')

Note that you need to do `gcloud auth application-default login` to run the cell above. 

If the error persists, you may also need to restart the kernel (in VSCode, click the `Restart` button).

Get bucket metadata:

In [23]:
print("Bucket name: {}".format(bucket.name))
print("Bucket location: {}".format(bucket.location))
print("Bucket storage class: {}".format(bucket.storage_class))

Bucket name: gcp-public-data-landsat
Bucket location: US
Bucket storage class: STANDARD


List blobs in a bucket:

In [24]:
blobs = bucket.list_blobs()

print("Blobs in {}:".format(bucket.name))
for ix, item in enumerate(blobs):
    print("\t" + item.name)
    if ix == 50:
        break

Blobs in gcp-public-data-landsat:
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_ANG.txt
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B10.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B11.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B2.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B3.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B4.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B5.TIF
	LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B6.TIF
	LC08/01/

Get a blob and display metadata:

In [25]:
blob = bucket.get_blob("LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF")

print("Name: {}".format(blob.id))
print("Size: {} bytes".format(blob.size))
print("Content type: {}".format(blob.content_type))
print("Public URL: {}".format(blob.public_url))

Name: gcp-public-data-landsat/LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF/1502391058568908
Size: 75085385 bytes
Content type: application/octet-stream
Public URL: https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF


Download a blob to a local directory:

In [26]:
output_file_name = "../output/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF"
blob.download_to_filename(output_file_name)

print("Downloaded blob {} to {}.".format(blob.name, output_file_name))

Downloaded blob LC08/01/001/002/LC08_L1GT_001002_20160817_20170322_01_T2/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF to ../output/LC08_L1GT_001002_20160817_20170322_01_T2_B1.TIF.
