# Atlas Online Archive

[Atlas Online Archive](https://docs.atlas.mongodb.com/online-archive/manage-online-archive/) moves infrequently accessed immutable data from your Atlas cluster to MongoDB-managed read-only blob storage without user action. Once Atlas archives the data, you have a unified view of your Atlas and Online Archive data.

In this demo we will generate 1000 IoT events for the current year. Here's an example event:

```JSON
{
  '_id': ObjectId('5ef4ff46cf35f6a16e7f88a9'),
  'username': 'rogerrhodes',
  'remote_ipv4': '82.180.218.173',
  'httpMethod': 'PATCH',
  'hostName': 'desktop-51.freeman.net',
  'portNum': 52048,
  'location': {
    'type': 'Point',
    'coordinates': [
      Decimal128('-158.511919'),
      Decimal128('24.326279')
    ]
  },
  'dateAccessed': datetime.datetime(2020,  6,  15,  0,  0)
}
```

The events will be written to ```test.iot``` and Online Archive has been configured to achive documents whose ```dateAccessed``` field is older than 30 days:

<img src="./images/online_archive.png">


## Python Setup

In [82]:
# Imports
import time
import datetime
from timeit import default_timer as timer
import settings
from pymongo import MongoClient
from faker import Faker
from bson.decimal128 import Decimal128
import requests
from requests.auth import HTTPDigestAuth
import json


# Constants loaded from .env file
MDB_CONNECTION = settings.MDB_CONNECTION
MDB_CONNECTION_ARCHIVE = settings.MDB_CONNECTION_ARCHIVE
MDB_DATABASE = settings.MDB_DATABASE
MDB_COLLECTION = settings.MDB_COLLECTION
NUM_DOCS = settings.NUM_DOCS
API_PUBLIC_KEY = settings.API_PUBLIC_KEY
API_PRIVATE_KEY = settings.API_PRIVATE_KEY
PROJECT_ID = settings.PROJECT_ID
CLUSTER_NAME = settings.CLUSTER_NAME

## Some Helper Functions

In [85]:
def get_archive_id():
    url = "https://cloud.mongodb.com/api/atlas/v1.0/groups/" + PROJECT_ID +"/clusters/" + CLUSTER_NAME  +"/onlineArchives"
    resp = requests.get(url, auth=HTTPDigestAuth(API_PUBLIC_KEY, API_PRIVATE_KEY))

    if (resp.ok):

        archives = json.loads(resp.content)
        #print ("There are {0} online archive(s)".format(len(archives)))

        for archive in archives:

            # There can only be one online archive per collection.
            if (archive['dbName'] == MDB_DATABASE and archive['collName'] == MDB_COLLECTION):
                return archive['_id']
            else:
                return 0;
        
        return 0;

    else:
        print(resp)
        
def get_archive_state(id):
        
    url = "https://cloud.mongodb.com/api/atlas/v1.0/groups/" + PROJECT_ID +"/clusters/" + CLUSTER_NAME  +"/onlineArchives/" + str(id)
    resp = requests.get(url, auth=HTTPDigestAuth(API_PUBLIC_KEY, API_PRIVATE_KEY))

    if (resp.ok):
        return resp.json()['state']

    else:
        print(resp)
        
def print_row(count, source):
    formatted_count = str(count).rjust(5)
    print(" %-10s %45s" % (formatted_count, source))
    

## Archive Creation
To keep the archive clean between demos (so there's alsways 1000 documents between the cluster and the archive), the OnLine Archive API is used to re-create the archive before each demo.

In [86]:
## Delete Existing Archive 

archive_id = get_archive_id()

if (archive_id): 
    
    url = "https://cloud.mongodb.com/api/atlas/v1.0/groups/" + PROJECT_ID +"/clusters/" + CLUSTER_NAME  +"/onlineArchives/" + archive_id
    resp = requests.delete(url, auth=HTTPDigestAuth(API_PUBLIC_KEY, API_PRIVATE_KEY))

    if (resp.ok):
        time.sleep(3) # Allowance for deletion to complete
        print("Deleted existing archive")  

    else:
        print(resp)
        
else:
    print("No existing archive found for this demo")

Deleted existing archive


In [87]:
## Create New Archive

date_field = "dateAccessed"

url = "https://cloud.mongodb.com/api/atlas/v1.0/groups/" + PROJECT_ID +"/clusters/" + CLUSTER_NAME  +"/onlineArchives"

data = {
        "dbName": MDB_DATABASE,
        "collName": MDB_COLLECTION,
        "partitionFields": [
              {
                      "fieldType": "string",
                      "fieldName": "userName",
                      "order": 0
              }],    
        "criteria": {
              "dateField": date_field,
              "expireAfterDays": 30
          }
}
headers = {"content-type":"application/json"}
    
resp = requests.post(url, auth=HTTPDigestAuth(API_PUBLIC_KEY, API_PRIVATE_KEY), json=data, headers=headers)

if resp.ok:
    
    print("Archive Created")
    print(resp.json())
    
else:
   print(resp)

<Response [500]>


In [92]:
# Wait for archive to become active
archive_id = get_archive_id()

state = get_archive_state(archive_id) 
if (state == 'ACTIVE'):
    print(state)
else:
    print ("Waiting for archive to build")

while state != 'ACTIVE':
    state = get_archive_state(archive_id) 
    print(".", end="")    
    if (state == 'ACTIVE'):
        print ("\n" + state)
        break;
    else:
        time.sleep(1)

ACTIVE


## Event Generation
This script uses [Faker](https://faker.readthedocs.io/en/master/) to randomly generate 1000 IoT events.

In [103]:
fake = Faker()

# Start script
startTs = time.gmtime()
start = timer()
print("================================")
print("   Generating Sample IoT Data   ")
print("================================")
print("\nStarting " + time.strftime("%Y-%m-%d %H:%M:%S", startTs) + "\n")

print('NUM DOCS TO GENERATE: ' + str(NUM_DOCS))

mongo_client = MongoClient(MDB_CONNECTION)
db = mongo_client[MDB_DATABASE]
my_collection = db[MDB_COLLECTION]

# Remove the existing documents (don't drop the collection from underneath the archive)
my_collection.delete_many({})

for index in range(int(NUM_DOCS)):
    # create timestamp
    fake_timestamp = fake.date_this_year()

    # Define IoT Document
    my_iot_document = {
        "username": fake.user_name(),
        "remote_ipv4": fake.ipv4(),
        "httpMethod": fake.http_method(),
        "hostName": fake.hostname(),
        "portNum": fake.port_number(),
        "location": {
                "type": "Point",
                "coordinates": [
                    Decimal128(fake.longitude()),
                    Decimal128(fake.latitude())
                ]
        },
        "dateAccessed": datetime.datetime(fake_timestamp.year, fake_timestamp.month, fake_timestamp.day)
    }
    # print(my_iot_document)
    print(".", end="")
    my_collection.insert_one(my_iot_document)

# Indicate end of script
end = timer()
endTs = time.gmtime()
print("\nEnding " + time.strftime("%Y-%m-%d %H:%M:%S", endTs))
print('===============================')
print('Total Time Elapsed (in seconds): ' + str(end - start))
print('===============================')

   Generating Sample IoT Data   

Starting 2020-07-14 22:00:03

NUM DOCS TO GENERATE: 1000
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

## Cluster Query
While IoT events are generating, query the cluster's document count. 

In [99]:
mongo_client = MongoClient(MDB_CONNECTION)
db = mongo_client[MDB_DATABASE]
my_collection = db[MDB_COLLECTION]

cluster_count = my_collection.count_documents({})

while count == 1000:
    print ("Waiting for documents to archive")
    print(".", end="")

print("Archive has begun:")
cluster_count = my_collection.count_documents({})
print_row (count, "Total number of documents in the Atlas Cluster")

# While the document count is shrinking
while my_collection.count_documents({}) < cluster_count:
    print_row (count, "Total number of documents in the Atlas Cluster")
    time.sleep(1)
    

Archive has begun:
   158      Total number of documents in the Atlas Cluster


## Cluster and Online Archive

While the Atlas Cluster has some subset of the documents, there are still 1000 documents across the cluster and archive.

In [101]:
# Establish a connection to the Cluster and Online Archive
mongo_client_archive = MongoClient(MDB_CONNECTION_ARCHIVE)
archive_db = mongo_client_archive[MDB_DATABASE]
my_collection_archive = archive_db[MDB_COLLECTION]

cluster_count = my_collection.count_documents({'dateAccessed':{'$lt': archive_date}})
cluster_archive_count = my_collection_archive.count_documents({'dateAccessed':{'$lt': archive_date}})

print("Archive date (30 days ago): " + str(archive_date.date()))
print('')
print_row(my_collection.count_documents({}), "Total number of documents in the Atlas Cluster")
print_row(cluster_count, "Total number of documents in the Atlas Cluster older than 30 days")
print_row(cluster_archive_count, "Total number of documents across the Atlas Cluster and the Online Archive older than 30 days")
print('------')
print_row(my_collection_archive.count_documents({}), "Total number of documents across the Atlas Cluster and Online Archive")









Archive date (30 days ago): 2020-06-14

   156      Total number of documents in the Atlas Cluster
     0      Total number of documents in the Atlas Cluster older than 30 days
  1686      Total number of documents across the Atlas Cluster and the Online Archive older than 30 days
------
  1842      Total number of documents across the Atlas Cluster and Online Archive
