# Search OCI & Build Infrastructure Graph

## Import all necessary Libraries
* **OCI** - Python Library that converts Python Commands into OCI API Requests
* **JSON** - For JSON to Python Dictionary Conversion , File read , and File write
* **PPRINT** - Module for pretty printing all text while debugging
* **DATETIME** - Used to Convert String Datetime Inputs from OCI API into datetime objects for carrying out datetime arithmetic

In [1]:
import oci
import json
import pprint
import pandas as pd
import datetime
from operator import itemgetter, add
import asyncio
import time
import logging
import concurrent.futures as cf
import glob
import networkx as nx
from networkx.drawing.nx_agraph import graphviz_layout
import matplotlib.pyplot as plt
from flatten_dict import flatten
import numpy as np
import re

%load_ext nb_black
from IPython.display import display, Javascript, HTML

<IPython.core.display.Javascript object>

## Helper Functions
* Some of the helper functions written to make the code cleaner

In [2]:
from helpers import list_region_subscriptions
from helpers import fetch_compartment_heirarchy
from helpers import search_region_and_populate
from oci_clients import clients_init

<IPython.core.display.Javascript object>

## Define Resource List to Query
* Initialize Clients that will be leveraged
* Conditional variables - Interested in qeurying for active resources
* Using || Symbol to make sure we fetch resources either in Active, Running or Available State

## Supported Resources in Search
* [List of Supported Resources](https://docs.cloud.oracle.com/en-us/iaas/Content/Search/Concepts/queryoverview.htm#resourcetypes)

In [3]:
resourcetype_list = [
    "instance",
    "image",
    "dbsystem",
    "vmcluster",
    "odainstance",
    "bootvolume",
    "bootvolumebackup",
    "volumebackup",
    "volumebackuppolicy",
    "volume",
    "datascienceproject",
    "datasciencemodel",
    "datasciencenotebooksession",
    "datacatalog",
    "analyticsinstance",
    "autonomousdatabase",
    "integrationinstance",
    "vcn",
    "subnet",
    "vnic",
    "securitylist",
    "routetable",
    "natgateway",
    "servicegateway",
    "onstopic",
    "onssubscription",
    "stream",
    "connectharness",
    "bucket",
    "vault",
    "filesystem",
    "apigateway",
    "apideployment",
    "compartment",
    "group",
    "identityprovider",
    "idpgroupmapping",
    "policy",
    "tagdefault",
    "tagnamespace",
    "user",
]
condition_list = [
    "lifecycleState = 'RUNNING'",
    "lifecycleState = 'AVAILABLE'",
    "lifecycleState = 'ACTIVE'",
]
resourceString = (", ").join(resourcetype_list)
conditionString = (" || ").join(condition_list)

<IPython.core.display.Javascript object>

## Supported Clients & Resources
- Search provides the list of OCIDs based on the query string 
- Use the OCIDs to drill down into the resources further to understand your tenancy better
 - **Identity Client** - To understand how many regions ( Data Center Geographies, the tenancy is subscribed to )
 - **Search Client** - To fetch all resources that satisfy query conditions
 - **Compute Client** - To Drill down into compute resources for Compute/VM/Bare metal server specific Information
 - **Database Client** -  To Drill down into database resources for Database Specific Information 
 - **Analytics Client** - To Drill down into analytics resources for Analytics Instance Specific Information
 - **VCN Client** - To Drill down information of VCN, Subnets, LPGs, DRGs, Load Balancers etc. 
 - **Notifications Client** - To Drill down on Information on Notification Topics, Subscriptions etc. 
 - **API-GW Client** - To Drill down on API Gateway & API Deployments.
 - **Block Storage Client** - To Drill down on Block Volumes and Boot Volumes.
 - **Object Storage Client** - To Drill down on Object Storage Solution.
 - **Streams Client** - To Drill down on Streams, kafka Connect harness etc. 

## Setup the OCI Config
* Read the OCI Config from the default Path / provide the path where the config file is available

In [4]:
config = oci.config.from_file()
tenancy_id = config["tenancy"]

<IPython.core.display.Javascript object>

## List all Regions Subscribed in Tenancy
 - **Search** endpoints are regional and hence we will iterate asynchronously over all regions

In [5]:
region_names = list_region_subscriptions(config)

'Fetching all regions in tenancy'
("List of regions subscribed to : ['ap-mumbai-1', 'eu-frankfurt-1', "
 "'ap-hyderabad-1', 'us-phoenix-1']")


<IPython.core.display.Javascript object>

## Fetch Compartment Heirarchy
 - **Compartment** is an IAM resource and is Global
 - **Compartment_KV** - Lookup table between Compartment OCID and Compartment name
 - **Compartment Parent OCID  KV** - Lookup table between Compartment OCID, Parent Compartment OCID

In [6]:
compartment_kv, compartment_parent_ocid_kv = fetch_compartment_heirarchy(config)

'Populate Compartment Herirachies in Tenancy'


<IPython.core.display.Javascript object>

## Mark Start of Execution
Print Start Time and end time for populating the entire tenancy tree

In [7]:
pprint.pprint(
    "Start Time : {}".format(time.strftime("%b %d %Y %H:%M:%S", time.localtime()))
)

'Start Time : Apr 15 2020 06:10:50'


<IPython.core.display.Javascript object>

## Search and Populate
The Search Region and Populate Function has two stages
1. Fetch the search results based on search query provided
2. Use the Search result to drill down resource specific information
3. Consolidate both and populate a JSON Tree . 

In [8]:
executor = cf.ThreadPoolExecutor(max_workers=20,)
returnFlag = [
    search_region_and_populate(
        executor,
        config,
        region_name,
        resourceString,
        conditionString,
        compartment_kv,
        compartment_parent_ocid_kv,
    )
    for region_name in region_names
]

<IPython.core.display.Javascript object>

## Asynchronous Parallelism
- Concurrent Futures is used to execute all mutually exclusive tasks as separate Threads
- AsyncIO.gather is used to concurrently spin up asynchronous non-blocking calls to multiple API endpoints. 

In [9]:
await asyncio.gather(*(returnFlag))

'Initializing Resource Specific Clients & Regions '
'Initialize Identity Client in Region : ap-mumbai-1'
'Initialize Search Client in Region : ap-mumbai-1'
'Initialize Compute Client in Region : ap-mumbai-1'
'Initialize DB Client in Region : ap-mumbai-1'
'Initialize Analytics Client in Region : ap-mumbai-1'
'Initialize Networking Client in Region : ap-mumbai-1'
'Initialize Data Science Client in Region : ap-mumbai-1'
'Initialize Block Storage Client in Region : ap-mumbai-1'
'Initialize Object Storage Client in Region : ap-mumbai-1'
'Initialize Notifications Client in Region : ap-mumbai-1'
'Initialize API-GW Client in Region : ap-mumbai-1'
'Initialize Streaming Client in Region : ap-mumbai-1'
'Initialize Functions Client in Region : ap-mumbai-1'
'Initialize Integration Client in Region : ap-mumbai-1'
'Initialize Vaults Client in Region : ap-mumbai-1'
'Initialize Oracle Digital Assistant Client in Region : ap-mumbai-1'
'Initialize Data Catalog Client in Region : ap-mumbai-1'
'Initialize 

[None, None, None, None]

## Mark End of Execution
Print End Time for populating the entire tenancy tree

In [10]:
pprint.pprint(
    "End Time : {}".format(time.strftime("%b %d %Y %H:%M:%S", time.localtime()))
)

'End Time : Apr 15 2020 06:11:34'


<IPython.core.display.Javascript object>

## Create NetworkX Graph
Load JSON Objects and Make Network Graph

In [11]:
def extract_value_by_field(obj, key):
    """Pull all values of specified key from nested JSON."""
    arr = []

    def extract(obj, arr, key):
        """Recursively search for values of key in JSON tree."""
        if isinstance(obj, dict):
            for k, v in obj.items():
                if isinstance(v, (dict, list)):
                    extract(v, arr, key)
                elif k == key:
                    if v is not None:
                        arr.append(v)
                    else:
                        arr.append("None")
        elif isinstance(obj, list):
            for searchResult in obj:
                extract(searchResult, arr, key)
        elif isinstance(obj, type(None)):
            arr.append("None")

        return arr

    results = extract(obj, arr, key)
    return results

<IPython.core.display.Javascript object>

## Extract Resource Dict
* Every Resource is a JSON Dictionary
* Every Attribute is a Key Value Pair
* Extract Key and Value Pair to add as elements to Graph

In [12]:
def extract_resource_dict(
    dict_list, resourceType="Compartment", key="resource_ocid", value="display_name"
):
    resList = list(filter(lambda d: d["resource_type"] == resourceType, dict_list))
    value_list = list(map(itemgetter(value), resList))
    key_list = list(map(itemgetter(key), resList))
    kv_dict = dict(zip(key_list, value_list))
    return kv_dict

<IPython.core.display.Javascript object>

In [14]:
def create_dict_list():
    resource_list = []
    for file_name in glob.glob("./region_distribution*"):
        with open(file_name) as f:
            resource_list.extend(json.load(f))
    return resource_list

<IPython.core.display.Javascript object>

## Aggregate Resources from Multiple Regions
* Every region is a JSON File
* JSON File when parsed yields a list of dictionaries
* Every element in the JSON File is a Dict.

In [20]:
dict_list = create_dict_list()

# Compartment Dictionaries
compartment_dict = extract_resource_dict(
    dict_list, "Compartment", key="display_name", value="resource_ocid"
)

#Substitute with your tenancy name
compartment_dict.update({"apacinset01": tenancy_id})


# VNIC Dictionaries
vnic_dict = extract_resource_dict(
    dict_list, "Vnic", key="resource_ocid", value="display_name"
)
vnic_subnet_dict = extract_resource_dict(
    dict_list, "Vnic", key="resource_ocid", value="subnet_id"
)

# Subnet Dictionaries
subnet_dict = extract_resource_dict(dict_list, "Subnet")
subnet_vcn_dict = extract_resource_dict(
    dict_list, "Subnet", key="resource_ocid", value="vcn_id"
)
subnet_securitylist_dict = extract_resource_dict(
    dict_list, "Subnet", key="resource_ocid", value="security_list_ids"
)
subnet_routetable_dict = extract_resource_dict(
    dict_list, "Subnet", key="resource_ocid", value="route_table_id"
)

# Route Table Dictionaries
routetable_dict = extract_resource_dict(
    dict_list, "RouteTable", key="resource_ocid", value="display_name"
)
routetable_vcn_dict = extract_resource_dict(
    dict_list, "RouteTable", key="resource_ocid", value="vcn_id"
)

#Security List Dictionaries
securitylist_dict = extract_resource_dict(dict_list, "SecurityList")
securitylist_vcn_dict = extract_resource_dict(
    dict_list, "SecurityList", key="resource_ocid", value="vcn_id"
)
vcn_dict = extract_resource_dict(dict_list, "Vcn")

# Compute Instance
instance_dict = extract_resource_dict(
    dict_list, "Instance", key="resource_ocid", value="display_name"
)
instance_vnic_dict = extract_resource_dict(
    dict_list, "Instance", key="resource_ocid", value="vnic_attachments"
)

# Boot Volume Dictionary
instance_bootvolume_dict = extract_resource_dict(
    dict_list, "Instance", key="resource_ocid", value="boot_volume_attachments"
)

instance_volume_dict = extract_resource_dict(
    dict_list, "Instance", key="resource_ocid", value="volume_attachments"
)

# Volume Dictionary
volume_dict = extract_resource_dict(
    dict_list, "Volume", key="resource_ocid", value="display_name"
)
bootvolume_dict = extract_resource_dict(
    dict_list, "BootVolume", key="resource_ocid", value="display_name"
)

# User Group and Group Membership Dictionaries
user_dict = extract_resource_dict(
    dict_list, "User", key="resource_ocid", value="display_name"
)

reverse_user_dict =  extract_resource_dict(dict_list, "User", key="display_name", value="resource_ocid")

group_dict = extract_resource_dict(
    dict_list, "Group", key="resource_ocid", value="display_name"
)

groupmembership_dict = extract_resource_dict(dict_list, "User", key="resource_ocid", value = "group_memberships")


# Volume Backup Dictionaries
volumebackup_dict = extract_resource_dict(
    dict_list, "VolumeBackup", key="resource_ocid", value="display_name"
)
volumebackup_volume_dict = extract_resource_dict(
    dict_list, "VolumeBackup", key="resource_ocid", value="volume_id"
)

# DB System Dictionaries
dbsystem_dict = extract_resource_dict(
    dict_list, "DbSystem", key="resource_ocid", value="display_name"
)
dbsystem_subnet_dict = extract_resource_dict(
    dict_list, "DbSystem", key="resource_ocid", value="subnet_id"
)


{'beta-user': 'ocid1.user.oc1..aaaaaaaacq4ibaheldx6x2r6bxdvfpayjwhpbvhwp3cctg7kotuzw5hdlbka',
 'karl.miller@oracle.com': 'ocid1.user.oc1..aaaaaaaat7rzc3s5zer5bvii2ajwctd2qad5rju3zgu5rbitugxmwjrugbqa',
 'oracleidentitycloudservice/Jhan.han@oracle.com': 'ocid1.user.oc1..aaaaaaaagw3pm4ljr7pnh4hkmnizh5xx5sxoyrlax6gjzcmgbkno5ybxjzpa',
 'oracleidentitycloudservice/amey.marathe@oracle.com': 'ocid1.user.oc1..aaaaaaaastpv23muod3ech2cm3bd3b7j42ksl3gqinkxa5ayqkc7lu7xdd3a',
 'oracleidentitycloudservice/amit.r.jha@oracle.com': 'ocid1.user.oc1..aaaaaaaasdk4gwfkpllrkodeb7a33o4zmdkdcbx3lp73hd7gxyvcv23mcbra',
 'oracleidentitycloudservice/amit.shivpuri@oracle.com': 'ocid1.user.oc1..aaaaaaaaolrpmo6tlpl7zm47seonbinsf65tywaukksvz2o4c5c5cnzl2ijq',
 'oracleidentitycloudservice/anuj.m.mittal@oracle.com': 'ocid1.user.oc1..aaaaaaaalgmj4ffbd3w3zezvf3rttuwasdg3ctzrojz5vr42acx3a5zegmzq',
 'oracleidentitycloudservice/anurag.mittal@oracle.com': 'ocid1.user.oc1..aaaaaaaa55duv2ceov7fpmdx43owbpvp3ms3dclgnoposuevrlnp5pw

<IPython.core.display.Javascript object>

## Populate Graph
* Every Resource is a Node
* Resource -> Resource Relationship as an edge

In [21]:
g = nx.DiGraph()

for d in dict_list:
    flattened_dict = flatten(d, reducer = 'underscore')
    g.add_node(node_for_adding=d["resource_ocid"], DisplayName = flattened_dict["display_name"], ResourceType = flattened_dict["resource_type"],CompartmentName = flattened_dict["compartment_name"], CreatedOn = flattened_dict["CreatedOn"], CreatedBy = flattened_dict["CreatedBy"], )


for d in dict_list:
    g.add_edge(
        compartment_dict[d["compartment_name"]],
        d["resource_ocid"],
        CompartmentName = d["compartment_name"],)
    if d["CreatedBy"] != 'scim-service':
        d['CreatedBy']
        g.add_edge(reverse_user_dict[d['CreatedBy']], d['resource_ocid'], CreatedBy = d['CreatedBy'])

for d in dict_list:
    if d["resource_type"] == "Instance":
        # For each VNIC attached to Instance
        for element in instance_vnic_dict[d["resource_ocid"]]:
            g.add_edge(
                d["resource_ocid"],
                element["vnic_id"],
                VnicName = vnic_dict[element["vnic_id"]],     
            )

        # Boot Volume attached to Instance
        for element in instance_bootvolume_dict[d["resource_ocid"]]:
            g.add_edge(
                d["resource_ocid"],
                element["boot_volume_id"],
                BootVolumeName = bootvolume_dict[element["boot_volume_id"]],
            )

        # Block Volume attached to Instance
        for element in instance_volume_dict[d["resource_ocid"]]:
            g.add_edge(
                d["resource_ocid"],
                element["volume_id"],
                VolumeName =volume_dict[element["volume_id"]],
            )

    elif d["resource_type"] == "Vnic":
        # Subnet that the VNIC Belongs to
        g.add_edge(
            vnic_subnet_dict[d["resource_ocid"]],
            d["resource_ocid"],
            SubnetName=subnet_dict[vnic_subnet_dict[d["resource_ocid"]]],
        )

    if d["resource_type"] == "Subnet":
        # VCN that the Subnet belongs to .
        g.add_edge(
            subnet_vcn_dict[d["resource_ocid"]],
            d["resource_ocid"],
            VcnName=vcn_dict[subnet_vcn_dict[d["resource_ocid"]]],
        )

        # Route table attached to Subnet
        g.add_edge(
            subnet_routetable_dict[d["resource_ocid"]],
            d["resource_ocid"],
            RouteTableName = routetable_dict[subnet_routetable_dict[d["resource_ocid"]]],
        )

        # For each Security List attached to Subnet
        for element in subnet_securitylist_dict[d["resource_ocid"]]:
            g.add_edge(
                element,
                d["resource_ocid"],
                SubnetName=securitylist_dict[element],
            )

    #For Each Security List That Belongs to a VCN
    elif d["resource_type"] == "SecurityList":
        g.add_edge(
            securitylist_vcn_dict[d["resource_ocid"]],
            d["resource_ocid"],
            VcnName = vcn_dict[securitylist_vcn_dict[d["resource_ocid"]]],
        )

    # For Each Route Table That Belongs to a VCN
    elif d["resource_type"] == "RouteTable":
        g.add_edge(
            routetable_vcn_dict[d["resource_ocid"]],
            d["resource_ocid"],
            VcnName =vcn_dict[routetable_vcn_dict[d["resource_ocid"]]],
        )
        
    elif d["resource_type"] == "VolumeBackup":
        g.add_edge(
            volumebackup_volume_dict[d["resource_ocid"]],
            d["resource_ocid"],
            VolumeName =volume_dict[volumebackup_volume_dict[d["resource_ocid"]]],
        )
    elif d['resource_type'] == "DbSystem":
        g.add_edge(
            dbsystem_subnet_dict[d["resource_ocid"]],
            d["resource_ocid"],
            SubnetName =subnet_dict[dbsystem_subnet_dict[d["resource_ocid"]]],
        )

<IPython.core.display.Javascript object>

### Print Graph Information

In [22]:
print(nx.info(g))
density = nx.density(g)
print("Network density:", density)

Name: 
Type: DiGraph
Number of nodes: 1512
Number of edges: 3123
Average in degree:   2.0655
Average out degree:   2.0655
Network density: 0.001366959755444203


<IPython.core.display.Javascript object>

# Export Network Graph to GraphML

In [23]:
nx.write_graphml(g, 'g.graphml')

<IPython.core.display.Javascript object>

## Use Network Analysis to Find Neighbors

In [None]:
n = g.neighbors("ocid1.vcn.oc1.phx.amaaaaaa43cggciahbua26k7q5ly5arbjxfnibbapudkaiyn5h6spyhofs6q")
for element in n :
    print(list(g.neighbors(element)))

# For Non-Graph Analytics

## Data Load into Data Frame
- Load JSON Data and Append it to Data Frame 

In [None]:
resource_dist_df = pd.DataFrame()
for file_name in glob.glob('./region_distribution*'):
    temp = pd.read_json(file_name)
    resource_dist_df = resource_dist_df.append(temp, ignore_index=True, sort=False)       
pprint.pprint(resource_dist_df.head())

## Sample Data Manipulation to Calculate Number of Active Days 
- A very important factor in our tenancy was to limit the number of active days and turn of resources as required.
- Hence this calculation

In [None]:
resource_dist_df["Qty"] = 1
resource_dist_df["TimeNow"] = datetime.datetime.now(datetime.timezone.utc)
resource_dist_df["CreatedOn"] = pd.to_datetime(resource_dist_df["CreatedOn"])
resource_dist_df["ActiveForDays"] = resource_dist_df["TimeNow"].sub(
    resource_dist_df["CreatedOn"], axis=0
)

## Further Manipulation
* Convert the Results to String for Easy Visualization

In [None]:
resource_dist_df["TimeNow"] = resource_dist_df["TimeNow"].astype(str)
resource_dist_df["CreatedOn"] = resource_dist_df["CreatedOn"].astype(str)
resource_dist_df["ActiveForDays"] = resource_dist_df["ActiveForDays"].astype(str)

## Export Data to 
* JSON
* CSV

In [None]:
csv = resource_dist_df.to_csv("processed_data.csv", sep=",", header=True, index=True)
out = resource_dist_df.to_json(orient="index")
with open("processed_json.json", "w") as f:
    f.write(out)