<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Retrieve-inspection-jobs-from-Ceph" data-toc-modified-id="Retrieve-inspection-jobs-from-Ceph-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Retrieve inspection jobs from Ceph</a></span></li><li><span><a href="#Describe-the-structure-of-an-inspection-job-result" data-toc-modified-id="Describe-the-structure-of-an-inspection-job-result-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Describe the structure of an inspection job result</a></span></li><li><span><a href="#Mapping-InspectionRun-JSON-to-pandas-DataFrame" data-toc-modified-id="Mapping-InspectionRun-JSON-to-pandas-DataFrame-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Mapping InspectionRun JSON to pandas DataFrame</a></span></li><li><span><a href="#Plot-InspectionRun" data-toc-modified-id="Plot-InspectionRun-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Plot InspectionRun</a></span></li><li><span><a href="#Wrapper" data-toc-modified-id="Wrapper-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Wrapper</a></span></li></ul></div>

# Amun InspectionRun Analysis

**Introduction**

The goal of this notebook is to...

---

## Retrieve inspection jobs from Ceph

In [2]:
%env THOTH_DEPLOYMENT_NAME     thoth-core-upshift-stage
%env THOTH_CEPH_BUCKET         thoth
%env THOTH_CEPH_BUCKET_PREFIX  data/thoth
%env THOTH_S3_ENDPOINT_URL     https://s3.upshift.redhat.com/

env: THOTH_DEPLOYMENT_NAME=thoth-core-upshift-stage
env: THOTH_CEPH_BUCKET=thoth
env: THOTH_CEPH_BUCKET_PREFIX=data/thoth
env: THOTH_S3_ENDPOINT_URL=https://s3.upshift.redhat.com/


In [3]:
from thoth.storages import InspectionResultsStore

inspection_store = InspectionResultsStore()
inspection_store.connect()

In [318]:
doc_id, doc = next(inspection_store.iterate_results())  # sample

# build log is unnecessary for our purposes and it is demanding to display it
doc['build_log'] = None
doc

{'build_log': None,
 'created': '2019-05-10T12:25:54.606897',
 'inspection_id': 'inspection-004187facf477500',
 'job_log': {'exit_code': 0,
  'hwinfo': {'cpu': {'has_3dnow': False,
    'has_3dnowext': False,
    'has_Altivec': None,
    'has_f00f_bug': None,
    'has_fdiv_bug': None,
    'has_mmx': True,
    'has_sse': True,
    'has_sse2': True,
    'has_sse3': True,
    'has_ssse3': True,
    'is_32bit': None,
    'is_64bit': None,
    'is_AMD': False,
    'is_AMD64': False,
    'is_Alpha': None,
    'is_Athlon64': False,
    'is_AthlonHX': False,
    'is_AthlonK6': False,
    'is_AthlonK6_2': False,
    'is_AthlonK6_3': False,
    'is_AthlonK7': False,
    'is_AthlonMP': False,
    'is_Celeron': False,
    'is_Core2': None,
    'is_EV4': None,
    'is_EV5': None,
    'is_EV56': None,
    'is_Hammer': False,
    'is_Intel': True,
    'is_Itanium': None,
    'is_Nocona': False,
    'is_Opteron': False,
    'is_PCA56': None,
    'is_Pentium': False,
    'is_PentiumII': False,
    'is_P

---

## Describe the structure of an inspection job result

In [5]:
import pandas as pd
pd.set_option('max_colwidth', 800)

In [310]:
def extract_structure_json(input_json, upper_key: str, level: int, json_structure):
    """Convert a json file structure into a nested list showing keys depths"""
    level += 1
    for key in input_json.keys():
        if type(input_json[key]) is dict:
            json_structure.append([level, upper_key, key, [k for k in input_json[key].keys()]])
            
            extract_structure_json(input_json[key], f"{upper_key}__{key}", level, json_structure)
        else:
            json_structure.append([level, upper_key, key, input_json[key]])
    return json_structure

def filter_dataframe(json_pandas, filter_df):
    """Filter the dataframe for a certain key, combination of keys or for a tree depth"""
    if type(filter_df) is str:
        available_keys = set(df["Current_key"].values)
        available_combined_keys = set(df["Upper_keys"].values)
        if filter_df in available_keys or filter_df in available_combined_keys:
            ndf = df[df["Upper_keys"].str.contains(filter_df)]
        else:
            print("The key is not in the json")
            ndf = "". join([f"The available keys are (WARNING: Some of the keys have no leafs): {available_keys} ", f"The available combined keys are: {available_combined_keys}"])
            
    elif type(filter_df) is int:
        max_depth = df["Tree_depth"].max()
        if filter_df <= max_depth:
            ndf = df[df["Tree_depth"] == filter_df]
        else:
            ndf = f"The maximum tree depth available is: {max_depth}"
    return ndf

In [319]:
#Create the dataframe
df = pd.DataFrame(extract_structure_json(doc,"", 0, []))
df.columns = ["Tree_depth", "Upper_keys", "Current_key", "Value"]

We can take a look at the inspection job structure from the point of view of the tree depth, considering a key or a combination of keys.

In [320]:
filter_dataframe(df, 1)

Unnamed: 0,Tree_depth,Upper_keys,Current_key,Value
0,1,,build_log,
1,1,,created,2019-05-10T12:25:54.606897
2,1,,inspection_id,inspection-004187facf477500
3,1,,job_log,"[exit_code, hwinfo, script_sha256, stderr, stdout]"
83,1,,specification,"[base, build, files, packages, python, run, script]"
165,1,,status,"[build, job]"


In [317]:
filter_dataframe(df, 2)

Unnamed: 0,Tree_depth,Upper_keys,Current_key,Value
4,2,__job_log,exit_code,0
5,2,__job_log,hwinfo,"[cpu, platform]"
71,2,__job_log,script_sha256,8000affa84b9cd3eb1041046e0c45f4984b2b582cfacf404ebca142a958db488
72,2,__job_log,stderr,"DTYPE set to float32\nDEVICE set to cpu\nREPS set to 20000\nMATRIX size set to 512\n# Version: 1.9.0, path: ['/home/amun/.local/share/virtualenvs/amun-B5uv3Ni-/lib/python3.6/site-packages/tensorflow']\n512 x 512 matmul took: \t369851.1822 ms,\t 0.00 GFLOPS\nFailed to obtain AICoE specific build information for TensorFlow\nTraceback (most recent call last):\n File ""/home/amun/script"", line 43, in _get_aicoe_tensorflow_build_info\n with open(build_info_path, 'r') as build_info_file:\nFileNotFoundError: [Errno 2] No such file or directory: '/home/amun/.local/share/virtualenvs/amun-B5uv3Ni-/lib/python3.6/site-packages/tensorflow-1.9.0.dist-info/build_info.json'\n"
73,2,__job_log,stdout,"[@parameters, @result, tensorflow_buildinfo]"
84,2,__specification,base,fedora:29
85,2,__specification,build,[requests]
94,2,__specification,files,[]
95,2,__specification,packages,"[pipenv, which, python36]"
96,2,__specification,python,"[requirements, requirements_locked]"


In [280]:
filter_dataframe(df, 3)

Unnamed: 0,Tree_depth,Upper_keys,Current_key,Value
6,3,__job_log__hwinfo,cpu,"[has_3dnow, has_3dnowext, has_Altivec, has_f00f_bug, has_fdiv_bug, has_mmx, has_sse, has_sse2, has_sse3, has_ssse3, is_32bit, is_64bit, is_AMD, is_AMD64, is_Alpha, is_Athlon64, is_AthlonHX, is_AthlonK6, is_AthlonK6_2, is_AthlonK6_3, is_AthlonK7, is_AthlonMP, is_Celeron, is_Core2, is_EV4, is_EV5, is_EV56, is_Hammer, is_Intel, is_Itanium, is_Nocona, is_Opteron, is_PCA56, is_Pentium, is_PentiumII, is_PentiumIII, is_PentiumIV, is_PentiumM, is_PentiumMMX, is_PentiumPro, is_Power, is_Power7, is_Power8, is_Power9, is_Prescott, is_XEON, is_Xeon, is_i386, is_i486, is_i586, is_i686, is_singleCPU, nbits, ncpus, not_impl, try_call]"
63,3,__job_log__hwinfo,platform,"[architecture, machine, node, platform, processor, release, version]"
74,3,__job_log__stdout,@parameters,"[device, dtype, matrix_size, reps]"
79,3,__job_log__stdout,@result,"[elapsed, rate]"
82,3,__job_log__stdout,tensorflow_buildinfo,
86,3,__specification__build,requests,"[cpu, hardware, memory]"
97,3,__specification__python,requirements,"[dev-packages, packages, requires, source]"
104,3,__specification__python,requirements_locked,"[_meta, default, develop]"
156,3,__specification__run,requests,"[cpu, hardware, memory]"
167,3,__status__build,container,56031b71f362a938ee9858eecad0d83fe01d72c4c116ad44244900ee7bc77153


In [281]:
filter_dataframe(df, 4)

Unnamed: 0,Tree_depth,Upper_keys,Current_key,Value
7,4,__job_log__hwinfo__cpu,has_3dnow,False
8,4,__job_log__hwinfo__cpu,has_3dnowext,False
9,4,__job_log__hwinfo__cpu,has_Altivec,
10,4,__job_log__hwinfo__cpu,has_f00f_bug,
11,4,__job_log__hwinfo__cpu,has_fdiv_bug,
12,4,__job_log__hwinfo__cpu,has_mmx,True
13,4,__job_log__hwinfo__cpu,has_sse,True
14,4,__job_log__hwinfo__cpu,has_sse2,True
15,4,__job_log__hwinfo__cpu,has_sse3,True
16,4,__job_log__hwinfo__cpu,has_ssse3,True


In [282]:
filter_dataframe(df, 5)

Unnamed: 0,Tree_depth,Upper_keys,Current_key,Value
89,5,__specification__build__requests__hardware,cpu_family,6
90,5,__specification__build__requests__hardware,cpu_model,94
91,5,__specification__build__requests__hardware,physical_cpus,32
92,5,__specification__build__requests__hardware,processor,"Intel Core Processor (Skylake, IBRS)"
100,5,__specification__python__requirements__packages,tensorflow,==1.9.0
102,5,__specification__python__requirements__requires,python_version,3.6
106,5,__specification__python__requirements_locked___meta,hash,[sha256]
108,5,__specification__python__requirements_locked___meta,pipfile-spec,6
109,5,__specification__python__requirements_locked___meta,requires,[python_version]
111,5,__specification__python__requirements_locked___meta,sources,"[{'name': 'pypi', 'url': 'https://pypi.org/simple', 'verify_ssl': True}]"


In [283]:
filter_dataframe(df, 6)

Unnamed: 0,Tree_depth,Upper_keys,Current_key,Value
107,6,__specification__python__requirements_locked___meta__hash,sha256,b3ec4de69687847147aaaf3396ab634cf73c9e77c8fa220620d72c752331dec2
110,6,__specification__python__requirements_locked___meta__requires,python_version,3.6
114,6,__specification__python__requirements_locked__default__absl-py,hashes,[sha256:b943d1c567743ed0455878fcd60bc28ac9fae38d129d1ccfad58079da00b8951]
115,6,__specification__python__requirements_locked__default__absl-py,version,==0.7.1
117,6,__specification__python__requirements_locked__default__astor,hashes,"[sha256:95c30d87a6c2cf89aa628b87398466840f0ad8652f88eb173125a6df8533fb8d, sha256:fb503b9e2fdd05609fbf557b916b4a7824171203701660f0c55bbf5a7a68713e]"
118,6,__specification__python__requirements_locked__default__astor,version,==0.7.1
120,6,__specification__python__requirements_locked__default__gast,hashes,[sha256:fe939df4583692f0512161ec1c880e0a10e71e6a232da045ab8edd3756fbadf0]
121,6,__specification__python__requirements_locked__default__gast,version,==0.2.2
123,6,__specification__python__requirements_locked__default__grpcio,hashes,"[sha256:0442f7d0c527ceab6a76159937ae8109941eace90ec00cb1bd08fc4f3179e52e, sha256:051957d0f61f4dec90868a54ee969228409926a0a19fd8ed7b4a0e50388effee, sha256:0d262794b2339770d5378a5717f8ddbfb68e409974582f0503272b90b7cc79bd, sha256:142693dc8bd427c595d030f75bf8d01c843d9ccb659499e8507ad22da832e9cf, sha256:18d44515a3fd3a71442abb5a1c65fc1909d859c13cda50c974cbc69742a80cea, sha256:1d50674bdffa18ea6143e0df9a1b97cdeab583ce5dd1cabda3502ee75215065c, sha256:3945335a5b8332995415c5f03da1a5f6e36da6ede819a611e2cbb093cf752bdd, sha256:3a9603ff14070524f4c69634afad6b280b07ad9f8c2c346c4b2290306e1928ac, sha256:52861aac5c1dcf4c841eb555b257cfb56d0c840a286495078382f538d0a34d6a, sha256:53c512c7c8af9cb9e3e1cc5ce5e4a5fb2f2e7695e69219f90016bc602abe2f3b, sha256:57ea92c9b81015e5f2cc355e53f08a4e661b78a207857311c7b8c55137..."
124,6,__specification__python__requirements_locked__default__grpcio,version,==1.20.1


---

## Mapping InspectionRun JSON to pandas DataFrame

---

## Plot InspectionRun

---

## Wrapper