Directory three of the project:

```
├── code
├── data
├── jmxs
├── model
├── pcaps
└── results
```

The working dir of the script is the code/ directory

The code will be used to extract data to instrument the model. The data has been obtained from the system executing the application. The perform N readings in a observation time interval T. 

    1. read data from the data dir/ - 
    
    ```
    ├── 10-38
    │   ├── containers_pre.json
    │   ├── containers_post.json
    │   ├── energy.csv
    │   ├── system_pre.json
    │   ├── system_post.json
    │   └── requests.jtl
    ├── 11-38
    │   ├── containers_pre.json
    │   ├── containers_post.json
    │   ├── energy.csv
    │   ├── system_pre.json
    │   ├── system_post.json
    │   └── requests.jtl
    ```
       
    The name of the directory contains two number: the leftmost number refers to the ith reading while the second one the number of customers severed during the run.

In [1]:
from pathlib import Path
import pandas as pd
import numpy as np
from itertools import chain
import csv, json, glob, os
import matplotlib.pyplot as plt

# Requests Tree

This section of the notebook reads the .csv containing the source and the destination of the requests

In [2]:
CONVERSATIONS = pd.read_csv("experiment_configuration_data/conversations_names.csv")

### Energy Estimation 

In [3]:
DATA = f"../validation_results" # directory containing the profile data
CUST = 400 # Number of customers for analysis

In [4]:
DIRS = list(map(lambda x: f"{DATA}/{x}", os.listdir(DATA)))
# retrieves the dir having number of customers {CUST}
DIRS = list(filter(lambda x: x.find(f"-{CUST}") != -1, DIRS)) 

In [5]:
def get_energy_single_run(run):
    return np.trapz(run['power'], run['time'])

def get_energy_over_trial(trials):
    return np.array(
        [np.trapz(x['power'], x['time']) for x in trials]
    )

def get_duration(trials):
    return np.array([x['time'].iloc[-1] for x in trials])

def get_e_value(trials):
    return get_energy_over_trial(trials)/get_duration(trials)

def print_stats(measurements):
    energy = get_energy_over_trial(measurements)
    e = get_e_value(measurements)    
    duration = get_duration(measurements)
    
    print(
            f"# Energy Per Visit(Joule/Visit):\n",
            f"## Mean:\t\t\t{energy.mean()}", 
            f"## Min-Max:\t\t\t[{energy.min()}, {energy.max()}]",
            f"## Var:\t\t\t\t{energy.var()}", 
            f"## Std:\t\t\t\t{energy.std()}", 
            '\n'
            f"# Average Response Time:\t{duration.mean()}",
            f"# e (Joule/s):\t\t\t{e.mean()}, [{e.min()}, {e.max()}]",
            sep='\n'
    )

In [6]:
POWERFILES = [f"{x}/energy.csv" for x in DIRS] # list of files having power values 

In [7]:
power_values = [
    pd.read_csv(x, names = ["time", "power"]) for x in POWERFILES
] # reading files as Dataframe

In [8]:
power_values = [x[x['power'] > 45] for x in power_values] # gets the portion where the CPU was active

In [9]:
duration = np.array([x['time'].iloc[-1] - x['time'].iloc[0] for x in power_values]) # duration of the run
# calculate the total energy consumed during the run
energy = np.array([x['power'].mean() * (x['time'].iloc[-1] - x['time'].iloc[0]) for x in power_values])

In [10]:
print(
    f"energy: {round(energy.mean(), 3)} J",
    f"duration: {round(duration.mean(), 3)} s",
    f"e: {round((energy/duration).mean(), 3)} (J/s)",
    sep='\n'
)

energy: 2624.166 J
duration: 31.51 s
e: 83.289 (J/s)


### Performance Values

In [12]:
from helpers.stats import SystemStats
from helpers.stats import ContainerStats

In [13]:
def get_system_utilization(DIRS):
    rows = []
    for x in DIRS:
        f1, f2 = glob.glob(f"{x}/system_*.json")
        data = SystemStats(f2, f1).data[0]
        rows.append(
            {'cpu': data['cpu0'], 'disk': data['disk'], 'io': data['io'], 'duration': data['duration']}
        )

    return pd.DataFrame.from_records(rows)

### Server Performance
This section describes the statistics of the whole machine, i.e., not just TTBS. For example, the utilization value also includes the load generated by the operating system services and not just TTBS.

In [14]:
system_stats     = get_system_utilization(DIRS)
system_stats

Unnamed: 0,cpu,disk,io,duration
0,80.765,9.602,930,37.15973
1,81.531,22.5,2155,37.635378


In [15]:
mean_duration    = system_stats['duration'].mean()
mean_disk        = system_stats['disk'].mean()
mean_io          = system_stats['io'].mean()
sys_mean_cpu     = system_stats['cpu'].mean()
arrival_rate     = CUST/mean_duration

In [16]:
print(
    "Server Stats\n",
    f"mean_cpu: {sys_mean_cpu:}",
    f"mean_disk: {mean_disk:}",
    f"mean_duration: {mean_duration:}",
    f"mean_io: {mean_io:}",
    f"arrival_rate: {arrival_rate:}",
    sep='\n'
)

Server Stats

mean_cpu: 81.148
mean_disk: 16.051000000000002
mean_duration: 37.39755415916443
mean_io: 1542.5
arrival_rate: 10.695886642682442


### Throughput of Each Run

In this section, we study using the output of JMeter that the system is not subject to congestion. We do that by checking whether the rate of arrival of requests is similar to the throughput

In [17]:
# Read the JTL file
def get_jtl_single_run(path):
    d = pd.read_csv(f"{path}/requests.jtl")
    return d[d['label'] == 'BuyTicket']

def get_jtl_over_trial(paths):    
    return [get_jtl_single_run(x) for x in paths if len(get_jtl_single_run(x)) <= CUST]

def calculate_duration_from_jtl(jtl):
    last = jtl.iloc[-1]
    return ((last.loc['timeStamp'] + last.loc['elapsed']) - jtl.iloc[0].loc['timeStamp'])/1000

def calculate_throughput(jtl):
    duration = calculate_duration_from_jtl(jtl)
    
    """
    print(
        f"requests: {len(jtl)}",
        f"duration(s): {duration}"
        #f"duration(s): {(end-start)/1000}"
    )"""
    return len(jtl) / duration

jtls = get_jtl_over_trial(DIRS)

throughput = np.array([calculate_throughput(x) for x in jtls])
duration = np.array([calculate_duration_from_jtl(x) for x in jtls])

print(
    f"throughtput {round(throughput.mean(), 3)} request/second",
    f"duration {round(duration.mean(), 3)} seconds",
    f"rate {round(CUST/duration.mean(), 3)} request/second",
    sep="\n"
)

### Containers Utilization

In [28]:
def get_containers_stats_single_run(path):
    f1, f2 = glob.glob(f"{path}/containers_*.json")
    data = ContainerStats(f2, f1).data
    return {k: [v['cpu'], v['disk'], v['io']] for k,v in data.items()}

def get_containers_stats_over_trial(DIRS):
    return pd.DataFrame.from_records([get_containers_stats_single_run(x) for x in DIRS])

In [29]:
containers_stats = get_containers_stats_over_trial(DIRS)

In [30]:
utilization = pd.DataFrame(columns=['container', 'cpu', 'disk', 'io'])

In [31]:
utilization = pd.DataFrame([{
    'container': k, 
    'cpu'      : np.array([v[0] for v in containers_stats[k]]).mean(),
    'disk'     : np.array([v[1] for v in containers_stats[k]]).mean(),
    'io'       : np.array([v[2] for v in containers_stats[k]]).mean()
} for k in containers_stats.keys()])

In [32]:
utilization.sort_values(by=['cpu'])

Unnamed: 0,container,cpu,disk,io
15,baseline-ts-consign-service-1,0.009882,0.0,0.0
6,baseline-ts-user-service-1,0.010071,0.0,0.0
23,baseline-ts-inside-payment-mongo-1,0.05719,0.0,0.0
0,baseline-ts-consign-price-service-1,0.057417,0.0,1.0
25,baseline-ts-payment-mongo-1,0.058039,0.0,0.0
33,baseline-ts-assurance-mongo-1,0.05836,0.0,2.0
31,baseline-ts-ticket-office-mongo-1,0.058856,0.0,2.0
27,baseline-ts-consign-mongo-1,0.059036,0.0,1.0
26,baseline-ts-price-mongo-1,0.097745,0.0,1.0
29,baseline-ts-food-map-mongo-1,0.150222,0.0,2.0


In [33]:
print(f"Total Utilization of TTBS: {round(utilization['cpu'].sum(), 3)} %")

Total Utilization of TTBS: 71.357 %


In [34]:
# service_time = (utilization['cpu'].to_numpy() / 100) / arrival_rate

In [35]:
# average busy time of each container during while serving a batch of customers
print(f"container{' '*37}service_time \n")
for i, u in utilization.iterrows():
    print(f"{u['container']:<45} {round((u['cpu'] / 100) / arrival_rate, 8)}")

container                                     service_time 

baseline-ts-consign-price-service-1           5.368e-05
baseline-ts-price-service-1                   0.00044751
baseline-ts-contacts-service-1                0.00236661
baseline-ts-order-service-1                   0.00169109
baseline-ts-route-service-1                   0.00359234
baseline-ts-travel-service-1                  0.00693064
baseline-ts-user-service-1                    9.42e-06
baseline-ts-config-service-1                  0.000771
baseline-ts-ticketinfo-service-1              0.00393587
baseline-ts-order-other-service-1             0.00048129
baseline-ts-station-service-1                 0.0048126
baseline-ts-preserve-service-1                0.00202619
baseline-ts-assurance-service-1               0.00052901
baseline-ts-basic-service-1                   0.00474987
baseline-ts-security-service-1                0.0009168
baseline-ts-consign-service-1                 9.24e-06
baseline-ts-train-service-1         