# Purpose

This notebooks graphs the performance results of 5G core networks. The traffic for the 5G core networks is generated using a [5G core traffic generator](https://github.com/tariromukute/core-tg). The performance results are collected using a bcc and bpftrace tools.

In [1]:
# configure spark variables
from pyspark.context import SparkContext
from pyspark.sql.context import SQLContext
from pyspark.sql.session import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
    
sc = SparkContext()
sqlContext = SQLContext(sc)
spark = SparkSession(sc)

# load up other dependencies
import re
import pandas as pd

import glob
import matplotlib.pyplot as plt
import numpy as np

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/31 15:19:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


In [2]:
import os
if not os.path.exists("images"):
    os.mkdir("images")

import os
import glob
import plotly.express as px
from plotly.subplots import make_subplots
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
from pyspark.sql.functions import expr
basePath = "../results"

In [3]:
html_output_file = '../open5gs.html'
with open(html_output_file, 'w') as f:
    f.write('<h1>Open GiLAN Testbed Results</h1>')
    f.write('<h3> The graphs summaries the NFV performance metrics<h3>')
    f.write('<h4> General Workload Chacterisation </h4>')
    f.write('<a href="#system-chaterisation"> Skip to results </a></h5>')
    f.write('<p> The system run different processes depending on the applications running. The operations of these applications and their respective processes is \
        execute through system calls. There are wide range of system calls that can be run by the OS. In general the frequent types of system calls can provide \
        a general chaterisationof the workload running on the OS. The workload charaterisation is a good starting point in understanding the system or applications \
        running the system. In addition to the frequent system calls, details on the processes making syscalls is helpful in understanding the system. \
        \
        The latency of the both the system calls and the processes making system calls is a starting point in understand the latency of the system as a whole. From these \
        results we can go further to look at the performance results of the different compute resources. The chaterisations helps in knowing the compute results to focus on \
        e.g., if there is a load or read syscalls we can focus on Filesystem and cache.</p>')
    f.write('<h4> CPU  <h4>')
    f.write('<h5><a href="#cpu-metrics"> Skip to results </a></h5>')
    f.write('<p> The CPU is responsible for executing all workloads on the NFV. Like other resources, the CPU is managed by the kernel. The user-level applications access CPU resources by sending system calls to the kernel. The kernel also receives other system call requests from different processes; memory loads and stores can issue page faults system calls. The primary consumers of CPU resources are threads (also called tasks), which belong to procedures, kernel routines and interrupt routes. The kernel manages the sharing via a CPU scheduler.</p>')
    f.write('<p> There are three thread states: ON-PROC for threads running on a CPU, RUNNABLE for threads that could run but are waiting their turn, and SLEEP for blocked lines on another event, including uninterruptible waits. These can be categorised into two for more accessible analysis, on-CPU referring to ON-PROC, and off-CPU referring to all other states, where the thread is not running on a CPU. Lines leave the CPU in one of two ways: (1) voluntary if they block on I/O, a lock, or asleep, or (2) involuntary if they have exceeded their scheduled allocation of CPU time. When a CPU switches from running one process or thread to another, it switches address spaces and other metadata. This process is called context switching; it also consumes the CPU resources. All these processes, described, in general, consume the CPU time. In addition to the time, another CPU resource used by the methods, kernel routines and interrupts routines is the CPU cache.</p>')
    f.write('<p> There are typically multiple levels of CPU cache, increasing in both size and latency. The caches end with the last-level store (LLC), large (Mbytes) and slower. On a processor with three levels of supplies, the LLC is also the Level 3 cache. Processes are instructions to be interpreted and run by the CPU. This set of instructions is typically loaded from RAM and cached into the CPU cache for faster access. The CPU first checks the lower cache, i.e., L1 cache. If the CPU finds the data, this is called a hit. If the CPU does not see the data, it looks for it in L2 and then L3. If the CPU does not find the data in any memory caches, it can access it from your system memory (RAM). When that happens, it is known as a cache miss. In general, a cache miss means high latency, i.e., the time needed to access data from memory. </p>')

    f.write('<h4> Memory <h4>')
    f.write('<h5><a href="#memory-metrics"> Skip to results </a> </h5>')
    f.write('<p> The kernel and processor are responsible for mapping the virtual memory to physical memory. For efficiency, memory mappings are created in groups of memory called <em>pages</em>. When an application starts, it begins with a request for memory allocation. In the case that there is no free memory on the heap, the syscall <em>brk()</em> is issued to extend the size of the bank. However, if there is free memory on the heap, a new memory segment is created via the <em>mmap()</em> syscall. Initially, this virtual memory mapping does not have a corresponding physical memory allocation. Therefore when the application tries to access this allocated memory segment, the error called <em>page fault</em> occurs on the MMU. The kernel then handles the page fault, mapping from the virtual to physical memory. The amount of physical memory allocated to a process is called resident set size (RSS). When there is too much memory demand on the system, the kernel page-out daemon (kswapd) may look for memory pages to free. Three types of pages can be released in their order: pages that we read but not modified (backed by disk) these can be immediately rid, pages that have been modified (dirty) these need to be written to disk before they can be freed and pages of application memory (anonymous) these must be stored on a swap device before they can be released. kswapd, a page-out daemon, runs periodically to scan for inactive and active pages with no memory to free. It is woken up when free memory crosses a low threshold and goes back to sleep when it crosses a high threshold. Swapping usually causes applications to run much more slowly.</p>')

    f.write('<h4>Filesytem <h4>')
    f.write('<h5><a href="#filesystem-metrics"> Skip to results </a> </h5>')
    f.write('<p> The file system that applications usually interact with directly and file systems can use caching, read-ahead, buffering, and asynchronous I/O to avoid exposing disk I/O latency to the application. Logical I/O describes requests to the file system. If these requests must be served from the storage devices, they become physical I/O. Not all I/O will; many logical read requests may be returned from the file system cache and never become physical I/O. File systems are accessed via a virtual file system (VFS). It provides operations for reading, writing, opening, closing, etc., which are mapped by file systems to their internal functions. Linux uses multiple caches to improve the performance of storage I/O via the file system. These are Page cache: This contains virtual memory pages and enhances the performance of file and directory I/O. Inode cache, which are data structures used by file systems to describe their stored objects. The directory cache caches mappings from directory entry names to VFS inodes, improving the performance of pathname lookups. The page cache grows to be the largest of all these because it caches the contents of files and includes “dirty” pages that have been modified but not yet written to disk.</p>')

    f.write('<h4>Disk I/O <h4>')
    f.write('<h5><a href="#disk-metrics"> Skip to results </a> </h5>')
    f.write('<p> Linux exposes rotational magnetic media, flash-based storage, and network storage as storage devices. Therefore, disk I/O refers to I/O operations on these devices. Disk I/O is a common source of performance issues because I/O latency on storage devices is orders of magnitude slower than the nanosecond or microsecond speed of CPU and memory operations. Block I/O refers to device access in blocks. I/O is queued and scheduled in the block layer. The wait time is spent in the block layer scheduler queues and device dispatcher queues from the operating system. Service time is the time from device issue to completion. This may include the time spent waiting in an on-device line. Request time is the overall time from when an I/O was inserted into the OS queues to its completion. The request time matters the most, as that is the time that applications must wait if I/O is synchronous.</p>')

    f.write('<h4>Networking<h4>')
    f.write('<h5><a href="#networking-metrics"> Skip to results </a> </h5>')
    f.write('<p> Networking is a complex part of the Linux system. It involves many different layers and protocols, including the application, protocol libraries, syscalls, TCP or UDP, IP, and device drivers for the network interface. In general, the Networking system can be broken down into four. The NIC and Device Driver Processing first reads packets from the NIC and puts them into kernel buffers. Besides the NIC and Device driver, this process includes the DMA and particular memory regions on the RAM for storing receive and transmit packets called rings and the NAPI system for poling packets from these rings to the kernel buffers. It also incorporates some early packet processing hooks like XDP and AF\_XDP and can have custom drivers that bypass the kernel (i.e., the following two processes) like DPDK. Following is the Socket processing. This part also includes queuing and different queuing disciplines. It also incorporates some packet processing hooks like TC, Netfilter etc., which can alter the flow of the networking stack. After that is the  Protocol processing layer, which applies functions for different IP and transport protocols, both these protocols run under the context of SoftIrq. Lastly is the application process. The application receives and sends packets on the destination socket</p>')
    
    f.write('<h4>Flame Graphs to analyse code paths<h4>')
    f.write('<h5><a href="#flame-graphs"> Skip to results </a> </h5>')
    f.write('<p> A flame graph visualizes a distributed request trace and represents each service call that occurred during the requests execution path with a timed, color-coded, horizontal bar. Flame graphs for distributed traces include error and latency data to help developers identify and fix bottlenecks in their applications..</p>')

In [4]:
# General chaterisation
import plotly; print(plotly.__version__)

5.15.0


23/08/31 15:19:29 WARN GarbageCollectionMetrics: To enable non-built-in garbage collector(s) List(G1 Concurrent GC), users should configure it(them) to spark.eventLog.gcMetrics.youngGenerationGarbageCollectors or spark.eventLog.gcMetrics.oldGenerationGarbageCollectors


In [16]:
import subprocess
import os

# Helper functions
def remove_noise_processes(df, field, values):
    a = df.loc[df[field].isin(values)].index.array.tolist()
    df.drop(a, inplace=True)
    return df

def pivot_dataframe_to_gnuplot_format(df, values, index='ues', columns='cn'):
    print(df.head())

    # Group the DataFrame by 'Country' and 'Year'
    grouped_data = df.groupby([index, columns]).sum()

    # Pivot the resulting grouped data
    pivoted_df = grouped_data.pivot_table(index=index, columns=columns, values=values).reset_index()

    return pivoted_df

def draw_gnuplot_linepoints(df, name, title, xlabel, ylabel):
    df.to_csv(f'gnuplot/{name}.csv', index=False)
    print(df.columns)
    # Write the Gnuplot script
    with open(f'gnuplot/{name}.gnu', 'w') as f:
        f.write('set style data linespoints\n')
        f.write('set term png\n')
        f.write(f"set output '{name}.png'\n")
        # f.write('set key outside left bottom horizontal spacing 1 width 2 height 1.5\n')
        f.write('set key top left\n')
        f.write('set key autotitle columnhead\n')
        f.write("set datafile separator ','\n")
        f.write(f'set title "{title}"\n')
        f.write('set grid xtics ytics mytics\n')
        f.write(f'set xlabel "{xlabel}"\n')
        f.write(f'set ylabel "{ylabel}"\n')
        f.write(' # Create theme \n \
        dpi = 600 ## dpi (variable) \n \
        width = 164.5 ## mm (variable) \n \
        height = 100 ## mm (variable) \n \
        \n \
        in2mm = 25.4 # mm (fixed) \n \
        pt2mm = 0.3528 # mm (fixed) \n \
        \n \
        mm2px = dpi/in2mm \n \
        ptscale = pt2mm*mm2px \n \
        round(x) = x - floor(x) < 0.5 ? floor(x) : ceil(x) \n \
        wpx = round(width * mm2px) \n \
        hpx = round(height * mm2px) \n \
        \n \
        set terminal pngcairo size wpx,hpx fontscale ptscale linewidth ptscale pointscale ptscale \n \
        \n \
        colors = "blue red green brown black magenta orange purple sienna1 slategray tan1 yellow turquoise orchid khaki" \n ')
        f.write(f'plot for [i=2:{len(df.columns)}] "{name}.csv" u 1:i t columnhead lc rgb word(colors, i-1)')

    # Run the Gnuplot script
    
    # Relative path of the desired working directory
    relative_dir_path = 'gnuplot'
    
    # Get the absolute path of the working directory
    curr_dir = os.getcwd()
    
    # Create the full path to the desired directory
    my_dir_path = os.path.join(curr_dir, relative_dir_path)
    
    subprocess.call(['gnuplot', '-p', f'{name}.gnu'],  cwd=my_dir_path)
    # Change current working directory to 'mydir'
    # os.chdir('./gnuplot')

    # # Execute command 'mycommand' in the new directory
    # os.system('gnuplot -p cn_perf_ue_avg_exp.gnu')

    # display the image on the screen
    from IPython.display import Image
    Image(filename=f'gnuplot/{name}.png')

labels = {
    "ues": "Number of UEs",
    "time (ms)": "Time (ms)",
    "syscall": "System calls",
    "count": "Number of calls",
    "avg": "Average time per syscall (ms)",
    "cn": "Core network"
}

noise_processes_excl_db = ['python3', 'systemd', 'snapd', 'sshd', 'sudo', 'multipathd', 'systemd-logind', 'systemd-timesyn', 'systemd-resolve', 'systemd-udevd', 'systemd-network', 'systemctl', 'accounts-daemon', 'dbus-daemon', '[unknown]']
noise_processes = noise_processes_excl_db + ['mongod', 'mysqld']

In [6]:
""" This shows how the usage syscalls change as the load changes. 
(a) The number of syscalls as the traffic load increases
(b) The time spent executing syscalls as the traffic increases
(c) The average time spent per syscall as the traffic increases
This can tell us:
1. How the core network is architectures to respond to increasing load
2. Comparing can tell us the core network that sends more time on syscalls. We can use that to corellate to the performance of the core network
3. We have the details on overal performance of the core networks, we can look at the results that correlate the performance
4. Is there a general trend to syscalls that can show well architected e.g., the latency should increase as load increase etc.
If there is an ideal trend or correlation, does it match the trend of the core networks and correlate to the performance we are seeing
"""

top_n = 5

syscount_df = spark.read.option("basePath", basePath).json(
f"{basePath}/cn=*/ues=*/tool=syscount")

df_syscount = syscount_df.toPandas().groupby(['cn', 'ues']).agg({ 'count': 'sum', 'time (ms)': 'sum' }).reset_index()

df_syscount['avg'] = (df_syscount['time (ms)'] / df_syscount['count'])

title='Syscalls across the system (by latency)'
syscount_fig = px.line(df_syscount, x="ues", y="time (ms)", color="cn", labels=labels,
                title=title, markers=True)
syscount_fig.show()
syscount_fig.write_image("./plotly/syscount_latency.jpeg")
gnuplot_df = pivot_dataframe_to_gnuplot_format(df_syscount, 'time (ms)')
draw_gnuplot_linepoints(gnuplot_df, name='syscount_latency', title=title,
                        xlabel='Number of UEs', ylabel=labels['time (ms)'])

title=f'System calls across the system (by number of calls)'
sysprocess_count_fig = px.line(df_syscount, x="ues", y="count", color="cn", labels=labels,
                hover_data=["count", "time (ms)"],
                title=title, markers=True)
sysprocess_count_fig.show()
syscount_fig.write_image("plotly/syscount_count.jpeg")
gnuplot_df = pivot_dataframe_to_gnuplot_format(df_syscount, 'count')
draw_gnuplot_linepoints(gnuplot_df, name='syscount_count', title=title,
                        xlabel='Number of UEs', ylabel=labels['count'])

title=f'System calls across the system (by average latency)'
sysprocess_count_fig = px.line(df_syscount, x="ues", y="avg", color="cn", labels=labels,
                hover_data=["count", "time (ms)"],
                title=title, markers=True)
sysprocess_count_fig.show()
syscount_fig.write_image("plotly/syscount_avg.jpeg")
gnuplot_df = pivot_dataframe_to_gnuplot_format(df_syscount, 'avg')
draw_gnuplot_linepoints(gnuplot_df, name='syscount_avg', title=title,
                        xlabel='Number of UEs', ylabel=labels['avg'])


        cn  ues    count     time (ms)         avg
0  free5gc    0    20399  3.294732e+06  161.514367
1  free5gc    5    48316  3.749627e+06   77.606314
2  free5gc   10    62475  3.719466e+06   59.535277
3  free5gc   50   718714  3.381769e+06    4.705305
4  free5gc  100  1357965  3.279550e+06    2.415048
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues    count     time (ms)         avg
0  free5gc    0    20399  3.294732e+06  161.514367
1  free5gc    5    48316  3.749627e+06   77.606314
2  free5gc   10    62475  3.719466e+06   59.535277
3  free5gc   50   718714  3.381769e+06    4.705305
4  free5gc  100  1357965  3.279550e+06    2.415048
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues    count     time (ms)         avg
0  free5gc    0    20399  3.294732e+06  161.514367
1  free5gc    5    48316  3.749627e+06   77.606314
2  free5gc   10    62475  3.719466e+06   59.535277
3  free5gc   50   718714  3.381769e+06    4.705305
4  free5gc  100  1357965  3.279550e+06    2.415048
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


In [7]:
""" A tabular view with ratios of the most sum of (latency per syscall, count per syscall and average latency of syscall). The tabular view will
1. Show us for each core network what is the ratio of a syscall precense over the other e.g., recvfrom has 4x more latency than sendto
2. Across core networks, we can compare the ratio of presence of a syscall e.g., free5gc invokes recvfrom 4x more than open5gs
3. For grouped syscalls, we can tell which flavor a given call network uses more e.g., for multiplexing syscalls, we can may see that free5gc uses
select more than epoll_wait and infer based on the relative performance of them
4. In addition to (3), for different core networks we can see that e.g., free5gc use select which is 4x more that epoll_wait being used by open5gs.
Tying this with the theory of the syscall we may be able to get the reasons for difference in performance

"""

' A tabular view with ratios of the most sum of (latency per syscall, count per syscall and average latency of syscall). The tabular view will\n1. Show us for each core network what is the ratio of a syscall precense over the other e.g., recvfrom has 4x more latency than sendto\n2. Across core networks, we can compare the ratio of presence of a syscall e.g., free5gc invokes recvfrom 4x more than open5gs\n3. For grouped syscalls, we can tell which flavor a given call network uses more e.g., for multiplexing syscalls, we can may see that free5gc uses\nselect more than epoll_wait and infer based on the relative performance of them\n4. In addition to (3), for different core networks we can see that e.g., free5gc use select which is 4x more that epoll_wait being used by open5gs.\nTying this with the theory of the syscall we may be able to get the reasons for difference in performance\n\n'

In [22]:
""" The top X active syscalls and process per core network
We can look at:
1. The composition of the core network, the system calls that run or maintain the system
2. It can tell us what the system spends most of it's time on
3. For these we can see if they syscalls follow the 'ideal trend' of responding to traffic load

"""
top_n = 6

def top_processes(df, field):
    label_maxes = df.groupby(['comm'])[field].sum().sort_values(ascending=False)

    # Select the top n labels with the highest y-values
    top_labels = label_maxes.head(top_n).index.tolist()

    return top_labels

sysprocess_df = spark.read.option("basePath", basePath).json(
f"{basePath}/cn=*/ues=*/tool=sysprocess")

df_sysprocess = sysprocess_df.toPandas()
df_process = remove_noise_processes(df_sysprocess, 'comm', noise_processes_excl_db)
df_process['avg'] = (df_process['time (ms)'] / df_process['count'])

top_labels = top_processes(df_process, 'time (ms)')

sunburst_fig = px.sunburst(df_process[df_process['comm'].isin(top_labels)], path=['cn', 'comm'], values='time (ms)',
            color='cn', hover_data=['time (ms)'],
            title=f"Processes making syscall (by latency)")
sunburst_fig.update_traces(textinfo="label+percent root")
sunburst_fig.show()
sunburst_fig.write_image(f"plotly/grouped_sysprocess_latency.jpeg")

top_labels = top_processes(df_process, 'count')

sunburst_fig = px.sunburst(df_process[df_process['comm'].isin(top_labels)], path=['cn', 'comm'], values='count',
            color='cn', hover_data=['time (ms)'],
            title=f"Processes making syscall (by number of calls)")
sunburst_fig.update_traces(textinfo="label+percent root")
sunburst_fig.show()
sunburst_fig.write_image(f"plotly/grouped_sysprocess_count.jpeg")

top_labels = top_processes(df_process, 'avg')

sunburst_fig = px.sunburst(df_process[df_process['comm'].isin(top_labels)], path=['cn', 'comm'], values='avg',
            color='cn', hover_data=['time (ms)'],
            title=f"Processes making syscall (by average latency)")
sunburst_fig.update_traces(textinfo="label+percent root")
sunburst_fig.show()
sunburst_fig.write_image(f"plotly/grouped_sysprocess_avg.jpeg")

# Create line graphs for the all process per core network
top_labels = top_processes(df_process, 'count')
df_cn_process = remove_noise_processes(df_sysprocess, 'comm', noise_processes)
df_cn_process = df_cn_process.groupby(['ues', 'cn']).agg({ 'count': 'sum', 'time (ms)': 'sum'}).reset_index()

title=f'Core networks: Top {top_n} active processes making syscall (by number of calls)'
sysprocess_fig = px.line(df_cn_process.sort_values('ues'),
        x="ues", y="count", color="cn",
        hover_data=["count", "time (ms)"],
        labels=labels,
        title=title,
        markers=True)
sysprocess_fig.show()
sysprocess_fig.write_image(f"plotly/core_network_sum_sysprocess_count.jpeg")

gnuplot_df = pivot_dataframe_to_gnuplot_format(df_cn_process, 'count', index='ues', columns='cn')
draw_gnuplot_linepoints(gnuplot_df, name=f'core_network_sum_sysprocess_count', title=title,
                xlabel='Number of UEs', ylabel=labels['count'])


# For the active processes remove databases
df_process = remove_noise_processes(df_sysprocess, 'comm', noise_processes)
df_process['avg'] = (df_process['time (ms)'] / df_process['count'])

grouped_data = df_process.groupby(['cn'])
for group_name, group_df in grouped_data:
     top_labels = top_processes(group_df, 'time (ms)')

     title=f'Top {top_n} active processes making syscall {group_name[0]} (by latency)'
     sysprocess_fig = px.line(group_df[group_df['comm'].isin(top_labels)].sort_values('ues'),
                x="ues", y="time (ms)", color="comm",
                hover_data=["count", "time (ms)"],
                labels=labels,
                title=title,
                markers=True)
     sysprocess_fig.show()
     sysprocess_fig.write_image(f"plotly/{group_name[0]}_sysprocess_latency.jpeg")

     gnuplot_df = pivot_dataframe_to_gnuplot_format(group_df[group_df['comm'].isin(top_labels)], 'count', index='ues', columns='comm')
     draw_gnuplot_linepoints(gnuplot_df, name=f'{group_name[0]}_sysprocess_latency', title=title,
                        xlabel='Number of UEs', ylabel=labels['time (ms)'])

     top_labels = top_processes(group_df, 'count')

     title=f'Top {top_n} active processes making syscall {group_name[0]} (by number of calls)'
     sysprocess_fig = px.line(group_df[group_df['comm'].isin(top_labels)].sort_values('ues'),
                x="ues", y="count", color="comm",
                hover_data=["count", "time (ms)"],
                labels=labels,
                title=title,
                markers=True)
     sysprocess_fig.show()
     sysprocess_fig.write_image(f"plotly/{group_name[0]}_sysprocess_count.jpeg")

     gnuplot_df = pivot_dataframe_to_gnuplot_format(group_df[group_df['comm'].isin(top_labels)], 'count', index='ues', columns='comm')
     draw_gnuplot_linepoints(gnuplot_df, name=f'{group_name[0]}_sysprocess_count', title=title,
                        xlabel='Number of UEs', ylabel=labels['count'])
     
     top_labels = top_processes(group_df, 'avg')

     title = f'Top {top_n} active processes making syscall {group_name[0]} (by average latency)'
     sysprocess_fig = px.line(group_df[group_df['comm'].isin(top_labels)].sort_values('ues'),
                x="ues", y="avg", color="comm",
                hover_data=["count", "time (ms)"],
                labels=labels,
                title=title,
                markers=True)
     sysprocess_fig.show()
     sysprocess_fig.write_image(f"plotly/{group_name[0]}_sysprocess_avg.jpeg")

     gnuplot_df = pivot_dataframe_to_gnuplot_format(group_df[group_df['comm'].isin(top_labels)], 'count', index='ues', columns='comm')
     draw_gnuplot_linepoints(gnuplot_df, name=f'{group_name[0]}_sysprocess_avg', title=title,
                        xlabel='Number of UEs', ylabel=labels['avg'])

   ues       cn   count     time (ms)
0    0  free5gc    9990  1.291965e+06
1    0      oai  284715  9.058338e+05
2    0  open5gs   11252  3.279446e+06
3    5  free5gc   34966  1.618149e+06
4    5      oai  283691  7.738424e+05
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


         comm   count    pid      time      time (ms)       cn  ues  \
701       amf  237796  18598  06:45:00  314221.706944  free5gc   50   
702       udr  145296  18620  06:45:00  266389.573056  free5gc   50   
703       nrf  113822  18587  06:45:00  264975.763634  free5gc   50   
704       pcf   36126  18629  06:45:00  247216.221091  free5gc   50   
706  rsyslogd     103  35679  06:45:00  130125.558466  free5gc   50   

           tool          avg  
701  sysprocess     1.321392  
702  sysprocess     1.833427  
703  sysprocess     2.327984  
704  sysprocess     6.843166  
706  sysprocess  1263.354937  
Index(['ues', 'amf', 'nrf', 'pcf', 'rsyslogd', 'udm', 'udr'], dtype='object', name='comm')


    comm   count    pid      time      time (ms)       cn  ues        tool  \
701  amf  237796  18598  06:45:00  314221.706944  free5gc   50  sysprocess   
702  udr  145296  18620  06:45:00  266389.573056  free5gc   50  sysprocess   
703  nrf  113822  18587  06:45:00  264975.763634  free5gc   50  sysprocess   
704  pcf   36126  18629  06:45:00  247216.221091  free5gc   50  sysprocess   
720  udm  138513  18639  06:45:00   27506.717822  free5gc   50  sysprocess   

          avg  
701  1.321392  
702  1.833427  
703  2.327984  
704  6.843166  
720  0.198586  
Index(['ues', 'amf', 'ausf', 'nrf', 'pcf', 'udm', 'udr'], dtype='object', name='comm')


            comm  count    pid      time      time (ms)       cn  ues  \
706     rsyslogd    103  35679  06:45:00  130125.558466  free5gc   50   
707          upf    136  18565  06:45:00  120038.467130  free5gc   50   
708         nssf    147  18646  06:45:00  120032.155405  free5gc   50   
709          smf    147  18610  06:45:00  120031.613460  free5gc   50   
715  packagekitd     98  35184  06:45:00   65004.980545  free5gc   50   

           tool          avg  
706  sysprocess  1263.354937  
707  sysprocess   882.635788  
708  sysprocess   816.545275  
709  sysprocess   816.541588  
715  sysprocess   663.316128  
Index(['ues', 'irqbalance', 'nssf', 'packagekitd', 'rsyslogd', 'smf', 'upf'], dtype='object', name='comm')


         comm   count     pid      time      time (ms)   cn  ues        tool  \
354       smf    1912  165171  02:16:05  180110.716769  oai  300  sysprocess   
355  rsyslogd    1891     599  02:16:05  135026.586948  oai  300  sysprocess   
357       nrf   69717  165153  02:16:05  128153.315028  oai  300  sysprocess   
358      ausf  169938  165158  02:16:05   81379.119863  oai  300  sysprocess   
359       udm  253783  165159  02:16:05   81166.429051  oai  300  sysprocess   

           avg  
354  94.200166  
355  71.404858  
357   1.838193  
358   0.478875  
359   0.319826  
Index(['ues', 'ausf', 'nrf', 'rsyslogd', 'smf', 'udm', 'udr'], dtype='object', name='comm')


     comm   count     pid      time      time (ms)   cn  ues        tool  \
351   amf  115805  165173  02:16:05  336396.293646  oai  300  sysprocess   
357   nrf   69717  165153  02:16:05  128153.315028  oai  300  sysprocess   
358  ausf  169938  165158  02:16:05   81379.119863  oai  300  sysprocess   
359   udm  253783  165159  02:16:05   81166.429051  oai  300  sysprocess   
360   udr  130567  165183  02:16:05   80436.078891  oai  300  sysprocess   

          avg  
351  2.904851  
357  1.838193  
358  0.478875  
359  0.319826  
360  0.616052  
Index(['ues', 'amf', 'ausf', 'nrf', 'systemd-journal', 'udm', 'udr'], dtype='object', name='comm')


           comm  count     pid      time      time (ms)   cn  ues        tool  \
354         smf   1912  165171  02:16:05  180110.716769  oai  300  sysprocess   
355    rsyslogd   1891     599  02:16:05  135026.586948  oai  300  sysprocess   
372  irqbalance    138     593  02:16:05   60002.358082  oai  300  sysprocess   
373        cron     13     586  02:16:05   60000.264476  oai  300  sysprocess   
403         smf   1895  125437  00:45:16  180077.701719  oai  200  sysprocess   

             avg  
354    94.200166  
355    71.404858  
372   434.799696  
373  4615.404960  
403    95.027811  
Index(['ues', 'cron', 'fwupd', 'irqbalance', 'rsyslogd', 'smf', 'udisksd'], dtype='object', name='comm')


            comm  count    pid      time      time (ms)       cn  ues  \
1   open5gs-smfd   1807  51819  14:44:47  615397.340219  open5gs   50   
2   open5gs-hssd   1525  51820  14:44:47  545349.319852  open5gs   50   
3  open5gs-pcrfd   1525  51818  14:44:47  545348.547112  open5gs   50   
4   open5gs-mmed   1584  51817  14:44:47  545342.993446  open5gs   50   
7       rsyslogd   3350    691  14:44:47  134704.413547  open5gs   50   

         tool         avg  
1  sysprocess  340.563000  
2  sysprocess  357.606111  
3  sysprocess  357.605605  
4  sysprocess  344.282193  
7  sysprocess   40.210273  
Index(['ues', 'open5gs-hssd', 'open5gs-mmed', 'open5gs-pcrfd', 'open5gs-smfd',
       'rsyslogd', 'systemd-journal'],
      dtype='object', name='comm')


               comm  count    pid      time      time (ms)       cn  ues  \
7          rsyslogd   3350    691  14:44:47  134704.413547  open5gs   50   
10     open5gs-scpd  18797  51790  14:44:47   69449.917735  open5gs   50   
11     open5gs-amfd  17072  51781  14:44:47   69087.440791  open5gs   50   
13  systemd-journal   3255    328  14:44:47   68718.138245  open5gs   50   
23    open5gs-ausfd   5095  51788  14:44:47   60003.001870  open5gs   50   

          tool        avg  
7   sysprocess  40.210273  
10  sysprocess   3.694734  
11  sysprocess   4.046828  
13  sysprocess  21.111563  
23  sysprocess  11.776840  
Index(['ues', 'open5gs-amfd', 'open5gs-ausfd', 'open5gs-scpd', 'open5gs-udmd',
       'rsyslogd', 'systemd-journal'],
      dtype='object', name='comm')


             comm  count    pid      time     time (ms)       cn  ues  \
15   open5gs-upfd     63  51780  14:44:47  66028.017813  open5gs   50   
21  open5gs-nssfd     98  51811  14:44:47  60048.607396  open5gs   50   
24     irqbalance    100    687  14:44:47  59999.341955  open5gs   50   
27  open5gs-sgwud     42  51796  14:44:47  58562.425758  open5gs   50   
28  open5gs-sgwcd     35  51791  14:44:47  55051.215129  open5gs   50   

          tool          avg  
15  sysprocess  1048.063775  
21  sysprocess   612.740892  
24  sysprocess   599.993420  
27  sysprocess  1394.343470  
28  sysprocess  1572.891861  
Index(['ues', 'cron', 'irqbalance', 'open5gs-nssfd', 'open5gs-sgwcd',
       'open5gs-sgwud', 'open5gs-upfd'],
      dtype='object', name='comm')


In [9]:
df_cn_process = df_process.groupby(['ues', 'cn']).agg({ 'count': 'sum', 'time (ms)': 'sum'}).reset_index()
print(df_cn_process.head())

   ues       cn   count     time (ms)
0    0  free5gc    9990  1.291965e+06
1    0      oai  284715  9.058338e+05
2    0  open5gs   11252  3.279446e+06
3    5  free5gc   34966  1.618149e+06
4    5      oai  283691  7.738424e+05


In [9]:

top_n = 10

def top_syscalls(df, field):
    label_maxes = df.groupby(['syscall'])[field].sum().sort_values(ascending=False)

    # Select the top n labels with the highest y-values
    top_labels = label_maxes.head(top_n).index.tolist()

    return top_labels

syscount_df = spark.read.option("basePath", basePath).json(
f"{basePath}/cn=*/ues=*/tool=syscount")

df_syscall = syscount_df.toPandas()

df_syscall['avg'] = (df_syscall['time (ms)'] / df_syscall['count'])

grouped_data = df_syscall.groupby(['cn'])
for group_name, group_df in grouped_data:
     top_labels = top_syscalls(group_df, 'time (ms)')

     f'Top {top_n} active syscalls {group_name[0]} (by latency)'
     syscount_fig = px.line(group_df[group_df['syscall'].isin(top_labels)].sort_values('ues'),
                    x="ues", y="time (ms)", color="syscall",
                    hover_data=["count", "time (ms)"],
                    labels=labels,
                    title=title,
                    markers=True)
     syscount_fig.show()
     syscount_fig.write_image(f"plotly/{group_name[0]}_top_syscall_latency.jpeg")
     
     gnuplot_df = pivot_dataframe_to_gnuplot_format(group_df[group_df['syscall'].isin(top_labels)], 'time (ms)', index='ues', columns='syscall')
     draw_gnuplot_linepoints(gnuplot_df, name=f'{group_name[0]}_top_syscall_latency.jpeg', title=title,
                        xlabel='Number of UEs', ylabel=labels['time (ms)'])
     
     top_labels = top_syscalls(group_df, 'count')

     title=f'Top {top_n} active syscalls {group_name[0]} (by number of calls)'
     syscount_fig = px.line(group_df[group_df['syscall'].isin(top_labels)].sort_values('ues'),
                    x="ues", y="count", color="syscall",
                    hover_data=["count", "time (ms)"],
                    labels=labels,
                    title=title,
                    markers=True)
     syscount_fig.show()
     syscount_fig.write_image(f"plotly/{group_name[0]}_top_syscall_count.jpeg")

     gnuplot_df = pivot_dataframe_to_gnuplot_format(group_df[group_df['syscall'].isin(top_labels)], 'count', index='ues', columns='syscall')
     draw_gnuplot_linepoints(gnuplot_df, name=f'{group_name[0]}_top_syscall_count.jpeg', title=title,
                        xlabel='Number of UEs', ylabel=labels['count'])

     top_labels = top_syscalls(group_df, 'avg')

     title=f'Top {top_n} active syscalls {group_name[0]} (by average latency)'
     syscount_fig = px.line(group_df[group_df['syscall'].isin(top_labels)].sort_values('ues'),
                    x="ues", y="avg", color="syscall",
                    hover_data=["count", "time (ms)"],
                    labels=labels,
                    title=title,
                    markers=True)
     syscount_fig.show()
     syscount_fig.write_image(f"plotly/{group_name[0]}_top_syscall_avg.jpeg")

     gnuplot_df = pivot_dataframe_to_gnuplot_format(group_df[group_df['syscall'].isin(top_labels)], 'avg', index='ues', columns='syscall')
     draw_gnuplot_linepoints(gnuplot_df, name=f'{group_name[0]}_top_syscall_avg.jpeg', title=title,
                        xlabel='Number of UEs', ylabel=labels['avg'])

      count          syscall      time     time (ms)       cn  ues      tool  \
300  253591            futex  08:52:57  2.126846e+06  free5gc  300  syscount   
301   16925          recvmsg  08:52:57  4.403394e+05  free5gc  300  syscount   
302  265977      epoll_pwait  08:52:57  2.748867e+05  free5gc  300  syscount   
303     781  clock_nanosleep  08:52:57  2.626265e+05  free5gc  300  syscount   
304     151       epoll_wait  08:52:57  2.568159e+05  free5gc  300  syscount   

             avg  
300     8.386912  
301    26.017101  
302     1.033498  
303   336.269503  
304  1700.767593  
Index(['ues', 'clock_nanosleep', 'epoll_pwait', 'epoll_wait', 'futex',
       'nanosleep', 'poll', 'ppoll', 'recvmsg', 'select', 'write'],
      dtype='object', name='syscall')


       count        syscall      time     time (ms)       cn  ues      tool  \
300   253591          futex  08:52:57  2.126846e+06  free5gc  300  syscount   
302   265977    epoll_pwait  08:52:57  2.748867e+05  free5gc  300  syscount   
308   251511      nanosleep  08:52:57  5.584121e+04  free5gc  300  syscount   
309    87618          write  08:52:57  4.028096e+03  free5gc  300  syscount   
310  2710909  clock_gettime  08:52:57  1.921834e+03  free5gc  300  syscount   

          avg  
300  8.386912  
302  1.033498  
308  0.222023  
309  0.045973  
310  0.000709  
Index(['ues', 'clock_gettime', 'epoll_ctl', 'epoll_pwait', 'futex',
       'gettimeofday', 'nanosleep', 'read', 'sched_yield', 'setsockopt',
       'write'],
      dtype='object', name='syscall')


      count          syscall      time     time (ms)       cn  ues      tool  \
300  253591            futex  08:52:57  2.126846e+06  free5gc  300  syscount   
301   16925          recvmsg  08:52:57  4.403394e+05  free5gc  300  syscount   
302  265977      epoll_pwait  08:52:57  2.748867e+05  free5gc  300  syscount   
303     781  clock_nanosleep  08:52:57  2.626265e+05  free5gc  300  syscount   
304     151       epoll_wait  08:52:57  2.568159e+05  free5gc  300  syscount   

             avg  
300     8.386912  
301    26.017101  
302     1.033498  
303   336.269503  
304  1700.767593  
Index(['ues', 'clock_nanosleep', 'epoll_pwait', 'epoll_wait', 'fdatasync',
       'fsync', 'futex', 'poll', 'ppoll', 'recvmsg', 'select'],
      dtype='object', name='syscall')


    count       syscall      time     time (ms)   cn  ues      tool  \
0   23020         futex  02:10:49  1.315869e+06  oai  300  syscount   
1    1251  io_getevents  02:10:49  6.256785e+05  oai  300  syscount   
2    1842    epoll_wait  02:10:49  6.145464e+05  oai  300  syscount   
3     733          poll  02:10:49  3.225495e+05  oai  300  syscount   
4  282892          read  02:10:49  2.763346e+05  oai  300  syscount   

          avg  
0   57.161989  
1  500.142679  
2  333.629978  
3  440.040278  
4    0.976820  
Index(['ues', 'clock_nanosleep', 'epoll_wait', 'futex', 'io_getevents',
       'openat', 'poll', 'ppoll', 'read', 'select', 'sendmsg'],
      dtype='object', name='syscall')


     count     syscall      time     time (ms)   cn  ues      tool        avg
0    23020       futex  02:10:49  1.315869e+06  oai  300  syscount  57.161989
4   282892        read  02:10:49  2.763346e+05  oai  300  syscount   0.976820
9    12852      openat  02:10:49  3.354367e+02  oai  300  syscount   0.026100
10    2559  readlinkat  02:10:49  2.794760e+02  oai  300  syscount   0.109213
13   10225       fstat  02:10:49  7.872975e+01  oai  300  syscount   0.007700
Index(['ues', 'clock_gettime', 'close', 'fcntl', 'fstat', 'futex', 'mmap',
       'openat', 'read', 'readlinkat', 'stat'],
      dtype='object', name='syscall')


    count       syscall      time     time (ms)   cn  ues      tool  \
0   23020         futex  02:10:49  1.315869e+06  oai  300  syscount   
1    1251  io_getevents  02:10:49  6.256785e+05  oai  300  syscount   
2    1842    epoll_wait  02:10:49  6.145464e+05  oai  300  syscount   
3     733          poll  02:10:49  3.225495e+05  oai  300  syscount   
4  282892          read  02:10:49  2.763346e+05  oai  300  syscount   

          avg  
0   57.161989  
1  500.142679  
2  333.629978  
3  440.040278  
4    0.976820  
Index(['ues', 'clock_nanosleep', 'epoll_wait', 'futex', 'io_getevents', 'poll',
       'ppoll', 'read', 'sched_yield', 'select', 'sendmsg'],
      dtype='object', name='syscall')


     count          syscall      time     time (ms)       cn  ues      tool  \
250  13832            futex  14:43:05  2.990883e+06  open5gs   50  syscount   
251  10036       epoll_wait  14:43:05  1.216888e+06  open5gs   50  syscount   
252  17708             poll  14:43:05  3.417506e+05  open5gs   50  syscount   
253   7597          recvmsg  14:43:05  2.804530e+05  open5gs   50  syscount   
254    784  clock_nanosleep  14:43:05  2.629725e+05  open5gs   50  syscount   

            avg  
250  216.229272  
251  121.252287  
252   19.299223  
253   36.916277  
254  335.424144  
Index(['ues', 'clock_nanosleep', 'epoll_wait', 'futex', 'nanosleep', 'openat',
       'poll', 'ppoll', 'recvmsg', 'select', 'sendmsg'],
      dtype='object', name='syscall')


     count     syscall      time     time (ms)       cn  ues      tool  \
250  13832       futex  14:43:05  2.990883e+06  open5gs   50  syscount   
251  10036  epoll_wait  14:43:05  1.216888e+06  open5gs   50  syscount   
252  17708        poll  14:43:05  3.417506e+05  open5gs   50  syscount   
253   7597     recvmsg  14:43:05  2.804530e+05  open5gs   50  syscount   
259  17717      openat  14:43:05  1.463739e+02  open5gs   50  syscount   

            avg  
250  216.229272  
251  121.252287  
252   19.299223  
253   36.916277  
259    0.008262  
Index(['ues', 'close', 'epoll_wait', 'fcntl', 'fstat', 'futex', 'openat',
       'poll', 'recvfrom', 'recvmsg', 'sendto'],
      dtype='object', name='syscall')


     count          syscall      time     time (ms)       cn  ues      tool  \
250  13832            futex  14:43:05  2.990883e+06  open5gs   50  syscount   
251  10036       epoll_wait  14:43:05  1.216888e+06  open5gs   50  syscount   
252  17708             poll  14:43:05  3.417506e+05  open5gs   50  syscount   
253   7597          recvmsg  14:43:05  2.804530e+05  open5gs   50  syscount   
254    784  clock_nanosleep  14:43:05  2.629725e+05  open5gs   50  syscount   

            avg  
250  216.229272  
251  121.252287  
252   19.299223  
253   36.916277  
254  335.424144  
Index(['ues', 'clock_nanosleep', 'epoll_wait', 'fdatasync', 'fsync', 'futex',
       'nanosleep', 'poll', 'ppoll', 'recvmsg', 'select'],
      dtype='object', name='syscall')


In [10]:
""" For each of the syscalls, we plot for the core networks, the amount of times and occurances. This can tell us:
1. How the core network are architected to respond to load for different operations e.g., how their socket read logic is implemented
to work and how that responses to change in traffic load
2. Relative to other core network which syscalls it uses the most. For example this can tell us the syscall that has the most differentiating factor, 
e.g., if all syscalls are relatively the same and there is a huge difference for sched_yield, then it is likely the differentiating syscall or design

"""
import pandas as pd
import plotly.graph_objs as go

def grouped_syscall_stats(df_sysprocess, syscall, writer=None):

    cn_df = df_sysprocess.groupby(['cn', 'ues']).agg({ 'count': 'sum', 'time (ms)': 'sum' }).reset_index()

    cn_df['avg'] = (cn_df['time (ms)'] / cn_df['count'])

    title=f'Core network syscall {syscall} (by latency)'
    sysprocess_count_fig = px.line(cn_df.sort_values('ues'),
                    x="ues", y="time (ms)", color="cn", 
                    hover_data=["count", "time (ms)"],
                    labels=labels,
                    title=title,
                    markers=True)
    sysprocess_count_fig.show()
    sysprocess_count_fig.write_image(f"plotly/core_network_on_{syscall}_latency.jpeg")

    gnuplot_df = pivot_dataframe_to_gnuplot_format(cn_df, 'count', index='ues', columns='cn')
    draw_gnuplot_linepoints(gnuplot_df, name=f'core_network_on_{syscall}_latency', title=title,
                        xlabel='Number of UEs', ylabel=labels['count'])
    

    sysprocess_count_fig = px.line(cn_df.sort_values('ues'),
                    x="ues", y="count", color="cn",
                    hover_data=["count", "time (ms)"],
                    labels=labels,
                    title=f'Processes making {syscall} syscall (by number of calls)',
                    markers=True)
    sysprocess_count_fig.show()
    sysprocess_count_fig.write_image(f"plotly/core_network_on_{syscall}_count.jpeg")

    gnuplot_df = pivot_dataframe_to_gnuplot_format(cn_df, 'count', index='ues', columns='cn')
    draw_gnuplot_linepoints(gnuplot_df, name=f'core_network_on_{syscall}_count', title=title,
                        xlabel='Number of UEs', ylabel=labels['count'])

    sysprocess_count_fig = px.line(cn_df.sort_values('ues'),
                    x="ues", y="avg", color="cn",
                    hover_data=["count", "time (ms)"],
                    labels=labels,
                    title=f'Processes making {syscall} syscall (by average latency)',
                    markers=True)
    sysprocess_count_fig.show()
    sysprocess_count_fig.write_image(f"plotly/core_network_on_{syscall}_avg.jpeg")

    gnuplot_df = pivot_dataframe_to_gnuplot_format(cn_df, 'avg', index='ues', columns='cn')
    draw_gnuplot_linepoints(gnuplot_df, name=f'core_network_on_{syscall}_avg', title=title,
                        xlabel='Number of UEs', ylabel=labels['avg'])


    return cn_df


def grouped_processes_stats(df_sysprocess, writer=None):
    comm_df = df_sysprocess.groupby(['cn', 'comm']).agg({ 'count': 'sum', 'time (ms)': 'sum' }).reset_index()

    comm_df['avg'] = (comm_df['time (ms)'] / comm_df['count'])

    sunburst_fig = px.sunburst(comm_df, path=['cn', 'comm'], values='time (ms)',
                  color='cn', hover_data=['count'],
                  title=f"Processes making {syscall} syscall (by latency)")
    sunburst_fig.show()
    sunburst_fig.write_image(f"plotly/grouped_sysprocess_on_{syscall}_latency.jpeg")

    sunburst_fig = px.sunburst(comm_df, path=['cn', 'comm'], values='count',
                  color='cn', hover_data=['time (ms)'],
                  title=f"Processes making {syscall} syscall (by number of calls)")
    sunburst_fig.show()
    sunburst_fig.write_image(f"plotly/grouped_sysprocess_on_{syscall}_count.jpeg")

    sunburst_fig = px.sunburst(comm_df, path=['cn', 'comm'], values='avg',
                  color='cn', hover_data=['time (ms)'],
                  title=f"Processes making {syscall} syscall (by average latency)")
    sunburst_fig.show()
    sunburst_fig.write_image(f"plotly/grouped_sysprocess_on_{syscall}_avg.jpeg")

    return comm_df

def grouped_syscall_types(syscalls, syscall_type):
    df = pd.DataFrame()
    for syscall in syscalls:
        sysprocess_df = spark.read.option("basePath", basePath).json(
            f"{basePath}/cn=*/ues=*/tool=sysprocess_{syscall}")
        df1 = sysprocess_df.toPandas()
        df1['syscall'] = syscall
        df = pd.concat([df, df1])

    df_syscall = remove_noise_processes(df, 'comm', noise_processes)

    syscall_df = df_syscall.groupby(['cn', 'syscall', 'comm']).agg({ 'count': 'sum', 'time (ms)': 'sum' }).reset_index()

    syscall_df['avg'] = (syscall_df['time (ms)'] / syscall_df['count'])

    sunburst_fig = px.sunburst(syscall_df, path=['cn', 'syscall', 'comm'], values='time (ms)',
                  color='cn', hover_data=['count'],
                #   title=f"Core networks making {syscall_type} syscall (by latency)"
                  )
    sunburst_fig.show()
    sunburst_fig.write_image(f"plotly/grouped_systypes_on_{syscall_type.replace('/', '')}_latency.jpeg")

    sunburst_fig = px.sunburst(syscall_df, path=['cn', 'syscall', 'comm'], values='count',
                  color='cn', hover_data=['time (ms)'],
                #   title=f"Core networks making {syscall_type} syscall (by number of calls)"
                  )
    sunburst_fig.show()
    sunburst_fig.write_image(f"plotly/grouped_systypes_on_{syscall_type.replace('/', '')}_count.jpeg")


    sunburst_fig = px.sunburst(syscall_df, path=['cn', 'syscall', 'comm'], values='avg',
                  color='cn', hover_data=['time (ms)'],
                #   title=f"Core networks making {syscall_type} syscall (by average latency)"
                  )
    sunburst_fig.show()
    sunburst_fig.write_image(f"plotly/grouped_systypes_on_{syscall_type.replace('/', '')}_avg.jpeg")


def compute_grouped_stats(syscall, summary_df):
    sysprocess_df = spark.read.option("basePath", basePath).json(
    f"{basePath}/cn=*/ues=*/tool=sysprocess_{syscall}")

    df_sysprocess = sysprocess_df.toPandas()

    df1 = remove_noise_processes(df_sysprocess, 'comm', noise_processes)
    syscall_df = grouped_syscall_stats(df1, syscall, writer)

    comm_df = grouped_processes_stats(df1, writer)

    # Get the summary
    df2 = comm_df.groupby(['cn']).agg({ 'count': 'sum', 'time (ms)': 'sum', 'avg': 'sum' }).reset_index()
    
    df2['syscall'] = syscall

    summary_df = pd.concat([summary_df, df2])

# writer = pd.ExcelWriter('ActiveProcessesPerSyscall-WithoutNoiseProcesses.xlsx', engine='xlsxwriter')
writer = None
noise_processes = ['python3', 'systemd', 'snapd', 'sshd', 'sudo', 'multipathd', 'systemd-logind', 'systemd-timesyn', 'systemd-resolve', 'systemd-udevd', 'systemd-network', 'systemctl', 'accounts-daemon', 'dbus-daemon', 'mongod', 'mysqld', '[unknown]']


io_multiplex_syscalls = ['epoll_wait', 'poll', 'ppoll', 'epoll_pwait', 'select']
grouped_syscall_types(io_multiplex_syscalls, 'I/O Multiplexing')
print("Syscalls for io multiplexing")
# Run for each syscall
grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
for syscall in io_multiplex_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)     

print(grouped_io_df)

grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
socket_files_syscalls = ['read', 'write']
grouped_syscall_types(socket_files_syscalls, 'Files')
print("Syscalls for read or write for files operations")
for syscall in socket_files_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)

print(grouped_io_df)

grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
socket_write_syscalls = ['sendto', 'sendmsg']
grouped_syscall_types(socket_write_syscalls, 'Send')
print("Syscalls for socket write operations")
for syscall in socket_write_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)

print(grouped_io_df)

grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
socket_read_syscalls = [ 'recvmsg', 'recvfrom']
grouped_syscall_types(socket_read_syscalls, 'Receive')
print("Syscalls for socket read operations")
for syscall in socket_read_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)

print(grouped_io_df)

grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
time_syscalls = ['clock_nanosleep', 'nanosleep']
grouped_syscall_types(time_syscalls, 'Time')
print("Syscalls for process time operations")
for syscall in time_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)

print(grouped_io_df)

grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
locks_syscalls = ['futex']
grouped_syscall_types(locks_syscalls, 'Locks')
print("Syscalls for locks operations")
for syscall in locks_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)

print(grouped_io_df)

grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
control_syscalls = ['sched_yield']
grouped_syscall_types(control_syscalls, 'Control operations')
print("Syscalls for control operations")
for syscall in control_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)

print(grouped_io_df)

Syscalls for io multiplexing


        cn  ues  count     time (ms)          avg
0  free5gc    0     32  65307.226301  2040.850822
1  free5gc    5     32  65314.282514  2041.071329
2  free5gc   10     32  65526.726272  2047.710196
3  free5gc   50     32  65319.185467  2041.224546
4  free5gc  100     32  65319.478896  2041.233716
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count     time (ms)          avg
0  free5gc    0     32  65307.226301  2040.850822
1  free5gc    5     32  65314.282514  2041.071329
2  free5gc   10     32  65526.726272  2047.710196
3  free5gc   50     32  65319.185467  2041.224546
4  free5gc  100     32  65319.478896  2041.233716
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count     time (ms)          avg
0  free5gc    0     32  65307.226301  2040.850822
1  free5gc    5     32  65314.282514  2041.071329
2  free5gc   10     32  65526.726272  2047.710196
3  free5gc   50     32  65319.185467  2041.224546
4  free5gc  100     32  65319.478896  2041.233716
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count      time (ms)          avg
0  free5gc    0     68  125034.576169  1838.743767
1  free5gc    5     66  135082.251277  2046.700777
2  free5gc   10     80  125066.995027  1563.337438
3  free5gc   50     65  125075.002274  1924.230804
4  free5gc  100     65  125072.297641  1924.189194
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count      time (ms)          avg
0  free5gc    0     68  125034.576169  1838.743767
1  free5gc    5     66  135082.251277  2046.700777
2  free5gc   10     80  125066.995027  1563.337438
3  free5gc   50     65  125075.002274  1924.230804
4  free5gc  100     65  125072.297641  1924.189194
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count      time (ms)          avg
0  free5gc    0     68  125034.576169  1838.743767
1  free5gc    5     66  135082.251277  2046.700777
2  free5gc   10     80  125066.995027  1563.337438
3  free5gc   50     65  125075.002274  1924.230804
4  free5gc  100     65  125072.297641  1924.189194
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


Empty DataFrame
Columns: [cn, ues, count, time (ms), avg]
Index: []
Index(['ues'], dtype='object', name='cn')



plot for [i=2:1] "core_network_on_ppoll_latency.csv" u 1:i t columnhead lc rgb word(colors, i-1)
                                                                                                ^
"core_network_on_ppoll_latency.gnu" line 28: x range is invalid



Empty DataFrame
Columns: [cn, ues, count, time (ms), avg]
Index: []
Index(['ues'], dtype='object', name='cn')



plot for [i=2:1] "core_network_on_ppoll_count.csv" u 1:i t columnhead lc rgb word(colors, i-1)
                                                                                              ^
"core_network_on_ppoll_count.gnu" line 28: x range is invalid



Empty DataFrame
Columns: [cn, ues, count, time (ms), avg]
Index: []
Index(['ues'], dtype='object', name='cn')



plot for [i=2:1] "core_network_on_ppoll_avg.csv" u 1:i t columnhead lc rgb word(colors, i-1)
                                                                                            ^
"core_network_on_ppoll_avg.gnu" line 28: x range is invalid



        cn  ues  count      time (ms)        avg
0  free5gc    0   4061  271829.932940  66.936699
1  free5gc    5  11143  271170.568160  24.335508
2  free5gc   10  15576  259133.057105  16.636688
3  free5gc   50  53298  257356.674686   4.828637
4  free5gc  100  97345  267757.644184   2.750605
Index(['ues', 'free5gc'], dtype='object', name='cn')


        cn  ues  count      time (ms)        avg
0  free5gc    0   4061  271829.932940  66.936699
1  free5gc    5  11143  271170.568160  24.335508
2  free5gc   10  15576  259133.057105  16.636688
3  free5gc   50  53298  257356.674686   4.828637
4  free5gc  100  97345  267757.644184   2.750605
Index(['ues', 'free5gc'], dtype='object', name='cn')


        cn  ues  count      time (ms)        avg
0  free5gc    0   4061  271829.932940  66.936699
1  free5gc    5  11143  271170.568160  24.335508
2  free5gc   10  15576  259133.057105  16.636688
3  free5gc   50  53298  257356.674686   4.828637
4  free5gc  100  97345  267757.644184   2.750605
Index(['ues', 'free5gc'], dtype='object', name='cn')


Empty DataFrame
Columns: [cn, ues, count, time (ms), avg]
Index: []
Index(['ues'], dtype='object', name='cn')



plot for [i=2:1] "core_network_on_select_latency.csv" u 1:i t columnhead lc rgb word(colors, i-1)
                                                                                                 ^
"core_network_on_select_latency.gnu" line 28: x range is invalid



Empty DataFrame
Columns: [cn, ues, count, time (ms), avg]
Index: []
Index(['ues'], dtype='object', name='cn')



plot for [i=2:1] "core_network_on_select_count.csv" u 1:i t columnhead lc rgb word(colors, i-1)
                                                                                               ^
"core_network_on_select_count.gnu" line 28: x range is invalid



Empty DataFrame
Columns: [cn, ues, count, time (ms), avg]
Index: []
Index(['ues'], dtype='object', name='cn')



plot for [i=2:1] "core_network_on_select_avg.csv" u 1:i t columnhead lc rgb word(colors, i-1)
                                                                                             ^
"core_network_on_select_avg.gnu" line 28: x range is invalid



Empty DataFrame
Columns: [cn, count, time (ms), avg, syscall]
Index: []


Syscalls for read or write for files operations


        cn  ues  count   time (ms)       avg
0  free5gc    0    407    3.811100  0.009364
1  free5gc    5   2755   18.818438  0.006831
2  free5gc   10   5288   32.361066  0.006120
3  free5gc   50  25395  140.789165  0.005544
4  free5gc  100  49265  270.704544  0.005495
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count   time (ms)       avg
0  free5gc    0    407    3.811100  0.009364
1  free5gc    5   2755   18.818438  0.006831
2  free5gc   10   5288   32.361066  0.006120
3  free5gc   50  25395  140.789165  0.005544
4  free5gc  100  49265  270.704544  0.005495
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count   time (ms)       avg
0  free5gc    0    407    3.811100  0.009364
1  free5gc    5   2755   18.818438  0.006831
2  free5gc   10   5288   32.361066  0.006120
3  free5gc   50  25395  140.789165  0.005544
4  free5gc  100  49265  270.704544  0.005495
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count    time (ms)       avg
0  free5gc    0    138     6.585365  0.047720
1  free5gc    5   1515    49.749855  0.032838
2  free5gc   10   2899   121.698848  0.041980
3  free5gc   50  14308   673.257456  0.047055
4  free5gc  100  28353  1211.265575  0.042721
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count    time (ms)       avg
0  free5gc    0    138     6.585365  0.047720
1  free5gc    5   1515    49.749855  0.032838
2  free5gc   10   2899   121.698848  0.041980
3  free5gc   50  14308   673.257456  0.047055
4  free5gc  100  28353  1211.265575  0.042721
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count    time (ms)       avg
0  free5gc    0    138     6.585365  0.047720
1  free5gc    5   1515    49.749855  0.032838
2  free5gc   10   2899   121.698848  0.041980
3  free5gc   50  14308   673.257456  0.047055
4  free5gc  100  28353  1211.265575  0.042721
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


Empty DataFrame
Columns: [cn, count, time (ms), avg, syscall]
Index: []


Syscalls for socket write operations


    cn  ues  count  time (ms)       avg
0  oai    0    217   6.165872  0.028414
1  oai    5     97   2.692177  0.027754
2  oai   10    217   4.304325  0.019836
3  oai   50   3878  46.896122  0.012093
4  oai  100    217   5.027216  0.023167
Index(['ues', 'oai', 'open5gs'], dtype='object', name='cn')


    cn  ues  count  time (ms)       avg
0  oai    0    217   6.165872  0.028414
1  oai    5     97   2.692177  0.027754
2  oai   10    217   4.304325  0.019836
3  oai   50   3878  46.896122  0.012093
4  oai  100    217   5.027216  0.023167
Index(['ues', 'oai', 'open5gs'], dtype='object', name='cn')


    cn  ues  count  time (ms)       avg
0  oai    0    217   6.165872  0.028414
1  oai    5     97   2.692177  0.027754
2  oai   10    217   4.304325  0.019836
3  oai   50   3878  46.896122  0.012093
4  oai  100    217   5.027216  0.023167
Index(['ues', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count  time (ms)       avg
0  free5gc    0     21   0.536184  0.025533
1  free5gc    5     46   1.195868  0.025997
2  free5gc   10     71   2.252461  0.031725
3  free5gc   50    271   8.133967  0.030015
4  free5gc  100    521  15.532431  0.029813
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count  time (ms)       avg
0  free5gc    0     21   0.536184  0.025533
1  free5gc    5     46   1.195868  0.025997
2  free5gc   10     71   2.252461  0.031725
3  free5gc   50    271   8.133967  0.030015
4  free5gc  100    521  15.532431  0.029813
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count  time (ms)       avg
0  free5gc    0     21   0.536184  0.025533
1  free5gc    5     46   1.195868  0.025997
2  free5gc   10     71   2.252461  0.031725
3  free5gc   50    271   8.133967  0.030015
4  free5gc  100    521  15.532431  0.029813
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


Empty DataFrame
Columns: [cn, count, time (ms), avg, syscall]
Index: []


Syscalls for socket read operations


        cn  ues  count     time (ms)         avg
0  free5gc    0     88  66523.189496  755.945335
1  free5gc    5    116  65772.787944  567.006793
2  free5gc   10    152  65738.234844  432.488387
3  free5gc   50    430  63732.719819  148.215627
4  free5gc  100    777  59629.261644   76.742936
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count     time (ms)         avg
0  free5gc    0     88  66523.189496  755.945335
1  free5gc    5    116  65772.787944  567.006793
2  free5gc   10    152  65738.234844  432.488387
3  free5gc   50    430  63732.719819  148.215627
4  free5gc  100    777  59629.261644   76.742936
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count     time (ms)         avg
0  free5gc    0     88  66523.189496  755.945335
1  free5gc    5    116  65772.787944  567.006793
2  free5gc   10    152  65738.234844  432.488387
3  free5gc   50    430  63732.719819  148.215627
4  free5gc  100    777  59629.261644   76.742936
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


    cn  ues  count  time (ms)       avg
0  oai    0     23   0.107422  0.004671
1  oai    5     94   0.474618  0.005049
2  oai   10    311   1.403748  0.004514
3  oai   50     72   0.366389  0.005089
4  oai  100     48   3.179605  0.066242
Index(['ues', 'oai', 'open5gs'], dtype='object', name='cn')


    cn  ues  count  time (ms)       avg
0  oai    0     23   0.107422  0.004671
1  oai    5     94   0.474618  0.005049
2  oai   10    311   1.403748  0.004514
3  oai   50     72   0.366389  0.005089
4  oai  100     48   3.179605  0.066242
Index(['ues', 'oai', 'open5gs'], dtype='object', name='cn')


    cn  ues  count  time (ms)       avg
0  oai    0     23   0.107422  0.004671
1  oai    5     94   0.474618  0.005049
2  oai   10    311   1.403748  0.004514
3  oai   50     72   0.366389  0.005089
4  oai  100     48   3.179605  0.066242
Index(['ues', 'oai', 'open5gs'], dtype='object', name='cn')


Empty DataFrame
Columns: [cn, count, time (ms), avg, syscall]
Index: []


Syscalls for process time operations


        cn  ues  count     time (ms)           avg
0  free5gc    0      1  60000.173373  60000.173373
1  free5gc  200      1  60000.229870  60000.229870
2      oai    0      8  60115.912187   7514.489023
3      oai    5      9    215.208309     23.912034
4      oai   10      9    214.875574     23.875064
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count     time (ms)           avg
0  free5gc    0      1  60000.173373  60000.173373
1  free5gc  200      1  60000.229870  60000.229870
2      oai    0      8  60115.912187   7514.489023
3      oai    5      9    215.208309     23.912034
4      oai   10      9    214.875574     23.875064
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count     time (ms)           avg
0  free5gc    0      1  60000.173373  60000.173373
1  free5gc  200      1  60000.229870  60000.229870
2      oai    0      8  60115.912187   7514.489023
3      oai    5      9    215.208309     23.912034
4      oai   10      9    214.875574     23.875064
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count     time (ms)       avg
0  free5gc    0   2076   6088.379265  2.932745
1  free5gc    5   5963   7963.958995  1.335562
2  free5gc   10  10267   7639.005718  0.744035
3  free5gc   50  46353  20909.734799  0.451098
4  free5gc  100  81274  26040.029450  0.320398
Index(['ues', 'free5gc'], dtype='object', name='cn')


        cn  ues  count     time (ms)       avg
0  free5gc    0   2076   6088.379265  2.932745
1  free5gc    5   5963   7963.958995  1.335562
2  free5gc   10  10267   7639.005718  0.744035
3  free5gc   50  46353  20909.734799  0.451098
4  free5gc  100  81274  26040.029450  0.320398
Index(['ues', 'free5gc'], dtype='object', name='cn')


        cn  ues  count     time (ms)       avg
0  free5gc    0   2076   6088.379265  2.932745
1  free5gc    5   5963   7963.958995  1.335562
2  free5gc   10  10267   7639.005718  0.744035
3  free5gc   50  46353  20909.734799  0.451098
4  free5gc  100  81274  26040.029450  0.320398
Index(['ues', 'free5gc'], dtype='object', name='cn')


Empty DataFrame
Columns: [cn, count, time (ms), avg, syscall]
Index: []


Syscalls for locks operations


        cn  ues  count      time (ms)         avg
0  free5gc    0   1769  919662.347647  519.876963
1  free5gc    5   8465  929429.553680  109.796758
2  free5gc   10  13731  966711.447506   70.403572
3  free5gc   50  47550  875401.200400   18.410120
4  free5gc  100  86216  811605.502880    9.413630
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count      time (ms)         avg
0  free5gc    0   1769  919662.347647  519.876963
1  free5gc    5   8465  929429.553680  109.796758
2  free5gc   10  13731  966711.447506   70.403572
3  free5gc   50  47550  875401.200400   18.410120
4  free5gc  100  86216  811605.502880    9.413630
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


        cn  ues  count      time (ms)         avg
0  free5gc    0   1769  919662.347647  519.876963
1  free5gc    5   8465  929429.553680  109.796758
2  free5gc   10  13731  966711.447506   70.403572
3  free5gc   50  47550  875401.200400   18.410120
4  free5gc  100  86216  811605.502880    9.413630
Index(['ues', 'free5gc', 'oai', 'open5gs'], dtype='object', name='cn')


Empty DataFrame
Columns: [cn, count, time (ms), avg, syscall]
Index: []


Syscalls for control operations


        cn  ues  count  time (ms)       avg
0  free5gc    0     41   0.356870  0.008704
1  free5gc    5    849   4.425298  0.005212
2  free5gc   10   1295   8.935651  0.006900
3  free5gc   50   2619  42.631772  0.016278
4  free5gc  100   3288  68.212510  0.020746
Index(['ues', 'free5gc'], dtype='object', name='cn')


        cn  ues  count  time (ms)       avg
0  free5gc    0     41   0.356870  0.008704
1  free5gc    5    849   4.425298  0.005212
2  free5gc   10   1295   8.935651  0.006900
3  free5gc   50   2619  42.631772  0.016278
4  free5gc  100   3288  68.212510  0.020746
Index(['ues', 'free5gc'], dtype='object', name='cn')


        cn  ues  count  time (ms)       avg
0  free5gc    0     41   0.356870  0.008704
1  free5gc    5    849   4.425298  0.005212
2  free5gc   10   1295   8.935651  0.006900
3  free5gc   50   2619  42.631772  0.016278
4  free5gc  100   3288  68.212510  0.020746
Index(['ues', 'free5gc'], dtype='object', name='cn')


Empty DataFrame
Columns: [cn, count, time (ms), avg, syscall]
Index: []


In [11]:
syscalls = ['sendto', 'sendmsg']
df = pd.DataFrame()
for syscall in syscalls:
    sysprocess_df = spark.read.option("basePath", basePath).json(
        f"{basePath}/cn=*/ues=*/tool=sysprocess_{syscall}")
    df1 = sysprocess_df.toPandas()
    df1['syscall'] = syscall
    df = pd.concat([df, df1])

df_syscall = remove_noise_processes(df, 'comm', noise_processes)

syscall_df = df_syscall.groupby(['cn', 'syscall', 'comm']).agg({ 'count': 'sum', 'time (ms)': 'sum' }).reset_index()
print(syscall_df)

syscall_df['avg'] = (syscall_df['time (ms)'] / syscall_df['count'])

sunburst_fig = px.sunburst(syscall_df, path=['cn', 'syscall', 'comm'], values='time (ms)',
                color='cn', hover_data=['count'],
                title=f"Core networks making Test syscall (by average latency)")
sunburst_fig.show()

         cn  syscall             comm  count   time (ms)
0   free5gc  sendmsg              amf   3627  122.262383
1   free5gc  sendmsg          polkitd     14    0.127517
2   free5gc  sendmsg  systemd-journal    126    3.382038
3       oai  sendmsg              nrf    288   13.512258
4       oai  sendmsg  systemd-journal   2194   20.936222
5       oai   sendto              amf   4211   51.538485
6       oai   sendto             ausf   3792   45.079659
7       oai   sendto              nrf      1    0.017558
8       oai   sendto              smf     23    0.543764
9       oai   sendto              udm   6868  101.496607
10      oai   sendto              udr    130    3.404450
11  open5gs  sendmsg     open5gs-amfd   1294   33.972496
12  open5gs  sendmsg     open5gs-hssd     12    1.025288
13  open5gs  sendmsg     open5gs-mmed     11    0.597391
14  open5gs  sendmsg     open5gs-pcfd    233    3.920419
15  open5gs  sendmsg    open5gs-pcrfd      8    0.647769
16  open5gs  sendmsg     open5g

In [12]:
""" For each syscall look at the processes that are making the calls
(a) Graphs
(b) Tables with the sum per latency, count and average latency
This should give us:
1. An idea of the processes making use of the most relavant syscall or the syscall we are looking at in the study
2. It will give us an ide of the relavance of these processes and making it easier for us to analysis e.g., if the rsyslog system
is the most active process per syscall, we know we need to do further work to disable logs or looking at another logging mechanism
3. 
"""

' For each syscall look at the processes that are making the calls\n(a) Graphs\n(b) Tables with the sum per latency, count and average latency\nThis should give us:\n1. An idea of the processes making use of the most relavant syscall or the syscall we are looking at in the study\n2. It will give us an ide of the relavance of these processes and making it easier for us to analysis e.g., if the rsyslog system\nis the most active process per syscall, we know we need to do further work to disable logs or looking at another logging mechanism\n3. \n'

In [13]:
import pandas as pd
import plotly.express as px

# create a sample dataframe
data = {'year': [2010, 2010, 2011, 2011, 2012, 2012],
        'region': ['A', 'B', 'A', 'B', 'A', 'B'],
        'sales': [100, 200, 150, 250, 180, 220]}
df = pd.DataFrame(data)

# create a sample dataframe
sunburst_data = {'region': ['Asia', 'Asia', 'Europe', 'Europe', 'Africa', 'Africa'],
        'country': ['China', 'Japan', 'France', 'Germany', 'Nigeria', 'South Africa'],
        'industry': ['Technology', 'Automobiles', 'Energy', 'Manufacturing', 'Oil & Gas', 'Mining'],
        'sales': [100, 50, 75, 90, 60, 40]}
sunburs_df = pd.DataFrame(sunburst_data)


for template in ['ggplot2', 'seaborn', 'simple_white', 'plotly',
         'plotly_white', 'plotly_dark', 'presentation', 'xgridoff',
         'ygridoff', 'gridon', 'none']:
    # create a line chart using Plotly Express
        fig = px.line(df, x='year', y='sales', color='region',
                title=f"Line chart theme {template}", 
                template=template, markers=True)
        fig.show()

        fig = px.sunburst(sunburs_df, path=['region', 'country', 'industry'], values='sales',
                        title=f"Sunburst chart theme {template}", 
                        template=template)
        fig.show()
   

In [14]:
def my_theme(fig):
    #This is my own personal preferences you can create a default & pass a plotly graph_object
    #changes theme, height & width to my preferences
    fig.update_layout(template='plotly_white', width=1000, height=700)
    #I like grid lines on my x & y axis
    fig.update_xaxes(showline=False,linewidth=0.2, gridwidth=1, linecolor='white', gridcolor='lightgrey',categoryorder='total descending',color='black')
    fig.update_yaxes(showline=False,linewidth=0.2, gridwidth=1, linecolor='white', gridcolor='lightgrey')
    fig.update_traces(texttemplate='<b>%{y:0,.1f}')

#converts every plot to my default styling    
my_theme(fig)
fig.show()

In [15]:
import pandas as pd
import subprocess
import os

# Define your data
data = {'year': [2010, 2011, 2012, 2013, 2014],
        'sales': [100, 130, 240, 350, 500],
        'purchases': [10, 13, 24, 35, 500]}

# Convert to a DataFrame and save to CSV
df = pd.DataFrame(data)



import os

# Change current working directory to 'mydir'
os.chdir('./gnuplot')

# Execute command 'mycommand' in the new directory
os.system('gnuplot -p cn_perf_ue_avg_exp.gnu')


0

23/08/28 14:37:28 WARN HeartbeatReceiver: Removing executor driver with no recent heartbeats: 2294609 ms exceeds timeout 120000 ms
23/08/28 14:37:28 WARN SparkContext: Killing executors is not supported by current scheduler.
23/08/28 14:37:29 ERROR Inbox: Ignoring error
org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:322)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
	at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110)
	at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:36)
	at org.apache.spark.storage.BlockManagerMasterEndpoint.driverEndpoint$lzycompute(BlockManagerMasterEndpoint.scala:117)
	at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$driverEndpoint(BlockManagerMasterEndpoint.scala:116)
	at org.apache.spark.storage.