# Purpose

This notebooks graphs the performance results of 5G core networks. The traffic for the 5G core networks is generated using a [5G core traffic generator](https://github.com/tariromukute/core-tg). The performance results are collected using a bcc and bpftrace tools.

In [1]:
# configure spark variables
from pyspark.context import SparkContext
from pyspark.sql.context import SQLContext
from pyspark.sql.session import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
    
sc = SparkContext()
sqlContext = SQLContext(sc)
spark = SparkSession(sc)

# load up other dependencies
import re
import pandas as pd

import glob
import matplotlib.pyplot as plt
import numpy as np

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/23 16:37:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


In [2]:
import os
if not os.path.exists("images"):
    os.mkdir("images")

import os
import glob
import plotly.express as px
from plotly.subplots import make_subplots
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
from pyspark.sql.functions import expr
basePath = "../results"

In [3]:
html_output_file = '../open5gs.html'
with open(html_output_file, 'w') as f:
    f.write('<h1>Open GiLAN Testbed Results</h1>')
    f.write('<h3> The graphs summaries the NFV performance metrics<h3>')
    f.write('<h4> General Workload Chacterisation </h4>')
    f.write('<a href="#system-chaterisation"> Skip to results </a></h5>')
    f.write('<p> The system run different processes depending on the applications running. The operations of these applications and their respective processes is \
        execute through system calls. There are wide range of system calls that can be run by the OS. In general the frequent types of system calls can provide \
        a general chaterisationof the workload running on the OS. The workload charaterisation is a good starting point in understanding the system or applications \
        running the system. In addition to the frequent system calls, details on the processes making syscalls is helpful in understanding the system. \
        \
        The latency of the both the system calls and the processes making system calls is a starting point in understand the latency of the system as a whole. From these \
        results we can go further to look at the performance results of the different compute resources. The chaterisations helps in knowing the compute results to focus on \
        e.g., if there is a load or read syscalls we can focus on Filesystem and cache.</p>')
    f.write('<h4> CPU  <h4>')
    f.write('<h5><a href="#cpu-metrics"> Skip to results </a></h5>')
    f.write('<p> The CPU is responsible for executing all workloads on the NFV. Like other resources, the CPU is managed by the kernel. The user-level applications access CPU resources by sending system calls to the kernel. The kernel also receives other system call requests from different processes; memory loads and stores can issue page faults system calls. The primary consumers of CPU resources are threads (also called tasks), which belong to procedures, kernel routines and interrupt routes. The kernel manages the sharing via a CPU scheduler.</p>')
    f.write('<p> There are three thread states: ON-PROC for threads running on a CPU, RUNNABLE for threads that could run but are waiting their turn, and SLEEP for blocked lines on another event, including uninterruptible waits. These can be categorised into two for more accessible analysis, on-CPU referring to ON-PROC, and off-CPU referring to all other states, where the thread is not running on a CPU. Lines leave the CPU in one of two ways: (1) voluntary if they block on I/O, a lock, or asleep, or (2) involuntary if they have exceeded their scheduled allocation of CPU time. When a CPU switches from running one process or thread to another, it switches address spaces and other metadata. This process is called context switching; it also consumes the CPU resources. All these processes, described, in general, consume the CPU time. In addition to the time, another CPU resource used by the methods, kernel routines and interrupts routines is the CPU cache.</p>')
    f.write('<p> There are typically multiple levels of CPU cache, increasing in both size and latency. The caches end with the last-level store (LLC), large (Mbytes) and slower. On a processor with three levels of supplies, the LLC is also the Level 3 cache. Processes are instructions to be interpreted and run by the CPU. This set of instructions is typically loaded from RAM and cached into the CPU cache for faster access. The CPU first checks the lower cache, i.e., L1 cache. If the CPU finds the data, this is called a hit. If the CPU does not see the data, it looks for it in L2 and then L3. If the CPU does not find the data in any memory caches, it can access it from your system memory (RAM). When that happens, it is known as a cache miss. In general, a cache miss means high latency, i.e., the time needed to access data from memory. </p>')

    f.write('<h4> Memory <h4>')
    f.write('<h5><a href="#memory-metrics"> Skip to results </a> </h5>')
    f.write('<p> The kernel and processor are responsible for mapping the virtual memory to physical memory. For efficiency, memory mappings are created in groups of memory called <em>pages</em>. When an application starts, it begins with a request for memory allocation. In the case that there is no free memory on the heap, the syscall <em>brk()</em> is issued to extend the size of the bank. However, if there is free memory on the heap, a new memory segment is created via the <em>mmap()</em> syscall. Initially, this virtual memory mapping does not have a corresponding physical memory allocation. Therefore when the application tries to access this allocated memory segment, the error called <em>page fault</em> occurs on the MMU. The kernel then handles the page fault, mapping from the virtual to physical memory. The amount of physical memory allocated to a process is called resident set size (RSS). When there is too much memory demand on the system, the kernel page-out daemon (kswapd) may look for memory pages to free. Three types of pages can be released in their order: pages that we read but not modified (backed by disk) these can be immediately rid, pages that have been modified (dirty) these need to be written to disk before they can be freed and pages of application memory (anonymous) these must be stored on a swap device before they can be released. kswapd, a page-out daemon, runs periodically to scan for inactive and active pages with no memory to free. It is woken up when free memory crosses a low threshold and goes back to sleep when it crosses a high threshold. Swapping usually causes applications to run much more slowly.</p>')

    f.write('<h4>Filesytem <h4>')
    f.write('<h5><a href="#filesystem-metrics"> Skip to results </a> </h5>')
    f.write('<p> The file system that applications usually interact with directly and file systems can use caching, read-ahead, buffering, and asynchronous I/O to avoid exposing disk I/O latency to the application. Logical I/O describes requests to the file system. If these requests must be served from the storage devices, they become physical I/O. Not all I/O will; many logical read requests may be returned from the file system cache and never become physical I/O. File systems are accessed via a virtual file system (VFS). It provides operations for reading, writing, opening, closing, etc., which are mapped by file systems to their internal functions. Linux uses multiple caches to improve the performance of storage I/O via the file system. These are Page cache: This contains virtual memory pages and enhances the performance of file and directory I/O. Inode cache, which are data structures used by file systems to describe their stored objects. The directory cache caches mappings from directory entry names to VFS inodes, improving the performance of pathname lookups. The page cache grows to be the largest of all these because it caches the contents of files and includes “dirty” pages that have been modified but not yet written to disk.</p>')

    f.write('<h4>Disk I/O <h4>')
    f.write('<h5><a href="#disk-metrics"> Skip to results </a> </h5>')
    f.write('<p> Linux exposes rotational magnetic media, flash-based storage, and network storage as storage devices. Therefore, disk I/O refers to I/O operations on these devices. Disk I/O is a common source of performance issues because I/O latency on storage devices is orders of magnitude slower than the nanosecond or microsecond speed of CPU and memory operations. Block I/O refers to device access in blocks. I/O is queued and scheduled in the block layer. The wait time is spent in the block layer scheduler queues and device dispatcher queues from the operating system. Service time is the time from device issue to completion. This may include the time spent waiting in an on-device line. Request time is the overall time from when an I/O was inserted into the OS queues to its completion. The request time matters the most, as that is the time that applications must wait if I/O is synchronous.</p>')

    f.write('<h4>Networking<h4>')
    f.write('<h5><a href="#networking-metrics"> Skip to results </a> </h5>')
    f.write('<p> Networking is a complex part of the Linux system. It involves many different layers and protocols, including the application, protocol libraries, syscalls, TCP or UDP, IP, and device drivers for the network interface. In general, the Networking system can be broken down into four. The NIC and Device Driver Processing first reads packets from the NIC and puts them into kernel buffers. Besides the NIC and Device driver, this process includes the DMA and particular memory regions on the RAM for storing receive and transmit packets called rings and the NAPI system for poling packets from these rings to the kernel buffers. It also incorporates some early packet processing hooks like XDP and AF\_XDP and can have custom drivers that bypass the kernel (i.e., the following two processes) like DPDK. Following is the Socket processing. This part also includes queuing and different queuing disciplines. It also incorporates some packet processing hooks like TC, Netfilter etc., which can alter the flow of the networking stack. After that is the  Protocol processing layer, which applies functions for different IP and transport protocols, both these protocols run under the context of SoftIrq. Lastly is the application process. The application receives and sends packets on the destination socket</p>')
    
    f.write('<h4>Flame Graphs to analyse code paths<h4>')
    f.write('<h5><a href="#flame-graphs"> Skip to results </a> </h5>')
    f.write('<p> A flame graph visualizes a distributed request trace and represents each service call that occurred during the requests execution path with a timed, color-coded, horizontal bar. Flame graphs for distributed traces include error and latency data to help developers identify and fix bottlenecks in their applications..</p>')

In [4]:
# General chaterisation
import plotly; print(plotly.__version__)

5.15.0


23/08/23 16:37:32 WARN GarbageCollectionMetrics: To enable non-built-in garbage collector(s) List(G1 Concurrent GC), users should configure it(them) to spark.eventLog.gcMetrics.youngGenerationGarbageCollectors or spark.eventLog.gcMetrics.oldGenerationGarbageCollectors


In [78]:
# Helper functions
def remove_noise_processes(df, field, values):
    a = df.loc[df[field].isin(values)].index.array.tolist()
    df.drop(a, inplace=True)
    return df

In [77]:
""" This shows how the usage syscalls change as the load changes. 
(a) The number of syscalls as the traffic load increases
(b) The time spent executing syscalls as the traffic increases
(c) The average time spent per syscall as the traffic increases
This can tell us:
1. How the core network is architectures to respond to increasing load
2. Comparing can tell us the core network that sends more time on syscalls. We can use that to corellate to the performance of the core network
3. We have the details on overal performance of the core networks, we can look at the results that correlate the performance
4. Is there a general trend to syscalls that can show well architected e.g., the latency should increase as load increase etc.
If there is an ideal trend or correlation, does it match the trend of the core networks and correlate to the performance we are seeing
"""

top_n = 5

syscount_df = spark.read.option("basePath", basePath).json(
f"{basePath}/cn=*/ues=*/tool=syscount")

df_syscount = syscount_df.toPandas().groupby(['cn', 'ues']).agg({ 'count': 'sum', 'time (ms)': 'sum' }).reset_index()

df_syscount['avg'] = (df_syscount['time (ms)'] / df_syscount['count'])

syscount_fig = px.line(df_syscount,
                x="ues", y="time (ms)", color="cn",
                labels={
                     "ues": "Number of UEs",
                     "time (ms)": "Time (ms)",
                     "syscall": "System calls",
                     "count": "Number of calls",
                     "cn": "Core Network"
                 },
                title='Syscalls across the system (by latency)',
                markers=True)
# syscount_fig.update_traces(textinfo='value')
# syscount_fig.update_traces(textinfo='time (us)')
syscount_fig.show()
syscount_fig.write_image("images/syscount_fig_m2.medium.jpeg")

sysprocess_count_fig = px.line(df_syscount,
                x="ues", y="count", color="cn",
                hover_data=["count", "time (ms)"],
                labels={
                     "ues": "Number of UEs",
                     "time (ms)": "Time (ms)",
                     "syscall": "System calls",
                     "count": "Number of calls",
                     "cn": "Core network"
                },
                title=f'Processes making syscall (by number of calls)',
                markers=True)
# sysprocess_count_fig.update_traces(textinfo='value')
sysprocess_count_fig.show()

sysprocess_count_fig = px.line(df_syscount,
                x="ues", y="avg", color="cn",
                hover_data=["count", "time (ms)"],
                labels={
                     "ues": "Number of UEs",
                     "time (ms)": "Time (ms)",
                     "syscall": "System calls",
                     "count": "Number of calls",
                     "avg": "Average time per syscall (ms)",
                     "cn": "Core network"
                },
                title=f'Processes making syscall (by number of calls)',
                markers=True)
# sysprocess_count_fig.update_traces(textinfo='value')
sysprocess_count_fig.show()

sysprocess_count_fig.write_image(f"images/syscount_count_fig_m2.medium.jpeg")
with open(html_output_file, 'a') as f:
    f.write('<h2 id="syscall-count"> Syscall counts across the system, with latency information. </h2>')
    f.write('<p>futex: (short for “fast userspace mutex”) is a kernel system call that programmers can use to implement basic locking, \
            or as a building block for higher-level locking abstractions such as semaphores and POSIX mutexes or condition variables \
            A known cause of high futex system calls on VM is high contention on shared-memory resources that causes many threads \
            to wait on futexes <a href="https://access.redhat.com/solutions/534663">source</a>. </p>')
    f.write('<p>o_getevents: system calls are used to read asynchronous I/O events from the completion queue of an AIO context. \
            An AIO context is a data structure that holds information about pending and completed I/O operations \
            <a href="https://linux.die.net/man/2/io_getevents">source</a>. \
            \
            The io_getevents() system call is Linux-specific and hence affects portability of programs.</p>')
    f.write('<p>poll_wait: is a system call that waits for events on an epoll instance, which is a mechanism \
            for monitoring multiple file descriptors for I/O readiness.</p>')
    f.write(syscount_fig.to_html(full_html=False, include_plotlyjs='cdn'))
    f.write(sysprocess_count_fig.to_html(full_html=False, include_plotlyjs='cdn'))

In [None]:
""" A tabular view with ratios of the most sum of (latency per syscall, count per syscall and average latency of syscall). The tabular view will
1. Show us for each core network what is the ratio of a syscall precense over the other e.g., recvfrom has 4x more latency than sendto
2. Across core networks, we can compare the ratio of presence of a syscall e.g., free5gc invokes recvfrom 4x more than open5gs
3. For grouped syscalls, we can tell which flavor a given call network uses more e.g., for multiplexing syscalls, we can may see that free5gc uses
select more than epoll_wait and infer based on the relative performance of them
4. In addition to (3), for different core networks we can see that e.g., free5gc use select which is 4x more that epoll_wait being used by open5gs.
Tying this with the theory of the syscall we may be able to get the reasons for difference in performance

"""

In [80]:
""" The top X active syscalls and process per core network
We can look at:
1. The composition of the core network, the system calls that run or maintain the system
2. It can tell us what the system spends most of it's time on
3. For these we can see if they syscalls follow the 'ideal trend' of responding to traffic load

"""
top_n = 6

def top_processes(df, field):
    label_maxes = df.groupby(['comm'])[field].sum().sort_values(ascending=False)

    # Select the top n labels with the highest y-values
    top_labels = label_maxes.head(top_n).index.tolist()

    return top_labels

sysprocess_df = spark.read.option("basePath", basePath).json(
f"{basePath}/cn=*/ues=*/tool=sysprocess")

df_process = sysprocess_df.toPandas()
df_process = remove_noise_processes(df_process, 'comm', ['python3'])
df_process['avg'] = (df_process['time (ms)'] / df_process['count'])

grouped_data = df_process.groupby(['cn'])
for group_name, group_df in grouped_data:
    top_labels = top_processes(group_df, 'time (ms)')

    sysprocess_fig = px.line(group_df[group_df['comm'].isin(top_labels)].sort_values('ues'),
                x="ues", y="time (ms)", color="comm",
                hover_data=["count", "time (ms)"],
                labels={
                     "ues": "Number of UEs",
                     "time (ms)": "Time (ms)",
                     "syscall": "System calls",
                     "count": "Number of calls",
                     "comm": "Process name"
                },
                title=f'Top {top_n} active processes making syscall {group_name[0]} (by latency)',
                markers=True)
    sysprocess_fig.show()

    top_labels = top_processes(group_df, 'count')

    sysprocess_fig = px.line(group_df[group_df['comm'].isin(top_labels)].sort_values('ues'),
                x="ues", y="count", color="comm",
                hover_data=["count", "time (ms)"],
                labels={
                     "ues": "Number of UEs",
                     "time (ms)": "Time (ms)",
                     "syscall": "System calls",
                     "count": "Number of calls",
                     "comm": "Process name"
                },
                title=f'Top {top_n} active processes making syscall {group_name[0]} (by number of calls)',
                markers=True)
    sysprocess_fig.show()

    top_labels = top_processes(group_df, 'avg')

    sysprocess_fig = px.line(group_df[group_df['comm'].isin(top_labels)].sort_values('ues'),
                x="ues", y="avg", color="comm",
                hover_data=["count", "time (ms)"],
                labels={
                     "ues": "Number of UEs",
                     "time (ms)": "Time (ms)",
                     "syscall": "System calls",
                     "count": "Number of calls",
                     "avg": "Average time per syscall (ms)",
                     "comm": "Process name"
                },
                title=f'Top {top_n} active processes making syscall {group_name[0]} (by average latency)',
                markers=True)
    sysprocess_fig.show()

In [50]:

top_n = 10

def top_syscalls(df, field):
    label_maxes = df.groupby(['syscall'])[field].sum().sort_values(ascending=False)

    # Select the top n labels with the highest y-values
    top_labels = label_maxes.head(top_n).index.tolist()

    return top_labels

syscount_df = spark.read.option("basePath", basePath).json(
f"{basePath}/cn=*/ues=*/tool=syscount")

df_syscall = syscount_df.toPandas()

df_syscall['avg'] = (df_syscall['time (ms)'] / df_syscall['count'])

grouped_data = df_syscall.groupby(['cn'])
for group_name, group_df in grouped_data:
    top_labels = top_syscalls(group_df, 'time (ms)')

    syscount_fig = px.line(group_df[group_df['syscall'].isin(top_labels)].sort_values('ues'),
                x="ues", y="time (ms)", color="syscall",
                hover_data=["count", "time (ms)"],
                labels={
                     "ues": "Number of UEs",
                     "time (ms)": "Time (ms)",
                     "syscall": "System calls",
                     "count": "Number of calls",
                     "comm": "Process name"
                },
                title=f'Top {top_n} active syscalls {group_name[0]} (by latency)',
                markers=True)
    syscount_fig.show()

    top_labels = top_syscalls(group_df, 'count')

    syscount_fig = px.line(group_df[group_df['syscall'].isin(top_labels)].sort_values('ues'),
                x="ues", y="count", color="syscall",
                hover_data=["count", "time (ms)"],
                labels={
                     "ues": "Number of UEs",
                     "time (ms)": "Time (ms)",
                     "syscall": "System calls",
                     "count": "Number of calls",
                     "comm": "Process name"
                },
                title=f'Top {top_n} active syscalls {group_name[0]} (by number of calls)',
                markers=True)
    syscount_fig.show()

    top_labels = top_syscalls(group_df, 'avg')

    syscount_fig = px.line(group_df[group_df['syscall'].isin(top_labels)].sort_values('ues'),
                x="ues", y="avg", color="syscall",
                hover_data=["count", "time (ms)"],
                labels={
                     "ues": "Number of UEs",
                     "time (ms)": "Time (ms)",
                     "syscall": "System calls",
                     "count": "Number of calls",
                     "avg": "Average time per syscall (ms)",
                     "comm": "Process name"
                },
                title=f'Top {top_n} active syscalls {group_name[0]} (by average latency)',
                markers=True)
    syscount_fig.show()

In [76]:
""" For each of the syscalls, we plot for the core networks, the amount of times and occurances. This can tell us:
1. How the core network are architected to respond to load for different operations e.g., how their socket read logic is implemented
to work and how that responses to change in traffic load
2. Relative to other core network which syscalls it uses the most. For example this can tell us the syscall that has the most differentiating factor, 
e.g., if all syscalls are relatively the same and there is a huge difference for sched_yield, then it is likely the differentiating syscall or design

"""
import pandas as pd
import plotly.graph_objs as go

def grouped_syscall_stats(df_sysprocess, writer=None):

    cn_df = df_sysprocess.groupby(['cn', 'ues']).agg({ 'count': 'sum', 'time (ms)': 'sum' }).reset_index()

    cn_df['avg'] = (cn_df['time (ms)'] / cn_df['count'])

    sysprocess_fig = px.line(cn_df.sort_values('ues'),
                    x="ues", y="time (ms)", color="cn", 
                    hover_data=["count", "time (ms)"],
                    labels={
                        "ues": "Number of UEs",
                        "time (ms)": "Time (ms)",
                        "syscall": "System calls",
                        "count": "Number of calls",
                        "cn": "Core Network"
                    },
                    title=f'Core network syscall {syscall} (by latency)',
                    markers=True)
    # sysprocess_fig.update_traces(textinfo='value')
    sysprocess_fig.show()
    sysprocess_fig.write_image(f"images/sysprocess_{syscall}_fig_m2.medium.jpeg")

    with open(html_output_file, 'a') as f:
        f.write(f'<h2 id="{syscall}-syscall-processes"> Processes are making {syscall} syscalls with latency information </h2>')
        f.write('<p>  </p>') 
        f.write(sysprocess_fig.to_html(full_html=False, include_plotlyjs='cdn'))
    

    sysprocess_count_fig = px.line(cn_df.sort_values('ues'),
                    x="ues", y="count", color="cn",
                    hover_data=["count", "time (ms)"],
                    labels={
                        "ues": "Number of UEs",
                        "time (ms)": "Time (ms)",
                        "syscall": "System calls",
                        "count": "Number of calls",
                        "cn": "Core network"
                    },
                    title=f'Processes making {syscall} syscall (by number of calls)',
                    markers=True)
    sysprocess_count_fig.show()

    sysprocess_count_fig = px.line(cn_df.sort_values('ues'),
                    x="ues", y="avg", color="cn",
                    hover_data=["count", "time (ms)"],
                    labels={
                        "ues": "Number of UEs",
                        "time (ms)": "Time (ms)",
                        "syscall": "System calls",
                        "count": "Number of calls",
                        "avg": "Average time per syscall (ms)",
                        "cn": "Core network"
                    },
                    title=f'Processes making {syscall} syscall (by average latency)',
                    markers=True)
    sysprocess_count_fig.show()

    sysprocess_count_fig.write_image(f"images/sysprocess_count_{syscall}_fig_m2.medium.jpeg")
    with open(html_output_file, 'a') as f:
        f.write(f'<h2 id="{syscall}-syscall-count-processes"> Processes are making {syscall} syscalls by number of calls</h2>')
        f.write('<p>  </p>') 
        f.write(sysprocess_count_fig.to_html(full_html=False, include_plotlyjs='cdn'))

    return cn_df


def compute_grouped_stats(syscall, summary_df):
    sysprocess_df = spark.read.option("basePath", basePath).json(
    f"{basePath}/cn=*/ues=*/tool=sysprocess_{syscall}")

    df_sysprocess = sysprocess_df.toPandas()

    df1 = remove_noise_processes(df_sysprocess, 'comm', noise_processes)
    df = grouped_syscall_stats(df1, writer)

    comm_df = df_sysprocess.groupby(['cn', 'comm']).agg({ 'count': 'sum', 'time (ms)': 'sum' }).reset_index()

    print(comm_df.sort_values('time (ms)', ascending=False))

    # Get the summary
    df2 = df.groupby(['cn']).agg({ 'count': 'sum', 'time (ms)': 'sum', 'avg': 'sum' }).reset_index()
    
    df2['syscall'] = syscall

    summary_df = pd.concat([summary_df, df2])

# writer = pd.ExcelWriter('ActiveProcessesPerSyscall-WithoutNoiseProcesses.xlsx', engine='xlsxwriter')
writer = None
noise_processes = ['python3', 'systemd', 'snapd', 'sshd', 'sudo', 'multipathd', 'systemd-logind', 'systemd-timesyn', 'systemd-resolve', 'systemd-udevd', 'systemd-network', 'systemctl', 'accounts-daemon', 'dbus-daemon']


io_multiplex_syscalls = ['epoll_wait', 'poll', 'ppoll', 'epoll_pwait', 'select']
print("Syscalls for io multiplexing")
# Run for each syscall
grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
for syscall in io_multiplex_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)     

print(grouped_io_df)

grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
socket_write_syscalls = ['write', 'sendto']
print("Syscalls for socket write operations")
for syscall in socket_write_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)

print(grouped_io_df)

grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
socket_read_syscalls = [ 'recvmsg', 'recvfrom', 'read']
print("Syscalls for socket read operations")
for syscall in socket_read_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)

print(grouped_io_df)

grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
time_syscalls = ['clock_nanosleep', 'nanosleep']
print("Syscalls for process time operations")
for syscall in time_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)

print(grouped_io_df)

grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
locks_syscalls = ['futex']
print("Syscalls for locks operations")
for syscall in locks_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)

print(grouped_io_df)

grouped_io_df = pd.DataFrame(columns=['cn', 'count', 'time (ms)', 'avg', 'syscall'])
control_syscalls = ['sched_yield']
print("Syscalls for control operations")
for syscall in control_syscalls:
    compute_grouped_stats(syscall, grouped_io_df)

print(grouped_io_df)

Syscalls for io multiplexing


         cn             comm  count      time (ms)
2       oai        [unknown]   1441  620206.951749
10  open5gs        [unknown]   2039  540255.474248
4       oai           mysqld    511  483485.035447
24  open5gs  systemd-journal   4253  479985.821299
7       oai  systemd-journal   1267  474748.015116
11  open5gs     open5gs-amfd   6649  459237.223539
23  open5gs     open5gs-upfd    148  457931.288096
1   free5gc  systemd-journal    224  457712.258200
19  open5gs    open5gs-sgwud    136  454849.128651
18  open5gs    open5gs-sgwcd    122  451346.769431
14  open5gs     open5gs-nrfd   6332  444437.866897
17  open5gs     open5gs-scpd   6312  442346.218291
20  open5gs     open5gs-smfd    315  432183.114048
5       oai              nrf    964  425541.584141
16  open5gs     open5gs-pcfd    800  410025.922132
13  open5gs     open5gs-bsfd    138  400359.095890
22  open5gs     open5gs-udrd   1641  400184.603227
12  open5gs    open5gs-ausfd   5020  391866.379614
21  open5gs     open5gs-udmd   

         cn           comm   count      time (ms)
9       oai       rsyslogd    1998  472616.546225
30  open5gs       rsyslogd    6633  464630.329076
4   free5gc       rsyslogd     122  455384.226694
2   free5gc     irqbalance      43  429993.650385
8       oai     irqbalance      42  419971.793419
15  open5gs     irqbalance      42  419969.324547
7       oai          fwupd      50  132029.420620
31  open5gs        udisksd      72   59211.772295
28  open5gs   open5gs-udrd    1569    1386.233376
11      oai            udm     301     521.794188
12      oai            udr     421     495.329820
23  open5gs   open5gs-pcfd    2335     394.416259
10      oai            smf     186     337.345109
6       oai           ausf     320     288.202838
16  open5gs   open5gs-amfd  102686      81.313322
29  open5gs        polkitd     153      62.756567
3   free5gc        polkitd     157      59.467875
25  open5gs   open5gs-scpd   55284      52.545977
27  open5gs   open5gs-udmd   33998      27.773947


        cn       comm  count     time (ms)
2      oai     mysqld    543  19208.294960
1      oai  [unknown]    131   2510.636767
3  open5gs  [unknown]     28     15.759895
0  free5gc  [unknown]     25     13.603056


        cn  comm   count      time (ms)
4  free5gc   pcf   40321  485356.401794
2  free5gc   nrf   89737  481687.386732
7  free5gc   udr  122446  480851.500854
0  free5gc   amf  212112  374484.288215
6  free5gc   udm  114804   41811.967265
1  free5gc  ausf   54110   20201.880436
5  free5gc   smf       6       0.038941
8  free5gc   upf       6       0.034052
3  free5gc  nssf       4       0.021944


    cn       comm  count  time (ms)
0  oai  [unknown]    182  72.890088
Empty DataFrame
Columns: [cn, count, time (ms), avg, syscall]
Index: []
Syscalls for socket write operations


         cn             comm  count    time (ms)
2   free5gc              amf  35069  4194.035915
10  free5gc              udm  50130  1325.639005
11  free5gc              udr  48317  1144.924441
5   free5gc              nrf  26222   830.332368
3   free5gc             ausf  17327   539.088709
35  open5gs     open5gs-scpd  26560   177.860574
6   free5gc              pcf  11076   170.491788
39  open5gs         rsyslogd  11082   113.628384
13      oai        [unknown]   2026    97.686079
20      oai         rsyslogd   1512    44.292427
25      oai              udm    294    19.237731
8   free5gc         rsyslogd    127    18.534957
15      oai             ausf    259    14.552299
26      oai              udr    350    14.296524
28  open5gs        [unknown]   2081    12.335773
14      oai              amf     99    10.520885
22      oai              smf    952     7.910888
30  open5gs     open5gs-amfd    464     5.432168
31  open5gs    open5gs-ausfd    220     2.834626
17      oai         

         cn           comm  count   time (ms)
19  open5gs   open5gs-scpd   5306  132.206716
7       oai            udm   8768  124.121922
2       oai            amf   5272   66.268142
3       oai           ausf   4888   58.145740
15  open5gs   open5gs-nrfd   2685   38.301454
10  open5gs   open5gs-amfd   1525   21.932762
11  open5gs  open5gs-ausfd   1345   16.725824
8       oai            udr    455   11.369749
22  open5gs   open5gs-smfd    256    9.936010
6       oai            smf    161    4.189982
17  open5gs   open5gs-pcfd    121    3.951658
24  open5gs   open5gs-udrd    100    3.923682
12  open5gs   open5gs-bsfd    100    3.828752
23  open5gs   open5gs-udmd    121    3.787476
16  open5gs  open5gs-nssfd    110    3.497425
21  open5gs  open5gs-sgwud     87    3.248913
25  open5gs   open5gs-upfd     93    2.876022
20  open5gs  open5gs-sgwcd     87    2.140654
1       oai      [unknown]     60    0.888387
0   free5gc      [unknown]     49    0.760192
9   open5gs      [unknown]     49 

         cn             comm  count     time (ms)
3   free5gc           mongod  31124  2.589808e+06
27  open5gs     open5gs-smfd   4730  4.896476e+05
26  open5gs    open5gs-pcrfd   4717  4.895612e+05
24  open5gs     open5gs-hssd   4711  4.894507e+05
25  open5gs     open5gs-mmed   4711  4.894097e+05
2   free5gc              amf   5314  4.226116e+05
22  open5gs           mongod   7460  6.073796e+03
29  open5gs         rsyslogd  10897  4.403992e+01
30  open5gs  systemd-journal   8213  3.508586e+01
23  open5gs     open5gs-amfd   3661  2.405378e+01
21  open5gs        [unknown]   2390  1.827757e+01
9       oai        [unknown]   1883  8.249631e+00
14      oai         rsyslogd   2086  7.631260e+00
18      oai              udm   1670  7.225560e+00
19      oai              udr   1470  6.278389e+00
11      oai             ausf   1285  5.762120e+00
16      oai  systemd-journal   1015  5.470280e+00
10      oai              amf   1113  5.447773e+00
12      oai              nrf    830  5.195983e+00


         cn           comm  count  time (ms)
16  open5gs   open5gs-scpd  11492  67.272991
9   open5gs   open5gs-amfd   4249  23.976005
12  open5gs   open5gs-nrfd   2047  17.298849
20  open5gs   open5gs-udmd   3929  16.734134
10  open5gs  open5gs-ausfd   3106  13.349720
21  open5gs   open5gs-udrd   4248  12.103047
14  open5gs   open5gs-pcfd   1960   7.184553
7       oai            udr    179   4.732332
19  open5gs   open5gs-smfd    260   1.536829
22  open5gs   open5gs-upfd     87   0.823439
3       oai           ausf    129   0.756628
6       oai            udm    182   0.753353
17  open5gs  open5gs-sgwcd     93   0.751587
5       oai            smf    135   0.640270
13  open5gs  open5gs-nssfd    147   0.609260
18  open5gs  open5gs-sgwud     93   0.605426
11  open5gs   open5gs-bsfd    148   0.569232
2       oai            amf     70   0.252416
0   free5gc      [unknown]     28   0.099228
8   open5gs      [unknown]     28   0.080189
4       oai         mysqld     21   0.062223
1       oa

         cn             comm   count      time (ms)
17      oai              nrf  478737  473820.224724
14      oai             ausf  478167  473739.202148
23      oai              udm  478201  473640.371303
24      oai              udr  477258  471486.565838
13      oai        [unknown]   56572   42020.677809
2   free5gc              amf   71693     868.710450
11  free5gc              udm   85995     366.656970
12  free5gc              udr   78826     276.080192
6   free5gc              nrf   34905     175.564658
3   free5gc             ausf   33836     150.147558
28  open5gs           mongod    5880     125.079706
5   free5gc           mongod    5880     123.232027
18      oai         rsyslogd       6      63.733631
7   free5gc              pcf   20355      57.793701
26  open5gs        [unknown]    4374      32.353997
21      oai  systemd-journal    2629      18.808904
16      oai           mysqld      88      17.166055
38  open5gs  systemd-journal    1939      16.410262
9   free5gc 

         cn           comm  count      time (ms)
1   free5gc         mongod   4997  953080.239672
7   open5gs         mongod   5000  945728.118269
4       oai         mysqld    483  483073.028670
10  open5gs   open5gs-smfd      3  180000.659135
9   open5gs  open5gs-pcrfd      2  120000.413721
0   free5gc           cron      2  120000.403243
8   open5gs   open5gs-hssd      2  120000.388146
3       oai           cron      2  120000.225192
5       oai            nrf     22    1103.423756
6       oai            smf     36     113.269974
2       oai      [unknown]     21       1.718516


        cn  comm   count     time (ms)
0  free5gc   amf  159257  78133.895056
7  free5gc   udr  117129  30155.577611
2  free5gc   nrf   85519  28991.297544
6  free5gc   udm  119342  15946.142980
4  free5gc   pcf   27134  14645.269641
1  free5gc  ausf   48753   8264.293352
3  free5gc  nssf      11     82.351829
8  free5gc   upf       9     71.580994
5  free5gc   smf       9     71.380242
Empty DataFrame
Columns: [cn, count, time (ms), avg, syscall]
Index: []
Syscalls for locks operations


         cn           comm   count     time (ms)
17      oai         mysqld  149548  6.508959e+06
26  open5gs         mongod   23684  5.492351e+06
3   free5gc         mongod   31627  5.433623e+06
30  open5gs   open5gs-smfd    6445  3.510382e+06
29  open5gs  open5gs-pcrfd    6500  3.504327e+06
27  open5gs   open5gs-hssd    6277  3.386644e+06
28  open5gs   open5gs-mmed    6337  3.386640e+06
4   free5gc            nrf   74161  1.395140e+06
12  free5gc            udr  128587  1.338509e+06
6   free5gc            pcf   34459  1.294754e+06
1   free5gc            amf  137908  1.240514e+06
21      oai            smf     647  1.079908e+06
20      oai       rsyslogd    4555  4.725855e+05
32  open5gs       rsyslogd   31055  4.649582e+05
8   free5gc       rsyslogd     362  4.554034e+05
11  free5gc            udm  132379  2.767799e+05
2   free5gc           ausf   50117  2.501230e+05
5   free5gc           nssf      40  2.400298e+05
15      oai            amf     124  1.262598e+05
9   free5gc         

        cn    comm  count    time (ms)
9  open5gs  mongod   4553  1770.016444
0  free5gc     amf  24984   234.177878
2  free5gc  mongod  37567    99.050114
3  free5gc     nrf   1357    92.293179
6  free5gc     udm   3130    67.656370
7  free5gc     udr   1709    66.896923
1  free5gc    ausf   1603    30.959767
8      oai  mysqld   2257    28.982232
4  free5gc     pcf    113     3.624828
5  free5gc     smf      1     0.005813
Empty DataFrame
Columns: [cn, count, time (ms), avg, syscall]
Index: []


In [None]:
""" For each syscall look at the processes that are making the calls
(a) Graphs
(b) Tables with the sum per latency, count and average latency
This should give us:
1. An idea of the processes making use of the most relavant syscall or the syscall we are looking at in the study
2. It will give us an ide of the relavance of these processes and making it easier for us to analysis e.g., if the rsyslog system
is the most active process per syscall, we know we need to do further work to disable logs or looking at another logging mechanism
3. 
"""

In [39]:
import pandas as pd

top_n = 1

# create a sample dataframe
df = pd.DataFrame({'syscall': ['read', 'write', 'read', 'write'],
                   'ues': [10, 20, 30, 40],
                   'latency': [5, 10, 15, 20]})

print(df.loc[df['syscall'].isin(['read'])].index.array.tolist())

[0, 2]
