# System Calls in Nginx

An analysis of system calls in `nginx`. This notebook aims to answer the following questions:

- Which system calls are made by `nginx` during a `wrk` benchmark?
- Which shared libraries do these system calls come from?
- If we aimed to implement compartmentalisation _along shared library boundaries_, what degree of privilege reduction would we see?

In [20]:
import pandas as pd

counts_df = pd.read_json("./stats/counts.json")
missed_df = pd.read_json("./stats/missed.json", lines=True)

counts_df = counts_df.fillna(0).astype(int)

syscalls_sum = counts_df.sum()

counts_df = counts_df.reset_index().rename(columns={"index": "syscall"})
missed_df.head()

Unnamed: 0,ringbuf_full,get_parent_failed,get_pt_regs_failed,all
0,65354516,0,0,73160289


In [21]:
import matplotlib.pyplot as plt

total_syscalls = syscalls_sum.sum()
percentages = (syscalls_sum / total_syscalls) * 100

# Create a table
result_table = pd.DataFrame({
    'Library': syscalls_sum.index,
    'Total Syscalls': syscalls_sum.values,
    '% of Total Syscalls': percentages.values
}).reset_index(drop=True)
# Show the plot
result_table

Unnamed: 0,Library,Total Syscalls,% of Total Syscalls
0,/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2,468,0.005996
1,/usr/lib/x86_64-linux-gnu/libc.so.6,7805244,99.993223
2,anonymous,61,0.000781


## Percentage Syscalls

- As was the case with redis, almost all syscalls come from `libc`.

So, is it the case that the tool is not mapping the libraries properly, or is it the case that all of these libraries invoke all of their syscalls via `libc`?

## Privilege Reduction

Given 99.97\% of syscalls come from `libc` there will be no practical privilege reduction if one were to compartmentalise across libraries. However, we can do some numbers anyway.

### How dangerous is a library
For the purposes of this analysis, the 'danger' of a library is is equivalent to which system calls the library is allowed to make.

Not all system calls are equally dangerous. If an attacker can only call `getpid()` they are likely to be able to do much harm. If they can call `execve`, they may be able to do much more damage. 

Using the list from [Table 4](https://www.researchgate.net/publication/261959738_Using_Attack_Surface_Entry_Points_and_Reachability_Analysis_to_Assess_the_Risk_of_Software_Vulnerability_Exploitability) and the code Go generates from `/usr/include/asm/unistd_64.h`, I classified each syscall into 3 categories (as per the referenced paper). 

High risk will have a multiplier of 3, medium risk 2, and low risk 1. 

(I am extremely unconvinced that this is a sound metric but it can be swapped later)

So, calculating the baseline "risk" of an uncompartmentalised `redis-server` instance can be done as follows:

In [12]:
import yaml

#
with open('../syscall-ranking.yaml', 'r') as file:
    data = yaml.safe_load(file)

# Convert the YAML data into a DataFrame
syscalls_df = pd.DataFrame([
    {'threat_level': level, 'syscall': syscall}
    for level, syscalls in data['syscalls'].items()
    for syscall in syscalls
])

multiplier_mapping = {
    'high-threat': 3,
    'medium-threat': 2
}

# Add the multiplier column
syscalls_df['threat_level'] = syscalls_df['threat_level'].map(multiplier_mapping)

# Merge the DataFrames with a left join
merged_df = counts_df.merge(
    syscalls_df[['syscall', 'threat_level']],
    left_on='syscall',
    right_on='syscall',
    how='left'
)

# Assign default multiplier of 1 to unmatched rows
merged_df['threat_level'] = merged_df['threat_level'].fillna(1).astype(int)

merged_df.head()

Unnamed: 0,syscall,/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2,/usr/lib/x86_64-linux-gnu/libc.so.6,anonymous,threat_level
0,0,45,298,0,1
1,10,64,0,0,1
2,11,4,0,0,1
3,12,1,102,0,1
4,158,2,0,0,1


In [16]:
# Create a new dataframe to hold the scores for each library
library_scores = merged_df.drop(columns=['syscall', 'threat_level']).copy()

# Add the 'threat_level' to the corresponding library columns
for library in library_scores.columns:
    # For each syscall in the library, add the corresponding 'threat_level'
    library_scores[library] = library_scores[library] * merged_df['threat_level']

# Sum up the 'threat_level' scores for each library
library_scores_sum = library_scores.sum()

overall_score = library_scores.sum().sum()

combined_scores_df = library_scores_sum.to_frame(name='library_score')

combined_scores_df.loc['overall_score'] = overall_score

combined_scores_df['percentage_reduction'] = (overall_score - combined_scores_df['library_score']) / overall_score * 100


# Display the resulting scores
combined_scores_df.head()

Unnamed: 0,library_score,percentage_reduction
/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2,468,99.9759
/usr/lib/x86_64-linux-gnu/libc.so.6,1941405,0.027344
anonymous,63,99.996756
overall_score,1941936,0.0


### Privilege Reduction Results

- Threat model: assume any one of the compartments can be compromised
- So, threat reduction is only as good as the **minimum** threat reduction seen in the table above
- For this run of redis, it would be 0.03% - not amazing