# System Calls in Redis

An analysis of system calls in `redis`. This notebook aims to answer the following questions:

- Which system calls are made during a `redis-benchmark`?
- Which shared libraries do these system calls come from?
- If we aimed to implement compartmentalisation _along shared library boundaries_, what degree of privilege reduction would we see?

In [127]:
import pandas as pd

counts_df = pd.read_json("./stats/counts.json")
missed_df = pd.read_json("./stats/missed.json", lines=True)

counts_df = counts_df.fillna(0).astype(int).reset_index().rename(columns={"index": "syscall"})
missed_df.head()

Unnamed: 0,ringbuf_full,get_parent_failed,get_pt_regs_failed,all
0,0,0,0,4878135


In [93]:
import matplotlib.pyplot as plt

total_syscalls = syscalls_sum.sum()
percentages = (syscalls_sum / total_syscalls) * 100

# Create a table
result_table = pd.DataFrame({
    'Library': syscalls_sum.index,
    'Total Syscalls': syscalls_sum.values,
    '% of Total Syscalls': percentages.values
}).reset_index(drop=True)
# Show the plot
result_table

Unnamed: 0,Library,Total Syscalls,% of Total Syscalls
0,/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2,224,0.004592
1,/usr/lib/x86_64-linux-gnu/libc.so.6,4877850,99.994158
2,anonymous,61,0.00125


## Percentage Syscalls

- Almost all syscalls come from `libc`

### Is this accurate?
Looking at the libraries used by the `redis-server` executable

```shell
$ ldd `which redis-server`
        linux-vdso.so.1 (0x00007ffff7fc3000)
        libatomic.so.1 => /lib/x86_64-linux-gnu/libatomic.so.1 (0x00007ffff7fa6000)
        liblzf.so.1 => /lib/x86_64-linux-gnu/liblzf.so.1 (0x00007ffff7fa0000)
        libjemalloc.so.2 => /lib/x86_64-linux-gnu/libjemalloc.so.2 (0x00007ffff7800000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ffff7eb7000)
        libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x00007ffff7b20000)
        libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007ffff7756000)
        libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007ffff7200000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff6e00000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ffff6a00000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ffff7af2000)
        /lib64/ld-linux-x86-64.so.2 (0x00007ffff7fc5000)
        libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x00007ffff7ea6000)
        libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007ffff70b8000)
        liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x00007ffff7734000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007ffff7086000)
        libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x00007ffff6d46000)
        libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007ffff7061000)
```

which the tool detects as being in the address space as can be seen from this log line

```json
{
  "level": "info",
  "ts": 1733177700.3952844,
  "caller": "syso/maps.go:64",
  "msg": "address space post-load",
  "pid": 1830193,
  "maps": [
    {
      "AddrStart": 93824992231424,
      "AddrEnd": 93824992423936,
      "Offset": 0,
      "PathName": "/usr/bin/redis-check-rdb"
    },
    {
      "AddrStart": 93824992423936,
      "AddrEnd": 93824993349632,
      "Offset": 192512,
      "PathName": "/usr/bin/redis-check-rdb"
    },
    {
      "AddrStart": 93824993349632,
      "AddrEnd": 93824993660928,
      "Offset": 1118208,
      "PathName": "/usr/bin/redis-check-rdb"
    },
    ...,
    {
      "AddrStart": 18446744073699065856,
      "AddrEnd": 18446744073699069952,
      "Offset": 0,
      "PathName": "[vsyscall]"
    }
  ]
}
```

Pulling out individual libraries with grep gives

```json
"PathName": "/etc/ld.so.cache"
      "PathName": "[stack]"
      "PathName": "/usr/bin/redis-check-rdb"
      "PathName": "/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2"
      "PathName": "/usr/lib/x86_64-linux-gnu/libatomic.so.1.2.0"
      "PathName": "/usr/lib/x86_64-linux-gnu/libcap.so.2.44"
      "PathName": "/usr/lib/x86_64-linux-gnu/libcrypto.so.3"
      "PathName": "/usr/lib/x86_64-linux-gnu/libc.so.6"
      "PathName": "/usr/lib/x86_64-linux-gnu/libdl.so.2"
      "PathName": "/usr/lib/x86_64-linux-gnu/libgcc_s.so.1"
      "PathName": "/usr/lib/x86_64-linux-gnu/libgcrypt.so.20.3.4"
      "PathName": "/usr/lib/x86_64-linux-gnu/libgpg-error.so.0.32.1"
      "PathName": "/usr/lib/x86_64-linux-gnu/libjemalloc.so.2"
      "PathName": "/usr/lib/x86_64-linux-gnu/liblua5.1-bitop.so.0.0.0"
      "PathName": "/usr/lib/x86_64-linux-gnu/liblua5.1-cjson.so.0.0.0"
      "PathName": "/usr/lib/x86_64-linux-gnu/liblua5.1.so.0.0.0"
      "PathName": "/usr/lib/x86_64-linux-gnu/liblz4.so.1.9.3"
      "PathName": "/usr/lib/x86_64-linux-gnu/liblzf.so.1.5"
      "PathName": "/usr/lib/x86_64-linux-gnu/liblzma.so.5.2.5"
      "PathName": "/usr/lib/x86_64-linux-gnu/libm.so.6"
      "PathName": "/usr/lib/x86_64-linux-gnu/libssl.so.3"
      "PathName": "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30"
      "PathName": "/usr/lib/x86_64-linux-gnu/libsystemd.so.0.32.0"
      "PathName": "/usr/lib/x86_64-linux-gnu/libzstd.so.1.4.8"
      "PathName": "[vdso]"
      "PathName": "[vsyscall]"
      "PathName": "[vvar]"
```

which aligns with the output from `ldd`.

So, is it the case that the tool is not mapping the libraries properly, or is it the case that all of these libraries invoke all of their syscalls via `libc`?

## Privilege Reduction

Given 99.99\% of syscalls come from `libc` there will be no practical privilege reduction if one were to compartmentalise across libraries. However, we can do some numbers anyway.

### How dangerous is a library
For the purposes of this analysis, the 'danger' of a library is is equivalent to which system calls the library is allowed to make.

Not all system calls are equally dangerous. If an attacker can only call `getpid()` they are likely to be able to do much harm. If they can call `execve`, they may be able to do much more damage. 

Using the list from [Table 4](https://www.researchgate.net/publication/261959738_Using_Attack_Surface_Entry_Points_and_Reachability_Analysis_to_Assess_the_Risk_of_Software_Vulnerability_Exploitability) and the code Go generates from `/usr/include/asm/unistd_64.h`, I classified each syscall into 3 categories (as per the referenced paper). 

High risk will have a multiplier of 3, medium risk 2, and low risk 1. 

(I am extremely unconvinced that this is a sound metric but it can be swapped later)

So, calculating the baseline "risk" of an uncompartmentalised `redis-server` instance can be done as follows:

In [118]:
import yaml

#
with open('../syscall-ranking.yaml', 'r') as file:
    data = yaml.safe_load(file)

# Convert the YAML data into a DataFrame
syscalls_df = pd.DataFrame([
    {'threat_level': level, 'syscall': syscall}
    for level, syscalls in data['syscalls'].items()
    for syscall in syscalls
])

multiplier_mapping = {
    'high-threat': 3,
    'medium-threat': 2
}

# Add the multiplier column
syscalls_df['threat_level'] = syscalls_df['threat_level'].map(multiplier_mapping)

# Merge the DataFrames with a left join
merged_df = counts_df.merge(
    syscalls_df[['syscall', 'threat_level']],
    left_on='syscall',
    right_on='syscall',
    how='left'
)

# Assign default multiplier of 1 to unmatched rows
merged_df['threat_level'] = merged_df['threat_level'].fillna(1).astype(int)

merged_df.head()

Unnamed: 0,syscall,/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2,/usr/lib/x86_64-linux-gnu/libc.so.6,anonymous,threat_level
0,0,21,2101667,0,1
1,10,30,4,0,1
2,11,3,1,0,1
3,12,1,1,0,1
4,158,2,0,0,1


In [132]:
# Create a new dataframe to hold the scores for each library
library_scores = merged_df.drop(columns=['syscall', 'threat_level']).copy()

# Add the 'threat_level' to the corresponding library columns
for library in library_scores.columns:
    # For each syscall in the library, add the corresponding 'threat_level'
    library_scores[library] = library_scores[library] * merged_df['threat_level']

# Sum up the 'threat_level' scores for each library
library_scores_sum = library_scores.sum()

overall_score = overall_scores.sum().sum()

combined_scores_df = library_scores_sum.to_frame(name='library_score')

combined_scores_df.loc['overall_score'] = overall_score

combined_scores_df['percentage_reduction'] = (overall_score - combined_scores_df['library_score']) / overall_score * 100


# Display the resulting scores
combined_scores_df.head()

Unnamed: 0,library_score,percentage_reduction
/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2,224,99.995408
/usr/lib/x86_64-linux-gnu/libc.so.6,4877856,0.005883
anonymous,63,99.998709
overall_score,4878143,0.0


### Privilege Reduction Results

- Threat model: assume any one of the compartments can be compromised
- So, threat reduction is only as good as the **minimum** threat reduction seen in the table above
- For this run of redis, it would be 0.006% - not amazing