# Artifact Evaluation (AE) for ''Sweet or Sour CHERI: Performance Characterization of the Arm Morello Platform'' IISWC 2025

This repository contains the artifacts and workflows necessary to reproduce the experimental results and figures presented in the IISWC 2025 paper by X. Sun, J. Singer, and Z. Wang. 

**abstract**

Capability Hardware Enhanced RISC Instructions (CHERI) offer a hardware-based approach to enhance memory safety by enforcing strong spatial and temporal memory protections. This paper presents the largest performance analysis of the CHERI architecture on the ARM Morello platform seen to date. Using on-chip performance monitoring counters (PMCs), we evaluate 20 C/C++ applications, including the SPEC CPU2017 suite, SQL database engine, JavaScript engine, and large language model inference, across three CHERI Application Binary Interfaces (ABIs). 

Our results show that performance penalties of CHERI range from negligible to 1.65x, with the most significant impact apparent in pointer-intensive and memory-sensitive workloads. These overheads are primarily caused by increased memory traffic and L1/L2 cache pressure from 128-bit capabilities. However, our projections suggest that these overheads can be significantly reduced with modest microarchitectural changes, and that a mature, optimized implementation could achieve memory safety with minimal performance impact. We hope these findings provide valuable guidance for the design of future, performance-optimized memory security features.
## Pre-requisites

### Hardware 
- [ARM Morello prototype board](https://www.arm.com/architecture/cpu/morello) running [CheriBSD 25.03](https://www.cheribsd.org/).
- Intel-based machine running Ubuntu 22.04 LTS for cross-compilation.

### Software dependencies
- [cheribuild commit:b529760afc01d9e](https://github.com/CTSRD-CHERI/cheribuild).
- Python 3.10, git, ssh, autoconf, automake, etc. 
- All remaining dependencies, including [morello/llvm-project](https://git.morello-project.org/morello/llvm-project.git), compiled with [cheribuild](https://github.com/CTSRD-CHERI/cheribuild).

All of the above software dependencies can be installed on Ubuntu with the following command:
```bash
$ sudo apt update 
$ sudo apt install autoconf automake libtool pkg-config clang bison cmake mercurial ninja-build samba flex texinfo time libglib2.0-dev libpixman-1-dev libarchive-dev libarchive-tools libbz2-dev libattr1-dev libcap-ng-dev libexpat1-dev libgmp-dev bc

$ cd ~ && git clone https://github.com/CTSRD-CHERI/cheribuild.git 
$ cd ~/cheribuild 
$ ./cheribuild.py all-morello-purecap # manually reboot machine after this.
```

### Benchmark licenses
Users must obtain a license prior to executing the following benchmarks: 
- [SPEC CPU 2017 benchmark](https://www.spec.org/cpu2017/)


## Important notes

**A few bash scripts take more than half an hour to complete; Please wait for the results before executing the next one.**

Overload might lead to a longer wait for results. This issue may occur if multiple reviewers simultaneously run the scripts to generate results. One possible way of avoiding this is to check the running process by `ps aux` before executing any other script.

The experiments are customisable as reviewers can edit the Jupyter notebook on the spot. Type your changes with different Linux shell scripts and re-run using **Cell > Run Cells** from the menu.

**The profiling data needs to be transferred from Morello to the development machine. Currently, the network bandwidth over the public IP is approximately 17 MB/s. The full profiling dataset is approximately 150 GB in size.**

## Links to the submitted paper

For each step, we highlight that the current evaluation is corresponding to which Section or Figure in the submitted paper.


## 1. Setup

### 1.1 Ensure that the Docker container (named `iiswc25ae`) is running. 

`docker ps` command shows the current running docker containers. The next cell in this notebook should output the following information. If not produce similar output, please run `!docker stop iiswc25ae` and `!docker start iiswc25ae` commands in a cell to restart it.

```
CONTAINER ID   IMAGE                 COMMAND             CREATED         STATUS            PORTS     NAMES
61bf369c9b48   iiswc25ae-image      "tail -f /dev/null"   5 seconds ago   Up 4 seconds             iiswc25ae
```

In [1]:
# to define a magic function for launching scripts in this notebook
%alias docker_exec docker exec -it iiswc25ae bash -lc 

# show the current running docker containers
!docker ps

# start the docker container if it is not running
# !docker start iiswc25ae

CONTAINER ID   IMAGE                COMMAND               CREATED              STATUS              PORTS     NAMES
f3b1a9a9170a   iiswc25ae-image:v2   "tail -f /dev/null"   About a minute ago   Up About a minute             iiswc25ae


### 1.2 Ensure that the runtime environment is working well. 

`docker exec` supports running a Bash script in a running container. The following cell executes a command to check the files on the **development machine** and outputs the results shown below:

```
total 24
drwxr-xr-x  3 scxs scxs 4096 Aug 22 01:48 SPEC
drwxr-xr-x 26 scxs scxs 4096 Aug 22 01:54 llama-cpp
drwxr-xr-x  5 scxs scxs 4096 Aug 22 01:46 matrix-multiply
drwxr-xr-x 13 scxs scxs 4096 Aug 22 01:47 quickjs
drwxr-xr-x  6 scxs scxs 4096 Aug 22 02:00 sqlite-bench
drwxr-xr-x 21 scxs scxs 4096 Aug 22 02:30 workload-characterization-on-morello
```

In [2]:
%docker_exec 'ls -l ~/workspace/'

total 32
drwxr-xr-x  3 scxs scxs 4096 Aug 22 01:48 SPEC
drwxr-xr-x  1 scxs scxs 4096 Aug 22 05:02 llama-cpp
drwxr-xr-x  1 scxs scxs 4096 Aug 22 01:46 matrix-multiply
drwxr-xr-x 13 scxs scxs 4096 Aug 22 01:47 quickjs
drwxr-xr-x  6 scxs scxs 4096 Aug 22 02:00 sqlite-bench
drwxr-xr-x  1 scxs scxs 4096 Aug 22 02:30 workload-characterization-on-morello


The **ARM Morello board** is accessible at 127.0.0.1:2201, projected to the development machine with a public IP using autossh.
The following command checks the Morello board, and the output is shown below:

```bash
NAME=CheriBSD
VERSION="15.0-CURRENT"
VERSION_ID="15.0"
ID=cheribsd
ID_LIKE=freebsd
ANSI_COLOR="0;31"
PRETTY_NAME="CheriBSD 15.0-CURRENT"
CPE_NAME="cpe:/o:freebsd:freebsd:15.0"
HOME_URL="https://www.cheribsd.org/"
BUG_REPORT_URL="https://github.com/CTSRD-CHERI/cheribsd/issues"
FreeBSD cheribsd 15.0-CURRENT FreeBSD 15.0-CURRENT #0 releng/24.05-b2ad856aac65: Fri Jul 19 21:17:24 UTC 2024     jenkins@focal:/local/scratch/jenkins/workspace/CheriBSD-pipeline_releng_24.05@2/cheribsd-morello-purecap-build/local/scratch/jenkins/workspace/CheriBSD-pipeline_releng_24.05@2/cheribsd/arm64.aarch64c/sys/GENERIC-MORELLO-PURECAP arm64
```

In [3]:
%docker_exec "ssh scxs@127.0.0.1 'cat /etc/os-release && uname -a'"

NAME=CheriBSD
VERSION="15.0-CURRENT"
VERSION_ID="15.0"
ID=cheribsd
ID_LIKE=freebsd
ANSI_COLOR="0;31"
PRETTY_NAME="CheriBSD 15.0-CURRENT"
CPE_NAME="cpe:/o:freebsd:freebsd:15.0"
HOME_URL="https://www.cheribsd.org/"
BUG_REPORT_URL="https://github.com/CTSRD-CHERI/cheribsd/issues"
FreeBSD cheribsd 15.0-CURRENT FreeBSD 15.0-CURRENT #0 releng/24.05-b2ad856aac65: Fri Jul 19 21:17:24 UTC 2024     jenkins@focal:/local/scratch/jenkins/workspace/CheriBSD-pipeline_releng_24.05@2/cheribsd-morello-purecap-build/local/scratch/jenkins/workspace/CheriBSD-pipeline_releng_24.05@2/cheribsd/arm64.aarch64c/sys/GENERIC-MORELLO-PURECAP arm64


### 1.3 Run customized commands if needed?

When you need to execute customized commands inside the Docker container in this notebook, prefix them with `docker_exec`.

Example:
to install vim inside the container, run: `docker_exec apt install -y vim`.

# 2. Evaluation
>The jupyter notebook runs on a VM with 30 cores (60 vCPUs), 240 GB memory, and 1TB storage on a public cloud platform. The compilation time may slightly differ that we marked. 


Next, we evaluate performance changes across three CHERI ABIs using a set of benchmarks, including SPEC CPU 2017, SQLite, QuickJS, and LLaMA.cpp

## 2.1 Cross-compile benchmarks

Use `ps aux` to check for other processes launched by reviewers that may still be running and causing overload.

In [4]:
%docker_exec ps aux

    PID TTY          TIME CMD
     84 pts/1    00:00:00 ps


In [10]:
# SPEC CPU 2017 (around 40 minutes)
# The runtime log has been recorded below this cell.
# If you want to run it again, please uncomment the following line.
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/speccpu && ./cross-compile/compile all'

# SQLite (1 minute)
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/sqlite-bench && ./cross-compile/compile'

# QuickJS (5 minutes) 
# Please ignore the errors output during compilation
# (To successfully cross-compile QuickJS, it is necessary to generate the reply.c file on 
# the target device during the build process, using the intermediate binaries produced during compilation.)
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/quickjs && ./cross-compile/compile'

# LLaMA.cpp (13 minutes)
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/llama-cpp && ./cross-compile/compile'

# Matrix Multiply (2 minutes)
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/matrix-multiply && ./cross-compile/compile'

/home/scxs/workspace/workload-characterization-on-morello/speccpu
SPEC CPU(r) 2017 Benchmark Suites
Copyright 1995-2019 Standard Performance Evaluation Corporation (SPEC)

runcpu v6247
Using 'linux-x86_64' tools
Reading file manifests... read 32285 entries from 2 files in 0.12s (267419 files/s)
Loading runcpu modules.................
Locating benchmarks...found 47 benchmarks in 53 benchsets.
Reading config file '/home/scxs/workspace/workload-characterization-on-morello/configs/speccpu/morello-purecap-clang-linux-x86.cfg'
1 configuration selected:

Action    Run Mode        Workload          Report Type      Benchmarks
-------   --------   ------------------   ----------------   ------------------
clobber   rate       test,train,refrate   SPECrate2017_int   502.gcc_r         
-------------------------------------------------------------------------------

Setting up environment for running 502.gcc_r...
Starting runcpu for 502.gcc_r...
Running "specperl /home/scxs/workspace/SPEC/cpu2017-

## 2.2 Setup and distribute binaries to Morello

Generating the launch scripts, including input data and PMU events collection.

Use `ps aux` to check for other processes launched by reviewers that may still be running and causing overload.

In [None]:
%docker_exec "ps aux"

**You can skip the following commands if you think it is time-consuming. We have already transferred the binaries to Morello.**

In [20]:
# around 100 minutes

# SPEC CPU 2017
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/speccpu && ./run/setup all && ./run/distribute all 127.0.0.1'

# SQLite
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/sqlite-bench && ./run/setup && ./run/distribute 127.0.0.1'

# QuickJS
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/quickjs && ./run/setup && ./run/distribute 127.0.0.1'

# LLaMA.cpp
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/llama-cpp && ./run/setup && ./run/distribute 127.0.0.1'

# Matrix Multiply
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/matrix-multiply && ./run/setup && ./run/distribute 127.0.0.1' 

sqlite3-purecap                               100% 1960KB 623.1KB/s   00:03    
sqlite3-purecap-benchmark                     100% 1970KB   1.0MB/s   00:01    
sqlite3-hybrid                                100% 1704KB   1.9MB/s   00:00    
_launch_pmcstat                               100% 1973     9.4KB/s   00:00    
_launch_raw                                   100%  112     0.5KB/s   00:00    
_launch_pmcstat                               100% 1973     5.5KB/s   00:00    
_launch_raw                                   100%  112     0.7KB/s   00:00    
_launch_pmcstat                               100% 1973     6.1KB/s   00:00    
_launch_raw                                   100%  112     0.4KB/s   00:00    
suite.sql                                     100%   37MB   7.1MB/s   00:05    
rm: /home/scxs/quickjs/build-cheribsd-morello-hybrid/bin/results/_launch: Permission denied
rm: /home/scxs/quickjs/build-cheribsd-morello-hybrid/bin/results/quickjs.out: Permission denied
rm: /home/sc

## 2.3 Validate binaries

Ensuring that the compiled binaries in the `hybrid`, `purecap`, `purecap-benchmark` ABIs are generated correctly.

Use `ps aux` to check for other processes launched by reviewers that may still be running and causing overload.

In [22]:
%docker_exec ps aux

    PID TTY          TIME CMD
  96821 pts/1    00:00:00 ps


In [23]:
# SPEC CPU 2017
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/speccpu && ./run/check-abi all 127.0.0.1'

# SQLite
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/sqlite-bench && ./run/check-abi 127.0.0.1'

# QuickJS
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/quickjs && ./run/check-abi 127.0.0.1'

# LLaMA.cpp
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/llama-cpp && ./run/check-abi 127.0.0.1'

# Matrix Multiply
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/matrix-multiply && ./run/check-abi 127.0.0.1'


All binaries have checked ABIs (no output means success)


All binaries have checked ABIs (no output means success)


All binaries have checked ABIs (no output means success)


All binaries have checked ABIs (no output means success)



## 2.4 Execute benchmarks

The result-folder resides on the development machine, and the results generated on Morello are automatically transferred back upon completion. This process is time-consuming.


Use `ps aux` to check for other processes launched by reviewers that may still be running and causing overload.

In [24]:
%docker_exec ps aux

    PID TTY          TIME CMD
  96901 pts/1    00:00:00 ps


The following cell is very time-consuming, needing around 24 hours to finish. Please uncomment one or some of them to run.

In [None]:
# SPEC CPU 2017
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/speccpu && ./run/launch all 127.0.0.1 ./results/speccpu'

# SQLite
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/sqlite-bench && ./run/launch 127.0.0.1 ./results/sqlite-bench'

# QuickJS
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/quickjs && ./run/launch 127.0.0.1 ./results/quickjs'

# LLaMA.cpp
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/llama-cpp && ./run/launch 127.0.0.1 ./results/llama-cpp'

# Matrix Multiply
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/matrix-multiply && ./run/launch 127.0.0.1 ./results/matrix-multiply'

## 2.5 Analyze and visualize results

Extract and consolidate identical metrics from multiple profiling files for each benchmark, reorganizing them into a standardized format. Reviewers may undertake additional analyses or modify the source code as necessary to conduct further performance evaluations.

Use `ps aux` to check for other processes launched by reviewers that may still be running and causing overload.

In [25]:
%docker_exec ps aux

    PID TTY          TIME CMD
  96908 pts/1    00:00:00 ps


After you run (we have aleady executed one or two times), you can output the profiling data as a readable format.

In [29]:
# SPEC CPU 2017
# %docker_exec 'cd ~/workspace/workload-characterization-on-morello/speccpu && sudo ./run/verbose-list all'

# SQLite
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/sqlite-bench && sudo ./run/verbose-list'

# QuickJS
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/quickjs && sudo ./run/verbose-list'

# LLaMA.cpp
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/llama-cpp && sudo ./run/verbose-list'
    
# Matrix Multiply
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/matrix-multiply && sudo ./run/verbose-list'

'sqlite-bench': {
'inst_retired': (1254697, 1549898, 1543536),
'cpu_cycles': (794348, 1134050, 1187535),
'stall_backend': (197428, 371653, 409046),
'stall_frontend': (121333, 135835, 141472),
'inst_spec': (1357833, 1698395, 1706811),
'ase_spec': (965, 963, 848),
'br_retired': (216488, 259322, 258496),
'br_mis_pred_retired': (1980, 2717, 2859),
'br_indirect_spec': (23893, 28785, 28979),
'br_return_spec': (13322, 16349, 16638),
'br_immed_spec': (197098, 235337, 234915),
'itlb_walk': (11, 21, 11),
'l1i_tlb_refill': (1242, 1799, 1780),
'l1i_tlb': (299729, 372410, 385393),
'l1i_cache': (349779, 422517, 423098),
'l1i_cache_refill': (15011, 18603, 19214),
'vfp_spec': (596, 598, 613),
'dtlb_walk': (18, 121, 97),
'l1d_tlb': (427059, 573436, 576073),
'l1d_tlb_refill': (1762, 3711, 3410),
'l2d_tlb': (3100, 5520, 5237),
'l2d_tlb_refill': (30, 144, 105),
'crypto_spec': (0, 0, 0),
'l1d_cache': (441997, 586491, 588859),
'l1d_cache_rd': (298592, 390437, 391532),
'l1d_cache_refill': (7530, 12223, 12443

### 2.5.1 The overall execution performance (Figure 1)

This figure highlights the high variability of runtime overhead in purecap mode across different workloads. In some cases, CHERI features incur no measurable overhead and can even yield modest performance improvements, as observed in the 519.lbm_r and LLaMA matmul benchmarks, (§4.1).

In [45]:
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/overleaf && python3 ./figure1-macroscopic-performance.py && sudo cp ./figure1*.png ../figures/'

Then load the generated figure herein:
![waiting to generate](figures/figure1-macroscopic-performance.png)

### 2.5.2 The distribution of program section sizes (Figure 2)

This figure illustrates the impact of the three ABI modes on binary size across different program sections. We use the hybrid ABI as the baseline and normalize the sizes of the purecap and purecap benchmark binaries relative to it. Overall, CHERI capability metadata introduces roughly a 5% increase in total binary size, though the magnitude of the overhead varies substantially across sections, (§4.2).

In [46]:
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/overleaf && sudo python3 ./figure2-macroscopic-binary-size.py && sudo cp ./figure2*.png ../figures/'

  ax1.set_ylim(0, 100)
Binary Section Sizes (normalized to Hybrid) - Statistics across 8 benchmarks:
--------------------------------------------------------------------------------
Section              Hybrid          PureCap-BM      PureCap         Y-Axis    
--------------------------------------------------------------------------------
.text                1.000           1.127           1.117           Primary   
.data                1.000           4.000           4.000           Primary   
.bss                 1.000           1.244           1.244           Primary   
.rodata              1.000           0.813           0.813           Primary   
.got*                1.000           5.731           5.846           Primary   
.note.
 cheri        0.000           48.000          48.000          Primary   
.data.
 rel.ro       0.000           10624.000       10624.000       Secondary 
.rela.
 dyn          1.000           85.021          85.021          Secondary 
.debug           

Then load the generated figure herein:
![waiting to generate](figures/figure2-macroscopic-binary-size.png)

### 2.5.3 The top-level breakdown analysis (including Retiring, Bad Spec, Frontend, Backend, along with IPC) (Figure 3)

To identify where CPU cycles are spent or wasted, we applied a Top-Down analysis, classifying pipeline slots into four categories: Retiring, Bad Speculation, Frontend Bound, and Backend Bound, (details in §4.4).

In [47]:
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/overleaf && python3 ./figure3-top-level.py && sudo cp ./figure3*.png ../figures/'

Then load the generated figure herein:
![waiting to generate](figures/figure3-top-level.png)

### 2.5.4 The percentage of counters pretaining to core and memory bounds (Figure 4)

Backend stalls arise when the execution units or memory subsystem cannot keep pace with the Frontend. As shown in this figure, the higher stall rates observed in purecap modes are primarily attributable to memory hierarchy effects, in particular elevated L1I, L1D, L2D, and TLB miss rates, (§4.6).

In [48]:
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/overleaf && python3 ./figure4-backend-level.py && sudo cp ./figure4*.png ../figures/'

Then load the generated figure herein:
![waiting to generate](figures/figure4-backend-level.png)

### 2.5.5 The distribution of speculative instrution ratios across benchmarks by ABIs (Figure 5)

This figure reveals a pronounced shift under purecap execution. The proportion of data-processing instructions (DP_SPEC) increases substantially, ranging from 5.21% to 29.31%, reflecting the additional arithmetic operations required for capability manipulation and bounds checking. By contrast, the proportions of load (LD_SPEC) and store (ST_SPEC) instructions remain relatively stable, with standard deviations of 2.01% and 1.47%, respectively. These results suggest that memory access patterns are largely unaffected, while computational demands increase significantly. The observed shift highlights the microarchitectural cost of CHERI’s security model and points to potential opportunities for optimization, (§4.6).

In [49]:
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/overleaf && python3 ./figure5-spec-instructions-ratio-boxplot.py && sudo cp ./figure5*.png ../figures/'


Processing 510.parest_r:
  Hybrid ABI:
    Total spec events: 7,409,230
    inst_spec: 3,987,062 (53.81%)
    ase_spec: 54,429 (0.73%)
    br_indirect_spec: 10,663 (0.14%)
    br_return_spec: 4,290 (0.06%)
    br_immed_spec: 460,225 (6.21%)
    vfp_spec: 477,637 (6.45%)
    ld_spec: 1,326,707 (17.91%)
    st_spec: 86,926 (1.17%)
    dp_spec: 1,001,291 (13.51%)
  Purecap Benchmark ABI:
    Total spec events: 7,689,927
    inst_spec: 4,103,984 (53.37%)
    ase_spec: 54,416 (0.71%)
    br_indirect_spec: 14,459 (0.19%)
    br_return_spec: 6,756 (0.09%)
    br_immed_spec: 486,982 (6.33%)
    vfp_spec: 475,271 (6.18%)
    ld_spec: 1,349,611 (17.55%)
    st_spec: 98,632 (1.28%)
    dp_spec: 1,099,816 (14.30%)
  Purecap ABI:
    Total spec events: 7,687,200
    inst_spec: 4,108,025 (53.44%)
    ase_spec: 54,431 (0.71%)
    br_indirect_spec: 14,391 (0.19%)
    br_return_spec: 6,736 (0.09%)
    br_immed_spec: 486,964 (6.33%)
    vfp_spec: 474,941 (6.18%)
    ld_spec: 1,348,980 (17.55%)
    st_s

Then load the generated figure herein:
![waiting to generate](figures/figure5-spec-instructions-ratio-boxplot.png)

### 2.5.6 The detailed memory bound analysis from cache and DRAM (Figure 6)

The CHERI architecture’s 128-bit capabilities fundamentally reshape the memory footprint of applications, particularly those that are pointer-intensive or manage large pointer-based data structures. This, in turn, has direct implications for cache efficiency and TLB performance, (§4.7).

In [50]:
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/overleaf && python3 ./figure6-memory-level.py && sudo cp ./figure6*.png ../figures/'

Then load the generated figure herein:
![waiting to generate](figures/figure6-memory-level.png)

### 2.5.7 The performance metrics correlation matrix (hybrid vs purecap) (Figure 7)

The increase in CAP_MEM_ACCESS_RD directly drives the rise in L1I miss rates, demonstrating that performance degradation stems from the additional capability memory operations introduced by CHERI. While MEM_ACCESS_RD_CTAG records tag-dependent memory accesses without explicitly capturing tag-check latency—likely pipelined with memory operations—the high frequency of these events confirms heavy reliance on CHERI’s memory protection mechanisms. More fundamentally, CHERI’s safety guarantees and capability manipulations enforce a tightly coupled execution pattern that binds instruction-level behavior to memory system performance. This coupling manifests in the strong correlations among cache refills, TLB walks, and stall cycles, (§4.8).

In [51]:
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/overleaf && python3 ./figure7-metric-correlation.py --chart-type combined && sudo cp ./figure7*.png ../figures/'

Then load the generated figure herein:
![waiting to generate](figures/figure7-metric-correlation.png)

### 2.5.8 The impact of compiler optimization levels (Figure 8)

The impact of compiler optimization flags (-O) varies across ABIs. We evaluated five versions of a matrix multiplication implementation. In version two, purecap mode incurs significant overhead at -O0, minimal overhead at -O3, and achieves its best performance at -O1. These results suggest that traditional compiler optimization strategies may need to be reconsidered in the CHERI context, (§4.9).

In [52]:
%docker_exec 'cd ~/workspace/workload-characterization-on-morello/overleaf && python3 ./figure8-optimization-impact.py && sudo cp ./figure8*.png ../figures/'

Then load the generated figure herein:
![waiting to generate](figures/figure8-optimization-impact.png)


## The end of this Artifact Evaluation

Many thanks for your review, time and efforts on this artifact evaluation.  

Many thanks for your understanding and bearing with some inconveniences on this notebook. 