<a href="https://colab.research.google.com/github/zaellis/sscs-ose-code-a-chip.github.io/blob/main/VLSI23/ASCON_code-a-chip.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Wishbone ASCON with OpenLane

```
Copyright 2023 Zachary Ellis
SPDX-License-Identifier: GPL-3.0-or-later
```

Run an ASCON wishbone peripheral design thru the [OpenLane](https://github.com/The-OpenROAD-Project/OpenLane/) GDS to RTL flow targeting the [open source SKY130 PDK](https://github.com/google/skywater-pdk/) with the addition of a 1kb RAM macro generated by [OpenRAM](https://github.com/VLSIDA/OpenRAM).

|Name|Affiliation| Email |IEEE Member|SSCS Member|
|:--:|:----------:|:----------:|:----------:|:----------:|
|Zachary Ellis|Georgia Institute of Technology|zellis7@gatech.edu|Yes|Yes|

## Introduction
ASCON is a family of authenticated encryption and hashing algorithms designed to be lightweight and easy to implement. It was recently accepted as a new lightweight crypto stadard by the National Institute of Standards and Technology (NIST) in the [NIST Lightweight Cryptography competition (2019–2023)](https://csrc.nist.gov/projects/lightweight-cryptography/finalists). It was also a finalist of the [CAESER Competition (2014-2019)](https://competitions.cr.yp.to/caesar-submissions.html). The finalized standard for ASCON can be found [here](https://csrc.nist.gov/CSRC/media/Projects/lightweight-cryptography/documents/finalist-round/updated-spec-doc/ascon-spec-final.pdf).

## This Project
Since ASCON is intended to be used for lightweight applications such as IOT the idea behind this project was to write a basic implementation of ASCON authenticated encryption and package it as a wishbone peripheral that could be included with something like a microcontroller or CPU core that uses a [wishbone bus](https://cdn.opencores.org/downloads/wbspec_b4.pdf). This was also a good opportunity for me to try out some open-source tools that I had not used before. Knowing that storage of large plaintexts/ciphertexts might be desirable for this implementation I decided to try out OpenRAM to generate a small RAM array for storage. A 1kb array is implemented here in the interest or reducing generation time, but the flexibility of this design would allow larger arrays given some configuration changes in the OpenRAM and OpenLane flows.

## Design Overview
This design is split into four main sections, the ASCON core itself, the wishbone interface, the memory block, and the data arbiters / memory controller for facilitating dataflow between the wishbone register file / RAM and the core. A high level diagram can be seen below.
![top](https://github.com/zaellis/ASCON_code-a-chip/blob/main/imgs/wb_ASCON_top.png?raw=true)

### ASCON core
ASCON operates off a fairly simple permutation structure reusing the same datapath components for each phase of encryption / decryption. The process for an ASCON encryption is shown in the image below ![ASCON_Process](https://github.com/zaellis/ASCON_code-a-chip/blob/main/imgs/ASCON_modes.png?raw=true)

$p^a$ and $p^b$ represent the permutation done a or b times respectively. The input / output of the permutation is known as the state and along with encrypted ciphertext / decrypted plaintext blocks, ASCON also ouputs a tag which is meant to uniquely identify the sequence of inputs in a way such that it would be unreasonably difficult to find another set of associated data / plaintext that produces the same tag. A basic diagram of the ASCON core is shown below.
![ASCON_core](https://github.com/zaellis/ASCON_code-a-chip/blob/main/imgs/wb_ASCON_core.png?raw=true)

This implementation of the ASCON core is inspired by the template provided for the 2023 HOST microelectronics competition. The general structure and file naming conventions of that implementation have been retained, however with the exception of [ASCON_ROUND_FUNCTION.v](https://github.com/zaellis/ASCON_code-a-chip/blob/main/RTL/ASCON_ROUND_FUNCTION.v) and [ASCON_SBOX.v](https://github.com/zaellis/ASCON_code-a-chip/blob/main/RTL/ASCON_SBOX.v) all of the design files have been rewritten in SystemVerilog with entirely new code with some files such as [ASCON_CONTROLER.sv](https://github.com/zaellis/ASCON_code-a-chip/blob/main/RTL/ASCON_CONTROLER.sv) being rewritten in much fewer lines (~250 lines down from ~1200) while maintaining or exceeding the level of functionality. This version of ASCON is also parameterized to all an unroll of the round function. By setting the **UNROLL**, **A**, and **B** parameters accordingly, the controller will be adjusted to run in fewer cycles and the datapath will be replicated up to 6 times such that multiple rounds of the permutation will happen each clock cycle. With the configuration of RAM and the desire for acceptable area / flow completion time, these parameters will be kept at their standard values for this notebook.  

### Wishbone Interface
This wishbone interface for this design, implemented in [wb_slave.sv](https://github.com/zaellis/ASCON_code-a-chip/blob/main/RTL/wb_slave.sv) is a standard pipelined wishbone slave with an 18 address 32 bit register file. For all addresses greater than 18 the wishbone bus is passed through to the memory controller which will read / write data from / to the RAM. In the case that the RAM is being accessed by the ASCON core, the transaction will not be successful and the wb_ack_o line will be held low. A description of the register file and it's contents can be found [here](https://github.com/zaellis/ASCON_code-a-chip/blob/main/imgs/ASCON_regs.pdf).

### Memory Block
The RAM block implemented here is a 1kb RAM with a 32 bit word size and 32 total words. it is a dual port RAM with no additional read / write granularity (reads / writes are always 32 bits). The full configuration can be found in config.py written below.

### Data Arbiters
ASCON allows for associated data which is sent as plaintext but is run through the ASCON core during the encryption or decryption process. This allows a user to authenticate this data that is sent in plaintext form as the encryption / decryption process will not work correctly if this data is tampered with. The first data arbiter is [AD_loader](https://github.com/zaellis/ASCON_code-a-chip/blob/main/RTL/AD_loader.sv), which is in charge of loading the associated data from the register file to the ASCON core. AD_loader is timed with the ASCON core to present the input data at the write time while also letting the core know the data size. The user is able to program the size of the associated data by writing to the AD_len field of the control register (CR 0x00000004).

The memory controller performs a similar function presenting the proper data and block size to the ascon core at exactly the right instance. It is also in charge of writing back the data output from the core back to memory. The memory controller will overwrite the plaintext stored in the RAM with the corresponding ciphertext block in order to make the most efficient use of space. Since the RAM has 32 bit words and ASCON uses 64 bit blocks, the memory controller needs to retrieve some data a clock cycle in advance so the full 64 bit output is valid at the write time. Similarly, the write back process from the ASCON core to the memory is a 2 cycle process. The reason for this choice was to reduce the size and complexity of the memory (the alternative option was a 64bit x 16 word memory with write select to allow read / write of either 64 or 32 bits at a time). The downside of this is that if the user wants to take advantage of the unroll parameters in the ASCON core, the memory will take too many cycles to fetch / write back data. Either the memory would have to be run at double the clock speed (which is a valid option in this case) or the more complex memory option would need to be used. 

## Simulation Results
Unfortunately in the interest of time I was unable to get a simulation to run in this notebook. My intention was to write a basic top level simulation in Verilator since it supports SystemVerilog but that will have to be a future project. The accompanying github repo for this project contains several testbenches for the ASCON core, the memory controller + RAM, the wishbone interface, as well as the top level design. Below are a couple screenshots from modelsim which show the key functionality of this design.

#### Wishbone Writes to RAM
The following image shows the RAM being populated via writes from the wishbone bus

![wb_to_RAM](https://github.com/zaellis/ASCON_code-a-chip/blob/main/imgs/write_to_ram.png?raw=true)

#### ASCON State Transitions and Writeback to RAM
The following image shows the ASCON core going through the state transitions for the encryption mode with associated data. It can be seen during the PT (plaintext) state that the corresponding ciphertext blocks are written back to RAM overwriting the plaintext blocks

![ascon_to_RAM](https://github.com/zaellis/ASCON_code-a-chip/blob/main/imgs/basic_encrypt.png?raw=true)

#### Writing of the Tag back to the Register File
The following image shows a couple different traces for the same transaction. In this case, it is showing the writeing of the tag from the ASCON core back to the wishbone register file after the finalize state.

![tag_write](https://github.com/zaellis/ASCON_code-a-chip/blob/main/imgs/tag_writeback.png?raw=true)

In [None]:
#@title Install Dependencies {display-mode: "form"}
#@markdown Click the ▷ button to setup the digital design environment based on [conda-eda](https://github.com/hdl/conda-eda).

#@markdown Main components we will install

#@markdown *   Open_pdks.sky130a : a PDK installer for open-source EDA tools.
#@markdown *   Openlane : an automated RTL to GDSII flow based on several components including OpenROAD, Yosys, Magic, Netgen, CVC, SPEF-Extractor, KLayout and a number of custom scripts for design exploration and optimization.
#@markdown *   GDSTK : a C++ library for creation and manipulation of GDSII and OASIS files. 

import os
import pathlib
import sys

!curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
conda_prefix_path = pathlib.Path('conda-env')
site_package_path = conda_prefix_path / 'lib/python3.7/site-packages'
sys.path.append(str(site_package_path.resolve()))

CONDA_PREFIX = str(conda_prefix_path.resolve())
PATH = os.environ['PATH']
#LD_LIBRARY_PATH = os.environ.get('LD_LIBRARY_PATH', '')

%env CONDA_PREFIX={CONDA_PREFIX}
%env PATH={CONDA_PREFIX}/bin:{PATH}
%env LD_LIBRARY_PATH={CONDA_PREFIX}/lib:{LD_LIBRARY_PATH}

!bin/micromamba create --yes --prefix $CONDA_PREFIX
!echo 'python ==3.7*' >> {CONDA_PREFIX}/conda-meta/pinned

!bin/micromamba install --quiet \
                        --yes \
                        --prefix $CONDA_PREFIX \
                        --channel litex-hub \
                        --channel main \
                        open_pdks.sky130a \
                        openlane

!bin/micromamba install --quiet \
                        --yes \
                        --prefix $CONDA_PREFIX \
                        --channel conda-forge \
                        gdstk

bin/micromamba
env: CONDA_PREFIX=/content/conda-env
env: PATH=/content/conda-env/bin:/opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin
env: LD_LIBRARY_PATH={CONDA_PREFIX}/lib:{LD_LIBRARY_PATH}

                                           __
          __  ______ ___  ____ _____ ___  / /_  ____ _
         / / / / __ `__ \/ __ `/ __ `__ \/ __ \/ __ `/
        / /_/ / / / / / / /_/ / / / / / / /_/ / /_/ /
       / .___/_/ /_/ /_/\__,_/_/ /_/ /_/_.___/\__,_/
      /_/

Empty environment created at prefix: /content/conda-env
    - lib/libblas.so
    - lib/libcblas.so
    - lib/liblapack.so


## Retrieve Design Files

In [None]:
!rm -rf ASCON_code-a-chip/
!git clone https://github.com/zaellis/ASCON_code-a-chip.git

Cloning into 'ASCON_code-a-chip'...
remote: Enumerating objects: 130, done.[K
remote: Counting objects: 100% (4/4), done.[K
remote: Compressing objects: 100% (4/4), done.[K
remote: Total 130 (delta 0), reused 0 (delta 0), pack-reused 126[K
Receiving objects: 100% (130/130), 67.66 MiB | 26.74 MiB/s, done.
Resolving deltas: 100% (61/61), done.


## Get OpenRAM

In [None]:
%env PDK_ROOT=/content/conda-env/share/pdk
!git clone --depth=1 -b stable https://github.com/VLSIDA/OpenRAM.git
!python -m pip install -r OpenRAM/requirements.txt
!git clone --depth=1 https://github.com/vlsida/sky130_fd_bd_sram $PDK_ROOT/sky130_fd_bd_sram
!git clone --depth=1 https://github.com/google/skywater-pdk-libs-sky130_fd_sc_hd $PDK_ROOT/skywater-pdk/libraries/sky130_fd_sc_hd/latest
%env OPENRAM_HOME=/content/OpenRAM/compiler
%env OPENRAM_TECH=/content/OpenRAM/technology
%env PYTHONPATH=/content/OpenRAM/compiler:/content/OpenRAM/technology:/content/OpenRAM/technology/sky130/custom
!make -C OpenRAM $OPENRAM_HOME/../technology/sky130/gds_lib \
                 $OPENRAM_HOME/../technology/sky130/mag_lib \
                 $OPENRAM_HOME/../technology/sky130/sp_lib \
                 $OPENRAM_HOME/../technology/sky130/lvs_lib \
                 $OPENRAM_HOME/../technology/sky130/calibre_lvs_lib \
                 $OPENRAM_HOME/../technology/sky130/klayout_lvs_lib \
                 $OPENRAM_HOME/../technology/sky130/maglef_lib

env: PDK_ROOT=/content/conda-env/share/pdk
fatal: destination path 'OpenRAM' already exists and is not an empty directory.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting scikit-learn>=0.22.2
  Using cached scikit_learn-1.0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (24.8 MB)
Collecting coverage>=4.5.2
  Using cached coverage-7.2.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (224 kB)
Collecting scipy>=1.3.3
  Using cached scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
Collecting python-subunit>=1.4.0
  Using cached python_subunit-1.4.2-py3-none-any.whl (106 kB)
Collecting unittest2>=1.1.0
  Using cached unittest2-1.1.0-py2.py3-none-any.whl (96 kB)
Collecting joblib>=0.11
  Using cached joblib-1.2.0-py3-none-any.whl (297 kB)
Collecting threadpoolctl>=2.0.0
  Using cached threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Collectin

## Write RAM Configuration

In [None]:
%%writefile config.py
word_size = 32 # Bits
num_words = 32
human_byte_size = "{:.0f}kbytes".format((word_size * num_words)/1024/8)

# Allow byte writes
write_size = 32 # Bits

# Dual port
num_rw_ports = 0
num_r_ports = 1
num_w_ports = 1
ports = '1r1w'

tech_name = 'sky130'
nominal_corner_only = True

route_supplies = 'ring'
check_lvsdrc = True
uniquify = True

output_name = f'{tech_name}_sram_{ports}_{word_size}x{num_words}_{write_size}'
output_path = '.'

Overwriting config.py


In [None]:
#@markdown Run OpenRAM
!python3 $OPENRAM_HOME/../sram_compiler.py config.py

** Start: 04/03/2023 17:55:37
Technology: sky130
Total size: 1024 bits
Word size: 32
Words: 32
Banks: 1
RW ports: 0
R-only ports: 1
W-only ports: 1
DRC/LVS/PEX is only run on the top-level design to save run-time (inline_lvsdrc=True to do inline checking).
Characterization is disabled (using analytical delay models) (analytical_delay=False to simulate).
Only generating nominal corner timing.
Words per row: None
Output files are: 
/content/./sky130_sram_1r1w_32x32_32.lvs
/content/./sky130_sram_1r1w_32x32_32.sp
/content/./sky130_sram_1r1w_32x32_32.v
/content/./sky130_sram_1r1w_32x32_32.lib
/content/./sky130_sram_1r1w_32x32_32.py
/content/./sky130_sram_1r1w_32x32_32.html
/content/./sky130_sram_1r1w_32x32_32.log
/content/./sky130_sram_1r1w_32x32_32.lef
/content/./sky130_sram_1r1w_32x32_32.gds
** Submodules: 2.7 seconds
** Placement: 0.1 seconds
**** Retrieving pins: 0.0 seconds
**** Analyzing pins: 0.0 seconds
**** Finding blockages: 2.6 seconds
**** Converting blockages: 0.3 seconds
**** 

## Write configuration

[Documentation](https://openlane.readthedocs.io/en/latest/reference/configuration.html)

Since I used an extra memory macro I followed the following [guide](https://openlane.readthedocs.io/en/latest/tutorials/openram.html) from openlane. As a result **QUIT_ON_MAGIC_DRC** is set to false. It would be up to the user to manually review the DRC log for any critical warnings if they were planning on actually using this in a tapeout.

In [None]:
%%writefile config.json
{
    "DESIGN_NAME": "wb_ASCON",
    "VERILOG_FILES": "/content/ASCON_code-a-chip/RTL/*.sv  /content/ASCON_code-a-chip/RTL/ASCON_ROUND_FUNCTION.v /content/ASCON_code-a-chip/RTL/ASCON_SBOX.v",
    "EXTRA_LEFS":      "/content/sky130_sram_1r1w_32x32_32.lef",
    "EXTRA_GDS_FILES": "/content/sky130_sram_1r1w_32x32_32.gds",
    "EXTRA_LIBS":      "/content/sky130_sram_1r1w_32x32_32_TT_1p8V_25C.lib",
    "FP_PDN_MACRO_HOOKS": "mb.sram vccd1 vssd1 vccd1 vssd1",
    "MACRO_PLACEMENT_CFG": "/content/macro_placement.cfg",
    "MAGIC_DRC_USE_GDS": false,
    "QUIT_ON_MAGIC_DRC": false,
    "VDD_NETS": "vccd1",
    "GND_NETS": "vssd1",
    "CLOCK_PERIOD": 25,
    "CLOCK_NET": "clk",
    "CLOCK_PORT": "clk",
    "FP_SIZING": "absolute",
    "DIE_AREA": "0 0 600 600",
    "PL_TARGET_DENSITY": 0.40
}

Overwriting config.json


Manual Macro Placement

In [None]:
%%writefile macro_placement.cfg
mb.sram 115 50 S

Overwriting macro_placement.cfg


## Run OpenLane Flow

[OpenLane](https://openlane.readthedocs.io/en/latest/) is an automated [RTL](https://en.wikipedia.org/wiki/Register-transfer_level) to [GDSII](https://en.wikipedia.org/wiki/GDSII) flow based on several components including [OpenROAD](https://theopenroadproject.org/), [Yosys](https://yosyshq.net/yosys/), [Magic](http://www.opencircuitdesign.com/magic/), [Netgen](http://opencircuitdesign.com/netgen/) and custom methodology scripts for design exploration and optimization targeting [open source PDKs](https://github.com/google/open-source-pdks).

![img](https://openlane.readthedocs.io/en/latest/_images/flow_v1.png)

In [None]:
%env PDK=sky130A
#!flow.tcl -design .
!flow.tcl -design . -tag full_run -overwrite

env: PDK=sky130A
OpenLane 2023.03.01_0_ge10820ec-conda
All rights reserved. (c) 2020-2022 Efabless Corporation and contributors.
Available under the Apache License, version 2.0. See the LICENSE file for more details.

[36m[INFO]: Using configuration in 'config.json'...[39m
[36m[INFO]: PDK Root: /content/conda-env/share/pdk[39m
[36m[INFO]: Process Design Kit: sky130A[39m
[36m[INFO]: Standard Cell Library: sky130_fd_sc_hd[39m
[36m[INFO]: Optimization Standard Cell Library: sky130_fd_sc_hd[39m
[36m[INFO]: Run Directory: /content/runs/full_run[39m
[36m[INFO]: Removing existing /content/runs/full_run...[39m
[36m[INFO]: Preparing LEF files for the nom corner...[39m
[36m[INFO]: Preparing LEF files for the min corner...[39m
[36m[INFO]: Preparing LEF files for the max corner...[39m
[STEP 1]
[36m[INFO]: Running Synthesis (log: runs/full_run/logs/synthesis/1-synthesis.log)...[39m
[STEP 2]
[36m[INFO]: Running Single-Corner Static Timing Analysis (log: runs/full_run/logs/synt

## Display layout

Because of some quirks related to gdstk which [it seems other people have had before](https://github.com/chipsalliance/silicon-notebooks/issues/30) I was unable to convert raw GDS to a PNG inside this notebook. Instead here is a PNG from one of my runs showing my design with the RAM macro inside.

![Layout](https://github.com/zaellis/ASCON_code-a-chip/blob/main/imgs/wb_ASCON.png?raw=true)

## Metrics

[Documentation](https://openlane.readthedocs.io/en/latest/reference/datapoint_definitions.html)


In [None]:
import pandas as pd
import pathlib

pd.options.display.max_rows = None
reports = sorted(pathlib.Path('runs').glob('full_run/reports/metrics.csv'))
df = pd.read_csv(reports[-1])
df.transpose()

Unnamed: 0,0
design,/content
design_name,wb_ASCON
config,full_run
flow_status,flow completed
total_runtime,0h29m41s0ms
routed_runtime,0h22m18s0ms
(Cell/mm^2)/Core_Util,37344.444444
DIEAREA_mm^2,0.36
CellPer_mm^2,18672.222222
OpenDP_Util,32.46
