- Introduction
- Repogitory File Structure
- Installation Requirements
- How to Run
- Citing LAXOR accelerator and simulator
- Licence
LAXOR is an Binary Neural Network accelerator. The essence of LAXOR lies in a novel local computing paradigm that fuses the weight storage (i.e., latch) and the compute unit (i.e., XOR gate) in a single logic to minimize data movement, achieving 4.2× lower energy consumption. Assisted with the optimized population count circuits, LAXOR accelerator obtains an energy efficiency of 2299 T OPS/W , 3.4× ∼ 37.6× higher compared to the advanced BNN accelerator architectures.
LAXOR Accelerator | |
---|---|
CMOS Technology | 28nm |
Desing Type | Digital |
Result Type | Synthesis |
VDD ( |
0.5-0.9 |
Bit Width | 1 |
Frequency ( |
200M |
Core Area ( |
2.73 |
Performance ( |
104.8 |
Compute Density ( |
38.388 |
MAC Energy Efficiency ( |
2299 @0.5 |
Bit Accurate | Yes |
LAXOR accelerator has a many-core architecture with compact XOR arrays and energy-efficient popcount logic for local computing. The architecture consists of 4 Processing Engine (PE) clusters, a global controller and configuration unit, an accumulation unit to realize total sums for activation layers or Fully-Connected (FC) layers, and a comparison block to determine the inference result based on the maximum value.
LAXOR is able to support BNN topologies with small kernel dimensions, ranging from
Example: a 3D
On LAXOR accelerator, the bitwise multiplication efficiently with an inverted-XOR gate is implemented. Besides, the accumulation can be realized by a population count logic (i.e., popcount logic), which counts the number of '0' of the XOR output. Also, LAXOR accelerator consists of a proposed tightly coupled 10T Latch-XOR cell in which the computation is in-situ with the data storage for local computing. It comprises a transmission gate, a cross-coupled latch, a two-transistor (i.e., M1, M2) switching path, and an inverter.
To further reduce data movement, we combine an array of 1024 Latch-XOR cells together with a Parallel-counter-Carry-Look-ahead (PCL) unit and a computation and activation unit to form a PE. This tightly-coupled configuration allows for efficient counting of the ‘0’s from the output of the array and generating activations or partial sums for further operation. The figure of a PCL unit which is deploied on LAXOR accelerator as follow.
We design a python-based simulator for the proposed LAXOR accelerator. The purpose of the simulator is to
- (1) map and verify the functionality of a BNN model onto the proposed architecture
- (2) generate application-specific, cycle-accurate results (e.g., latency, energy, utilization, etc.) for design analysis.
The LAXOR simulator consists of a front-end tool, Areca, and a back-end tool,Bits-Island. Areca interfaces with the pre-trained model and user configurations before generating the data stream in a format tailored to the accelerator. Bits-Island replicates the LAXOR architecture, maps the data stream onto different PEs, and simulates the functionality layer by layer. Eventually the tool-chain reports the mapping results, layer output, and critical design metrics by harnessing embedded cycle count, latency and energy models. To ensure accurate energy estimation, latency and energy per atomic hardware operation such as single XOR gate, buffer read, weight loading, are provided to Areca using Cadence Spectre gate-level and post layout simulations.
-
Pre-trained_model
Pre-trained model of Binary CNN for CIFAR-10, which the accuracy is 85.25%, the total enegy is 3.82
$uJ$ , and the model size is 0.51$MB$ . -
A program to run the LAXOR accelerator simulator with random weights.
-
A program to run the LAXOR accelerator simulator with the pre-trained binary CNN model for CIFAR-10.
- Python version3
- datetime
- matplotlib
- torch
- numpy
from LAXOR_Sim.Areca import Areca
areca = Areca()
# If you want to run a batch normalization after convolution, select BatchNorm='ON' and add parameters for batch normalization by numpy array format.
# input: numpy array
# weights: numpy array
# bias: numpy array
# padding: int
out = areca.CPU_Binary_Conv2D(input, weights, bias, padding=0, BatchNorm='OFF', BN_gamma=None, BN_beta=None, BN_mean=None, BN_std=None)
# If you want to run a batch normalization after convolution, select BatchNorm='ON' and add parameters for batch normalization by numpy array format.
# input: numpy array
# weights: numpy array
# bias: numpy array
# stride: int
# padding: int
out = areca.Binary_Conv2D(input, weights, bias, stride=1, padding=0, BatchNorm='OFF', BN_gamma=None, BN_beta=None, BN_mean=None, BN_std=None)
# If you want to run a batch normalization after convolution, select BatchNorm='ON' and add parameters for batch normalization by numpy array format.
# input: numpy array
# weights: numpy array
# bias: numpy array
out = areca.Binary_FullyConnected(input, weights, bias, BatchNorm='OFF', BN_gamma=None, BN_beta=None, BN_mean=None, BN_std=None)
# input: numpy array
# ksize: int
# stride: int
out = areca.MaxPooling(input, ksize, stride)
This is one of examples to show how to load. User should store weights, input, bias, and some parameters into numpy array. The pre-trained model we provide is stored npy file. In order to load them, we use numpy.load().
import LAXOR_Sim.Tool as tool
w_path = './Pre-trained_model/Conv1_binary-weights.npy'
bias_path = './Pre-trained_model/Conv1_bias.npy'
beta_path = './Pre-trained_model/Conv1_BNbeta.npy'
gamma_path = './Pre-trained_model/Conv1_BNgamma.npy'
mean_path = './Pre-trained_model/Conv1_BNmean.npy'
std_path = './Pre-trained_model/Conv1_BNstd.npy'
w1, b1, beta1, gamma1, mean1, std1 = tool.load_npys(w_path, bias_path, beta_path, gamma_path, mean_path, std_path)
w_path = './Pre-trained_model/Conv1_binary-weights.npy'
weights = np.load(w_path)
print(weights)
"""
[[[[ 1 1]
[ 1 1]]
[[ 1 -1]
[ 1 -1]]
[[-1 -1]
[-1 -1]]]
...
[[[ 1 1]
[ 1 -1]]
[[ 1 -1]
[-1 -1]]
[[ 1 -1]
[ 1 1]]]]
"""
bias_path = './Pre-trained_model/Conv1_bias.npy'
bias = np.load(bias_path)
print(bias)
"""
[-1.53262466e-01 -8.07617307e-01 -5.98477423e-01 8.31361294e-01
-3.19540091e-02 -9.77912545e-02 -7.92104006e-01 1.63953304e-01
-7.66488433e-01 7.03036129e-01 -1.27875507e-01 -2.24554762e-01
4.48263705e-01 -1.31150529e-01 -1.73672631e-01 -1.33374967e-02
...
1.89023882e-01 -2.55141824e-01 -8.33954439e-02 1.33623332e-02
-6.80823684e-01 1.56198531e-01 2.09271386e-01 1.42073661e-01
-1.16940970e-02 6.63392007e-01 -3.19188684e-01 -4.96945649e-01
1.12402477e-05 -1.28726274e-01 -6.90906346e-01 4.38432664e-01]
"""
LOG_FILE = 'Output.txt'
PENUMS = 256
BIT_SIZE_PE = 1024 #bit
ORNUMS = 256 # number of OR-Logic
ORBITWIDTH = 4 # Bit width of OR-Logic
# input buffer
BUFFERSIZE_INPUT = 1024 #bits
# weights buffer
BUFFERSIZE_WEIGHTS = 1024 #bits
# bias buffer
BUFFERSIZE_BIAS = 9 # bits
PINS_IW = 8
EPSILON = 0
NUM_LABELS = 10
## Computation
ENERGY_POPCOUNT = 0.00054075 # popcount
ENERGY_XOR = 2.7124E-07 #xor
ENERGY_OR = 2.25811E-06 # OR logic
ENERGY_BNA = 0.000437255 # batch normalization and activation
ENERGY_COMPARISON = 5.9293E-05 # Comparison
## Data movement
ENERGY_DM_READ_BUFFER_IW = 2.21987E-06 # read from buffer
ENERGY_DM_LOAD_PE = 4.08203E-07 # load data to PE
ENERGY_DM_REAM_BUFFER_BIAS = 5.01043E-06 # read from bias buffer
ENERGY_DM_LOAD_CONTROL = 5.9393E-05 # load control
# Computation
LEAK_POPCOUNT = 500 # popcount
LEAK_XOR = 10.25390625 # xor
LEAK_OR = 24.63866016 # OR logic
LEAK_BNA = 791.7637969 # batch normalization and activation
LEAK_COMPARISON = 509.1364333 # comparison
## Data movement
LEAK_DM_READ_BUFFER_IW = 23.25170117 # read from buffer
LEAK_DM_REAM_BUFFER_BIAS = 14.56696875 # read from bias
LEAK_DM_LOAD_CONTROL = 31.4090625 # load control
CLOCK_PERIOD = 0.000000005
Weights and bias are defined with random values
python Example_simple.py
Program includes loading a pre-trained model for cifar10
python main.py
If you use LAXOR, please cite the following paper:
@INPROCEEDINGS{10244499,
author={Li, Dongrui and Yamasaki, Tomomasa and Mani, Aarthy and Do, Anh Tuan and Chen, Niangjun and Wang, Bo},
booktitle={2023 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)},
title={LAXOR: A Bit-Accurate BNN Accelerator with Latch-XOR Logic for Local Computing},
year={2023},
volume={},
number={},
pages={1-6},
doi={10.1109/ISLPED58423.2023.10244499}}