Experiments are executed on the Jean Zay supercomputer using slurm scripts.
It can also be launched using mpiexec
or mpirun
.
One can face many issues installing SNN simulators on HPC clusters. One could face conflicts between PyTorch
versions, CUDA and custom CUDA code of Lava-DL.
We did not face critical issues running Lava-DL and Bindsnet with Pytorch v.2.0.1
.
Bindsnet must be installed by hand, following instructions found here.
Locally install Lie
containing networks and other tools concerning the SNN part using Bindsnet with :
pip install -e ./Lie
Lava-DL must be installed by hand following instructions found here.
Compilation of the custom CUDA code is not thread safe. One can add the following lines to their scripts to avoid issues:
import os
path = f"<YOUR_FOLDER>/torch_extension_mnist_{<PROCESS_RANK>}"
if not os.path.exists(path):
os.makedirs(path)
os.environ["TORCH_EXTENSIONS_DIR"] = path
Locally install Lave
containing networks and other tools concerning the SNN part using Lava-DL with:
pip install -e ./Lave
Experiments use the in development Zellij version.
Please install the Zellij version from develop_t branch.
An OPENMPI distribution is necessary, parallelization is made using mpi4py
.
For the version used in these experiments:
$ pip install -e ./zellij
There are 4 scripts for the main experiments:
experiment_1
:- Dataset: MNIST
- Architecture: Diehl and Cook
- Training STDP
- Simulator: Bindsnet
experiment_2
:- Dataset: DVS Gesture
- Architecture: Diehl and Cook + soft distance dependent lateral inhibition
- Training STDP
- Simulator: Bindsnet
experiment_3
:- Dataset: MNIST
- Architecture: CSNN
- Training SLAYER
- Simulator: LAVA-DL
experiment_4
:- Dataset: DVS Gesture
- Architecture: CSNN
- Training SLAYER
- Simulator: LAVA-DL
Fixed files launch_fixed_[...].py
are scripts used to retrain multiple times a unique solution.
--data
: size of the dataset,default=60000
--dataset
: name of the dataset, choose betweenMNIST_rate_100, MNIST_rate_25, GESTURE
. Datasets are loaded using the Bindsnet loader.--calls
: Number of total calls to the loss function. (Number of SNN evaluations)--mpi
:{synchronous, asynchronous, flexible}
. Useflexible
for these experiments.--gpu
: If True use GPU.--record_time
: Record evaluation time for all SNNs.--save
: If a path is given, results will be saved there.--gpu_per_node
: Deprecated. All GPUs must be isolated within their node. (One process per GPU)
Search spaces are found in search_spaces.py
and defined according to Zellij
.
mpiexec -machinefile <HOSTFILE> -rankfile <RANKFILE> -n 16 python3 experiment_1.py --dataset MNIST_rate_100 --mpi flexible --gpu --data 60000 --calls 1000
Results of all 4 experiments are in the results
folders.