cuHE: Homomorphic and fast
CUDA Homomorphic Encryption Library (cuHE) is a GPU-accelerated library for homomorphic encryption (HE) schemes and homomorphic algorithms defined over polynomial rings. cuHE yields an astonishing performance while providing a simple interface that greatly enhances programmer productivity. It features both algebraic techniques for homomorphic evalution of circuits and highly optimized code for single-GPU or multi-GPU machines. Develop high-performance applications rapidly with cuHE!
The cuHE library is distributed under the terms of the The MIT License (MIT). It is currently created for research purpose only. Several algorithms are implemented as examples and more will follow. Feedback and collaboration of any kind are welcomed.
The library pushes performance to a limit. A number of optimizations such as algebraic techniques for efficient evaluation, memory minimization techniques, memory and stream scheduling and low level CUDA hand-tuned assembly optimizations are included to take full advantage of the mass parallelism and high memory bandwidth GPUs offer. The arithmetic functions constructed to handle very large polynomial operands adopt the Chinese remainder theorem (CRT), the number-theoretic transform (NTT) and Barrett reduction based methods. A few algorithms and routines of the library is described in this paper, along with a performance analysis. More details on arithmetic methods and optimizations regarding HE are explained in our previous papers listed below.
Dai, Wei, and Berk Sunar. "cuHE: A Homomorphic Encryption Accelerator Library." Cryptography and Information Security in the Balkans. Springer International Publishing, 2015. 169-186. [draft] [Springer]
Dai, Wei, Yarkın Doröz, and Berk Sunar. "Accelerating SWHE Based PIRs Using GPUs." Financial Cryptography and Data Security: FC 2015 International Workshops, BITCOIN, WAHC, and Wearable, San Juan, Puerto Rico, January 30, 2015, Revised Selected Papers. Vol. 8976. Springer, 2015. [draft] [Springer]
Currently available is an implementation of the Doröz-Hu-Sunar (DHS) somewhat homomorphic encryption (SHE) scheme based on the Lopez-Tromer-Vaikuntanathan (LTV) scheme. Several homomorphic applications built on DHS are implemented on GPUs and are included as examples, such as the Prince block cipher and a sorting algorithm. These examples give an idea of how to program with the cuHE library.
- NVIDIA CUDA-Enabled GPUs with computation compability 3.0 or higher
- NTL: A Library for doing Number Theory 9.3.0 (requires C++11) NOTE: to avoid random crashes compile it running
- The OpenMP API
cd cuhe cmake ./ make
options to cmake command defaults are:
Notes for Mac OS X
On Mac you must use clang instead of gcc. You need to install a version compatible with OpenMP. With brew you can
brew install clang-omp
Then you must tell Cmake and Cuda that you are using clang-omp
cd cuhe CC=clang-omp CXX=clang-omp++ cmake -DGCC_CUDA_VERSION=clang-omp ./ make
A Short Tutorial
To design/implement a homomorphic application/circuit, e.x. the AND of 8 bits. First of all, we need to decide which homomorphic encryption scheme to adopt and set parameters (polynomial ring degree, coefficient sizes in each level of circuit, relinearization strategy) according to some noise analysis process. Let's say we decide to adopt the DHS HE scheme.
#include "cuHE.h" void setParameters(int d, int p, int w, int min, int cut, int m);//in "CuHE.h", set parameters void initCuHE(ZZ *coeffMod_, ZZX modulus); //in "CuHE.h", start pre-computation on GPUs
Then we may process some pre-computation of the circuit. When it is time to run the circuit, we suggest to turn on our virtual allocator. Do not turn it off until the circuit is completely done.
void startAllocator(); //in "CuHE.h", start virtual allocator void stopAllocator(); //in "CuHE.h", stop virtual allocator
The program by default uses a single GPU (device ID 0). To adopt multiple devices, call the function below.
void multiGPUs(int num); //adopt 'num' GPUs
Those are all the initialization steps. To implement any HE scheme or circuit, please check out the provided examples.
Funding for this research was in part provided by the US National Science Foundation CNS Award #1117590 and #1319130.
We want to acknowledge Andrea Peruffo for improving and debugging the code.