# TITLE: An end-to-end homomorphically encrypted neural network

### 1. Introduction

TBD

### 2. Background

### 3. Homomorphic encryption

The objective of using homomorphic encryption is that, given ciphertexts that encrypt to $\pi_i$, $\ldots$, $\pi_t$, a computation can be performed that encrypts $f(\pi_i, \ldots, \pi_t)$ for any function $f$ in a way that no information about $\pi_i$, $\ldots$, $\pi_t$ is decrypted at any point, as first described by (2009, Gentry). Formally, a homomorphic encryption scheme is said to be
secure if no adversary has an advantage in guessing (with better than 50% chance) whether a given ciphertext is an encryption of either one of two equally likely distinct messages. This requires encryption to be randomized so that two different encryptions of the same message do not look the same (2018, Standard).

Several implementations of such schemes have been developed since, including DGHV (Dijk, 2010), BGV (Brakerski, 2011) and BFV (Fan-Vercauteren, 2012), with varying levels of security and efficiency. In 2016, Cheon *et al.* introduced HEANN (a.k.a. CKKS), which is a variant of the BFV scheme that is particularly well-suited for computations on real numbers. This scheme is also apt for neural network evaluation where the linear layers have to involve a lot of matrix vector multiplications, as it allows for approximate arithmetic operations on encrypted data. Therefore, neural networks can be considered ideal lattices (source here) for homomorphic encryption, as they are composed of linear layers and activation functions that can be approximated by polynomials.

Following the methodology adopted by (Gentry, 2011), we derive a scheme from HEANN capable of performing inference on encrypted data when applied to neural networks. The main idea is to encode a message $\pi$ into a plaintext polynomial $m$ and encrypt it using a public key $\text{PK}$ to obtain a ciphertext $\text{Enc}(m, \nu)$. In this scheme, $\nu$ stands for the *noise parameter* attached to each polynomial $m$ such that the noise is less than some threshold $\Nu \gg \nu$, as proposed in (Gentry, 2009). This *noise budget* is a critical parameter in the scheme, as it determines the security level of the encryption and the accuracy of the computations. The noise budget is defined during training, and the scheme is designed to ensure that the noise budget is not exceeded at any point during the computation. 

$(m, \nu) \xleftarrow{\text{Encode}} \pi$ 

After encoding, the plaintext polynomial $m$ is encrypted using the public key $\text{PK}$ to obtain the ciphertext $\text{Enc}(m, \nu)$. We use a *probabilistic encryption* scheme, where the encryption function is randomized, such that the same plaintext polynomial $m$ can be encrypted into different ciphertexts $\text{Enc}(m, \nu)$ for different instances of the model it is being applied to. We also use a *public key* encryption scheme so that the encryption function can be performed by anyone with access to the public key $\text{PK}$, enabling the model to be applied to data encrypted by any party.

$\text{Enc}(m, \nu) \xleftarrow{\text{PK}} (m, \nu)$

Homomorphic operations can then be performed on the encrypted data as it is fed into the neural network. Considering the approximate arithmetic nature of our base scheme, the *noise parameter* is treated as part of error during approximate computations (Cheon *et al.*, 2016). In this sense, consider that the encryption $\mathbf{c}$ of message $m$ by the secret key $sk$ will have a structure of the form $\langle \mathbf{c}, sk \rangle = m + e$, where $e$ must be large enough to mask any significant features of message $m$ while not exceeding the noise budget $\nu$. For a noise budget of $0 \leq \nu \leq \Nu$, a ciphertext $\mathbf{c}$ will be decrypted to the message $m$ with a Gaussian distributed error $e$ such that $|e| \leq \nu$.

$\text{Enc}(f(m, \nu)) \xleftarrow{f(\cdot)} \text{Enc}(m, \nu)$

The encrypted output can then be decrypted using a secret key $\text{SK}$ to obtain the plaintext polynomial $m'$. The decryption function is deterministic, such that the same ciphertext $\text{Enc}(f(m, \nu))$ will always decrypt to the same plaintext polynomial $m'$. The decryption function is also designed to ensure that the noise parameter $\nu$ is removed from the decrypt plaintext polynomial $m'$ during decryption.

$f(m') \xleftarrow{\text{SK}} \text{Enc}(f(m, \nu))$

Finally, the plaintext polynomial $m'$ is decoded to obtain the message $\pi'$.

$\pi' \xleftarrow{\text{Decode}} f(m')$

The original HEANN scheme consists of five algorithms ($KeyGen$, $Enc$, $Dec$, $Add$, $Mult$), while ours consists of three algorithms ($KeyGen$, $Enc$, $Dec$), since the neural network will be performing additions and multiplications internally. The $KeyGen$ algorithm generates the public and secret keys, the $Enc$ algorithm encrypts a plaintext polynomial and the $Dec$ algorithm decrypts a ciphertext. Our algorithms are explained below and are based on the HEANN scheme proposed by Cheon *et al.* (2016) and in line with the (2018, Standard).

- $KeyGen(\lambda) \rightarrow (\text{PK}, \text{SK})$: Generates a public key $\text{PK}$ and a secret key $\text{SK}$, where $\lambda$ is the security parameter.
  - Given the security parameter $\lambda$, choose a $M = M(2^\lambda)$, an integer $h = h(\log_2(\lambda))$, a byte lenght $P = P(\lambda, \nu)$, and an $I$ number of iterations.
  - Choose a byte length  $\text{P}$ for the key, and fix the number of iterations  $\text{I}$ to ensure sufficient computational cost.
  - Set  $\mathcal{KDF} = \text{PBKDF2-HMAC-SHA(M)}$ applying the key derivation function iteratively.
  - Derive the key as: $k \leftarrow \mathcal{KDF}(\lambda, \text{H}, \text{P}, \text{I})$

- $Enc(\text{PK}, m) \rightarrow \text{Enc}(m, \nu)$: Encrypts a plaintext polynomial $m$ using the public key $\text{PK}$ to obtain a ciphertext $\text{Enc}(m, \nu)$.
  - for a $n$-dimensional vector $m = (m_1, \ldots, m_n) = Z_i \in \mathbb{R}$ of Gaussian numbers, compute the vector $m' = (m_1', \ldots, m_n') = Z_i' \in \mathbb{Z}_q$ using the public key $\text{PK}$ to obtain a ciphertext $\text{Enc}(m_i, \nu_i)$.
- $Dec(\text{SK}, \text{Enc}(m, \nu)) \rightarrow m'$: Decrypts a ciphertext $\text{Enc}(m, \nu)$ using the secret key $\text{SK}$ to obtain a plaintext polynomial $m'$.
  - for an input polynomial $m = (m_1, \ldots, m_n) = Z_i \in \mathbb{Z}_q$, compute the corresponding vector $m' = (m_1', \ldots, m_n') = Z_i' \in \mathbb{R}$ using the secret key $\text{SK}$ to obtain the plaintext polynomial $m'$. Return the closest vector to $m'$.

Similar to Cheon *et al.* (2016), we do not use a separate plaintext space from an inserted error, so an output $m' = f(m, \nu)$ is slightly different from the ouput for the original message $m$, but the error is considered an approximate value for approximate computations and eventually corrected during activation functions embedded in the neural network. We also do not use Galois keys, as we are not performing any rotations on the ciphertexts.

### 4. Neural network architecture

In general, neural networks can be considered a regression function that outputs data based on elaborate relationships between high dimensional input data. As observed by (Marcolla, 2022), privacy-preserving neural networks using homomorphic encryption suffer from high computational complexity and low efficiencies, as the computational complexity of training neural networks is generally high.

Neural network interpretability is a long known and mostly unsolved problem in the field of artificial intelligence (Karpathy, 2016). 

We propose an architecture that leverages a modified version of HEANN to encrypt encoded inputs and decrypt outputs, while introducing a new layer that performs the activation of logits in the encrypted domain, which we call Differentiable Soft Argmax layer. This layer is designed to approximate the argmax function in the encrypted domain, allowing the neural network to perform prediction on encrypted data. The layer is differentiable, allowing the network to be trained using backpropagation.

<div align="center">
  <img src="./arch-.jpg" width="700"/>
</div>

##### 4.1. Modified HEANN

HEANN is a homomorphic encryption scheme for arithmetic of approximate numbers, which supports approximate addition and multiplication of encrypted messages (2016, Cheon). It does so by truncating a ciphertext into a smaller modulus, which leads to rounding of plaintext and also adds noise to the main message. The main purpose of the noise is for security reasons and, given the nature of the scheme, it ends up being reduced during computation due to rescaling. The output of this scheme, therefore, is an approximate value with a predetermined precision.

As per the scheme definition, given the set of complex numbers $\mathbb{H} = \left\{ (z_j)_{j \in \mathbb{Z}_M^*} : z_{-j} = \overline{z_j}, \ \forall j \in \mathbb{Z}_M^* \right\} \subset \mathbb{C}^{\Phi(M)}$ and the subgroup ${T}$ of the multiplicative group $\mathbb{Z}_M^*$ satisfying $\mathbb{Z}_M^*/T = \{\pm 1\}$, the input of the scheme is a plaintext polynomial constituted by elements of a cyclotomic ring ${R} = \mathbb{Z}_t[X]/(\Phi_M(X))$, which is mapped to a vector of complex numbers via an embedding map represented by a ring homomorphism. This transformation enables the preservation of the precision of the plaintext after encoding. The decoding procedure is almost the inverse of the encoding, except for the addition of a round-off algorithm for discretization. An important characteristic of HEANN that makes it suitable for neural networks is that the bit size of ciphertext modulus does not grow exponentially in relation to the circuit being evaluated, which allows for the evaluation of deep neural networks without the need for bootstrapping.

In our version, the inputs to the neural network are encoded into plaintext polynomials ${f}(X) \in R = \mathbb{Z}_t[X]/(\Phi_M(X))$ derived during backprogagation, corresponding to a set of complex numbers $\mathbf{z} = (z_j)_{j \in \mathbb{Z}_M^*} \in \mathbb{H}$ mapping to the embedding inputs of the underlying model. These polynomials are then encrypted using the public key and the neural network then performs the forward pass on the encrypted data, using the homomorphic operations (${f(\cdot)}$) to compute the activations ${g(X) = h(f(X))}$ of the network. The activations are then decrypted using the secret key to obtain the plaintext polynomials, which are then decoded to obtain the predictions.

##### 4.2. Differentiable Soft Argmax Layer

The Softmax function was introduced by (Bridle, 1990) and is a way to convert raw network outputs (logits) into probabilities, which became a foundational technique in neural networks. When applied to a n-dimensional input vector, it rescales the input so that the elements of the n-dimensional output are normalized into a probability distribution. Given an input vector $\mathbf{x} = (x_1, \dots, x_n) \in \mathbb{R}^n$, it is formally defined as $\sigma : \mathbb{R}^n \rightarrow (0,1)^n$, where $n > 1$ and

$$\sigma(x_i) = \frac{e^{x_i}}{\sum_{j=1}^{n} e^{x_j}}$$

These logits can be calibrated before ingestion by the softmax function, in order to control the confidence or randomness of the predictions. By adjusting this hyperparameter, defined as temperature by (2017, Guo), the pre-softmax activation parameters can better reflect true likelihoods without altering the model's accuracy. In general, this is a post-processing technique that is applied to the logits after the neural network has been trained.

By inverting the order of the temperature and the argmax functions, we can approximate the argmax function in the encrypted domain. This is done by applying the softmax function to the logits in the encrypted domain, and then applying the argmax function to the resulting probabilities as a layer in the model. This layer is differentiable, allowing the network to be trained using backpropagation.

In order to compute this Soft-Argmax output, we compute the weighted sum of the output logits in relation to a reshaped and broadcasted index vector $i$ that ranges from 1 to $n$:

$$\text{Soft-Argmax}(\mathbf{x}) = \sum_{i=1}^{n} \sigma(x_i) \cdot i$$

Thus, this layer is designed to approximate the argmax function in the encrypted domain, allowing the neural network to perform prediction on encrypted data without any decryption. The layer is differentiable, allowing the network to be trained using backpropagation, and adds a negligible computational overhead to the model.

### 5. Training

##### 5.1 Training data and batching
##### 5.2 Hardware and scheduling
##### 5.1 Optimizer

### 6. Performance metrics

### 7. Experiments

### 8. Conclusion