# TITLE: An end-to-end homomorphically encrypted neural network

### 1. Introduction

nobody will care about privacy anymore, not because it’s unimportant, but because it will be guaranteed by design

### 2. Background

### 3. Homomorphic encryption

The objective of using homomorphic encryption ($\varepsilon$) is that, given an algorithm ${\text{Evaluate}_\varepsilon}$ and a valid public key $\text{pk}$, a computation can be performed on any circuit $\text{C}$ (*i.e.*, a collection of logic gates used to compute a function) and any cyphertexts $\psi_i \leftarrow \text{Encrypt}_{\varepsilon}(\text{pk}, \pi_i)$, such that the output of the computation is an encryption of $\psi \leftarrow \text{Evaluate}_{\varepsilon}(\text{pk}, \mathcal{C}, \psi_1, \dots, \psi_t)$ and that $\text{Decrypt}_{\varepsilon}(\text{sk}, \psi) = \mathcal{C}(\pi_1, \dots, \pi_t)$ for a valid secret key $\text{sk}$, as first described by (1978, Rivest). The first viable construction of a fully homomorphic encryption scheme was proposed by (Gentry, 2009) and may be broken down into three major conceptual steps:
  - a *"somewhat homomorphic"* scheme, supporting evaluation of low-degree polynomials on encrypted data;
  - a *"squashing"* decryption mechanism, responsible for expressing outputs as low-degree polynomials supported by the scheme, and;
  - a *"bootstrapping"* transformation, a self-referential property that makes the depth of the decryption circuit shallower than what the scheme can handle.

Gentry's main insight was to use bootstrapping in order to obtain a scheme that could evaluate polynomials of high-enough degrees while keeping the decryption procedure expressed as polynomials of low-enough degrees, so that the degrees of the polynomials evaluated by the scheme could surpass the degrees of the polynomials decrypted by the scheme (Gentry, 2011). This scheme used **"ideal lattices"**, corresponding to ideals in polynomial rings, to perform homomorphic operations. The reason for this is that ideal lattices inherit natural addition and multiplication properties from the ring, which allows for homomorphic operations to be performed on encrypted data. We defer the discussion on ideal lattices to the next section, since these operations are performed in the context of neural networks in our case.

Several implementations of homomorphic encryption schemes have been developed since, including DGHV (Dijk, 2010), BGV (Brakerski, 2011) and BFV (Fan-Vercauteren, 2012), with varying levels of security and efficiency. In 2016, Cheon *et al.* introduced HEANN (a.k.a. CKKS), which is a variant of the BFV scheme designed for computations on real numbers. This scheme is also particularly well-suited for neural network evaluation, where each layer is composed of several matrix vector multiplications and activation functions that can have their inputs approximated by encrypted floating-point polynomials.

Following the methodology adopted by (Gentry, 2011), we derive a scheme from HEANN capable of performing inference on encrypted data when applied to neural networks. The main idea is to encode a message $\pi$ into a plaintext polynomial $m$ and encrypt it using a public key $\text{pk}$ to obtain a ciphertext $\text{Enc}(m, \nu)$. In this scheme, $\nu$ stands for the *noise parameter* attached to each polynomial $m$ such that the noise is less than some threshold $\Nu \gg \nu$, as proposed by (Gentry, 2009). This *noise budget* is a critical parameter in the scheme and is inserted to guarantee the security of hardness assumptions such as LWE, RLWE and NTRU(citations). In this sense, it determines the security level of the encryption and the scheme is designed to ensure that the noise budget is not exceeded at any point during the computation.

$(m, \nu) \xleftarrow{\text{Encode}} \pi$ 

As in (Cheon, 2016), "it is inevitable to encrypt a vector of multiple plaintexts in a single ciphertext for efficient homomorphic computation" (p.3), being the plaintext space usually a cyclotomic ring $\mathbb{Z}_t[X]/(\Phi_M(X))$ of a finite characteristic. Thus, after being encoded by a native plaintext space, *ie.* the canonical embedding map, the plaintext polynomial ($m, \nu$) is encrypted using the public key $\text{pk}$ to obtain the ciphertext $\text{Enc}(m, \nu)$. We use a *probabilistic public key encryption* scheme so that the encryption function is randomized, such that the same plaintext polynomial $m$ can be encrypted into different ciphertexts $\text{Enc}(m, \nu)$ for different instances of the model, and encryption can be performed by anyone with access to the public key $\text{pk}$, enabling the model to consume data encrypted by any party.

$\text{Enc}(m, \nu) \xleftarrow{\text{pk}} (m, \nu)$

The encrypted data can then be used to perform computations as it is fed into the neural network. Considering the approximate arithmetic nature of our base scheme, the *noise parameter* is treated as part of error during approximate downstream matrix operations. In this sense, consider that the encryption $\mathbf{c}$ of message $\pi$ by the public key $\text{pk}$ will have a structure of the form $\langle \mathbf{c}, \text{pk} \rangle = m + \nu$, where $\nu$ must be large enough to mask any significant features of message $\pi$ while not exceeding the noise budget $\Nu$. Therefore, for a noise budget of $0 \leq \nu \leq \Nu$, each element of the output vector of the base neural network contains an embedded Gaussian distributed error $\nu'$ such that $|\nu'| \leq \Nu$.

$\text{Enc}(m', \nu') \xleftarrow{f(\cdot)} \text{Enc}(m, \nu)$

The encrypted output can then be decrypted using a secret key $\text{sk}$ to obtain the plaintext polynomial $m'$. For each instance of the network, the decryption function is deterministic, such that the same ciphertext $\text{Enc}(m', \nu')$ will always decrypt to the same plaintext polynomial $m'$. The decryption function is also designed to ensure that the noise parameter $\nu'$ is removed from the decrypt plaintext polynomial $m'$ during decryption.

$m' \xleftarrow{\text{sk}} \text{Enc}(m', \nu')$

Finally, the plaintext polynomial $m'$ is decoded to obtain the message $\pi'$, which represents the output prediction of the neural network for the input message $\pi$.

$\pi' \xleftarrow{\text{Decode}} m'$

The HEANN scheme consists of five algorithms ($KeyGen$, $Enc$, $Dec$, $Add$, $Mult$), while we use here only the first three algorithms ($KeyGen$, $Enc$, $Dec$), since the neural network will be performing additions and multiplications internally. The $KeyGen$ algorithm generates the public and secret keys, the $Enc$ algorithm encrypts an encoded input plaintext polynomial and the $Dec$ algorithm decrypts an output ciphertext into an encoded output plaintext polynomial. The parameters adopted are based on the original HEANN scheme and follow the notation and definitions established in the (2018, Standard):
- $\text{ParamGen}(\lambda, \text{PT}, \text{K}, \text{B}) \rightarrow \text{Params}$: the parameter generation algorithm, used to instantiate various parameters used in the core HE algoritms.
  - $\lambda$ is the desired security level parameter, *eg.* 128-bit security ($\lambda = 128$) or 256-bit security ($\lambda = 256$).
  - $\text{PT}$ is the underlying plaintext space of Gaussian distributed numbers mapping the message $\pi$ to a polynomial $m$.
  - $\text{K}$ is the number of dimensions of the vectors to be encrypted ($\text{V}_1, \ldots, \text{V}_k$).
  - $\text{B}$ is an indirect parameter that determines the key sizes, ciphertext sizes, and complexity of the evaluation procedures.

- $KeyGen(\lambda) \rightarrow (\text{pk}, \text{sk})$: Generates a public key $\text{pk}$ and a secret key $\text{sk}$, where $\lambda$ is the security parameter.
  - Given the security parameter $\lambda$, choose a random integer $h$, a byte lenght $\text{P}$ for the key, and fix the number of iterations $\text{I}$ to ensure sufficient computational cost.
  - Derive the key as: $k \leftarrow \mathcal{f}(\lambda, h, \text{P}, \text{I})$ applying the key derivation function iteratively.

- $Enc(\text{pk}, m) \rightarrow \text{Enc}(m, \nu)$: Encrypts a plaintext polynomial $m$ using the public key $\text{pk}$ to obtain a ciphertext $\text{Enc}(m, \nu)$.
  - for a $n$-dimensional vector $m = (m_1, \ldots, m_n)$ where $m_i \in \text{PT}$, compute the vector $m, v = (m_1, v_1, \ldots, m_n, v_n)$ where $n = \text{K}$ and $i_{len} = \text{B}$ using the public key $\text{pk}$ to obtain a ciphertext $\text{Enc}(m, \nu)$.
- $Dec(\text{sk}, \text{Enc}(m', \nu')) \rightarrow m'$: Decrypts a ciphertext $\text{Enc}(m', \nu')$ using the secret key $\text{sk}$ to obtain a plaintext polynomial $m'$.

Similar to Cheon *et al.* (2016, p.9), we do not use a separate plaintext space from an inserted error, so the output $m' = f(m, \nu)$ is slightly different from the ouput for the original message $m$, but the error is considered an approximate value for approximate computations and eventually corrected during activation functions embedded in the neural network. We also do not use Galois keys and bootstrapping, as we are not performing any rotations on the ciphertexts and we do not need to perform dimensional reductions in the encrypted domain.

### 4. Neural network architecture

Neural network interpretability is a long known and mostly unsolved problem in the field of artificial intelligence (Zeiler, 2014; Karpathy, 2016), sometimes being referred to as *black boxes* (2017, Fong). Therefore, input/output privacy-preserving neural networks allow the use of models on sensitive data without compromising the privacy of the data.

In general, neural networks can be considered a regression function that outputs data based on elaborate relationships between high dimensional input data. As observed by (Marcolla, 2022), privacy-preserving neural networks tend to suffer from high computational complexity and low efficiencies, as the computational complexity of training neural networks is generally high. By avoiding the need of training on encrypted data, we can reduce the computational complexity of the model and improve its efficiency.

We propose an architecture that leverages a modified version of HEANN to encrypt encoded inputs and decrypt outputs, while introducing a new layer that performs the activation of raw logits in the encrypted domain, which we call Differentiable Soft Argmax layer. This layer is designed to approximate a soft argmax function in the encrypted domain, allowing the neural network to perform prediction without leaking any intermediate values. By being differentiable, it also allows the use of backpropagation to calibrate the logits before ingestion by the softmax function, adding additional noise $\nu$ to the intermediate output of the base model, while effectively keeping it below the *noise budget* $\Nu \gg \nu$.

<div align="center">
  <img src="./arch-3.jpg" width="700"/>
</div>

##### 4.1. Rings, Ideals and Lattices

The requirement of using **ideal lattices** comes from the necessity of using encryption schemes whose decryption algorithms have low circuit complexity, generally represented by matrix-vector multiplication, with the important caveat that "code-based constructions would also represent an interesting possibility" (Gentry, 2009, p.11).

A ring is a set $R$ closed under two binary operations $+$ and $\times$ and with an additive identity $0$ and a multiplicative identity $1$. An ideal $I$ of a ring $R$ is a subset $I \subseteq R$ such that $\sum_{j=1}^{t} i_j + i'_j \in I$ and $\sum_{j=1}^{t} i_j \times r_j \in I$ for $i_1, \dots, i_t \in I$ and $r_1, \dots, r_t \in R$. As proposed by Gentry, the public key $\text{pk}$ contains an ideal $I$ and a plaintext space $\mathcal{P}$, consisting of cosets of the ideal $I$ in the ring $R$, while the secret key $\text{sk}$ corresponds to a short ideal $I$ in $R$. To encrypt $\pi \in \mathcal{P}$, the encrypter performs $\psi \xleftarrow{{R}} \pi + I$, where $I$ represents the noise parameter. The decrypter then performs $\pi \xleftarrow{{R}} \psi - I$ to decrypt the ciphertext $\psi$. To perform add and multiply operations on the encrypted data, ring homomorphisms are used on the plaintext space $\mathcal{P}$:

$$\text{Add}(\text{pk}, \psi_1, \psi_2) = \psi_1 + \psi_2 \in (\pi_1 + \pi_2) + I$$

$$\text{Mult}(\text{pk}, \psi_1, \psi_2) = \psi_1 \times \psi_2 \in (\pi_1 \times \pi_2) + I$$

These cyphertexts are essentially "noisy" lattice vectors, or ring elements, and decrypting them requires knowledge of the basis for a particular lattice (Ajtai, 1996). An *ideal lattice* is formed by embedding a ring ideal into a real $n$-dimensional coordinate space $\mathbb{R}^n$ used to represent the elements of an ideal as vectors (Lyubashevsky, 2010). Because $\mathbb{R}$ is discrete and has finite rank $n$, the images of its elements under the embedding form a lattice. Ideal lattices allow cyphertext operations to be performed efficiently using polynomial arithmetic, e.g. Fast Fourier Transform-based multiplication (Regev, 2009).

<div align="center">
  <img src="./ideals.jpg" width="400"/>
</div>
Source: Cheon, 2016

Consider plaintext messages $m_1$ and $m_2$. After encryption using the public key $\text{pk}$, these messages become vectors $\psi_1$ and $\psi_2$, each containing the underlying plaintexts, the ideals $\text{I}_1$ and $\text{I}_2$ used to obtain the cyphertexts and some error $e_1$ and $e_2$ stemming from the basis used to derive the secret key $\text{sk}$. A bit-wise homomorphic operation $\text{Mult}(\text{pk}, \psi_1, \psi_2)$ can then be performed on the cyphertexts to obtain the cyphertext $\psi_1 \times \psi_2$, which contain the product of the plaintexts $m_1 \times m_2$, the ideals $\text{I}_1$ and $\text{I}_2$ and the errors $e_1$ and $e_2$.

##### 4.2. Modified HEANN

HEANN supports approximate addition and multiplication of encrypted messages by truncating a ciphertext into a smaller modulus, which leads to rounding of plaintext and also adds noise to the message. The main purpose of the noise is for security reasons and, given the nature of the scheme, it ends up being reduced during computation due to rescaling. As the authors explain (p. 6), "the most important feature of our scheme is the rounding operation of plaintexts". The output of this scheme, therefore, is an approximate value with a predetermined precision.

As per the scheme definition, given the set of complex numbers $\mathbb{H} = \left\{ (z_j)_{j \in \mathbb{Z}_M^*} : z_{-j} = \overline{z_j}, \ \forall j \in \mathbb{Z}_M^* \right\} \subset \mathbb{C}^{\Phi(M)}$ and the subgroup ${T}$ of the multiplicative group $\mathbb{Z}_M^*$ satisfying $\mathbb{Z}_M^*/T = \{\pm 1\}$, the input of the scheme is a plaintext polynomial constituted by elements of a cyclotomic ring ${R} = \mathbb{Z}_t[X]/(\Phi_M(X))$, which is mapped to a vector of complex numbers via an embedding map represented by a ring homomorphism. This transformation enables the preservation of the precision of the plaintext after encoding. The decoding procedure is almost the inverse of the encoding, except for the addition of a round-off algorithm for discretization. An important characteristic of HEANN that makes it suitable for neural networks is that the bit size of ciphertext modulus does not grow exponentially in relation to the circuit being evaluated, which allows for the evaluation of deep neural networks without the need for bootstrapping.

In our version, based on Gentry's caveat, the inputs to the neural network are encrypted into the polynomial $\psi = Enc(m, v)$, where $m_i \in \text{PT}$, mapping to embedding inputs derived from the underlying model during backprogagation, and $\psi$ is a vector of lenght $\text{K}$, corresponding to the context window of the model being used as the basis for the interchangeable compute unit. Note that the noise component $\nu$ is added to the plaintext polynomial $m$ during encoding, which makes the scheme pass the hardness assumptions required for security.

Also as suggested by (Gentry, 2009, p.6), we use the parameter $\text{B}$ to determine the formatting of the secret key and ciphertexts as inputs and outputs of the network, whose size is a fixed polynomial in the security parameter, meaning the ciphertext size depends only on the security parameter and is independent of the circuit $\text{C}$.

This polynomial is then encrypted using the public key and the neural network performs the forward pass on the encrypted data (${f(\cdot)}$) in order to compute the activations ${g(m, \nu) = h(f(m, \nu))}$ of the network. The output of the network is then decrypted using the secret key to obtain the respective plaintext polynomial, which is then decoded to obtain the prediction.

##### 4.3. Differentiable Soft Argmax Layer

The softmax function was introduced by (Bridle, 1990) and is a way to convert raw network outputs (logits) into probabilities, which became a foundational technique in neural networks. When applied to a $n$-dimensional vector, the softmax rescales its elements so that the output is a normalized probability distribution. Given an input vector $\mathbf{x} = (x_1, \dots, x_n) \in \mathbb{R}^n$, it is formally defined as $\sigma : \mathbb{R}^n \rightarrow (0,1)^n$, where $n > 1$ and

$$\sigma(x_i) = \frac{e^{x_i}}{\sum_{j=1}^{n} e^{x_j}}$$

These logits can be calibrated before being ingested by the softmax function by adjusting a hyperparameter defined as temperature (2017, Guo), effectively raising the entropy of the activation parameters without altering the model's accuracy. This hyperparameter applies a single scalar $T \gt 1$ to the logit vector $\mathbf{x}$, such that $\sigma_i({x_i}/T)$. As $T \rightarrow \infty$, the probability approaches $1/{n}$, which represents maximum uncertainty. In general, this is a post-processing technique that is applied to the logits after the neural network has been trained.

By using a differentiable dedicated layer, we can approximate the temperature using backpropagation and apply the softmax function to the logits in the encrypted domain, after which we can derive the argmax from the resulting probabilities. In order to obtain the final output, we compute the weighted sum of the calibrated logits in relation to a reshaped and broadcasted index vector $i$ that ranges from 1 to $n$:

$$\text{Soft-Argmax}(\mathbf{x}) = \sum_{i=1}^{n} \sigma(x_i) \cdot i$$

Thus, this layer is designed to approximate the argmax function in the encrypted domain, allowing the neural network to perform prediction on encrypted data without any decryption. The layer is differentiable, allowing the network to be trained using backpropagation, and adds a negligible computational overhead to the model.

### 5. Training

In this section, we outline the overall training methodology adopted for our privacy-preserving neural network, highlighting the design choices that ensure encrypted data can be processed accurately without revealing sensitive information. Our approach comprises two main training phases: (i) first the base neural network is trained to learn how operate with encrypted inputs, and (ii) a fine-tuning phase where a differentiable soft argmax mechanism is used in conjuction with a temperature parameter.

##### 5.1. Training Data and Batching

For the purpose of this paper, we use the DistilBERT subword tokenizer (Sanh et al., 2019) with a shared vocabulary of approximately 30,000 tokens, ensuring consistent text input processing across different sequences. We train our model on a real-world text classification dataset derived from the Stanford Sentiment Treebank (SST-2) (Socher et al., 2013), containing roughly 67,000 sentences labeled with binary sentiment classes. Before batching, each sentence is encrypted and then grouped according to similar sequence lengths so that each mini-batch contains 32 sentences with a similar number of ciphered tokens. This approach ensures a smooth integration of homomorphic operations while maintaining efficient GPU utilization during the forward and backward passes.

##### 5.2. Hardware and Scheduling

All experiments were conducted using NVIDIA A100 GPUs allowing for efficient parallelization of the batched tensor operations. We first trained our base neural network for 200 epochs using both an AdamW optimizer and automatic mixed precision (AMP). Each epoch involves a forward and backward pass over all training samples, grouped into mini-batches of 32 encrypted sequences. With both the optimizer and AMP activated, a single step (i.e., processing one batch) typically took about 0.06 seconds on average, culminating in roughly 2 minutes per epoch over 2,105 steps. This schedule allowed the network to converge in approximately 6,5 hours of training time for the full 200 epochs.

##### 5.3. Optimizer and Mixed Precision

The training phase of our approach relies on an AdamW optimizer with a conservative learning rate of $(5 \times 10^{-6})$. AdamW adaptively scales gradient steps by considering the magnitudes of recent updates, while also adding weight decay to prevent explosive parameter growth. This combination of adaptivity and regularization proves essential when dealing with homomorphically encrypted data, where noise might obscure certain feature distinctions. The optimizer not only aids in stabilizing early training by adjusting learning rates in response to gradient feedback but also ensures that the model retains good generalization properties.

Despite these advantages, AdamW carries an inherent computational overhead compared to simpler optimizers. By maintaining running averages of both gradients and their second moments, AdamW must continuously update additional parameters during each backpropagation step. The result is a measurable increase in processing time per training step. For instance, experiments indicate that going from no optimizer to AdamW increases the step time from around 0.05 seconds to about 0.06 seconds when mixed precision is enabled, and from about 0.28 seconds to 0.29 seconds under full precision. These increments, although modest on a per-step basis, can accumulate over thousands of steps per epoch. Nevertheless, the adaptive nature of AdamW often outweighs its time cost, particularly in complex tasks requiring robust gradient management in noisy, privacy-preserving settings.

To mitigate the overall computational burden, we incorporate Automatic Mixed Precision (AMP), which reduces many core calculations from 32-bit to 16-bit floating-point arithmetic. Modern GPUs feature specialized hardware units that handle half-precision operations at substantially higher throughput, a property that significantly accelerates matrix multiplications within neural layers. Furthermore, AMP automatically identifies sensitive operations that might be prone to numerical instability and leaves them in full precision. This approach maintains model accuracy while decreasing training time. In practice, the introduction of AMP reduces step times from about 0.28 seconds down to near 0.06 seconds when combined with the AdamW optimizer, demonstrating that half-precision arithmetic can more than compensate for the increased overhead from adaptive updates.

In Table 1, we summarize the average step times for four configurations: (1) no optimizer with AMP, (2) AdamW optimizer with AMP, (3) no optimizer without AMP, and (4) AdamW optimizer without AMP. These measurements illustrate how both the optimizer and mixed precision affect runtime and model performance.

<div align="center">

**Table 1**: Comparison of Average Step Times and Final Loss Under Different Training Configurations

| Configuration               | Avg. Step Time (s) |
|-----------------------------|--------------------|
| No Optimizer, AMP          | 0.0537             | 
| Optimizer (AdamW), AMP     | 0.0620             | 
| No Optimizer, No AMP       | 0.2845             | 
| Optimizer (AdamW), No AMP  | 0.2915             | 

</div>

##### 5.4. Fine-Tuning the Temperature Parameter

To train this layer effectively, we freeze all previous parameters in order to preserve the already learned representations and optimize only the temperature via a specialized loss function. In many classification tasks, we find that binary cross-entropy or a closely related objective improves the probability estimates at the final stage, ensuring that the calibrated logit space aligns with the desired confidence measure. Because the temperature is a single (or low-dimensional) parameter, this secondary optimization converges swiftly.

### 6. Performance metrics

We evaluated our homomorphically encrypted neural network on the SST-2 validation set (Socher et al., 2013) and compared its performance to a publicly reported DistilBERT baseline on the same benchmark. Table 2 lists the core metrics for both our encrypted model and DistilBERT. While DistilBERT achieves near-perfect performance our encrypted approach necessarily operates under more stringent constraints. Encrypted data introduce both computational overhead and additional noise, factors that can modestly reduce performance relative to fully unencrypted pipelines.

<div align="center">

**Table 2**: Comparison of Model Performance on SST-2 Validation Set

| Metric    | Encrypted Model  | DistilBERT (SST-2) |
|-----------|------------------|---------------------|
| Accuracy  | 0.7993           | 0.989              |
| Precision | 0.8009           | 0.989              |
| Recall    | 0.8063           | 0.989              |
| F1-Score  | 0.8036           | 0.989              |

</div>

### 7. Experiments

| Variation | Learning Rate (Main) | Batch Size | Max Length | # Epochs | Temp Init | Noise Budget | Accuracy | F1     | AUROC  | Comments                                                                 |
|-----------|----------------------|-----------|-----------|----------|----------|-------------|----------|--------|--------|-----------------------------------------------------------------------------|
| **(A)**   | $(\(5 \times 10^{-6}\))$ | 32        | 512       | 20       | 2.0      | Standard    | 0.7993   | 0.8036 | 0.7992 | Baseline (matches your current setup)                                       |
| **(B)**   | \(1 \times 10^{-5}\) | 32        | 512       | 20       | 2.0      | Standard    | [TO GEN] | [TO GEN] | [TO GEN] | Testing a higher learning rate                                             |
| **(C)**   | \(5 \times 10^{-6}\) | 64        | 512       | 20       | 2.0      | Standard    | [TO GEN] | [TO GEN] | [TO GEN] | Doubling batch size                                                       |
| **(D)**   | \(5 \times 10^{-6}\) | 32        | 256       | 20       | 2.0      | Standard    | [TO GEN] | [TO GEN] | [TO GEN] | Shorter max length to reduce memory usage                                  |
| **(E)**   | \(5 \times 10^{-6}\) | 32        | 512       | 20       | **5.0**  | Standard    | [TO GEN] | [TO GEN] | [TO GEN] | Testing a higher initial temperature for Soft Argmax                       |
| **(F)**   | \(5 \times 10^{-6}\) | 32        | 512       | 20       | 2.0      | **Extended** | [TO GEN] | [TO GEN] | [TO GEN] | Increasing noise budget in the encryption to see if it reduces error, etc. |

### 8. Conclusion

Future work:

- In this seminal thesis, Gentry highlights that constructing an efficient homomorphic encryption scheme without using bootstrapping, or using some relaxations of it, is an interesting open problem (p.9). At least in the context of neural networks, that is we have accomplished in this work. In any way, we consider an interesting proposition to add bootstrapping and a recrypt algorithm (2009, Gentry, p. 8)
- blockchain and front running
