# FedEraser - First Attempt

FedEraser is the first published federated unlearning methodology. 

The basic idea of FedEraser is to trade the central server's storage for unlearned model's construction time, where FedEraser reconstructs the unlearned model by leveraging the historical parameter updates of clients that have been retained at the central server during the training process of FL.

Since the retained updates are derived from the global model which contains the influence of the target client's data, these updates have to be calibrated for information decoupling before using them for unlearning. Motivated by the fact that the client updates indicate in which direction the parameters of the global model need to be changed to fit the model to the training data, we further calibrate the retained client updates through performing only a few rounds of calibration training to approximate the direction of the updates without the target client, and the unlearned model can be constructed promptly using the calibrated updates.

## Architecture of FL

In a typical architecture of FL, there are $K$ federated clients with the same data structure and feature space that collaboratively train a unified DL model with the coordination of a central server. The central server organizes the model training process, by repeating the following steps until the training is stopped. At the $i_{th}$ training round ($i \in E$, where $E$ is the number of training rounds), the updating process of the global model $\mathcal{M}$ is performed as follows:

1. Federated clients download the current global model $\mathcal{M}^i$ and the training setting from the central server.

2. Each client $C_k\ (k \in \{1,2,\cdots, K\})$ trains the downloaded model $\mathcal{M}^i$ on its local data $D_k$ for $E_{local}$ rounds based on the training setting, and then computes an update $U_k^i$ with respect to $\mathcal{M}^i$.

3. The central server collects all the updates $U^i=\{U_1^i,U_2^i,\cdots,U_K^i\}$ from the $K$ clients.

4. The central server updates the global model on the basis of the aggregation of the collected updates $U^i$, thereby obtaining an updated model $\mathcal{M}^{i+1}$ that will play the role as the global model for the next training round.

When the termination criterion has been satisfied, the central server will stop the above iterative training process and get the final FL model $\mathcal{M}$.

## Design of FedEraser

In order to efficiently eliminate the influence of the target client's data from the trained global model, we add one extra function to the central server for FedEraser in the current architecture of FL, while the original functions of the central server remaining unchanged. Specifically, during the training process of the global model, the central server retains the updates of the clients, at intervals of regular rounds, as well as the index of corresponding round, so as to further calibrate the retained updates to reconstruct the unlearned global model, rather than retraining from scratch.

For clarity, we denote the round intervals as $\Delta t$, and the retained updates of the client $C_k$ as $U_k^{t_j}$ ($j \in \{1,2,\cdots, T\}$, where $t_1=1$, $t_{j+1}=t_j+\Delta t$, $T$ is the number of retaining rounds that equals to $\lfloor \frac{E}{\Delta t} \rfloor$, and $\lfloor \cdot \rfloor$ is the floor function). Thus, the whole retained updates can be denoted as $U = \{U^{t_1}, U^{t_2}, \cdots, U^{t_T}\}$, where $U^{t_j} = \{U_1^{t_j}, U_2^{t_j}, \cdots U_K^{t_j}\}$.

Given the retained updates $U$ and the target client $C_{k_u}$ whose data is required to be removed from the FL model, FedEraser mainly involves the following four steps:
(1) *calibration training*, (2) *update calibrating*, (3) *calibrated update aggregating*, and (4) *unlearned model updating*. The first step is performed on the calibrating clients $C_{k_c}$ ($k_c\in[1,2,\cdots,K]\setminus k_u$, i.e., the federated clients excluding the target one), while the rest steps are executed on the central server.

### Calibration training

Generally speaking, the client update indicates how much and in which direction the parameter of the global model needs to be changed to fit the model to the training data.

Specifically, at the $t_j$th training round, we let the calibrating clients run $E_{cali}$ rounds of local training with respect to the calibrated global model $\widetilde{\mathcal{M}}^{t_j}$ that is obtained by FedEraser in the previous calibration round.

It should be noticed that FedEraser can directly update the global model without calibration of the remaining clients' parameters at the first reconstruction epoch. The reason for this operation is that the initial model of the standard FL has not been trained by the target client and thus this model does not contain the influence brought by the target client.

After the calibration training, each calibrating client $C_{k_c}$ calculates the current update $\widehat{U}_{k_c}^{t_j}$ and sends it to the central server for update calibrating.

### Update calibrating

After the calibration training, the central server can get each client's current update $\widehat{U}_{k_c}^{t_j}$ with respect to the calibrated global model $\widehat{\mathcal{M}}^{t_j}$. Then FedEraser leverages $\widehat{U}_{k_c}^{t_j}$ to calibrate the retained update $U_{k_c}^{t_j}$.

In FedEraser, the norm of $U_{k_c}^t$ indicates how much the parameters of the global model needs to be changed, while the normalized $\widehat{U}_{k_c}^{t_j}$ indicates in which direction the parameters of $\widetilde{\mathcal{M}}^{t_j}$ should be updated.

Therefore, the calibration of $U_{k_c}^{t_j}$ can be simply expressed as:

$$\widetilde{U}_{k_c}^{t_j} = |U_{k_c}^{t_j}| \frac{\widehat{U}_{k_c}^{t_j}}{||\widehat{U}_{k_c}^{t_j}||}$$

### Calibrated update aggregating

Given the calibrated client updates $\widetilde{U}^{t_j}=\{\widetilde{U}_{k_c}^{t_j}|k_c\in[1,2,\cdots,K]\setminus k_u\}$, FedEraser next aggregates these updates for unlearned model updating.

In particular, FedEraser directly calculates the weighted average of the calibrated updates as follows:

$$\widetilde{\mathcal{U}}^{t_j} = \frac{1}{(K-1)\sum_{k_c} w_{k_c}} \sum_{k_c} w_{k_c} \widetilde{U}_{k_c}^{t_j}$$

where $w_{k_c}$ is the weight for the calibrating client obtained from the standard architecture of FL, and $w_{k_c} = \frac{N_{k_c}}{\sum_{k_c} N_{k_c}}$ where $N_{k_c}$ is the number of records the client $C_{k_c}$ has. It is worth noting that this aggregation operation is consistent with the standard FL.

### Unlearned model updating

With the aggregation of the calibrated updates, FedEraser can thus renovate the global FL model as:

$$\widetilde{\mathcal{M}}^{t_{j+1}} = \widetilde{\mathcal{M}}^{t_j} + \widetilde{\mathcal{U}}^{t_j}$$

where $\widetilde{\mathcal{M}}^{t_j}$ (resp. $\widetilde{\mathcal{M}}^{t_{j+1}}$) is the current global model (resp. updated global model) calibrated by FedEraser.

The central server and the calibrating clients collaboratively repeat the above process, until the original updates $U$ have all been calibrated and then updated to the global model $\widetilde{\mathcal{M}}$.
Finally, FedEraser gets the unlearned global model $\widetilde{\mathcal{M}}$ that has removed the influence of the client $C_{k_u}$'s data.

Once the unlearned global model $\widetilde{\mathcal{M}}$ is obtained, the standard deployment process of the deep learning model can be performed, including manual quality assurance, live A/B testing (by using the unlearned model $\widetilde{\mathcal{M}}$ on some clients' data and the original model $\mathcal{M}$ on other clients' data to compare their performance).

It is worth noting that FedEraser does not require far-reaching modifications of neither the existing architecture of FL nor the training process on the federated clients, yielding it very easy to be deployed in existing FL systems.
In particular, the process of calibration training executed on the federated clients can directly reuse the corresponding training process in the standard FL framework.

The aggregating and updating operations in FedEraser do not need to modify the existing architecture of FL, while only the additional retaining functionality is required at the central server side.

In addition, FedEraser can be performed unwittingly, as it does not involve any information about the target client, including his/her updates and local data, during the unlearning process.

## Time consumption analysis

In FedEraser, there are two settings that can speed up the reconstruction of the unlearned model.

First, we modify the standard FL to retain the client updates at intervals of regular rounds. We use a hyper-parameter $\Delta t$ to control the size of the retaining interval. Since FedEraser only processes on retained updates, the larger $\Delta t$ is, the less retaining rounds are involved, and the less reconstruction time FedEraser would require. This setting could provide FedEraser with a speed-up of $\Delta t$ times.

Second, FedEraser only requires the calibrating client perform a few rounds of local training in order to calibrate the retained updates. Specifically, the round number of the calibration training is controlled by the calibration ratio $\verb"r" = E_{cali}/E_{loc}$. This setting can directly reduce the time consumed by training on the client, and provide FedEraser with a speed-up of $\verb"r"^{-1}$ times.

Overall, FedEraser can reduce the time consumption by $\verb"r"^{-1}\Delta t$ times compared with retraining from scratch.

In our experiments, we empirically find that when $r=0.5$ and $\Delta t=2$, FedEraser can achieve a trade-off between the performance of the unlearned model and the time consumption of model reconstruction (detailed in the following section). In such a case, FedEraser can achieve an expected speed-up of $4 \times $ compared with retraining from the scratch.

## Performance evaluation

In this section, we evaluate the performance of FedEraser on different datasets and models. Besides, we launch membership inference attacks (MIAs) against FedEraser to verify its unlearning effectiveness from a privacy perspective.

- **Datasets.** We utilize four datasets in our experiments, including [UCI
Adult](https://archive.ics.uci.edu/ml/datasets/Adult), [Purchase](https://www.kaggle.com/c/acquire-valued-shoppers-challenge/data), [MNIST](http://yann.lecun.com/exdb/mnist), and [CIFAR-10](http://www.cs.toronto.edu/~kriz/cifar.html).

- **Global Models.** In the paradigm of FL and Federated Unlearning, the global model will be broadcasted to all clients and serve as the initial model for each client's training process. We make use of 4 global models with different structures for different classification tasks. 

FC layer means fully connected layer in the deep neural network (DNN) models, and Conv. (resp. Pool.) layer represents convolutional (resp. maxpooling) layer in the convolutional neural network (CNN) models.

| Dataset | Model Architecture |
| :--: | :--: |
| Adult | 2 FC layers |
| Purchase | 3 FC layers |
| MNIST | 2 Conv. and 2 FC layers |
| CIFAR-10 | 2 Conv., 2 Pool., and 2 FC layers |

- **Evaluation Metrics.** We evaluate the performance of FedEraser using standard metrics in the ML field, including ***accuracy*** and ***loss***. We also measure the unlearning ***time*** consumed by FedEraser to make a given global model forget one of the clients.

Furthermore, in order to assess whether or not the unlearned model still contains the information about the target client, we adopt the following three extra metric.

One metric is the ***prediction difference***, denoted as the L2 norm of prediction probability difference, between the original global model and the unlearned model:

$$P_{diss} = \frac{1}{N} \sum_{i=1}^{N} ||\mathcal{M}(x_i) - \widetilde{\mathcal{M}}(x_i)||_2~~~~x_i \in D_{k_u} %u means unlearn$$

where $N$ is the number of the target client's samples $D_{k_u}$. $\mathcal{M}(x_i)$ (resp. $\widetilde{\mathcal{M}}(x_i)$) is the prediction probability of the sample $x_i$ obtained from the original (resp. unlearned) model.

The rest two metrics are obtained from the MIAs that we perform against the unlearned global model. The goal of MIAs is to determine whether a given data was used to train a given ML model. Therefore, the performance of MIAs can measure the information that still remains in the unlearned global model.

We utilize the ***attack precision*** of the MIAs against the target data as one metric, which presents the proportion of target client's data that are predicted to have been participated in the training of the global model.

We also use the ***attack recall*** of the MIAs, which presents the fraction of the data of the target client that we can correctly infer as a part of the training dataset. In other words, attack precision and attack recall measure the privacy leakage level of the target client.

## Comparison methods

In our experiments, we compare FedEraser with two different methods: Federated Retrain (FedRetrain) and Federated Accumulating (FedAccum).

- **FedRetrain.** A simple method for unlearning by retraining this model from scratch on the remaining data after removing the target client's data, which will serve as the **baseline** in our experiments. Empirically, FedRetrain provides an **upper bound** on the prediction performance of the unlearned model reconstructed by FedEraser.

- **FedAccum.** A simple method for unlearning by directly accumulating the previous retained updates of the calibrating clients' local model parameters, and leveraging the accumulated updates to update the global model. The update process can be expressed as follows:

$$\widetilde{M}_{accum}^{t_{j+1}} = \widetilde{M}_{accum}^{t_j} +  \widetilde{\mathcal{U}}_{accum}^{t_j}$$

where $\widetilde{\mathcal{U}}_{accum}^{t_j}$ is the accumulation of the model updates at $t_j$th round, and $\widetilde{\mathcal{U}}_{accum}^{t_j}=\frac{1}{(K-1)\sum w_{k_c}} \sum_{{k_c}} w_{k_c} U_{k_c}^{t_j}$. Besides, $M_{accum}^{t_j}$ (resp. $M_{accum}^{t_{j+1}}$) represents the global model before (resp. after) updating with $\widetilde{\mathcal{U}}_{accum}^{t_j}$. The main difference between FedEraser and FedAccum is that the latter does not calibrate the clients' updates.

In order to evaluate the utility of the unlearned global model, we also compare FedEraser with the classical FL without unlearning. We employ the most widely used FL algorithm, federated averaging (**FedAvg**), to construct the global model. FedAvg executes the training  procedure in parallel on all federated clients and then exchanges the updated model weights. The updated weights obtained from every client are averaged to update the global model. 

## Experiment environment

In our experiments, we use a workstation equipped with an Intel Core i7 9400 CPU and NVIDIA GeForce GTX 2070 GPU for training the deep learning models. We use Pytorch 1.4.0 as the deep learning framework with CUDA 10.1 and Python 3.7.3.

We set the number of clients to $20$, the calibration ratio $\verb"r" = E_{cali}/E_{loc}=0.5$, and the retaining interval $\Delta t=2$. As for other training hyper-parameters, such as learning rate, training epochs, and batch size, we use the same settings to execute our algorithm and the comparison methods.

## Experiment results

### Prediction performance comparison

![pred](https://drive.google.com/uc?export=view&id=1LsgyfuP6KKRR7KY4t4_f9-l8ajhP-zfB)

### Time consumption of federated model construction

![time](https://drive.google.com/uc?export=view&id=12Ex0GelLwB1KaQI0uJ4B4IjB9kvxh93a)

### Prediction loss on the target client's data

![loss](https://drive.google.com/uc?export=view&id=1kB9I_8XDyzGfwnwEDHMI2UndvO3zcYpe)

### Performance of membership inference attacks

![mia](https://drive.google.com/uc?export=view&id=1hgi9nxsXSQYOYt3kyQMcfZcAgjvu97ho)

# References

- G. Liu, X. Ma, Y. Yang, C. Wang, and J. Liu, “FedEraser: Enabling Efficient Client-Level Data Removal from Federated Learning Models,” in 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), 2021, pp. 1–10. [[Paper](https://ieeexplore.ieee.org/abstract/document/9521274)]