# VeriFi: Towards Verifiable Federated Unlearning

![1](https://drive.google.com/uc?export=view&id=1PgkKlS-FL3H-17f_7yKmzp-BPcGKV4kF)

- We design the first unlearning-verification framework VeriFi for verifiable federated unlearning.
VeriFi grants FL participants the **right to verify**, i.e., the verification of the unlearning effect when leaving the federation. VeriFi introduces a unified mechanism that allows quantitative measurement on the effectiveness of different combinations of unlearning and verification methods.
- With VeriFi, we identify the limitations of existing unlearning and verification methods, and propose a more efficient and FL-friendly unlearning method $^u$S2U and two more effective and robust non-invasive **unique memory** based verification methods ($^v$EM and $^v$FM). The advantages of the three proposed methods are also demonstrated by our extensive experiments.
- With VeriFi, we systemically study 7 unlearning methods and 5 verification methods (i.e., 5 marking methods and 4 checking metrics) with both Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) on 7 datasets, including 3 natural image, 1 facial image, 1 audio and 2 medical image datasets. Our extensive study unveils the necessity, potentials and limitations of different federated unlearning and verification methods.

## Unlearning Verification

Intuitively, the effectiveness of unlearning can be verified by the change in the model's performance before and after unlearning. In existing works, loss or accuracy on the leaving data is often used to achieve the verification purpose. Unlearning can also be verified on a set of backdoored training samples obtained via a backdoor attack, which is essentially a data poisoning process that injects a trigger pattern into a small subset of training data so as to trick the model into memorizing the correlation between the pattern and a target class. Suppose the trigger pattern is $\mathbf{r}$ and its associated backdoor target class is $y_{target}$. Once the trigger is learned by the model $f$, the model will constantly predict the target class on any samples attached with the trigger pattern:
$$\arg\max f(\mathbf{x} \oplus \mathbf{r}) = y_{target}, \; \forall (\mathbf{x},y) \in \mathcal{D},$$
here, the model $f$ outputs the class probabilities, the operation $\mathbf{x} \oplus \mathbf{r}$ produces a backdoored version of $\mathbf{x}$, $(\mathbf{x},y)$ is an input-label pair, and $\mathcal{D}$ is the training dataset in traditional machine learning. If the unlearning is effective, then the model will forget the backdoor correlation and predict the correct class instead:
$$\arg\max \overline{f}(\mathbf{x} \oplus \mathbf{r}) = y, \; \forall (\mathbf{x},y) \in \mathcal{D},$$
where $\overline{f}$ denotes the model obtained after unlearning and $y$ is the correct class of $\mathbf{x}$.

Although several unlearning methods have been proposed, the challenge and potential issues of unlearning verification have not been thoroughly studied, especially in FL. Considering the high security risk (could backdoor all participants) of backdoor techniques, it is thus not ideal to use backdoor verification in FL.

We also adapt and study two plausible concepts from the deep learning intellectual property (IP) protection domain for unlearning verification: watermarking and fingerprinting. Watermarking is an invasive technique that embeds owner-specific binary string or backdoor triggers into the model parameters to help determine the ownership of the model at a later (post-deployment) stage, while fingerprinting generates new samples to fingerprint the model's unique properties like decision boundary. In this work, we specially design and adapt these two types of techniques for federated unlearning verification. 

## Proposed Framework

### Unlearning-Verification Mechanism

Suppose the entire FL process consists of $T_{total}$ communication rounds. As shown in Fig. 3(a), the mechanism divides the entire process into two stages, including a free stage ($[T_{0}, T_{enabled})$) and an unlearning-enabled stage ($[T_{enabled}, T_{total}]$). The free stage refers to an early FL stage where the global model has not yet converged to a good solution. In this stage, all participants can join and leave the federation freely without activating the unlearning mechanism, as in this stage, the next round of training often overwrites the model's memorization at the previous rounds. Leaving the federation after $T_{enabled}$ will activate the unlearning and verification process, as at this time, the model's memorization of the private data is stabilized. Note that joining the federation at this stage should also be carefully examined as it is a **harvest stage** where small contribution can receive a big reward, i.e., a high-performance global model. Here, we only focus on leaving and unlearning.

- **Practical Considerations.** The time between $t_m$ and $t_{leave}$ is called the **checking period**, which spans both the marking and unlearning periods. The longer the checking period, the more certain the leaving participant is about the verification result. Nevertheless, the longer checking period also means that the leaving participant can download the global model more times than he/she should, which might be unfair to other participants. As such, $t_{leave}$ is an important hyper-parameter that should be agreed upon among the federation. The $T_{enabled}$ hyper-parameter, which ends the free stage and enables unlearning, can be determined by the global training loss or accuracy. In FL, the server does not have data to compute the global loss/accuracy. Nevertheless, the server can estimate the convergence by the stability of the aggregated gradients. It is also worth mentioning that dividing the FL process into two stages is of practical importance: it can avoid the collapse of the global model caused by the unlearning. 

- **System Assumption.** Following existing unlearning works, we assume a trusted server with an unlearning method in place. We also assume that the local data of the involved participants remains the same in each contributed round of FL. The server adopts partial device participation strategy in each round to motivate generating an excellent model with respect to each participant, such as choosing 10 participants among the 100 alternatives to contribute their local models each time. Beyond the above assumption, we additionally explore the adversarial scenarios that the server and other participants are **unlearning-malicious**.

### Unlearning

Unlearning is performed by the server immediately after the completion of marking by the leaving participant.
All unlearning methods are marked by the subscription symbol $^u$ before their names. 
For a comprehensive analysis, we adopt, adapt or propose a set of comprehensive methods in this work. This gives us 7 unlearning methods in total, including 3 existing, 3 adapted and 1 newly proposed ($^u$S2U). $^u$RT and $^u$RTB are both retraining-based unlearning methods but with different retraining starting points. $^u${CGS}, $^u${GGS} and $^u${IGS} all exploit gradient subtraction to erase the leaving data but with different gradient reconstruction strategies. $^u$DP is an existing differential privacy based unlearning method. Considering the high cost and negative impact of existing unlearning methods on the original task, we further propose $^u$S2U, a more efficient and friendly unlearning method that is more compatible with FL.

### Verification

![i](https://drive.google.com/uc?export=view&id=1Hf3i2l3HV393fu95d0-EShWbZq9RYd_T)

- **Proposed Unique Memory Markers**: e propose to leverage the unique memories of the global model about the leaving data as effective markers.
Specifically, we propose to explore two types of unique memories: *forgettable memory* and *erroneous memory*.

  - *Forgettable Memory* refers to the subset of forgettable examples by the global model. Intuitively, forgettable examples are the hardest and unique examples owned by the leaving participant, whereas unforgettable examples are easy examples shared across different participants. $^v$FM determines forgettable examples by the variance of their local training loss and chooses a subset of samples with the highest loss variance across several communication rounds as the markers. Fig. 4 illustrates a few forgettable examples (i.e., markers) identified by $^v$FM from the MNIST dataset. We denote the marker set found by $^v$FM for a leaving participant $\mathsf{a}$ as $D^{m}_{\mathsf{a}}$ and $D^{m}_{\mathsf{a}} \subset D_{\mathsf{a}}$.
At the marking step, $\mathsf{a}$ locally fine-tunes the model for a sufficient number of iterations to reduce the local loss variance on $D^{m}_{\mathsf{a}}$, then uploads the fine-tuned parameters to the server. Now the global model will also have relatively low loss variance on $D^{m}_{\mathsf{a}}$. During checking, $\mathsf{a}$ can monitor the global model's loss variance on $D^{m}_{\mathsf{a}}$ to verify the unlearning effect. Effective unlearning should quickly recover the high loss variation on $D^{m}_{\mathsf{a}}$.
  - *Erroneous Memory ($^v$EM)* refers to the subset of erroneous (incorrectly predicted) samples to the global model. Intuitively, erroneous samples are likely to be the hard and rare samples uniquely owned by the leaving participant, as otherwise they should be well learned by the global model if other participants also have these samples. $^v$EM first investigates the top $\kappa$ (\%) of the high loss samples (Line 1) and selects the majority class of erroneous samples into the marker set $D^{m}_{(\mathsf{a})}$ (Line 2). Note that the marker set has only one class (i.e., the majority class). Fig. 5 shows a few erroneous MNIST samples identified by $^v$EM, in which images of class \lab{7} are misclassified as \lab{2}.
$^v$EM then relabels $D^{m}_{(\mathsf{a})}$ to its mostly predicted label by the local model $f^{(\mathsf{a})}$ (Lines 4-6) and fine-tunes the local model on the relabelled dataset to obtain a marked model $\widetilde{f}^{(\mathsf{a})}$ (Line 7). The marked model will then be uploaded to the central server to be aggregated into the global model. Fine-tuning with erroneous labels is to make the loss on the markers smaller and check if the global model can increase the loss on the markers through unlearning. Since $\mathsf{a}$ fine-tunes the local model to maintain a low loss on the $^v$EM markers during the marking process, effective unlearning should quickly recover the high losses on the markers.

![4](https://drive.google.com/uc?export=view&id=16mFGZLlqumvdi1l-XPMMs39zgZf91cfA)
![5](https://drive.google.com/uc?export=view&id=1TO-Z-n1rxtBIHGEp95EpTpF-TegM5vHf)

- **Existing or Adapted Marking Methods.**
Existing watermarking methods such as parameter-based and backdoor-based watermarking or fingerprinting methods from the field of deep learning intellectual property protection can be adapted as marking methods.
For watermarking, we adopt the backdoor-based ($^v$BN) marking method from that was initially proposed for traditional machine unlearning verification. $^v$BN leverages the BadNets backdoor attack to inject trigger patterns associated with a backdoor class into the global model to verify the unlearning effect. At the marking step, $^v$BN fine-tunes the local model on backdoored data and uploads the backdoored local parameters to the server for aggregation. After fine-tuning, backdoored samples exhibit a high attack success rate on the backdoored local and global models. Effective unlearning should break the correlation between the trigger pattern and the backdoor class, i.e., lowering the attack success rate. For fingerprinting, we adapt the Boundary Fingerprint ($^v$BF) to find decision boundary fingerprints (markers) to verify unlearning. $^v$BF generates adversarial examples that are close to the decision boundary to characterize the robustness property of the local model $f^{(\mathsf{a})}$. Arguably, the adversarial examples with relatively high and close top-2 class probabilities are boundary examples. Therefore, before the unlearning round $t_u$, $^v$BF marks the following adversarial examples as markers:
$$\small
 D^m_{\mathsf{a}} = \{(\mathbf{x} + \sigma, y) \, \mid |f^{(\mathsf{a})}_{top-1}(\mathbf{x} + \sigma) - f^{(\mathsf{a})}_{top-2}(\mathbf{x} + \sigma)| \leq \gamma, (\mathbf{x}, y) \in D_{\mathsf{a}}\},$$
where, $f^{(\mathsf{a})}_{top-1}(\mathbf{x} + \sigma)$ and $f^{(\mathsf{a})}_{top-2}(\mathbf{x} + \sigma)$ denote the top-1 and top-2 class probabilities respectively, $\mathbf{x} + \sigma$ is the PGD \cite{madry2017towards} adversarial example of $\mathbf{x}$, and $\gamma \in [0, 0.1)$ is a small positive value defining how close are the two probabilities. 
At the marking step, $^v$BF first fine-tunes the local model on $D^m_{\mathsf{a}}$ to obtain a marked local model $\widetilde{f}^{(\mathsf{a})}$ which now becomes robust to $D^m_{\mathsf{a}}$ and has more smoothed boundary around the markers. The marked local model will then be uploaded to the central server and aggregated into the global model.
Effective unlearning should quickly forget the smoothed (robust) boundary around the markers (thus resulting in wrong predictions), which can be easily checked by the performance on the adversarial markers.

## Experiments

We conduct extensive experiments with the \verifi framework to answer the key research questions (RQs) on verifiable federated unlearning. All experiments are conducted on a Linux server with 4 Nvidia RTX 3090 GPUs, each with 24 GB dedicated memory, Intel Xeon processor with 16 cores and 384 GB RAM. Our code is implemented using PyTorch 1.7.1 with CUDA 11.1 and Python 3.7.

![iii](https://drive.google.com/uc?export=view&id=17RKrt5Y1RktzlGEN0aY5VvpzCBhUQGai)
![iv](https://drive.google.com/uc?export=view&id=16XtnYiMT3BVy2MqaZyqrJMyo0zx15od3)

### Is Federated Unlearning Necessary?

![v](https://drive.google.com/uc?export=view&id=1eT2JenfetnzYNAh8s5-7EnnSF_TI4Xg4)

### Are Markers Necessary for Verification?

![vi](https://drive.google.com/uc?export=view&id=158ObvpJFZbd-a3H511JktoHmoKo8MjfG)

![vii](https://drive.google.com/uc?export=view&id=1MH04T1Na_dDAy2qPTIo_NConmnJy_2fU)

![6](https://drive.google.com/uc?export=view&id=1yVrCHwvLAg3qf4SZHoHnasFtiDgdB4qM)

![7](https://drive.google.com/uc?export=view&id=1_3X05jp7E2XC3QufgjCJx-gkIzbXTBlq)

![8](https://drive.google.com/uc?export=view&id=1XRnNBt6GZUpGxrylSGRE9bbEioDOFxyM)

### Unlearning-Verification: The Combinations

![9](https://drive.google.com/uc?export=view&id=143GU7hILvZgz1MRu3K705dvVvuNgAri3)

# References
- X. Gao et al., VeriFi: Towards Verifiable Federated Unlearning. arXiv, 2022. doi: 10.48550/ARXIV.2205.12709. [[Paper](https://arxiv.org/abs/2205.12709)]
