# Unlearning in Federated Learning

Recently, federated learning has become popular in the field of machine learning. One typical federated learning scenario is building a machine learning model from healthcare information. Due to privacy regulations, the medical record data cannot leave the clients’ devices. Here, the clients could be hospitals or personal computers and are assumed to have machine learning environments. The server does not transmit the actual data to the global model. Rather, there is a communication protocol between the clients and servers that governs collaborative model training. In the literature, the communication protocol Federated Average (FedAvg) is typically used for model training. It consists of multiple rounds. Each round, the current global model weights are transmitted to the clients. Based on these weights, each client uses stochastic gradient descent to adjust their local model. Then the local model’s weights are forwarded to the server. In the final phase of each loop, the server aggregates the received weights by (weighted) averaging to prepare the global model for the next round.

![fedavg](https://drive.google.com/uc?export=view&id=1GpOM9SUloDjOxUjCPtKqlY5LNioy5geF)

Given such training protocols, machine unlearning cannot be extended easily to the federated learning setting. This is because the global weights are computed by aggregations rather than raw gradients. These are especially mixed up when many clients participate. Moreover, these clients might have some overlapping data, making it difficult to quantify the impact of each training item on the model weights. Using classic unlearning methods by gradient manipulation may even lead to severe accuracy degradation or new privacy threats.

Additionally, current studies on federated unlearning tend to assume that the data to be removed belongs wholly to one client. With this assumption, the historical contributions of particular clients to the global model’s training can be logged and erased easily. However, erasing historical parameter updates might still damage the global model, but there are many strategies for overcoming this issue. 

For example, [Liu et al.](https://ieeexplore.ieee.org/abstract/document/9521274) proposed calibration training to separate the individual contributions of clients as much as possible. This mechanism does not work well for deep neural networks, but it does work well with shallow architectures such as a 2-layer CNN or a network with two fully-connected layers. In addition, there is a trade-off between scalability and precision due to the cost of storing historical information on the federated server. 

[Wu et al.](https://arxiv.org/abs/2201.09441) put forward a knowledge distillation strategy that uses a prime global model to train the unlearned model on the remaining data. However, as the clients’ data is not accessible by the server, some unlabeled (synthetic) data that follows the distribution of the whole dataset needs to be sampled and extra rounds of information exchange are needed between the clients and server. As a result, the whole process is costly and approximate. Also, it might be further offset when the data is non-IID. 

On another spectrum, [Liu et al.](https://arxiv.org/abs/2203.07320) proposed a smart retraining method for federated unlearning without communication protocols. The approach uses the L-BFGS algorithm to efficient solve a Hessian approximation with historical parameter updates for global model retraining. However, this method is only applicable to small models (≤ 10K parameters). Plus, it involves storing old model snapshots (including historical gradients and parameters), which poses some privacy threats.

# References

- T. T. Nguyen, T. T. Huynh, P. L. Nguyen, A. W.-C. Liew, H. Yin, and Q. V. H. Nguyen, A Survey of Machine Unlearning. arXiv, 2022. [[Paper](https://arxiv.org/abs/2209.02299)]

- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Apr. 2017, vol. 54, pp. 1273–1282. [[Paper](https://proceedings.mlr.press/v54/mcmahan17a.html)]

