# Unlearning Definition

When fulfilling a removal request, the computer system needs
to remove all user’s data and ‘forget’ any influence on the models
that were trained on those data.

| Important Notations | Definition |
| :--: | :--: |
| $\mathcal{Z}$ | example space |
| $D$ | the training dataset |
| $D_f$  | forget set |
| $D_r = D \setminus D_f$  | retain set |
| $A(.)$ | a learning algorithm |
| $U(.)$ | an unlearning algorithm |
| $\mathcal{H}$ | hypothesis space of models |
| $w = A(D)$ | Parameters of the model trained on $D$ by $A$ |
| $w_r = A(D_r)$ | Parameters of the model trained on $D_r$ by $A$ |
| $w_u = U(.)$ | Parameters of the model unlearned by $U(.)$ |

To properly formulate an unlearning problem, we need to introduce a few concepts. First, let us denote $\mathcal{Z}$ as an example space, i.e., a space of data items or examples (called samples). Then, the set of all possible training datasets is denoted as $\mathcal{Z}^*$. One can argue that $\mathcal{Z}^* = 2^{\mathcal{Z}}$ but that is not important, as a particular training dataset $D \in Z^*$ is often given as input. Given $D$, we want to get a machine learning model from a hypothesis space $\mathcal{H}$. In general, the hypothesis space $\mathcal{H}$ covers the parameters and the meta-data of the models. Sometimes, it is modeled as $\mathcal{W} \times \Theta$, where $\mathcal{W}$ is the parameter space and $\Theta$ is the metadata/state space. The process of training a model on $D$ in the given computer system is enabled by a learning algorithm, denoted by a function $A: \mathcal{Z}^* \rightarrow \mathcal{H}$, with the trained model denoted as $A(D)$.

To support forgetting requests, the computer system needs to have an unlearning mechanism, denoted by a function $U$, that takes as input a training dataset $D \in Z^*$, a forget set $D_f \subset D$ (data to forget) and a model $A(D)$. It returns a sanitized (or unlearned) model $U(D, D_f, A(D)) \in \mathcal{H}$.
The unlearned model is expected to be the same or similar to a retrained model $A(D \setminus D_f)$ (i.e., a model as if it had been trained on the remaining data). Note that $A$ and $U$ are assumed to be randomized algorithms, i.e., the output is non-deterministic and can be modelled as a conditional probability distribution over the hypothesis space given the input data. This assumption is reasonable as many learning algorithms are inherently stochastic (e.g., SGD) and some floating-point operations involve randomness in computer implementations.

The core problem of machine unlearning involves the comparison between two distributions of machine learning models. 
Let $Pr(A(D))$ define the distribution of all models trained on a dataset $D$ by a learning algorithm $A(.)$. 
Let $Pr(U(D, D_f, A(D)))$ be the distribution of unlearned models. The reason why the output of $U(.)$ is modelled as a distribution rather than a single point is that learning algorithms $A(.)$ and unlearning algorithms $U(.)$ are randomized as mentioned above. Another note is that we do not define the function $U$ precisely before-hand as its definition varies with different settings.

## Exact unlearning (Perfect unlearning)

The core problem of machine unlearning involves the comparison between two distributions of machine learning models. Let $Pr(A(D))$ define the distribution of all models trained on a dataset $D$ by a learning algorithm $A(.)$. Let $Pr(U(D, D_f, A(D)))$ be the distribution of unlearned models. The reason why the output of $U(.)$ is modelled as a distribution rather than a single point is that learning algorithms $A(.)$ and unlearning algorithms $U(.)$ are randomized as mentioned above.

### Special case

*Given a learning algorithm $A(.)$, a dataset $D$, and a forget set $D_f \subseteq D$, we say the process $U(.)$ is an exact unlearning process iff*:

$$Pr(A(D \setminus D_f)) = Pr(U(D, D_f, A(D)))$$

Two key aspects can be drawn from this definition. First, the definition does not require that the model $A(D)$ be retrained from scratch on $D \setminus D_f$. Rather, it requires some evidence that it is likely to be a model that is trained from scratch on $D \setminus D_f$. Second, two models trained with the same dataset should belong to the same distribution. However, defining this distribution is tricky. So to avoid the unlearning algorithm being specific to a particular training dataset, we have a more general definition.

### General case

*Given a learning algorithm $A(.)$, we say the process $U(.)$ is an exact unlearning process iff $\forall \mathcal{T} \subseteq \mathcal{H}, D \in Z^*, D_f \subset D$*:

$$Pr(A(D \setminus D_f) \in \mathcal{T}) = Pr(U(D, D_f, A(D)) \in \mathcal{T})$$

This definition allows us to define a metric space the models belong to (and consequently for the distributions). A model can be viewed either as just a mapping of inputs to outputs in which case $Pr(.)$ are distributions over a function space (i.e., continuous function with the supremum metric), or as the specific parameters ${\theta}$ for a model architecture, in which case $Pr(.)$ are distributions over the weight space (e.g., some finite dimensional real vector space with the Euclidean norm). This ambiguity leads to two notions of exact unlearning:

- *Distribution of weights*: zero difference in the distribution of weights, i.e., $Pr(w_r) = Pr(w_u)$, where the parameters of models $w_r$ learned by $A(D_r)$ and $w_u$ are the parameters of the models given by $U(.)$.
- *Distribution of outputs*: zero difference in the distribution of outputs, i.e., $Pr(M(X; w_r))$ = $Pr(M(X; w_u))$, $\forall X \subseteq \mathcal{Z}$, where $M(.)$ is the parameterized mapping function from the input space $\mathcal{Z}$ to the output space (i.e., the machine learning model). This definition is sometimes referred to as *weak unlearning*.

If the unlearning mechanism $U(.)$ is implemented as retraining itself, equality is absolutely guaranteed. For this reason, retraining is sometimes considered to be the only exact unlearning method.
However, retraining inherently involves high computation costs, especially for large models. Another disadvantage of retraining is that it cannot deal with batch settings, where multiple removal requests happen simultaneously or are grouped in a batch. 

There are many different metrics for comparing numerical distributions over the output space and the weight space. However, doing so is expensive (e.g., generating a sample in these distributions involves training the whole model). To mitigate this issue, some approaches design an alternative metric on a point basis to compute the distance between two models, either in the output space or in the weight space.

## Approximate unlearning (Bounded/Certified unlearning)

Approximate unlearning approaches attempt to address these cost-related constraints. In lieu of retraining, these strategies: perform computationally less costly actions on the final weights; modify the architecture; or filter the outputs. Approximate unlearning relaxes the general definition of exact unlearning as follows:

### $\epsilon$-approximate unlearning

Given $\epsilon > 0$, an unlearning mechanism $U$ performs $\epsilon$-certified removal for a learning algorithm $A$ if $\forall \mathcal{T} \subseteq \mathcal{H}, D \in Z^*, z \in D$:

$$e^{-\epsilon}\leq \frac{Pr( U(D, z, A(D)) \in \mathcal{T})}{Pr(A(D \setminus z) \in \mathcal{T})} \leq e^{\epsilon}$$

where $z$ is the removed sample.

It is noteworthy that the above equation defines the bounds on a single sample $z$ only. It is still an open question as to whether constant bounds can be provided for bigger subsets of $D$. Moreover, the reason why we have the $[e^{-\epsilon}, e^{\epsilon}]$ bounds is that the probability distributions are often modeled by log functions, in which the above equation is equivalent to:

$$-\epsilon \leq \log \left[ Pr( U(D, z, A(D)) \in \mathcal{T}) - Pr(A(D \setminus z) \in \mathcal{T}) \right] \leq \epsilon$$

or:

$$\log || Pr( U(D, z, A(D)) \in \mathcal{T}) - Pr(A(D \setminus z) \in \mathcal{T}) || \leq \epsilon$$

where $|| . ||$ is an absolute distance metric on the weight space or the output space. A relaxed version of $\epsilon$-approximate unlearning is also defined in [Neel et al.](http://proceedings.mlr.press/v132/neel21a.html).

### ($\epsilon$,$\delta$)-approximate unlearning

*Given $\epsilon, \delta > 0$, an unlearning mechanism $U$ performs $\epsilon$-certified removal for a learning algorithm $A$ if $\forall \mathcal{T} \subseteq \mathcal{H}, D \in Z^*, z \in D$*:

$$Pr( U(D, z, A(D)) \in \mathcal{T}) \leq e^{\epsilon} {Pr(A(D \setminus z) \in \mathcal{T})} + \delta$$

and

$${Pr(A(D \setminus z) \in \mathcal{T})} \leq e^{\epsilon}  Pr( U(D, z, A(D)) \in \mathcal{T}) + \delta$$

In other words, $\delta$ upper bounds the probability for the max-divergence bound in $\epsilon$-approximate unlearning definition to fail.

### Relationship to differential privacy

Differential privacy states that:

$$\forall \mathcal{T} \subseteq \mathcal{H}, D, D': e^{-\epsilon} \leq \frac{Pr(A(D) \in \mathcal{T})}{Pr(A(D \setminus z) \in \mathcal{T})} \leq e^\epsilon$$

where $z$ is the removed sample. Differential privacy implies approximate unlearning: deleting the training data is not a concern if algorithm $A$ never memorises it in the first place. However, this is exactly the contradiction between differential privacy and machine unlearning. If $A$ is differentially private for any data, then it does not learn anything from the data itself. In other words, differential privacy is a very strong condition, and most differentially private models suffer a significant loss in accuracy even for large $\epsilon$.

# References

- T. T. Nguyen, T. T. Huynh, P. L. Nguyen, A. W.-C. Liew, H. Yin, and Q. V. H. Nguyen, A Survey of Machine Unlearning. arXiv, 2022. [[Paper](https://arxiv.org/abs/2209.02299)]