# In this tutorial we will cover how to use autodp for computing the most commonly used DP learning algorithms.

## 1. Noisy Full Gradient Descent

Suppose we are minimizing the following objective function:
$$\min_\theta \sum_{i=1}^n \ell(z_i,\theta)$$
Let parameter be $\theta\in \mathbb{R}^d$ and data points be $z_1,...,z_n \in \mathcal{Z}$.

The noisy gradient descent algorithm iteratively updates
$$\theta_{t+1} = \theta_t - \eta_t \left(\sum_{i=1}^n\nabla \ell(z_i,\theta_t) + \mathcal{N}(0,\Delta_t^2\sigma_t^2  I_d)\right). $$

In the above, $\sigma_t$ are determined ahread of time, $\Delta_t$ is an upper bound of $\sup_{z\in\mathcal{Z}} \|\nabla\ell(z,\theta_t)\|_2$, i.e., the **global sensitivity** of the full gradient $\sum_{i=1}^n\nabla \ell(z_i,\theta_t)$.

This algorithm from the differential privacy perspective, can be viewed as a sequence of standard Gaussian mechanisms (which admits tight composition and tight calibration) and you can use the following code-snippet to tightly compute the privacy loss after you run this algorithm.

### Cautionary notes:

- The objective function is a summation, not an empirical average. If it is an average you need to appropriately scale your update rule by a factor of $1/n$ **outside the brackets**.
- When $\Delta_t$ is unknown, you can ensure DP by clipping the *per-instance* gradient, i.e., use the sum of 
$$
\min\{1,\frac{\Delta_t}{\|\nabla \ell(z_i,\theta_t)\|_2 }\}\nabla \ell(z_i,\theta_t).
$$


In [1]:
from autodp.mechanism_zoo import GaussianMechanism 
import numpy as np


class NoisyGD_mech(GaussianMechanism):
    def __init__(self,sigma_list,name='NoisyGD'):
        GaussianMechanism.__init__(self, sigma=np.sqrt(1/np.sum(1/sigma_list**2)),name=name)
        self.params = {'sigma_list':sigma_list}

# The user could log sigma_list and then just declare a NoisyGD_mech object.
sigma_list = np.array([5.0,3.0,4.0])

mech = NoisyGD_mech(sigma_list)

# compute epsilon, as a function of delta
mech.get_approxDP(delta=1e-6)

  w = xb - ((xb - xc) * tmp2 - (xb - xa) * tmp1) / denom


2.0677358464697515

## 2. Noisy Stochastic Gradient Descent
 
Now, suppose we run the popular NoisySGD with Poisson sampling. 
    
The NoisySGD algorithm works by
$$\theta_{t+1} = \theta_t - \eta_t \left(\sum_{i\in \mathcal{I}_t}\min\{1,\frac{\Delta_t}{\|\nabla \ell(z_i,\theta_t)\|_2 }\}\nabla \ell(z_i,\theta_t) + \mathcal{N}(0,\Delta^2\sigma^2  I_d)\right) $$
where the minibatch $\mathcal{I}_t \subset [n]$ is constructed by **flipping one independent coin for each data point $i\in[n]$** to decide whether to include or exclude that data.


In the above, $\sigma_t$ are determined ahread of time, $\Delta_t$ is the per-instance clipping factor.  If the global sensitivity $\sup_{z} \ell(z,\theta_t) \leq \Delta_t$ (this is the Lipschitz constant. You may prove this in theor.) then clipping is not needed.

This algorithm from the differential privacy perspective, can be viewed as a sequence of **subsampled Gaussian mechanisms**  and you can use the following code-snippet to compute the privacy loss after you run this algorithm based on RDP-based Analytical Moments Accountant.


### Cautionary notes:

- The objective function is a summation, not an empirical average. If it is an average you need to appropriately scale your update rule by a factor of $1/(n\gamma)$ **outside the brackets**. Note that you cannot divide it by $|\mathcal{I}_t|$ because it is considered privacy information.

In [2]:
from autodp.autodp_core import Mechanism
from autodp.transformer_zoo import Composition 
from autodp import mechanism_zoo, transformer_zoo

class NoisySGD_mech(Mechanism):
    def __init__(self,prob,sigma,niter,name='NoisySGD'):
        Mechanism.__init__(self)
        self.name=name
        self.params={'prob':prob,'sigma':sigma,'niter':niter}
        
        # create such a mechanism as in previously
        subsample = transformer_zoo.AmplificationBySampling() # by default this is using poisson sampling
        mech = mechanism_zoo.GaussianMechanism(sigma=sigma)
        prob = prob
        # Create subsampled Gaussian mechanism
        SubsampledGaussian_mech = subsample(mech,prob,improved_bound_flag=True)

        # Now run this for niter iterations
        compose = transformer_zoo.Composition()
        mech = compose([SubsampledGaussian_mech],[niter])

        # Now we get it and let's extract the RDP function and assign it to the current mech being constructed
        rdp_total = mech.RenyiDP
        self.propagate_updates(rdp_total,type_of_update='RDP')

        
gamma = 0.01
        
noisysgd = NoisySGD_mech(prob=gamma,sigma=5.0,niter=1000)


# compute epsilon, as a function of delta
noisysgd.get_approxDP(delta=1e-6)


0.27105623043762284

## 3. Private Aggregation of Teacher Ensembles (PATE)  

This is a different, and increasingly popular type of private Deep Learning algorithm in the Knowledge Transfer model.

Specifically, there is a private dataset, and also a public (unlabeled) dataset. We will use the private dataset to privately release labels for the public dataset.  Let us say that there are $C$ classes.

PATE splits the private dataset into $k$ disjoint parts and for each part, train a supervised learner ("Teacher"). Then for each data point $x_i$ from $i=1,...,m$ in the public dataset, privately release the following

$$
\hat{p}_i =  \frac{1}{k} \left(\sum_{j}  \hat{p}^{(j)}(x_i)  + \mathcal{N}(0, 2 \sigma^2 I_C)\right) 
$$
where $\hat{p}^{(j)}$ is the soft-max prediction function of Teacher $j$, which returns a probability distribution on the labels (a $C$-dimensional vector).


From the differential privacy point of view this is nothing but composing $m$ Gaussian mechanisms, i.e., same as NoisyGD above. The global L2 sensitivity is $\sqrt{2}$ (hence the factor of $2$ in front of $\sigma^2$.) This is because adding or removing one individual data point will affect the prediction of only one of the teachers. The largest possible changes in L2 is $\sqrt{2}$.

### Notes:
- In the above, we described the version that aggregates the soft-label and release the average soft-label from the k-teachers.  You could also aggregate the hard label,  release either the voting scores or just the  voted label. The same code-snippet below works without changes. There will be some (data-dependent) privacy benefits in doing those which we do not cover in this tutorial.
- You could **screen the data points** and collect labels only for those that are **worthy**, so as to reduce $m$, hence the total privacy loss.

 


In [3]:
# We will be reusing the `NoisyGD_mech' class defined earlier. 

sigma = 5.0
m=100

sigma_list = sigma * np.ones(shape=(m,1))
pate = NoisyGD_mech(sigma_list,name='PATE_gaussian')
# compute epsilon, as a function of delta
pate.get_approxDP(delta=1e-6)

10.99715121422065

## 4. Private KNN

Private KNN is an alternative algorithm in the knowledge transfer setting that allows us to label more public data points than PATE.  The idea is to subsample the data set, find the $k$ closest data points to the input $x_i$ from the sampled subset, then privately release the voted labels.

1. Sample minibatch $\mathcal{I}_t \subset [n]$  by **flipping one independent coin for each data point $i\in[n]$** to decide whether to include or exclude that data point.
2. Find the closest $k$ data points within  $\mathcal{I}_t(x_i)\subset \mathcal{I}_t$ using your favorite distance function $\text{dist}(\cdot,\cdot)$.
3. Private release the voting scores
$$
\hat{p}_i =  \frac{1}{k} \left(\sum_{j \in \mathcal{I}_t(x_i)}  \textrm{OneHot}(y_j)  + \mathcal{N}(0, 2 \sigma^2 I_C)\right)
$$ 


From the DP perspective this is a composition of the Poisson Subsampled Gaussian Mechanism, which is equivalent to that of NoisySGD. So the following snippets suffices. The L2 global sensitivity is again $\sqrt{2}$ here.




In [4]:
# We will reuse the noisySGD class

# number of data points

gamma = 0.01
m=1000
sigma = 5.0

privateKNN = NoisySGD_mech(prob=gamma,sigma=sigma,niter=m)

# compute epsilon, as a function of delta
privateKNN.get_approxDP(delta=1e-6)

0.27105623043762284

## 5. Laplace-noise version of NoisyGD, NoisySGD and PATE/PrivatekNN

If you need pure differential privacy, you could replace Gaussian noise with Laplace noise above and when getting approximate DP, choose delta = 0.  

You could also get a similar (or slightly smaller) epsilon when you choose delta > 0, but the noise is more heavy-tailed thus it may affect the performance of your algorithm.


Code snippets below.

### Caution:
- Note that Laplace mechanism requires bounded L1 sensitivity, therefore clipping of the per-instance gradients need to be by clipping them to have bounded L1 norm.
- Similarly, the global L1 sensitivity in voting schemes in PATE and PrivateKNN is 2.



In [5]:
# NoisyGD and PATE with Laplace Mechanism

# ------------------------------------------
class Composed_Laplace_mech(Mechanism):
    def __init__(self,b,niter,name='Composed_Laplace'):
        Mechanism.__init__(self)
        self.name=name
        self.params={ 'b':b,'niter':niter}
        
        # create such a mechanism as in previously 
        laplace_mech = mechanism_zoo.LaplaceMechanism(b=b)  
        # Now run this for niter iterations
        compose = transformer_zoo.Composition()
        mech = compose([laplace_mech],[niter])
        # Now we get it and let's extract the RDP function and assign it to the current mech being constructed
        rdp_total = mech.RenyiDP
        self.propagate_updates(rdp_total,type_of_update='RDP')

        
#  Parameters
b = 5.0
m = 100

NoisyGD_laplace = Composed_Laplace_mech(b=b,niter=m,name="NoisyGD_Laplace")

PATE_laplace = Composed_Laplace_mech(b=b,niter=m,name="PATE_Laplace")
        
print(PATE_laplace.get_approxDP(delta=1e-6))

# --------------------------------------------

# NoisySGD and PrivateKNN with Poisson Sampled Laplace Mechanism
class Composed_SubsampledLaplace_mech(Mechanism):
    def __init__(self,prob,b,niter,name='Composed_SubsampledLaplace'):
        Mechanism.__init__(self)
        self.name=name
        self.params={'prob':prob,'b':b,'niter':niter}
        
        # create such a mechanism as in previously
        subsample = transformer_zoo.AmplificationBySampling() # by default this is using poisson sampling
        laplace_mech = mechanism_zoo.LaplaceMechanism(b=b)  
        prob = prob
        # Create subsampled Gaussian mechanism
        Subsampled_laplace_mech = subsample(laplace_mech,prob,improved_bound_flag=True)

        # Now run this for niter iterations
        compose = transformer_zoo.Composition()
        mech = compose([Subsampled_laplace_mech],[niter])

        # Now we get it and let's extract the RDP function and assign it to the current mech being constructed
        rdp_total = mech.RenyiDP
        self.propagate_updates(rdp_total,type_of_update='RDP')
        

#  Parameters
b = 5.0
m = 1000
prob=0.01

NoisySGD_laplace = Composed_SubsampledLaplace_mech(prob=gamma, b=b,niter=m,name="NoisySGD_Laplace")

PrivateKNN_laplace = Composed_SubsampledLaplace_mech(prob=gamma, b=b,niter=m,name="PrivateKNN_Laplace")
        
print(PrivateKNN_laplace.get_approxDP(delta=1e-6))
        

10.850346958780321
0.2569326333160541
