# Switching privacy definition in Session and privacy mechanism in aggregation

## Import Libraries

In [None]:
from pyspark.sql import SparkSession
import pandas as pd

In order to build any session and answer queries **QueryBuilder** and **Session** are needed

In [None]:
from tmlt.analytics.query_builder import QueryBuilder
from tmlt.analytics.privacy_budget import PureDPBudget, RhoZCDPBudget
from tmlt.analytics.session import Session

We also need to specify privacy_budget in Session and mechanism in query

In [None]:
from tmlt.analytics.query_expr import CountMechanism
from tmlt.core.measures import RhoZCDP, PureDP

## Load a Simple Dataset

We use a very simple dataset here to illustrate the example.

In [None]:
spark = SparkSession.builder.getOrCreate()
private_data = spark.createDataFrame(pd.DataFrame([["0", 1, 0], ["1", 0, 1]], columns=["A", "B", "X"]))

## Build Session with appropriate privacy budget

A privacy budget associates a privacy definition with one or more numeric values.

To build a session under rho-Zero-Concentrated Differential Privacy/RhoZCDP privacy definition, a RhoZCDPBudget privacy_budget is passed in as argument. The associated value is the rho privacy parameter.  

To build a session under Pure Differential Privacy/PureDP privacy definition, a PureDPBudget privacy_budget is passed. This privacy definition is also known as epsilon-differential privacy, and the associated value is the epsilon privacy parameter.  

In [None]:
# Session with RhoZCDP.
zcdp_sess = Session.from_dataframe(
    privacy_budget=RhoZCDPBudget(10),
    source_id="my_private_data",
    dataframe=private_data,
)

# Session with PureDP.
puredp_sess = Session.from_dataframe(
    privacy_budget=PureDPBudget(10),
    source_id="my_private_data",
    dataframe=private_data,
)

### Gaussian noise can be used with RhoZCDP.

This example illustrates the use of discrete Gaussian noise with RhoZCDP. We pass the `mechanism` argument to the query and specify discrete gaussian noise.

In [None]:
query_with_guassian = QueryBuilder("my_private_data").count(mechanism=CountMechanism.GAUSSIAN)

In [None]:
answer = zcdp_sess.evaluate(query_expr=query_with_guassian, privacy_budget=RhoZCDPBudget(1))
answer.show()

### Laplace or Geometric noise can be used with RhoZCDP or PureDP.

This example illustrates the use of Laplace noise with RhoZCDP and PureDP. We pass the `mechanism` argument to the query and specify Laplace noise. Laplace noise is used if measure column is floating-point valued, while double-sided geometric noise is used if measure column is integer-valued. Since count query is always integer-valued, double-sided geometric noise gets applied.

In [None]:
query_with_laplace = QueryBuilder("my_private_data").count(mechanism=CountMechanism.LAPLACE)

In [None]:
answer = zcdp_sess.evaluate(query_expr=query_with_laplace, privacy_budget=RhoZCDPBudget(1))
answer.show()

In [None]:
answer = puredp_sess.evaluate(query_expr=query_with_laplace, privacy_budget=PureDPBudget(1))
answer.show()

### Gaussian noise can not be used with PureDP.

This example illustrates that discrete Gaussian noise with PureDP is currently not supported.

In [None]:
try:
    puredp_sess.evaluate(
        query_expr=query_with_guassian,
        privacy_budget=PureDPBudget(1)
    )
except Exception as e:
    print(e)