## Attack idea from April 4

*The text below is from Salil's email describing the attack*


How about the following datasets.  Let n and U be powers of 2, and m=n/2.

u = m copies of U, followed by m/2 copies of alternating the values (U * m / 2^k) * (1 / 2 + 1 / 2^k) and -(U * m / 2^k) * (1 / 2 - 1 / 2^k)

If I'm correct, then since ULP(U * m) = U * m / 2^k, and when we're doing the alternation, all the rounding will be upward and BS^*(u) will be m * U + (m / 2) * (U * m / 2^k). 

v = same, but one less copy of U.

Then since ULP(U*(m-1)) <= (U * m / 2^k) * (1 / 2), when we're doing the alternation, we alternately round up and round down, so BS^* (v) = m * U

Overall, BS^* (u) - BS^* (v) = (m/2) * (U * m) / 2^k = U * (n^2) / (2^{k+3}).

So we get a blow-up in sensitivity by a factor of n^2/(2^{k+3}).  The key point is that we have an n^2 rather than an n in the numerator.  For example, taking n=2^{30}, we get a blow-up of 32 in the sensitivity.



In [None]:
from generate_datasets import unsized_64_consts, unsized_alternating_values
consts = unsized_64_consts()
U = consts['U']
m = consts['m']
neg_val, pos_val = unsized_alternating_values(consts)

## PipelineDP Setup

In [11]:
# set up pipelinedp
import pipeline_dp
from pipeline_dp import SumParams
from pipeline_dp.private_spark import make_private
import pyspark
import uuid
arr_u_file = 'arr_u.csv'
arr_v_file = 'arr_v.csv'


# to demonstrate this attack, we want to add no noise
def make_inf_accountant():
    # crashes opaquely if you pass `float('inf')`, so just set epsilon to a large value
    return pipeline_dp.NaiveBudgetAccountant(total_epsilon=1e20, total_delta=1.)

### Spark

In [92]:
# set up pyspark
import os
os.environ['JAVA_HOME'] = "/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home"

import pyspark
sc = pyspark.SparkContext()

Compute aggregates directly

In [95]:
# pyspark sums are less susceptible
sc.textFile(arr_u_file).map(float).sum()

134217727.99999999

In [45]:
sc.textFile(arr_v_file).map(float).sum()

134217727.0

Compute aggregates with PipelineDP

In [96]:
# 110 minutes
from pipeline_dp.private_spark import make_private

# when we compute a private sum, use laplace noise without partitioning
sum_params = SumParams(
    noise_kind=NoiseKind.LAPLACE,
    max_partitions_contributed=1,
    max_contributions_per_partition=1,
    min_value=neg_val,
    max_value=1,
    public_partitions=[0],
    partition_extractor=lambda _: 0,
    value_extractor=lambda v: v)

arr_u_rdd = sc.textFile(arr_u_file).map(float)

accountant = make_inf_accountant()
# Wrap Spark's RDD into its private version
# assign a random uuid user id to each row, to represent that each row is unique
private_arr_u_rdd = make_private(arr_u_rdd, accountant, lambda _: str(uuid.uuid4()))

# Calculate the private sum
dp_result = private_arr_u_rdd.sum(sum_params)
accountant.compute_budgets()

# not vulnerable, because the implementation uses the spark summer, which has previously shown to be robust
dp_result.collect()

[(0, 134217728.0)]

### Beam

In [3]:
import apache_beam as beam
from apache_beam.runners.portability import fn_api_runner

runner = fn_api_runner.FnApiRunner()



Compute aggregates directly. Shows that beam sum is susceptible.

In [98]:
# 53 minutes
with beam.Pipeline(runner=runner) as pipeline:
    pipeline \
        | beam.io.ReadFromText(arr_u_file) \
        | beam.Map(float) \
        | beam.CombineGlobally(sum) \
        | beam.Map(print)

134217730.0


Compute aggregates with PipelineDP

In [19]:
from pipeline_dp.private_beam import MakePrivate
from pipeline_dp.aggregate_params import NoiseKind
from pipeline_dp import private_beam, SumParams
import pipeline_dp

# crashes opaquely if public_partitions are set
sum_params = SumParams(
    noise_kind=NoiseKind.LAPLACE,
    max_partitions_contributed=1,
    max_contributions_per_partition=1,
    min_value=neg_val,
    max_value=1,
    # public_partitions=[0],
    partition_extractor=lambda _: 0,
    value_extractor=lambda v: v[1])

total_len = m + 2**27

with beam.Pipeline(runner=runner) as pipeline:
    accountant = make_inf_accountant()

    def parse_line(l):
        i, v = l.split(',')
        return int(i), float(v)

    def attack_transform(row):
        i, v = row
        if i < m:
            return i, U
        if i == m:
            return i, U * v
        if i % 2 == 1:
            return i, neg_val
        return i, pos_val
    
    # Load and parse input data
    dp_result = pipeline \
        | beam.io.ReadFromText('arr_u_i.csv') \
        | beam.Map(parse_line) \
        | MakePrivate(
            budget_accountant=accountant,
            privacy_id_extractor=lambda r: r[0]) \
        | private_beam.Map(attack_transform) \
        | private_beam.Sum(sum_params)

    accountant.compute_budgets()

    dp_result | beam.Map(print)