# The notebook contains
### Code for _Bulyan_ aggregation algorithm, *when gradient updates of benign clients are unknown to adversary*
### Evaluation of all the attacks (Fang, LIE, and our AGR-agnstic) on Multi-krum, except our AGR-tailored attack on Bulyan

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

In [2]:
from __future__ import print_function
import argparse, os, sys, csv, shutil, time, random, operator, pickle, ast, math, json
import numpy as np
import pandas as pd
from torch.optim import Optimizer
import torch.nn.functional as F
import torch
import pickle
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data as data
import torch.multiprocessing as mp

sys.path.insert(0,'./../utils/')
from logger import *
from eval import *
from misc import *

from femnist_normal_train import *
from femnist_util import *
from adam import Adam
from sgd import SGD
import torchvision.transforms as transforms
import torchvision.datasets as datasets

## Get the FEMNIST dataset; we use [LEAF framework](https://leaf.cmu.edu/)

In [3]:
user_tr_data = []
user_tr_labels = []

for i in range(34):
    f = '/mnt/nfs/work1/amir/vshejwalkar/leaf/data/femnist/data/train/all_data_%d_niid_0_keep_0_train_9.json'%i
    with open(f, 'r') as myfile:
        data=myfile.read()
    obj = json.loads(data)
    
    for user in obj['users']:
        user_tr_data.append(obj['user_data'][user]['x'])
        user_tr_labels.append(obj['user_data'][user]['y'])

user_te_data = []
user_te_labels = []

for i in range(34):
    f = '/mnt/nfs/work1/amir/vshejwalkar/leaf/data/femnist/data/test/all_data_%d_niid_0_keep_0_test_9.json'%i
    with open(f, 'r') as myfile:
        data=myfile.read()
    obj = json.loads(data)
    
    for user in obj['users']:
        user_te_data.append(obj['user_data'][user]['x'])
        user_te_labels.append(obj['user_data'][user]['y'])

In [4]:
user_tr_data_tensors=[]
user_tr_label_tensors=[]

for i in range(len(user_tr_data)):
    
    user_tr_data_tensor=torch.from_numpy(np.array(user_tr_data[i])).type(torch.FloatTensor)
    user_tr_label_tensor=torch.from_numpy(np.array(user_tr_labels[i])).type(torch.LongTensor)

    user_tr_data_tensors.append(user_tr_data_tensor)
    user_tr_label_tensors.append(user_tr_label_tensor)
    
    # print('user %d tr len %d'%(i,len(user_tr_data_tensor)))
print("number of clients: ", len(user_tr_data_tensors))

number of clients:  3400


In [5]:
te_data = np.concatenate(user_te_data, 0)
te_labels = np.concatenate(user_te_labels)
te_len = len(te_labels)

te_data_tensor = torch.from_numpy(te_data[:(te_len//2)]).type(torch.FloatTensor)
te_label_tensor = torch.from_numpy(te_labels[:(te_len//2)]).type(torch.LongTensor)

val_data_tensor = torch.from_numpy(te_data[(te_len//2):]).type(torch.FloatTensor)
val_label_tensor = torch.from_numpy(te_labels[(te_len//2):]).type(torch.LongTensor)

## Code for Bulyan aggregation algorithm

In [6]:
def bulyan(all_updates, n_attackers):
    nusers = all_updates.shape[0]
    bulyan_cluster = []
    candidate_indices = []
    remaining_updates = all_updates
    all_indices = np.arange(len(all_updates))

    while len(bulyan_cluster) < (nusers - 2 * n_attackers):
        distances = []
        for update in remaining_updates:
            distance = torch.norm((remaining_updates - update), dim=1) ** 2
            distances = distance[None, :] if not len(distances) else torch.cat((distances, distance[None, :]), 0)

        distances = torch.sort(distances, dim=1)[0]

        scores = torch.sum(distances[:, :len(remaining_updates) - 2 - n_attackers], dim=1)
        indices = torch.argsort(scores)[:len(remaining_updates) - 2 - n_attackers]

        candidate_indices.append(all_indices[indices[0].cpu().numpy()])
        all_indices = np.delete(all_indices, indices[0].cpu().numpy())
        bulyan_cluster = remaining_updates[indices[0]][None, :] if not len(bulyan_cluster) else torch.cat((bulyan_cluster, remaining_updates[indices[0]][None, :]), 0)
        remaining_updates = torch.cat((remaining_updates[:indices[0]], remaining_updates[indices[0] + 1:]), 0)

    # print('dim of bulyan cluster ', bulyan_cluster.shape)

    n, d = bulyan_cluster.shape
    param_med = torch.median(bulyan_cluster, dim=0)[0]
    sort_idx = torch.argsort(torch.abs(bulyan_cluster - param_med), dim=0)
    sorted_params = bulyan_cluster[sort_idx, torch.arange(d)[None, :]]

    return torch.mean(sorted_params[:n - 2 * n_attackers], dim=0), np.array(candidate_indices)

## Code for Multi-krum aggregation algorithm

In [7]:
def multi_krum(all_updates, n_attackers, multi_k=False):
    nusers = all_updates.shape[0]
    candidates = []
    candidate_indices = []
    remaining_updates = all_updates
    all_indices = np.arange(len(all_updates))

    while len(remaining_updates) > 2 * n_attackers + 2:
        distances = []
        for update in remaining_updates:
            distance = torch.norm((remaining_updates - update), dim=1) ** 2
            distances = distance[None, :] if not len(distances) else torch.cat((distances, distance[None, :]), 0)

        distances = torch.sort(distances, dim=1)[0]
        scores = torch.sum(distances[:, :len(remaining_updates) - 2 - n_attackers], dim=1)
        indices = torch.argsort(scores)[:len(remaining_updates) - 2 - n_attackers]

        candidate_indices.append(all_indices[indices[0].cpu().numpy()])
        all_indices = np.delete(all_indices, indices[0].cpu().numpy())
        candidates = remaining_updates[indices[0]][None, :] if not len(candidates) else torch.cat((candidates, remaining_updates[indices[0]][None, :]), 0)
        remaining_updates = torch.cat((remaining_updates[:indices[0]], remaining_updates[indices[0] + 1:]), 0)
        if not multi_k:
            break
    # print(len(remaining_updates))
    aggregate = torch.mean(candidates, dim=0)
    return aggregate, np.array(candidate_indices)


## Code for Fang attack on Bulyan
### Fang attacks on MKrum and Bulyan are the same

In [8]:
def compute_lambda(all_updates, model_re, n_attackers):

    distances = []
    n_benign, d = all_updates.shape
    for update in all_updates:
        distance = torch.norm((all_updates - update), dim=1)
        distances = distance[None, :] if not len(distances) else torch.cat((distances, distance[None, :]), 0)

    distances[distances == 0] = 10000
    distances = torch.sort(distances, dim=1)[0]
    scores = torch.sum(distances[:, :n_benign - 2 - n_attackers], dim=1)
    min_score = torch.min(scores)
    term_1 = min_score / ((n_benign - n_attackers - 1) * torch.sqrt(torch.Tensor([d]))[0])
    max_wre_dist = torch.max(torch.norm((all_updates - model_re), dim=1)) / (torch.sqrt(torch.Tensor([d]))[0])

    return (term_1 + max_wre_dist)


def get_malicious_updates_fang(all_updates, model_re, deviation, n_attackers):

    lamda = compute_lambda(all_updates, model_re, n_attackers)

    threshold = 1e-5
    mal_update = []

    while lamda > threshold:
        mal_update = (-lamda * deviation)
        mal_updates = torch.stack([mal_update] * n_attackers)
        mal_updates = torch.cat((mal_updates, all_updates), 0)

        # print(mal_updates.shape, n_attackers)
        agg_grads, krum_candidate = multi_krum(mal_updates, n_attackers, multi_k=False)
        if krum_candidate < n_attackers:
            # print('successful lamda is ', lamda)
            return mal_update
        else:
            mal_update = []

        lamda *= 0.5

    if not len(mal_update):
        mal_update = (model_re - lamda * deviation)
        
    return mal_update

In [9]:
resume=0
nepochs=1500
gamma=.1
fed_lr=0.001

criterion = nn.CrossEntropyLoss()
use_cuda = torch.cuda.is_available()
batch_size = 100
schedule = [2000]

aggregation = 'mkrum'
at_type = 'fang'
chkpt = './' + aggregation
epoch_num = 0

at_fractions = [20]

for at_fraction in at_fractions:

    fed_model = mnist_conv().cuda()
    fed_model.apply(weights_init)
    optimizer_fed = Adam(fed_model.parameters(), lr=fed_lr)

    print('==> Initializing global model')
    epoch_num = 0
    n_attacker = 0
    best_global_acc=0
    best_global_te_acc=0

    while epoch_num <= nepochs:
        user_grads = []

        # The following condition is necessary for Krum/Multi-krum/Bulyan to work
        while n_attacker < 4:
            round_users = np.random.choice(3400, 60)
            n_attacker = np.sum(round_users < (34*at_fraction))

        if n_attacker > 14:
            print ('n_attackers actual %d adjusted 14' % n_attacker)
            n_attacker = 14

        at_idx = []
        attacker_count = 0
        for i in round_users:
            if i < (34*at_fraction) and attacker_count < n_attacker:
                at_idx.append(i)
                attacker_count += 1
                continue

            inputs = user_tr_data_tensors[i]
            targets = user_tr_label_tensors[i]

            inputs, targets = inputs.cuda(), targets.cuda()
            inputs, targets = torch.autograd.Variable(inputs), torch.autograd.Variable(targets)

            outputs = fed_model(inputs)
            loss = criterion(outputs, targets)
            optimizer_fed.zero_grad()
            loss.backward(retain_graph=True)

            param_grad=[]
            for param in fed_model.parameters():
                param_grad=param.grad.data.view(-1) if not len(param_grad) else torch.cat((param_grad,param.grad.view(-1)))

            user_grads=param_grad[None,:] if len(user_grads)==0 else torch.cat((user_grads,param_grad[None,:]),0)    

        if n_attacker > 0:
            attacker_grads = []
            n_attacker_ = max(1, n_attacker**2//60)
            for i in at_idx:

                inputs = user_tr_data_tensors[i]
                targets = user_tr_label_tensors[i]

                inputs, targets = inputs.cuda(), targets.cuda()
                inputs, targets = torch.autograd.Variable(inputs), torch.autograd.Variable(targets)

                outputs = fed_model(inputs)
                loss = criterion(outputs, targets)
                optimizer_fed.zero_grad()
                loss.backward(retain_graph=True)

                param_grad=[]
                for param in fed_model.parameters():
                    param_grad=param.grad.data.view(-1) if not len(param_grad) else torch.cat((param_grad,param.grad.view(-1)))

                attacker_grads=param_grad[None,:] if len(attacker_grads)==0 else torch.cat((attacker_grads,param_grad[None,:]),0)

            mal_updates = []
            if at_type == 'fang':
                agg_grads = torch.mean(attacker_grads, 0)
                deviation = torch.sign(agg_grads)
                mal_update = get_malicious_updates_fang(attacker_grads, agg_grads, deviation, n_attacker_)

            elif at_type == 'our-agr':
                agg_grads = torch.mean(attacker_grads, 0)
                mal_update = our_attack_mkrum(attacker_grads, agg_grads, n_attacker_, dev_type='sign')
            
        if not len(mal_updates):
            mal_updates = torch.stack([mal_update] * n_attacker)
        malicious_grads = torch.cat((mal_updates, user_grads), 0)    
        
        if malicious_grads.shape[0] != 60: 
            print('malicious grads shape ', malicious_grads.shape)
            sys.exit()

        multi_k = True if aggregation == 'mkrum' else False
        if epoch_num == 0: print('multi krum is ', multi_k)
        agg_grads, krum_candidate = multi_krum(malicious_grads, n_attacker, multi_k=multi_k)
            
        start_idx=0

        if epoch_num in schedule:
            for param_group in optimizer_fed.param_groups:
                param_group['lr'] *= gamma
                print('New learnin rate ', param_group['lr'])

        optimizer_fed.zero_grad()

        model_grads=[]

        for i, param in enumerate(fed_model.parameters()):
            param_=agg_grads[start_idx:start_idx+len(param.data.view(-1))].reshape(param.data.shape)
            start_idx=start_idx+len(param.data.view(-1))
            param_=param_.cuda()
            model_grads.append(param_)

        optimizer_fed.step(model_grads)

        val_loss, val_acc = test(val_data_tensor,val_label_tensor,fed_model,criterion,use_cuda)
        te_loss, te_acc = test(te_data_tensor,te_label_tensor, fed_model, criterion, use_cuda)

        is_best = best_global_acc < val_acc

        best_global_acc = max(best_global_acc, val_acc)

        if is_best:
            best_global_te_acc = te_acc

        if epoch_num % 20 == 0:
            print('%s: at %s at_frac %.1f n_at %d n_mal_sel %d e %d fed_model val loss %.4f val acc %.4f best val_acc %f te_acc %f'%(aggregation, at_type, at_fraction, n_attacker, np.sum(krum_candidate < n_attacker), epoch_num, val_loss, val_acc, best_global_acc,best_global_te_acc))

        epoch_num+=1

==> Initializing global model
multi krum is  True


	add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
	add_(Tensor other, *, Number alpha) (Triggered internally at  /opt/conda/conda-bld/pytorch_1603729138878/work/torch/csrc/utils/python_arg_parser.cpp:882.)
  exp_avg.mul_(beta1).add_(1 - beta1, grad)


mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 0 fed_model val loss 3.9978 val acc 5.6168 best val_acc 5.616763 te_acc 6.059514
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 20 fed_model val loss 3.6510 val acc 17.0923 best val_acc 24.853274 te_acc 27.378501
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 8 e 40 fed_model val loss 3.0469 val acc 32.9077 best val_acc 32.907743 te_acc 37.286347
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 60 fed_model val loss 2.2733 val acc 42.3162 best val_acc 42.316207 te_acc 45.904551
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 80 fed_model val loss 1.9151 val acc 49.7271 best val_acc 49.860997 te_acc 52.404242
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 100 fed_model val loss 1.7264 val acc 54.1083 best val_acc 54.108320 te_acc 56.059514
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 120 fed_model val loss 1.5930 val acc 56.8240 best val_acc 57.300247 te_acc 58.674835
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 140 fed_model v

mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 1220 fed_model val loss 1.4674 val acc 73.5379 best val_acc 74.078460 te_acc 74.119646
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 1240 fed_model val loss 1.4735 val acc 73.6666 best val_acc 74.078460 te_acc 74.119646
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 1260 fed_model val loss 1.4893 val acc 73.7104 best val_acc 74.078460 te_acc 74.119646
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 1280 fed_model val loss 1.5089 val acc 73.3860 best val_acc 74.078460 te_acc 74.119646
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 1300 fed_model val loss 1.5288 val acc 73.3242 best val_acc 74.078460 te_acc 74.119646
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 1320 fed_model val loss 1.5697 val acc 73.5508 best val_acc 74.078460 te_acc 74.119646
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e 1340 fed_model val loss 1.5230 val acc 72.7425 best val_acc 74.078460 te_acc 74.119646
mkrum: at fang at_frac 20.0 n_at 9 n_mal_sel 9 e

## Our first AGR-agnostic attack - Min-Max

In [10]:
def our_attack_dist(all_updates, model_re, n_attackers, dev_type='unit_vec'):

    if dev_type == 'unit_vec':
        deviation = model_re / torch.norm(model_re)  # unit vector, dir opp to good dir
    elif dev_type == 'sign':
        deviation = torch.sign(model_re)
    elif dev_type == 'std':
        deviation = torch.std(all_updates, 0)

    lamda = torch.Tensor([50.0]).float().cuda()
    # print(lamda)
    threshold_diff = 1e-5
    lamda_fail = lamda
    lamda_succ = 0
    
    distances = []
    for update in all_updates:
        distance = torch.norm((all_updates - update), dim=1) ** 2
        distances = distance[None, :] if not len(distances) else torch.cat((distances, distance[None, :]), 0)
    
    max_distance = torch.max(distances)
    del distances

    while torch.abs(lamda_succ - lamda) > threshold_diff:
        mal_update = (model_re - lamda * deviation)
        distance = torch.norm((all_updates - mal_update), dim=1) ** 2
        max_d = torch.max(distance)
        
        if max_d <= max_distance:
            # print('successful lamda is ', lamda)
            lamda_succ = lamda
            lamda = lamda + lamda_fail / 2
        else:
            lamda = lamda - lamda_fail / 2

        lamda_fail = lamda_fail / 2

    mal_update = (model_re - lamda_succ * deviation)
    
    return mal_update

In [11]:
resume=0
nepochs=1500
gamma=.1
fed_lr=0.001

criterion = nn.CrossEntropyLoss()
use_cuda = torch.cuda.is_available()
batch_size = 100
schedule = [2000]

aggregation = 'bulyan'
at_type = 'min-max'
chkpt = './' + aggregation
epoch_num = 0

at_fractions = [20]

for at_fraction in at_fractions:

    fed_model = mnist_conv().cuda()
    fed_model.apply(weights_init)
    optimizer_fed = Adam(fed_model.parameters(), lr=fed_lr)

    print('==> Initializing global model')
    epoch_num = 0
    n_attacker = 0
    best_global_acc=0
    best_global_te_acc=0

    while epoch_num <= nepochs:
        user_grads = []

        # The following condition is necessary for Krum/Multi-krum/Bulyan to work
        while n_attacker < 4:
            round_users = np.random.choice(3400, 60)
            n_attacker = np.sum(round_users < (34*at_fraction))

        if n_attacker > 14:
            print ('n_attackers actual %d adjusted 14' % n_attacker)
            n_attacker = 14

        at_idx = []
        attacker_count = 0
        for i in round_users:
            if i < (34*at_fraction) and attacker_count < n_attacker:
                at_idx.append(i)
                attacker_count += 1
                continue

            inputs = user_tr_data_tensors[i]
            targets = user_tr_label_tensors[i]

            inputs, targets = inputs.cuda(), targets.cuda()
            inputs, targets = torch.autograd.Variable(inputs), torch.autograd.Variable(targets)

            outputs = fed_model(inputs)
            loss = criterion(outputs, targets)
            optimizer_fed.zero_grad()
            loss.backward(retain_graph=True)

            param_grad=[]
            for param in fed_model.parameters():
                param_grad=param.grad.data.view(-1) if not len(param_grad) else torch.cat((param_grad,param.grad.view(-1)))

            user_grads=param_grad[None,:] if len(user_grads)==0 else torch.cat((user_grads,param_grad[None,:]),0)    

        if n_attacker > 0:
            attacker_grads = []
            n_attacker_ = max(1, n_attacker**2//60)
            for i in at_idx:

                inputs = user_tr_data_tensors[i]
                targets = user_tr_label_tensors[i]

                inputs, targets = inputs.cuda(), targets.cuda()
                inputs, targets = torch.autograd.Variable(inputs), torch.autograd.Variable(targets)

                outputs = fed_model(inputs)
                loss = criterion(outputs, targets)
                optimizer_fed.zero_grad()
                loss.backward(retain_graph=True)

                param_grad=[]
                for param in fed_model.parameters():
                    param_grad=param.grad.data.view(-1) if not len(param_grad) else torch.cat((param_grad,param.grad.view(-1)))

                attacker_grads=param_grad[None,:] if len(attacker_grads)==0 else torch.cat((attacker_grads,param_grad[None,:]),0)

            mal_updates = []
            if at_type == 'fang':
                agg_grads = torch.mean(attacker_grads, 0)
                deviation = torch.sign(agg_grads)
                mal_update = get_malicious_updates_fang(attacker_grads, agg_grads, deviation, n_attacker_)
            elif at_type == 'our-agr':
                agg_grads = torch.mean(attacker_grads, 0)
                mal_update = our_attack_mkrum(attacker_grads, agg_grads, n_attacker_, dev_type='sign')
            elif at_type == 'min-max':
                agg_grads = torch.mean(malicious_grads, 0)
                mal_update = our_attack_dist(attacker_grads, agg_grads, n_attacker_, dev_type='sign')
            elif at_type == 'min-sum':
                agg_grads = torch.mean(malicious_grads, 0)
                mal_update = our_attack_score(attacker_grads, agg_grads, n_attacker_, dev_type='sign')

        if not len(mal_updates):
            mal_updates = torch.stack([mal_update] * n_attacker)
        malicious_grads = torch.cat((mal_updates, user_grads), 0)    
        
        if malicious_grads.shape[0] != 60: 
            print('malicious grads shape ', malicious_grads.shape)
            sys.exit()

        multi_k = True if aggregation == 'mkrum' else False
        if epoch_num == 0: print('multi krum is ', multi_k)
        agg_grads, krum_candidate = multi_krum(malicious_grads, n_attacker, multi_k=multi_k)

        start_idx=0

        if epoch_num in schedule:
            for param_group in optimizer_fed.param_groups:
                param_group['lr'] *= gamma
                print('New learnin rate ', param_group['lr'])

        optimizer_fed.zero_grad()

        model_grads=[]

        for i, param in enumerate(fed_model.parameters()):
            param_=agg_grads[start_idx:start_idx+len(param.data.view(-1))].reshape(param.data.shape)
            start_idx=start_idx+len(param.data.view(-1))
            param_=param_.cuda()
            model_grads.append(param_)

        optimizer_fed.step(model_grads)

        val_loss, val_acc = test(val_data_tensor,val_label_tensor,fed_model,criterion,use_cuda)
        te_loss, te_acc = test(te_data_tensor,te_label_tensor, fed_model, criterion, use_cuda)

        is_best = best_global_acc < val_acc

        best_global_acc = max(best_global_acc, val_acc)

        if is_best:
            best_global_te_acc = te_acc

        if epoch_num % 20 == 0:
            print('%s: at %s at_frac %.1f n_at %d n_mal_sel %d e %d fed_model val loss %.4f val acc %.4f best val_acc %f te_acc %f'%(aggregation, at_type, at_fraction, n_attacker, np.sum(krum_candidate < n_attacker), epoch_num, val_loss, val_acc, best_global_acc,best_global_te_acc))

        epoch_num+=1

==> Initializing global model
multi krum is  False
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 0 fed_model val loss 3.9115 val acc 4.5356 best val_acc 4.535626 te_acc 5.037582
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 20 fed_model val loss 3.9039 val acc 4.7055 best val_acc 9.596376 te_acc 10.474156
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 40 fed_model val loss 3.8798 val acc 17.7306 best val_acc 17.730643 te_acc 19.586594
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 60 fed_model val loss 3.6546 val acc 21.6124 best val_acc 21.612438 te_acc 24.423394
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 80 fed_model val loss 3.2039 val acc 29.0028 best val_acc 29.625721 te_acc 33.123970
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 100 fed_model val loss 2.8863 val acc 31.4044 best val_acc 31.507414 te_acc 35.227039
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 120 fed_model val loss 2.8507 val acc 33.5925 best val_acc 36.555292 te_a

bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 1180 fed_model val loss 2.9625 val acc 51.9332 best val_acc 53.325783 te_acc 55.624485
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 1200 fed_model val loss 2.9529 val acc 52.2833 best val_acc 53.325783 te_acc 55.624485
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 1220 fed_model val loss 3.0344 val acc 51.9100 best val_acc 53.325783 te_acc 55.624485
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 1240 fed_model val loss 3.0001 val acc 52.3296 best val_acc 53.325783 te_acc 55.624485
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 1260 fed_model val loss 3.1551 val acc 51.7710 best val_acc 53.325783 te_acc 55.624485
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 1280 fed_model val loss 3.2689 val acc 52.4815 best val_acc 53.325783 te_acc 55.624485
bulyan: at min-max at_frac 20.0 n_at 9 n_mal_sel 0 e 1300 fed_model val loss 3.1851 val acc 52.8779 best val_acc 53.325783 te_acc 55.624485
bulyan: at min-max a

## Our second AGR-agnostic attack - Min-Sum

In [12]:
def our_attack_score(all_updates, model_re, n_attackers, dev_type='unit_vec'):

    if dev_type == 'unit_vec':
        deviation = model_re / torch.norm(model_re)  # unit vector, dir opp to good dir
    elif dev_type == 'sign':
        deviation = torch.sign(model_re)
    elif dev_type == 'std':
        deviation = torch.std(all_updates, 0)
    
    lamda = torch.Tensor([50.0]).float().cuda()
    # print(lamda)
    threshold_diff = 1e-5
    lamda_fail = lamda
    lamda_succ = 0
    
    distances = []
    for update in all_updates:
        distance = torch.norm((all_updates - update), dim=1) ** 2
        distances = distance[None, :] if not len(distances) else torch.cat((distances, distance[None, :]), 0)
    
    scores = torch.sum(distances, dim=1)
    min_score = torch.min(scores)
    del distances

    while torch.abs(lamda_succ - lamda) > threshold_diff:
        mal_update = (model_re - lamda * deviation)
        distance = torch.norm((all_updates - mal_update), dim=1) ** 2
        score = torch.sum(distance)
        
        if score <= min_score:
            # print('successful lamda is ', lamda)
            lamda_succ = lamda
            lamda = lamda + lamda_fail / 2
        else:
            lamda = lamda - lamda_fail / 2

        lamda_fail = lamda_fail / 2

    mal_update = (model_re - lamda_succ * deviation)
    return mal_update

In [13]:
resume=0
nepochs=1500
gamma=.1
fed_lr=0.001

criterion = nn.CrossEntropyLoss()
use_cuda = torch.cuda.is_available()
batch_size = 100
schedule = [2000]

aggregation = 'bulyan'
at_type = 'min-sum'
chkpt = './' + aggregation
epoch_num = 0

at_fractions = [20]

for at_fraction in at_fractions:

    fed_model = mnist_conv().cuda()
    fed_model.apply(weights_init)
    optimizer_fed = Adam(fed_model.parameters(), lr=fed_lr)

    print('==> Initializing global model')
    epoch_num = 0
    n_attacker = 0
    best_global_acc=0
    best_global_te_acc=0

    while epoch_num <= nepochs:
        user_grads = []

        # The following condition is necessary for Krum/Multi-krum/Bulyan to work
        while n_attacker < 4:
            round_users = np.random.choice(3400, 60)
            n_attacker = np.sum(round_users < (34*at_fraction))

        if n_attacker > 14:
            print ('n_attackers actual %d adjusted 14' % n_attacker)
            n_attacker = 14

        at_idx = []
        attacker_count = 0
        for i in round_users:
            if i < (34*at_fraction) and attacker_count < n_attacker:
                at_idx.append(i)
                attacker_count += 1
                continue

            inputs = user_tr_data_tensors[i]
            targets = user_tr_label_tensors[i]

            inputs, targets = inputs.cuda(), targets.cuda()
            inputs, targets = torch.autograd.Variable(inputs), torch.autograd.Variable(targets)

            outputs = fed_model(inputs)
            loss = criterion(outputs, targets)
            optimizer_fed.zero_grad()
            loss.backward(retain_graph=True)

            param_grad=[]
            for param in fed_model.parameters():
                param_grad=param.grad.data.view(-1) if not len(param_grad) else torch.cat((param_grad,param.grad.view(-1)))

            user_grads=param_grad[None,:] if len(user_grads)==0 else torch.cat((user_grads,param_grad[None,:]),0)    

        if n_attacker > 0:
            attacker_grads = []
            n_attacker_ = max(1, n_attacker**2//60)
            for i in at_idx:

                inputs = user_tr_data_tensors[i]
                targets = user_tr_label_tensors[i]

                inputs, targets = inputs.cuda(), targets.cuda()
                inputs, targets = torch.autograd.Variable(inputs), torch.autograd.Variable(targets)

                outputs = fed_model(inputs)
                loss = criterion(outputs, targets)
                optimizer_fed.zero_grad()
                loss.backward(retain_graph=True)

                param_grad=[]
                for param in fed_model.parameters():
                    param_grad=param.grad.data.view(-1) if not len(param_grad) else torch.cat((param_grad,param.grad.view(-1)))

                attacker_grads=param_grad[None,:] if len(attacker_grads)==0 else torch.cat((attacker_grads,param_grad[None,:]),0)

            mal_updates = []
            if at_type == 'fang':
                agg_grads = torch.mean(attacker_grads, 0)
                deviation = torch.sign(agg_grads)
                mal_update = get_malicious_updates_fang(attacker_grads, agg_grads, deviation, n_attacker_)
            elif at_type == 'our-agr':
                agg_grads = torch.mean(attacker_grads, 0)
                mal_update = our_attack_mkrum(attacker_grads, agg_grads, n_attacker_, dev_type='sign')
            elif at_type == 'min-max':
                agg_grads = torch.mean(malicious_grads, 0)
                mal_update = our_attack_dist(attacker_grads, agg_grads, n_attacker_, dev_type='sign')
            elif at_type == 'min-sum':
                agg_grads = torch.mean(malicious_grads, 0)
                mal_update = our_attack_score(attacker_grads, agg_grads, n_attacker_, dev_type='sign')

        if not len(mal_updates):
            mal_updates = torch.stack([mal_update] * n_attacker)
        malicious_grads = torch.cat((mal_updates, user_grads), 0)    
        
        if malicious_grads.shape[0] != 60: 
            print('malicious grads shape ', malicious_grads.shape)
            sys.exit()

        multi_k = True if aggregation == 'mkrum' else False
        if epoch_num == 0: print('multi krum is ', multi_k)
        agg_grads, krum_candidate = multi_krum(malicious_grads, n_attacker, multi_k=multi_k)

        start_idx=0

        if epoch_num in schedule:
            for param_group in optimizer_fed.param_groups:
                param_group['lr'] *= gamma
                print('New learnin rate ', param_group['lr'])

        optimizer_fed.zero_grad()

        model_grads=[]

        for i, param in enumerate(fed_model.parameters()):
            param_=agg_grads[start_idx:start_idx+len(param.data.view(-1))].reshape(param.data.shape)
            start_idx=start_idx+len(param.data.view(-1))
            param_=param_.cuda()
            model_grads.append(param_)

        optimizer_fed.step(model_grads)

        val_loss, val_acc = test(val_data_tensor,val_label_tensor,fed_model,criterion,use_cuda)
        te_loss, te_acc = test(te_data_tensor,te_label_tensor, fed_model, criterion, use_cuda)

        is_best = best_global_acc < val_acc

        best_global_acc = max(best_global_acc, val_acc)

        if is_best:
            best_global_te_acc = te_acc

        if epoch_num % 20 == 0:
            print('%s: at %s at_frac %.1f n_at %d n_mal_sel %d e %d fed_model val loss %.4f val acc %.4f best val_acc %f te_acc %f'%(aggregation, at_type, at_fraction, n_attacker, np.sum(krum_candidate < n_attacker), epoch_num, val_loss, val_acc, best_global_acc,best_global_te_acc))

        epoch_num+=1

==> Initializing global model
multi krum is  False
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 0 e 0 fed_model val loss 3.9216 val acc 4.4636 best val_acc 4.463550 te_acc 4.847096
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 0 e 20 fed_model val loss 3.8622 val acc 4.9964 best val_acc 9.382722 te_acc 10.296540
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 0 e 40 fed_model val loss 3.8286 val acc 15.7280 best val_acc 15.727965 te_acc 17.262150
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 0 e 60 fed_model val loss 3.6181 val acc 24.7889 best val_acc 24.788921 te_acc 27.862438
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 0 e 80 fed_model val loss 3.2380 val acc 23.3809 best val_acc 28.482805 te_acc 32.215301
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 0 e 100 fed_model val loss 2.9159 val acc 32.2925 best val_acc 33.417422 te_acc 37.404757
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 0 e 120 fed_model val loss 2.8121 val acc 33.6131 best val_acc 35.507619 te_a

bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 1 e 1180 fed_model val loss 0.8779 val acc 73.1930 best val_acc 74.778624 te_acc 75.241969
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 1 e 1200 fed_model val loss 0.9424 val acc 72.3873 best val_acc 74.778624 te_acc 75.241969
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 1 e 1220 fed_model val loss 0.9748 val acc 71.6922 best val_acc 74.778624 te_acc 75.241969
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 0 e 1240 fed_model val loss 0.9316 val acc 72.7734 best val_acc 74.778624 te_acc 75.241969
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 1 e 1260 fed_model val loss 0.9073 val acc 73.9395 best val_acc 74.778624 te_acc 75.241969
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 1 e 1280 fed_model val loss 0.9382 val acc 73.3294 best val_acc 74.778624 te_acc 75.241969
bulyan: at min-sum at_frac 20.0 n_at 9 n_mal_sel 1 e 1300 fed_model val loss 0.9130 val acc 73.3757 best val_acc 74.778624 te_acc 75.241969
bulyan: at min-sum a