# Semi-honest, multi-point scenario

In this scenario, we assume Bob has multiple data points to contribute to Alice's ML model. Now Alice is trying to value the dataset as a whole, judging on the diversity, uncertainty of the datasets as well as the current model's performance on the dataset. 

# Part 0: The setup

In [9]:
#First, we define Alice's model M. We assume a simple CNN model.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim
import matplotlib.pyplot as plt
import os

class LeNet(nn.Sequential):
    """
    Adaptation of LeNet that uses ReLU activations
    """

    # network architecture:
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        x = self.act(x)
        return x
    
model = LeNet()

os.makedirs('data', exist_ok=True)
torch.save(model.state_dict(), 'data/model.pth')

#Next, we define the data loader for CIFAR-10 dataset.
import torchvision
import random
import torchvision.transforms as transforms
import numpy as np

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=False,transform=transform,download=True)


# Randomly select 100 images as Bob's dataset
indices = random.sample(range(len(trainset)), 100)
selected_images = np.array([trainset[i][0].numpy() for i in indices])
selected_labels = np.array([trainset[i][1]  for i in indices])


# Save images and labels separately
# torch.save(selected_images, 'data/selected_images.pth')
# torch.save(selected_labels, 'data/selected_labels.pth')


Files already downloaded and verified


# Part 1: Clustering

Before submitting points to Alice for evaluation, Bob needs to select a subset of representative data points. To do this, we recommend using K-means clustering to select a diverse set of points where K is defined by the number of data points Alice wishs to check. Bob can select a data point closest to the centroid of each cluster. It is ultimately up to Bob to decide which points to submit, even if they are not ideal so we do not need to securely compute this step.

We further make an enhancement to pure K-means selection by trying to select the most uncertain points in each cluster. As determining the uncertainty requires model inference, we define a computing budget B which is the number of points Bob and Alice can afford to evaluate. Bob can then strategically select some points in each cluster to calculate its uncertainty, and submit the points with the highest uncertainty  to Alice. 


In [2]:
selected_images

[tensor([[[ 0.9529,  0.9686,  1.0000,  ...,  0.9843,  0.9686,  0.9608],
          [ 0.9608,  0.6941,  0.3882,  ...,  0.2549,  0.4824,  0.9451],
          [ 1.0000,  0.3647, -0.3961,  ..., -0.6784, -0.1451,  0.9137],
          ...,
          [ 0.9765,  0.2863, -0.5922,  ..., -0.6078, -0.1608,  0.8667],
          [ 0.9922,  0.4902, -0.0980,  ..., -0.1451,  0.1843,  0.9059],
          [ 0.9843,  0.9216,  0.8667,  ...,  0.8588,  0.8980,  0.9922]],
 
         [[ 1.0000,  1.0000,  1.0000,  ...,  1.0000,  1.0000,  0.9843],
          [ 1.0000,  0.8275,  0.6627,  ...,  0.3725,  0.5216,  0.9451],
          [ 1.0000,  0.6392,  0.2706,  ..., -0.4510, -0.0510,  0.9294],
          ...,
          [ 1.0000,  0.5529,  0.0353,  ..., -0.1451,  0.1294,  0.9373],
          [ 1.0000,  0.6627,  0.3020,  ...,  0.1608,  0.3804,  0.9373],
          [ 0.9922,  0.9451,  0.9451,  ...,  0.9373,  0.9451,  0.9765]],
 
         [[ 0.9686,  0.9608,  0.9686,  ...,  1.0000,  1.0000,  0.9843],
          [ 0.9373,  0.6784,