# UMAP projection (9, 64, 64)

We examine if the synthetic is privacy preserving both in pixel space and in feature space. In this notebook we create the feature extractor (a UMAP transform reducing to 32 principal components) and apply it to the relevant images in order to obtain their corresponding features.

In [4]:
from torch.utils.data import Dataset, DataLoader
from tqdm.notebook import tqdm
from sklearn.metrics import confusion_matrix
import h5py
import pickle
import umap
import numpy as np
from scipy.spatial.distance import cdist

In [5]:
candidate = np.load('./privacy_saves/candidate.npy', allow_pickle=True)

In [6]:
h5f = h5py.File('./privacy_saves/synth_set.h5', 'r')
synth_set = h5f['images'][:]
h5f.close()

In [7]:
h5f = h5py.File('./privacy_saves/test_set.h5', 'r')
test_set = h5f['images'][-1000:]
h5f.close()

## Train UMAP
Optionnal can be skipped to load a pre-trained umap. This step usually takes around 30 minutes

In [None]:
h5f = h5py.File('./privacy_saves/test_set.h5', 'r')
temp_test = h5f['images'][333:1333]
temp_test = np.array(test_set.reshape([len(temp_test), 9*64*64]))
h5f.close()

In [None]:
h5f = h5py.File('./privacy_saves/val_set.h5', 'r')
temp_val = h5f['images'][333:1833]
temp_val = np.array(test_set.reshape([len(temp_val), 9*64*64]))
h5f.close()

In [None]:
h5f = h5py.File('./privacy_saves/val_set.h5', 'r')
temp_train = h5f['images'][1000:3500]
temp_train = np.array(test_set.reshape([len(temp_train), 9*64*64]))
h5f.close()

In [None]:
umap_dataset = np.concatenate((temp_test, temp_val, temp_train))

In [None]:
# Training UMAP
# Umap projects into 
trans = umap.UMAP(n_neighbors=20, random_state=10, min_dist = 0.0,
                  n_components=32).fit(umap_dataset.reshape([5000, 9*64*64]))

In [None]:
# Saving the trained UMAP object using pickle
f_name = './privacy_saves/umap_privacy_5000.sav'

## Load UMAP

In [8]:
f_name = './privacy_saves/umap_privacy_5000.sav'
loaded_model = pickle.load((open(f_name, 'rb')))
print(type(loaded_model))

<class 'umap.umap_.UMAP'>


## Transform with UMAP

#### Transform Test with UMAP
Used as a reference features in embedding space (since test samples can help define the notion of outliers). Only used in supplemental figure in notebook 4_plot_membership_attacks.ipynb

In [9]:
test_umap = loaded_model.transform(test_set.reshape([len(test_set), 9*64*64]))
np.save('./privacy_saves/u_test.npy', test_umap)

#### Transform synth with UMAP
Synthetic images to feature space. Used in all privacy attacks that happend in feature space. <br/>
Please note that this step takes a very long time because the synthetic dataset is very large.

In [10]:
synth_umap = loaded_model.transform(synth_set.reshape([len(synth_set), 9*64*64]))
np.save('./privacy_saves/u_synth.npy', synth_umap)

#### Transform candidate with UMAP
Candidates images projected to feature space. Used in all privacy attacks that happend in feature space. If a candidate that originated from training shares the same features as one or many synthetic samples and can easily be identified as such, then it can be considered as privacy threatening.

In [11]:
candidate_umap = loaded_model.transform(candidate.reshape([len(candidate), 9*64*64]))
np.save('./privacy_saves/u_candidate.npy', candidate_umap)