# Utility Evaluation
This notebook presents the reproducible results for the effects of fingerprinting on the ML performance. 
We use classification tasks (Adult Census and German Credit datasets) and a range of different classifiers (MLP, Random Forest, Logistic Regresson, Gradient Boosting and Linear SVC) to evaluate the utility via classification performance on the task.

1. [Adult Census dataset](#1.-Adult-census-dataset)
    - [1.1. Baseline performance](#1.1.-Baseline-performance)    
    - [1.2. Demo utility evaluation process](#1.2.-Demo-utility-effects-evaluation-process)
    - [1.3. Full utility evaluation](#1.3.-Full-evaluation)
2. [German Credit dataset](#2.-German-Credit-data)
    - [2.1. Baseline performance](#2.1.-Baseline-performance)
    - [2.2. Demo utility evaluation process](#2.2.-Demo-utility-effects)
    - [2.3. Full utility evaluation](#2.3.-Full-utility-evaluation)


In [1]:
from sklearn import metrics, preprocessing, model_selection
import pandas as pd
import os
import json
from matplotlib import pyplot as plt
import random
import scipy.stats as stats
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.neural_network import MLPClassifier

In [2]:
import warnings
warnings.filterwarnings('ignore')
os.chdir('C:/Users/tsarcevic/PycharmProjects/fingerprinting-toolbox')

In [3]:
from datasets import Adult, GermanCredit, Dataset
from scheme import Universal
from utils import fp_cross_val_score

## 1. Adult census dataset
### 1.1. Baseline performance
Original baseline accuracy on clean dataset

In [4]:
original_data = Adult()
# cleaning the data 
original_data.dropna()

<datasets._dataset.Adult at 0x1b8b5357940>

In [5]:
# encode categorical features and drop redundant 
original_data.number_encode_categorical()
original_data = original_data.drop(['fnlwgt','education'], axis=1)

In [6]:
# define target attribute
X = original_data.get_features()
y = original_data.get_target()

# scale features
scaler = preprocessing.StandardScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns=X.columns) #, index=X.index)
X.shape

(45222, 12)

In [7]:
classifiers = [GradientBoostingClassifier(), LinearSVC(), MLPClassifier(), RandomForestClassifier(), LogisticRegression()]

In [8]:
results_adult = []
# columns = ['classifier', 'gamma', 'accuracy_mean', 'accuracy_std', 'f1_mean', 'f1_std'] 
# gamma -> 0 for original data

In [9]:
# cross validation
for clf in classifiers:
    scores = model_selection.cross_validate(clf, X, y, scoring = ['accuracy', 'f1'], cv=10)
    print(clf)
    print("\tAccuracy: %0.3f (+/- %0.3f)\n\tF1 score: %0.3f (+/- %0.3f)" 
          % (scores['test_accuracy'].mean(), scores['test_accuracy'].std() * 2,
            scores['test_f1'].mean(), scores['test_f1'].std() * 2))
    
    # save scores
    results_adult.append([str(clf), 0, 
                      scores['test_accuracy'].mean(), scores['test_accuracy'].std(),
                      scores['test_f1'].mean(), scores['test_f1'].std()])
    

GradientBoostingClassifier()
	Accuracy: 0.861 (+/- 0.008)
	F1 score: 0.683 (+/- 0.019)
LinearSVC()
	Accuracy: 0.820 (+/- 0.005)
	F1 score: 0.538 (+/- 0.018)
MLPClassifier()
	Accuracy: 0.850 (+/- 0.009)
	F1 score: 0.668 (+/- 0.034)
RandomForestClassifier()
	Accuracy: 0.848 (+/- 0.011)
	F1 score: 0.672 (+/- 0.023)
LogisticRegression()
	Accuracy: 0.821 (+/- 0.006)
	F1 score: 0.559 (+/- 0.019)


### 1.2. Demo utility effects evaluation process

In [10]:
# define fingerprinting scheme
scheme = Universal(gamma=1, fingerprint_bit_length=64)
fp_dataset = scheme.insertion(original_data, secret_key=4370315727, recipient_id=0)

Universal fingerprinting scheme - initialised.
Embedding started...
	gamma: 1
	fingerprint length: 64
	xi: 1
	# recipients: None

	(secret key -- for evaluation purposes): 4370315727

Generated fingerprint for recipient 0: 0010010100000011101010000101110100101111001010111101000111010011
	Inserting a fingerprint into columns: Index(['age', 'workclass', 'education-num', 'marital-status', 'occupation',
       'relationship', 'race', 'sex', 'capital-gain', 'capital-loss',
       'hours-per-week', 'native-country'],
      dtype='object')
Fingerprint inserted.
	marked tuples: ~100.0%
	single fingerprint bit embedded 706 times ("amount of redundancy")
Time: 15 sec.


In [11]:
# sanity check fingerprint detection
suspect = scheme.detection(fp_dataset, secret_key=4370315727)

Start detection algorithm...
	gamma: 1
	fingerprint length: 64
Potential fingerprint detected: 0010010100000011101010000101110100101111001010111101000111010011
Detection counts:
[[678, 0], [658, 0], [0, 749], [645, 0], [687, 0], [0, 719], [680, 0], [0, 737], [718, 0], [717, 0], [676, 0], [674, 0], [686, 0], [687, 0], [0, 726], [0, 651], [0, 684], [761, 0], [0, 729], [720, 0], [0, 731], [794, 0], [674, 0], [742, 0], [658, 0], [0, 770], [678, 0], [0, 706], [0, 734], [0, 699], [751, 0], [0, 691], [723, 0], [687, 0], [0, 742], [659, 0], [0, 713], [0, 685], [0, 663], [0, 725], [706, 0], [690, 0], [0, 697], [747, 0], [0, 704], [721, 0], [0, 675], [0, 698], [0, 715], [0, 748], [687, 0], [0, 678], [720, 0], [693, 0], [715, 0], [0, 703], [0, 698], [0, 711], [721, 0], [0, 728], [703, 0], [727, 0], [0, 747], [0, 683]]
Recipient 0 is suspected.
Runtime: 5 sec.


In [12]:
# reproduce preprocessing of the original dataset
X_fp = fp_dataset.get_features()
y_fp = fp_dataset.get_target()

scaler = preprocessing.StandardScaler()
X_fp = pd.DataFrame(scaler.fit_transform(X_fp), columns=X_fp.columns) #, index=X_fp.index)
X_fp.shape

(45222, 12)

In [13]:
fp_scores = fp_cross_val_score(clf, X, y, X_fp, y_fp, cv=10, scoring = ['accuracy', 'f1'])
print("Accuracy: %0.3f (+/- %0.3f)\nF1 score: %0.3f (+/- %0.3f)" 
      % (fp_scores['test_accuracy'].mean(), fp_scores['test_accuracy'].std() * 2,
        fp_scores['test_f1'].mean(), fp_scores['test_f1'].std() * 2))

Accuracy: 0.821 (+/- 0.006)
F1 score: 0.559 (+/- 0.020)


### 1.3. Full evaluation  

1. Define gammas
2. Define classifiers
3. Evaluate performance (accuracy, F1 score) for each combination of gamma x classifier via 10-fold cross-validation

In [14]:
gammas = [1, 1.5, 2]
#classifiers = [GradientBoostingClassifier(), LinearSVC(), MLPClassifier(), RandomForestClassifier(), LogisticRegression()]
secret_key = 4370315727

In [15]:
for g in gammas:
    # fingerprint
    scheme = Universal(gamma=g, fingerprint_bit_length=64)
    fp_dataset = scheme.insertion(original_data, secret_key=secret_key, recipient_id=0)
    # split
    X_fp = fp_dataset.get_features()
    y_fp = fp_dataset.get_target()
    # scale
    X_fp = pd.DataFrame(scaler.fit_transform(X_fp), columns=X_fp.columns) #, index=X_fp.index)
    
    # score
    for clf in classifiers:
        print(clf)
        fp_scores = fp_cross_val_score(clf, X, y, X_fp, y_fp, cv=10, scoring = ['accuracy', 'f1'])
        print("Accuracy: %0.3f (+/- %0.3f)\nF1 score: %0.3f (+/- %0.3f)" 
              % (fp_scores['test_accuracy'].mean(), fp_scores['test_accuracy'].std() * 2,
              fp_scores['test_f1'].mean(), fp_scores['test_f1'].std() * 2))
        # save scores
        results_adult.append([str(clf), g, 
                              fp_scores['test_accuracy'].mean(), fp_scores['test_accuracy'].std(),
                              fp_scores['test_f1'].mean(), fp_scores['test_f1'].std()])

Universal fingerprinting scheme - initialised.
Embedding started...
	gamma: 1
	fingerprint length: 64
	xi: 1
	# recipients: None

	(secret key -- for evaluation purposes): 4370315727

Generated fingerprint for recipient 0: 0010010100000011101010000101110100101111001010111101000111010011
	Inserting a fingerprint into columns: Index(['age', 'workclass', 'education-num', 'marital-status', 'occupation',
       'relationship', 'race', 'sex', 'capital-gain', 'capital-loss',
       'hours-per-week', 'native-country'],
      dtype='object')
Fingerprint inserted.
	marked tuples: ~100.0%
	single fingerprint bit embedded 706 times ("amount of redundancy")
Time: 14 sec.
GradientBoostingClassifier()
Accuracy: 0.861 (+/- 0.010)
F1 score: 0.684 (+/- 0.026)
LinearSVC()
Accuracy: 0.820 (+/- 0.005)
F1 score: 0.536 (+/- 0.018)
MLPClassifier()
Accuracy: 0.849 (+/- 0.007)
F1 score: 0.665 (+/- 0.018)
RandomForestClassifier()
Accuracy: 0.848 (+/- 0.008)
F1 score: 0.671 (+/- 0.019)
LogisticRegression()
Accura

In [16]:
results_adult = pd.DataFrame(results_adult,
                             columns=['classifier', 'gamma', 'accuracy_mean', 'accuracy_std', 'f1_mean', 'f1_std'])

In [17]:
results_adult.to_csv('evaluation/utility/ML/ML_utility_results_adult.csv', index=False)

In [18]:
results_adult.sort_values(by=['classifier','gamma'])

Unnamed: 0,classifier,gamma,accuracy_mean,accuracy_std,f1_mean,f1_std
0,GradientBoostingClassifier(),0.0,0.861373,0.003983,0.683035,0.009664
5,GradientBoostingClassifier(),1.0,0.860931,0.005213,0.684308,0.013135
10,GradientBoostingClassifier(),1.5,0.861417,0.003925,0.685365,0.010197
15,GradientBoostingClassifier(),2.0,0.860975,0.004614,0.683276,0.011179
1,LinearSVC(),0.0,0.819889,0.002488,0.537968,0.009158
6,LinearSVC(),1.0,0.819822,0.002391,0.536459,0.008824
11,LinearSVC(),1.5,0.819756,0.002396,0.53658,0.008794
16,LinearSVC(),2.0,0.819756,0.002574,0.536739,0.009099
4,LogisticRegression(),0.0,0.820773,0.003124,0.55943,0.009566
9,LogisticRegression(),1.0,0.820596,0.003207,0.558942,0.009778


## 2. German Credit data

### 2.1. Baseline performance

In [19]:
germancredit_original = GermanCredit()

In [20]:
# encode categorical features and drop redundant 
germancredit_original.number_encode_categorical()

<datasets._dataset.GermanCredit at 0x1b8b542d270>

In [21]:
# define target attribute
X = germancredit_original.get_features()
y = germancredit_original.get_target()

# scale features
scaler = preprocessing.StandardScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns=X.columns) #, index=X.index)
X.shape

(1000, 20)

In [22]:
results_germancredit = []
# columns = ['classifier', 'gamma', 'accuracy_mean', 'accuracy_std', 'f1_mean', 'f1_std'] 
# gamma -> 0 for original data

In [23]:
# cross validation
for clf in classifiers:
    scores = model_selection.cross_validate(clf, X, y, scoring = ['accuracy', 'f1'], cv=10)
    print(clf)
    print("\tAccuracy: %0.3f (+/- %0.3f)\n\tF1 score: %0.3f (+/- %0.3f)" 
          % (scores['test_accuracy'].mean(), scores['test_accuracy'].std() * 2,
            scores['test_f1'].mean(), scores['test_f1'].std() * 2))
    
    # save scores
    results_germancredit.append([str(clf), 0, 
                      scores['test_accuracy'].mean(), scores['test_accuracy'].std(),
                      scores['test_f1'].mean(), scores['test_f1'].std()])
    

GradientBoostingClassifier()
	Accuracy: 0.777 (+/- 0.054)
	F1 score: 0.849 (+/- 0.038)
LinearSVC()
	Accuracy: 0.760 (+/- 0.088)
	F1 score: 0.838 (+/- 0.058)
MLPClassifier()
	Accuracy: 0.742 (+/- 0.072)
	F1 score: 0.819 (+/- 0.055)
RandomForestClassifier()
	Accuracy: 0.769 (+/- 0.054)
	F1 score: 0.847 (+/- 0.033)
LogisticRegression()
	Accuracy: 0.760 (+/- 0.089)
	F1 score: 0.838 (+/- 0.057)


### 2.2. Demo utility effects

In [24]:
# define fingerprinting scheme
scheme = Universal(gamma=1, fingerprint_bit_length=32)
fp_dataset = scheme.insertion(germancredit_original, secret_key=4370315727, recipient_id=0)

Universal fingerprinting scheme - initialised.
Embedding started...
	gamma: 1
	fingerprint length: 32
	xi: 1
	# recipients: None

	(secret key -- for evaluation purposes): 4370315727

Generated fingerprint for recipient 0: 11100010000111101000000101111011
	Inserting a fingerprint into columns: Index(['checking_account', 'duration', 'credit_hist', 'purpose',
       'credit_amount', 'savings', 'employment_since', 'installment_rate',
       'sex_status', 'debtors', 'residence_since', 'property', 'age',
       'installment_other', 'housing', 'existing_credits', 'job',
       'liable_people', 'tel', 'foreign'],
      dtype='object')
Fingerprint inserted.
	marked tuples: ~100.0%
	single fingerprint bit embedded 31 times ("amount of redundancy")
Time: <1 sec.


In [25]:
# sanity check fingerprint detection
suspect = scheme.detection(fp_dataset, secret_key=4370315727)

Start detection algorithm...
	gamma: 1
	fingerprint length: 32
Potential fingerprint detected: 11100010000111101000000101111011
Detection counts:
[[0, 30], [0, 38], [0, 36], [34, 0], [26, 0], [38, 0], [0, 29], [31, 0], [27, 0], [26, 0], [25, 0], [0, 22], [0, 38], [0, 36], [0, 40], [28, 0], [0, 26], [25, 0], [39, 0], [35, 0], [32, 0], [27, 0], [30, 0], [0, 29], [34, 0], [0, 29], [0, 35], [0, 37], [0, 35], [30, 0], [0, 32], [0, 21]]
Recipient 0 is suspected.
Runtime: 0 sec.


In [26]:
# reproduce preprocessing of the original dataset
X_fp = fp_dataset.get_features()
y_fp = fp_dataset.get_target()

scaler = preprocessing.StandardScaler()
X_fp = pd.DataFrame(scaler.fit_transform(X_fp), columns=X_fp.columns) #, index=X_fp.index)
X_fp.shape

(1000, 20)

In [27]:
c = GradientBoostingClassifier()
fp_scores = fp_cross_val_score(c, X, y, X_fp, y_fp, cv=10, scoring = ['accuracy', 'f1'])
print("Accuracy: %0.3f (+/- %0.3f)\nF1 score: %0.3f (+/- %0.3f)" 
      % (fp_scores['test_accuracy'].mean(), fp_scores['test_accuracy'].std() * 2,
        fp_scores['test_f1'].mean(), fp_scores['test_f1'].std() * 2))

Accuracy: 0.763 (+/- 0.074)
F1 score: 0.840 (+/- 0.049)


### 2.3. Full utility evaluation
1. Define gammas
2. Define classifiers
3. Evaluate performance (accuracy, F1 score) for each combination of gamma x classifier via 10-fold cross-validation

In [28]:
gammas = [1, 1.5, 2]
classifiers = [GradientBoostingClassifier(), LinearSVC(), MLPClassifier(), RandomForestClassifier(), LogisticRegression()]
secret_key = 4370315727

In [29]:
for g in gammas:
    # fingerprint
    scheme = Universal(gamma=g, fingerprint_bit_length=64)
    fp_dataset = scheme.insertion(germancredit_original, secret_key=secret_key, recipient_id=0)
    # split
    X_fp = fp_dataset.get_features()
    y_fp = fp_dataset.get_target()
    # scale
    X_fp = pd.DataFrame(scaler.fit_transform(X_fp), columns=X_fp.columns) #, index=X_fp.index)
    
    # score
    for clf in classifiers:
        print(clf)
        fp_scores = fp_cross_val_score(clf, X, y, X_fp, y_fp, cv=10, scoring = ['accuracy', 'f1'])
        print("Accuracy: %0.3f (+/- %0.3f)\nF1 score: %0.3f (+/- %0.3f)" 
              % (fp_scores['test_accuracy'].mean(), fp_scores['test_accuracy'].std() * 2,
              fp_scores['test_f1'].mean(), fp_scores['test_f1'].std() * 2))
        # save scores
        results_germancredit.append([str(clf), g, 
                              fp_scores['test_accuracy'].mean(), fp_scores['test_accuracy'].std(),
                              fp_scores['test_f1'].mean(), fp_scores['test_f1'].std()])

Universal fingerprinting scheme - initialised.
Embedding started...
	gamma: 1
	fingerprint length: 64
	xi: 1
	# recipients: None

	(secret key -- for evaluation purposes): 4370315727

Generated fingerprint for recipient 0: 0010010100000011101010000101110100101111001010111101000111010011
	Inserting a fingerprint into columns: Index(['checking_account', 'duration', 'credit_hist', 'purpose',
       'credit_amount', 'savings', 'employment_since', 'installment_rate',
       'sex_status', 'debtors', 'residence_since', 'property', 'age',
       'installment_other', 'housing', 'existing_credits', 'job',
       'liable_people', 'tel', 'foreign'],
      dtype='object')
Fingerprint inserted.
	marked tuples: ~100.0%
	single fingerprint bit embedded 15 times ("amount of redundancy")
Time: <1 sec.
GradientBoostingClassifier()
Accuracy: 0.781 (+/- 0.056)
F1 score: 0.852 (+/- 0.039)
LinearSVC()
Accuracy: 0.762 (+/- 0.086)
F1 score: 0.840 (+/- 0.058)
MLPClassifier()
Accuracy: 0.750 (+/- 0.068)
F1 score

In [30]:
results_germancredit = pd.DataFrame(results_germancredit,
                             columns=['classifier', 'gamma', 'accuracy_mean', 'accuracy_std', 'f1_mean', 'f1_std'])

In [31]:
results_germancredit.to_csv('evaluation/utility/ML/ML_utility_results_germancredit.csv', index=False)

In [32]:
results_germancredit.sort_values(by=['classifier','gamma'])

Unnamed: 0,classifier,gamma,accuracy_mean,accuracy_std,f1_mean,f1_std
0,GradientBoostingClassifier(),0.0,0.777,0.027221,0.848714,0.019172
5,GradientBoostingClassifier(),1.0,0.781,0.028089,0.852248,0.019342
10,GradientBoostingClassifier(),1.5,0.768,0.030594,0.842181,0.020519
15,GradientBoostingClassifier(),2.0,0.765,0.025397,0.840897,0.016818
1,LinearSVC(),0.0,0.76,0.044045,0.837693,0.029062
6,LinearSVC(),1.0,0.762,0.042849,0.840021,0.028837
11,LinearSVC(),1.5,0.76,0.043359,0.838032,0.029086
16,LinearSVC(),2.0,0.762,0.045122,0.839585,0.029118
4,LogisticRegression(),0.0,0.76,0.044272,0.837785,0.02855
9,LogisticRegression(),1.0,0.76,0.043818,0.838031,0.02942
