# Fair Binary Classification with SearchFair on two-dimensional toy data

Here, we present the use of SearchFair on toy data. 

## Imports

We start by importing SearchFair from the installed package.

In [1]:
from searchfair import SearchFair

ModuleNotFoundError: No module named 'searchfair'

Second, we load some necessary methods and numpy.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
import numpy as np

# We are ignoring cvxpys warning about disciplined programming rules. 
import warnings
warnings.filterwarnings('ignore')

## Loading toy dataset

In [None]:
from get_synthetic_data import get_gaussian_data
from get_synthetic_data import plot_boundaries
import utils as ut

n_samples = 1500
# Load Data
x_data, y_data, s_data = get_gaussian_data(n_samples=n_samples, plot_data=True)
ut.print_data_stats(s_data, y_data)
# Train Test split. Here, we choose a small number to reduce running time.
x_train, x_test, y_train, y_test, s_train, s_test = train_test_split(x_data, y_data, s_data, train_size=0.5, shuffle=True)

## Learning a fair classifier with SearchFair
### Demographic Parity

To learn a classifier with SearchFair, we need to choose a kernel between 'linear' and 'rbf', and we need to choose a fairness notion - either Demographic Parity (DDP) or Equality of Opportunity (DEO). Here, we start with a linear kernel and demographic parity. 

In [None]:
fairness_notion = 'DDP' # DDP = Demographic Parity, DEO = Equality of Opportunity. 
kernel = 'linear' # 'linear', 'rbf'
verbose = True # True = SearchFair output, 2 = show also solver progress

# Regularization Parameter beta
reg_beta = 0.0001
linear_model_DDP = SearchFair(reg_beta=reg_beta, kernel=kernel, fairness_notion=fairness_notion, verbose=verbose, reason_points=0.5)
linear_model_DDP.fit(x_train, y_train, s_train=s_train)

In [None]:
def print_clf_stats(model, x_train, x_test, y_train, y_test, s_train, s_test):
    train_acc = ut.get_accuracy(np.sign(model.predict(x_train)), y_train)
    test_acc = ut.get_accuracy(np.sign(model.predict(x_test)), y_test)
    test_DDP, test_DEO = ut.compute_fairness_measures(model.predict(x_test), y_test, s_test)
    train_DDP, train_DEO = ut.compute_fairness_measures(model.predict(x_train), y_train, s_train)

    print(10*'-'+"Train"+10*'-')
    print("Accuracy: %0.4f%%" % (train_acc * 100))
    print("DDP: %0.4f%%" % (train_DDP * 100), "DEO: %0.4f%%" % (train_DEO * 100))
    print(10*'-'+"Test"+10*'-')
    print("Accuracy: %0.4f%%" % (test_acc * 100))
    print("DDP: %0.4f%%" % (test_DDP * 100), "DEO: %0.4f%%" % (test_DEO * 100))

Let us check the accuracy and fairness results on the dataset. 

In [None]:
print_clf_stats(linear_model_DDP, x_train, x_test, y_train, y_test, s_train, s_test)

Finally, we can take a look at the classifier for this dataset. 

In [None]:
plot_boundaries(linear_model_DDP, x_data, y_data, s_data, num_to_draw=1000)

### Equality of Opportunity

Now, let us improve Equality of Opportunity using an rbf kernel. 

In [None]:
fairness_notion = 'DEO' # DDP = Demographic Parity, DEO = Equality of Opportunity. 
kernel = 'rbf' # 'linear', 'rbf'
verbose = True

# Regularization Parameter beta
reg_beta = 0.0001
rbf_model_DEO = SearchFair(reg_beta=reg_beta, kernel=kernel, fairness_notion=fairness_notion, verbose=verbose)
rbf_model_DEO.fit(x_train, y_train, s_train=s_train)

# Evaluate model
print_clf_stats(rbf_model_DEO, x_train, x_test, y_train, y_test, s_train, s_test)

In [None]:
plot_boundaries(rbf_model_DEO, x_data, y_data, s_data)