# Introduction: Particle Swarm Regression

In this notebook we will walk through how to use [Particle Swarm Optimisation](https://en.wikipedia.org/wiki/Particle_swarm_optimization)  to produce optimal Linear Regression models for a range of custom loss metrics. Using a methodology introduced by [Dietterich (1998)](https://sci2s.ugr.es/keel/pdf/algorithm/articulo/dietterich1998.pdf) we will perform a siginificance test on the holdout predictions, to see whether training on custom loss metrics produces significantly better models.

## Model Error

In regression problems the target variable is continuous and our model can make errors in two ways
1. Underpredict the target
2. Overpredict the target

The Ordinary Least Squares and Gradient Descent implementations of Linear Regression both minmise Mean Squared Error as the loss function. Mean Square error places equal importance on an underprediction vs an overprediction. However in practice our preferences for both types of error are rarely the same instead they are driven by the business problem at hand. For example to improve efficiency a utility company might want to know how much electricity they can put through a [transformer](https://en.wikipedia.org/wiki/Transformer) without it overheating. For this use case an overprediction would mean a loss of efficiency, but an underprediction could lead to a failure of the transformer, blackouts for customers and heavy regulatory fines.

One solution to this problem is to add on "safety" margins to our predictions. However a more elegant approach is to encode our preferences directly in a function which we use directly to optimise our model coefficients.

## Particle Swarm Optimisation
Particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formulae over the particle's position and velocity. Each particle's movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions.

I have [coded up an implementation](https://github.com/tonyjward/machine-learning-oop/blob/master/twlearn/ParticleSwarm.py) of PSO based on the above Wiki page which i'll use to produce throughout this notebook.


In [None]:
# Numpy for data manipulation
import numpy as np
import pandas as pd

# Modelling
from twlearn import LinearRegression

# Evaluation of the model
import sklearn.model_selection
from twlearn.metrics import Rmse
NO_FOLDS = 2
NO_REPEATS = 5

## Data
For this notebook we will work with the [Boston Housing dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) available as part of scikit learn. The objective is to predict the Median value of owner-occupied homes by training a model on past data. This is a supervised machine learning regression task: given past data we want to train a model to predict a continous outcome on testing data.

In [None]:
from sklearn.datasets import load_boston
boston = load_boston()

X = np.array(boston.data)
Y = np.array(boston.target)

# split the data into training and testing
X_train, X_test, Y_train, Y_test = sklearn.model_selection.train_test_split(X, Y, test_size=0.33, random_state = 1)

print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")


## Ordinary Least Sqaures

Now lets train two linear regression models 
1. Using ordinary least squares
2. Using particle swarm optimisation with MSE loss metric

In [None]:
# OLS
OLS_model = LinearRegression()
OLS_model.fit(X_train, Y_train, optimiser = 'OLS')

# Particle Swarm Regression - Rmse
PSO_RMSE_model = LinearRegression()
PSO_RMSE_model.fit(X_train, Y_train, optimiser = 'PSO', loss = Rmse)

OLS_coef = OLS_model.coefficients()['coefficients']
PSO_RMSE_coef = PSO_RMSE_model.coefficients()['coefficients']

print(np.c_[OLS_coef, PSO_RMSE_coef])

Close but not exact. Increasing the number of iterations and particles from their defaults of 500 and 300 respectively gets us much close to the OLS coefficients

In [None]:
PSO_RMSE_model.fit(X_train, Y_train, optimiser = 'PSO', loss = Rmse, num_iterations = 5000, no_particles = 500)
PSO_RMSE_coef = PSO_RMSE_model.coefficients()['coefficients']
print(np.c_[OLS_coef, PSO_RMSE_coef])


Ok lets try optimising a different error metric - mean absolute error, using the num_iterations and no_particles from above.

In [None]:
def Mae(predictions, actual):
    """
    Calculate Mean Absolute Error

    Arguments:
        predictions: predictions numpy array of size (no_examples, no_solutions)
        actuals: 1D numpy array of size (no_examples, 1)

    Returns:
        mae: Mean absolute Error for each solution - numpy array of size (1, no_solutions)
    
    Approach:
    predictions can be a 1d array which corresponds to one set of model predictions OR
    it can be a matrix of predictions, where each column represents a set of predictions
    for a specific model. 
    """
    assert(predictions.shape[0] == actual.shape[0])
    if len(predictions.shape) == 1:
        predictions = predictions.reshape(-1, 1)
    if len(actual.shape) == 1:
        actual = actual.reshape(-1, 1)

    absolute_errors = np.abs(predictions - actual)
    assert(absolute_errors.shape == predictions.shape)

    return np.mean(absolute_errors, axis = 0, keepdims = True)

In [None]:
import unittest

class Test_1D_solution(unittest.TestCase):
    """ Test when predictions argument contains one set of solutions"""
    def setUp(self):
        self.predictions = np.array([1, 2, 3])
        self.actuals = np.array([0.9, 2.2, 2.7])

    def test_Mae(self):
        mae = Mae(self.predictions, self.actuals)
        self.assertIsNone(np.testing.assert_allclose(mae, 0.2))
        
class Test_2D_solutions(unittest.TestCase):
    """ Test when predictions argument contains two set of solutions"""
    def setUp(self):
        self.predictions = np.array([[1,1], [2, 2], [3, 3]])
        self.actuals = np.array([0.9, 2.2, 2.7])

    def test_Mae(self):
        mae = Mae(self.predictions, self.actuals)
        self.assertIsNone(np.testing.assert_allclose(mae, np.array([[0.2, 0.2]])))
        
# run tests
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False)

In [None]:
PSO_MAE_model = LinearRegression()
PSO_MAE_model.fit(X_train, Y_train, optimiser = 'PSO', loss = Mae, num_iterations = 5000, no_particles = 500)

and now compare the predictions to the original OLS model using a helper function

In [None]:
def evaluate(model, X_test, Y_test, loss):
    """ Take a model and and some test data and produce test metrics
    Arguments:
        model         -- a fitted model
        X_test        -- test data - a numpy array
        loss_function -- a loss function to assess predictions

    Returns:
        loss - calculated loss
    """
    predictions = model.predict(X_test)
    return loss(predictions, Y_test)

print(f"OLS MAE: {evaluate(OLS_model, X_test, Y_test, Mae)}")
print(f"PSO MAE: {evaluate(PSO_MAE_model, X_test, Y_test, Mae)}")


Cool! The particle swarm regression model optimised for MAE performs better than the Ordinary Least Square model compared using MAE

## Is this significant?
519 data points is not very much. What could we do to assess the significance. Our first thought might be to do 10-fold cross validation, and compare the performance between both algorithms on each fold using a students paired t test. However this would be incorrect, since one of the core assumptions (independence) of the t test would be violated. A good overview of our options is provided here, and we elect for the 5 by 2 cross validation statistic.

The intuition behind this approach is that we use 50% of the data for training and 50% for testing ONCE to get our base estimate for the treatment effect (difference between models). Then we do 5 repeats of 2-fold cross validation to get an estimate for the variance in the treatment effect, which we use to calculate a t-statistic. 

In [None]:
def cross_validate(no_repeats, no_folds, loss, kwargs):
    """
    Approach:
        We want the random numbers for repeat 1 to be different for repeat 2, etc
        however we also want the random numbers for repeat 1, to be the same
        each time we run the cross_validation to ensure we can compare algorithm
        performance. We therefore set the seed to be the repeat number.
    """  
    boston = load_boston()

    X = np.array(boston.data)
    Y = np.array(boston.target)

    no_examples, no_features = X.shape

    cv_results = {}

    for repeat in range(NO_REPEATS):
        print(f"repeat: {repeat}")
        
        np.random.seed(repeat)
        folds = np.random.randint(low = 0, high = NO_FOLDS , size = no_examples)
        
        cv_results[repeat] = {}
  
        for fold in range(NO_FOLDS):
            print(f"fold: {fold}")

            X_train = X[folds != fold,:]
            X_test  = X[folds == fold,:]
            Y_train = Y[folds != fold]
            Y_test  = Y[folds == fold]           

            # train model
            model = LinearRegression()
            model.fit(X_train, Y_train, **kwargs)

            # evaluate model
            cv_results[repeat][fold] = evaluate(model, X_test, Y_test, loss)

    return cv_results

Train models 

In [None]:
CUSTOM_LOSS = Mae

print("Training OLS Model")
ols_results = cross_validate(no_repeats = NO_REPEATS, no_folds = NO_FOLDS, loss = CUSTOM_LOSS, 
                             kwargs = {"optimiser":"OLS"})

print("Training PSO Model")
pso_results = cross_validate(no_repeats = NO_REPEATS, no_folds = NO_FOLDS, loss = CUSTOM_LOSS,
                             kwargs = {"optimiser":'PSO', "loss":CUSTOM_LOSS, "num_iterations":5000, "no_particles":500})

In [None]:
print("Comparing Models using Mean Absolute Error")
for repeat in range(NO_REPEATS):
    for fold in range(NO_FOLDS):
        print(f"Error for repeat {repeat} fold {fold}: OLS: {np.round(ols_results[repeat][fold],2)} PSO: {np.round(pso_results[repeat][fold],2)} difference : {np.round(ols_results[repeat][fold] - pso_results[repeat][fold],2)}")

In [None]:
from twlearn.metrics import five_by_two_cv
from scipy import stats

t_statistic, average_differences = five_by_two_cv(ols_results, pso_results)
p_value = (1 - stats.t.cdf(t_statistic, df=5)) * 2

print(f"The t statistic is {np.round(t_statistic,2)} which has a p value of {p_value}")

Lets have a look at the distribution of error vs actual values. What if we really didn't want to underpredict at the high actual values? <insert image>

In [None]:
def fourth_quadrant(predictions, actual, multiplier = 100):
    """
    Calculate Mean Absolute Error

    Arguments:
        predictions: predictions numpy array of size (no_examples, no_particles)
        actuals: 1D numpy array of size (no_examples, 1)
        multiplier: int - how much weight to give underpredictions for positive actual values

    Returns:
        mae: mae for each particle - numpy array of size (1, no_particles)
    """
    assert(predictions.shape[0] == actual.shape[0])

    if len(predictions.shape) == 1:
        predictions = predictions.reshape(-1, 1)
    if len(actual.shape) == 1:
        actual = actual.reshape(-1, 1)
     
    errors = predictions - actual
    assert(errors.shape == predictions.shape)
 
    negative_error_index = errors < 0
    positive_actual_index = actual > 0

    extra_weight_index = np.logical_and(negative_error_index, positive_actual_index)

    # cautious adjustment
    adjustment = np.ones(shape = errors.shape)
    adjustment[extra_weight_index] = multiplier

    # squared error with adjustment
    adjusted_squared_error = np.multiply(np.square(errors), adjustment)

    return np.mean(adjusted_squared_error, axis = 0, keepdims = True)

In [None]:
# TODO: test or show example

In [None]:
CUSTOM_LOSS = fourth_quadrant

print("Training OLS Model")
ols_results = cross_validate(no_repeats = NO_REPEATS, no_folds = NO_FOLDS, loss = CUSTOM_LOSS, 
                             kwargs = {"optimiser":"OLS"})

print("Training PSO Model")
pso_results = cross_validate(no_repeats = NO_REPEATS, no_folds = NO_FOLDS, loss = CUSTOM_LOSS,
                             kwargs = {"optimiser":'PSO', "loss":CUSTOM_LOSS, "num_iterations":5000, "no_particles":500})

In [None]:
print("Comparing Models using Fourth Quadrant Error")
for repeat in range(NO_REPEATS):
    for fold in range(NO_FOLDS):
        print(f"Error for repeat {repeat} fold {fold}: OLS: {np.round(ols_results[repeat][fold],2)} PSO: {np.round(pso_results[repeat][fold],2)} difference : {np.round(ols_results[repeat][fold] - pso_results[repeat][fold],2)}")

In [None]:
t_statistic, average_differences = five_by_two_cv(ols_results, pso_results)
p_value = (1 - stats.t.cdf(t_statistic, df=5)) * 2

print(f"The t statistic is {np.round(t_statistic,2)} which has a p value of {p_value}")

In [None]:
#  N.B. We did not consider a one sided P value here because we could not be absolutely certain that the rats would all benefit from a high protein diet in comparison with those on a low protein diet.