# Ridge regression

Suppose we are given a dataset $X=(x_1,...,x_n)^T$ and a set of outputs $y=(y_1,...,y_n)^T$, where $x_i \in \mathbb{R}^d$ and $y_i \in \mathbb{R}$. We wish to model the outputs as a linear function of the inputs, i.e.

$$ y \sim Xw, $$

where $w \in \mathbb{R}^d$. The ridge regression model is a linear regression with $\ell^2$ regularization. For a given $\lambda$ we wish to find $w$ that minimizes the loss function

$$ \| y - Xw\|^2 + \lambda\|w\|^2, $$

where $\|\cdot\|$ is the $\ell^2$ norm. Taking derivative in $w$ one can find that $w_{RR}$ that minimizes the above loss function is given by

$$ (\lambda I + X^T X)^{-1}X^T y,$$

where $I$ is the identity matrix.

## Application: Wine Quality dataset

In the following we implement and apply ridge regression to Wine Quality dataset that can be found at: http://archive.ics.uci.edu/ml/datasets/Wine+Quality.

Import and load data:

In [1]:
import numpy as np

data = np.genfromtxt('wine_white.csv', delimiter=";")
print(data[1:])
print(data[1:].shape)

[[  7.     0.27   0.36 ...,   0.45   8.8    6.  ]
 [  6.3    0.3    0.34 ...,   0.49   9.5    6.  ]
 [  8.1    0.28   0.4  ...,   0.44  10.1    6.  ]
 ..., 
 [  6.5    0.24   0.19 ...,   0.46   9.4    6.  ]
 [  5.5    0.29   0.3  ...,   0.38  12.8    7.  ]
 [  6.     0.21   0.38 ...,   0.32  11.8    6.  ]]
(4898, 12)


Different wines correspond to the rows of the matrix. The first eleven columns are various parameters of the wines (see the link above for details) and the last row corresponds to the quality. Note that we model the qualities as continuous values.

We split the data into train, validation and test tests, and do preprocessing:

In [2]:
X_train = data[1:3200,0:11]
y_train = data[1:3200, 11]
X_val = data[3200:4000,0:11]
y_val = data[3200:4000, 11]
X_test = data[4000:, 0:11]
y_test = data[4000:, 11]

##Preprocessing for ridge regression
dev = np.std(X_train, axis=0)
mean = np.mean(X_train, axis=0)
y_mean = np.mean(y_train)

def preprocess(x):
    x_new = (x - mean)/dev
    return x_new

X_train = preprocess(X_train)
y_train = y_train - y_mean
X_val = preprocess(X_val)
X_test = preprocess(X_test)

# sanity check
print(np.mean(X_train, axis=0)) #should be close to zero
print(np.std(X_train, axis=0)) #should be close to one

[  1.74952269e-14  -6.00710826e-15   1.55800141e-14  -2.06357803e-15
  -1.34861991e-14   2.04206073e-16   9.01644082e-17   2.07880087e-12
  -2.67910455e-14  -2.82462394e-14  -4.77319549e-14]
[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]


The solution is given by (where lambda_input is the hyperparameter):

In [3]:
def rregression(x, y, lambda_input):
    aux = np.dot(x.T,x)
    n = aux.shape[0]
    return np.dot(np.dot(np.linalg.inv(lambda_input*np.identity(n) + aux), x.T), y)

We select the best lambda via cross-validation:

In [4]:
# select best lambda
best_lam = 0
best_val_error = 1e10
iterations = 1000
for i in range(iterations):
    #print('Iteration ',i+1, ' / ', iterations)
    lam = 10**np.random.uniform(-5,5)
    wRR = rregression(X_train, y_train, lam)
    y_ans = np.dot(X_val, wRR) + y_mean
    val_error = np.mean(np.absolute(y_ans - y_val))
    #print('lambda: ', lam, ' validation accuracy: ', val_error)
    if val_error < best_val_error:
        best_lam = lam
        best_val_error = val_error

Finally, print best_lambda, the validation error and the test error:

In [5]:
print('Best lambda: ', best_lam, ' validation accuracy: ', best_val_error)
wRR = rregression(X_train, y_train, best_lam)
y_ans = np.dot(X_test, wRR) + y_mean
test_error = np.mean(np.absolute(y_ans - y_test))
print('Test accuracy: ', test_error)

Best lambda:  175.65646291824032  validation accuracy:  0.562325610163
Test accuracy:  0.558423534613
