# Gradient Descent Algorithm

# Firing up graphlab

In [5]:
import graphlab

In [None]:
import numpy as np

# Loading the  house sales data

In [46]:
sales = graphlab.SFrame('kc_house_data.gl/')

In [47]:
def get_numpy_data(data_sframe, features, output):
    data_sframe['constant'] = 1 
    features = ['constant'] + features 
    features_sframe = data_sframe[features]
    feature_matrix = features_sframe.to_numpy()
    output_sarray = data_sframe[output]
    output_array = output_sarray.to_numpy()
    return(feature_matrix, output_array)

For testing let's use the 'sqft_living' feature and a constant as our features and price as our output:

In [48]:
(example_features, example_output) = get_numpy_data(sales, ['sqft_living'], 'price') 
print example_features[0,:] 
print example_output[0] 

[  1.00000000e+00   1.18000000e+03]
221900.0


# Predicting output given regression weights

Let the weights [1.0, 1.0] and the features [1.0, 500.0] and we wanted to compute the predicted output 1.0\*1.0 + 1.0\*500.0 = 501.0 . If they're numpy arrayws we can use np.dot() to compute this:

In [49]:
my_weights = np.array([1., 1.]) # this is the sample weight taken randomly
my_features = example_features[0,] # we'll use the first data point
predicted_value = np.dot(my_features, my_weights)
print predicted_value

1181.0


In [50]:
#  (feature matrix * weights) == predicted value
#  noise == ouput - predicted value
def predict_output(feature_matrix, weights):   
    predictions = np.dot(feature_matrix, weights)
    return(predictions)

# Computing the Derivative

In [51]:
def feature_derivative(errors, feature):
    derivative = 2*np.dot(errors, feature)
    return(derivative)

In [52]:
(example_features, example_output) = get_numpy_data(sales, ['sqft_living'], 'price') 
my_weights = np.array([0., 0.]) 
test_predictions = predict_output(example_features, my_weights) 
errors = test_predictions - example_output 
feature = example_features[:,0] 
derivative = feature_derivative(errors, feature)
print derivative
print -np.sum(example_output)*2 

-23345850022.0
-23345850022.0


# Gradient Descent

Now i will write a function that performs a gradient descent. The basic premise is simple. Given a starting point we update the current weights by moving in the negative gradient direction. Recall that the gradient is the direction of *increase* and therefore the negative gradient is the direction of *decrease* and we're trying to *minimize* a cost function.
The amount by which we move in the negative gradient *direction*  is called the 'step size'. We stop when we are 'sufficiently close' to the optimum. We define this by requiring that the magnitude (length) of the gradient vector to be smaller than a fixed 'tolerance'.

In [53]:
from math import sqrt

In [54]:
def regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance):
    converged = False 
    weights = np.array(initial_weights) # always use numpy array here
    while not converged:
        # computing the predictions based on feature_matrix and weights using the predict_output() function defined above
        predictions = predict_output(feature_matrix, weights)
        # computing the errors as predictions - output
        errors = predictions - output
        gradient_sum_squares = 0 # initialize the gradient sum of squares
        # while we haven't reached the tolerance yet, update each feature's weight
        for i in range(len(weights)): # loop over each weight
            derivative = feature_derivative(errors, feature_matrix[:, i])
            # add the squared value of the derivative to the gradient magnitude (for assessing convergence)
            gradient_sum_squares += (derivative**2)
            # subtracting the step size times the derivative from the current weight
            weights[i] -= (step_size * derivative)
        # computing the square-root of the gradient sum of squares to get the gradient matnigude:
        gradient_magnitude = sqrt(gradient_sum_squares)
        if gradient_magnitude < tolerance:
            converged = True
    return(weights)

# Running the Gradient Descent as Simple Regression

Splitting the whole data into 2 parts : training data which is used to train the model and test data which is used to evaluate the performance of the trained model. I am taking training data to be 80% of the total data and hence test data would be the remaining 20% of the total data.

In [55]:
train_data,test_data = sales.random_split(.8,seed=0)

Although the gradient descent is designed for multiple regression since the constant is now a feature we can use the gradient descent function to estimate the parameters in the simple regression on squarefeet. The folowing cell sets up the feature_matrix, output, initial weights and step size for the first model:

# let's use gradient descent to obtain weights and then I will predict prices using the obtained weights.
simple_features = ['sqft_living']
my_output = 'price'
(simple_feature_matrix, output) = get_numpy_data(train_data, simple_features, my_output)
initial_weights = np.array([-47000., 1.])
step_size = 7e-12
tolerance = 2.5e7

Next running the gradient descent with the above parameters.

In [65]:
(test_simple_feature_matrix, test_output) = get_numpy_data(test_data, simple_features, my_output)

# Final Answer using model1 in gradient descent with 1 featurre included

In [66]:
test_predictions = predict_output(test_simple_feature_matrix, test_weight)
print test_predictions

[ 356134.44317093  784640.86422788  435069.83652353 ...,  663418.65300782
  604217.10799338  240550.4743332 ]


In [67]:
print test_predictions[0] # predicted price for the 1st house in the TEST data set for model 1 

356134.443171


In [68]:
test_residuals = test_output - test_predictions
test_RSS = (test_residuals * test_residuals).sum()
print test_RSS

2.75400047593e+14


# Running a multiple regression

Let's add one more feature. Earlier theere was only sqft_living feature, now I added one more feature called sqft_living15. Now we will use more than one actual feature.

In [69]:
model_features = ['sqft_living', 'sqft_living15'] # sqft_living15 is the average squarefeet for the nearest 15 neighbors. 
my_output = 'price'
(feature_matrix, output) = get_numpy_data(train_data, model_features, my_output)
initial_weights = np.array([-100000., 1., 1.])
step_size = 4e-12
tolerance = 1e9

In [72]:
weight_2 = regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance)
print weight_2

[ -9.99999688e+04   2.45072603e+02   6.52795277e+01]


# Final Answer using model2 in gradient descent with 2 features included

In [73]:
(test_feature_matrix, test_output) = get_numpy_data(test_data, model_features, my_output)

test_predictions_2 = predict_output(test_feature_matrix, weight_2)
print test_predictions_2

[ 366651.41203656  762662.39786164  386312.09499712 ...,  682087.39928241
  585579.27865729  216559.20396617]


In [74]:
print test_predictions_2[0]   # predicted price for the 1st house in the TEST data set for model 2.

366651.412037


In [75]:
print test_data['price'][0] # actual price for the 1st house in the test data set

310000.0


RSS for model 2 on TEST data is calculated below 

In [76]:
test_residuals_2 = test_output - test_predictions_2
test_RSS_2 = (test_residuals_2**2).sum()
print test_RSS_2

2.70263446465e+14
