# Regression Week 2: Multiple Linear Regression Quiz 2

Estimating Multiple Regression Coefficients (Gradient Descent)

If you’re using python: to do the matrix operations required to perform a gradient descent we will be using the popular python library ‘numpy’ which is a computational library specialized for operations on arrays. For students unfamiliar with numpy we have created a numpy tutorial (see useful resources). It is common to import numpy under the name ‘np’ for short, to do this execute:

In [8]:
import numpy as np
import pandas as pd

Next write a function that takes a data set, a list of features (e.g. [‘sqft_living’, ‘bedrooms’]), to be used as inputs, and a name of the output (e.g. ‘price’). This function should return a features_matrix (2D array) consisting of first a column of ones followed by columns containing the values of the input features in the data set in the same order as the input list. It should also return an output_array which is an array of the values of the output in the data set (e.g. ‘price’). e.g. if you’re using SFrames and numpy you can complete the following function:

In [36]:
def get_numpy_data(df, features, output):
    
    #Set constant value
    df['constant'] = 1
    
    #Add constant value to features and create matrix
    features = ['constant'] + features
    features_matrix = df[features].values
    
    #Select output and create array
    output_array = df[output].values
    
    return features_matrix, output_array

If the features matrix (including a column of 1s for the constant) is stored as a 2D array (or matrix) and the regression weights are stored as a 1D array then the predicted output is just the dot product between the features matrix and the weights (with the weights on the right). Write a function ‘predict_output’ which accepts a 2D array ‘feature_matrix’ and a 1D array ‘weights’ and returns a 1D array ‘predictions’. e.g. in python

In [5]:
def predict_outcome(feature_matrix, weights):
    predictions = np.dot(feature_matrix, weights)
    return(predictions)

If we have a the values of a single input feature in an array ‘feature’ and the prediction ‘errors’ (predictions - output) then the derivative of the regression cost function with respect to the weight of ‘feature’ is just twice the dot product between ‘feature’ and ‘errors’. Write a function that accepts a ‘feature’ array and ‘error’ array and returns the ‘derivative’ (a single number). e.g. in python:

In [6]:
def feature_derivative(errors, feature):
    derivative = 2*np.dot(feature, errors)
    return(derivative)

Now we will use our predict_output and feature_derivative to write a gradient descent function. Although we can compute the derivative for all the features simultaneously (the gradient) we will explicitly loop over the features individually for simplicity. Write a gradient descent function that does the following:

Accepts a numpy feature_matrix 2D array, a 1D output array, an array of initial weights, a step size and a convergence tolerance.

While not converged updates each feature weight by subtracting the step size times the derivative for that feature given the current weights

At each step computes the magnitude/length of the gradient (square root of the sum of squared components)

When the magnitude of the gradient is smaller than the input tolerance returns the final weight vector.

In [42]:
def regression_gradient_descent(feature_matrix, output_array, initial_weights, step_size, tolerance):
    converged = False
    weights = np.array(initial_weights) 
    while not converged:
        predictions = predict_outcome(feature_matrix, weights)
        errors = predictions - output_array
        gradient_sum_squares = 0
        for i in range(len(weights)): # each weight
            
            # compute the derivative for weight[i]:
            derivative = feature_derivative(errors, feature_matrix[:, i]) #feature column of weight[i]

            # subtract the step size times the derivative from the current weight
            weights[i] = weights[i] - step_size*derivative
            
            # calculate gradient sum of squares
            gradient_sum_squares = (derivative * derivative) + gradient_sum_squares
            
        # calculate gradient magnitude and check if it's less than tolerance (outside loop)
        gradient_magnitude = np.sqrt(gradient_sum_squares)
        if gradient_magnitude < tolerance:
            converged = True
    return(weights)

Now we will run the regression_gradient_descent function on some actual data. In particular we will use the gradient descent to estimate the model from Week 1 using just an intercept and slope. Use the following parameters:

* features: ‘sqft_living’
* output: ‘price’
* initial weights: -47000, 1 (intercept, sqft_living respectively)
* step_size = 7e-12
* tolerance = 2.5e7

In [10]:
# Load Data
kc_house_train = pd.read_csv('~/Courses/u-wash-machine-learning/regression/data/kc_house_train_data.csv')
kc_house_test  = pd.read_csv('~/Courses/u-wash-machine-learning/regression/data/kc_house_test_data.csv')

In [43]:
simple_features = ['sqft_living']
my_output = 'price'
simple_feature_matrix, output = get_numpy_data(kc_house_train, simple_features, my_output)
initial_weights = np.array([-47000., 1.])
step_size = 7e-12
tolerance = 2.5e7

Use these parameters to estimate the slope and intercept for predicting prices based only on ‘sqft_living’.

In [44]:
simple_weights = regression_gradient_descent(simple_feature_matrix, output,initial_weights, step_size, tolerance)

In [50]:
simple_weights

array([-46999.88716555,    281.91211918])

Quiz Question: What is the value of the weight for sqft_living -- the second element of ‘simple_weights’ (rounded to 1 decimal place)?

In [48]:
round(simple_weights[1],1)

281.9

Now build a corresponding ‘test_simple_feature_matrix’ and ‘test_output’ using test_data. Using ‘test_simple_feature_matrix’ and ‘simple_weights’ compute the predicted house prices on all the test data.

In [49]:
test_simple_feature_matrix, test_output = get_numpy_data(kc_house_test, simple_features, my_output)

In [52]:
test_predictions = predict_outcome(test_simple_feature_matrix, simple_weights)

Quiz Question: What is the predicted price for the 1st house in the Test data set for model 1 (round to nearest dollar)?

In [53]:
test_predictions[0]

356134.4432550024

Now compute RSS on all test data for this model. Record the value and store it for later

In [57]:
output = kc_house_test['price']
residuals = test_predictions - output
RSS = (residuals * residuals).sum()
print(RSS)

275400044902128.3


Now we will use the gradient descent to fit a model with more than 1 predictor variable (and an intercept). Use the following parameters:

In [63]:
model_features = ['sqft_living', 'sqft_living15']
my_output = 'price'
(feature_matrix, output) = get_numpy_data(kc_house_train, model_features, my_output)
initial_weights = np.array([-100000., 1., 1.])
step_size = 4e-12
tolerance = 1e9

Note that sqft_living_15 is the average square feet of the nearest 15 neighbouring houses.

Run gradient descent on a model with ‘sqft_living’ and ‘sqft_living_15’ as well as an intercept with the above parameters. Save the resulting regression weights.

In [65]:
weights = regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance)

Use the regression weights from this second model (using sqft_living and sqft_living_15) and predict the outcome of all the house prices on the TEST data. 

In [66]:
test_feature_matrix, test_output = get_numpy_data(kc_house_test, model_features, my_output)

In [67]:
test_predictions = predict_outcome(test_feature_matrix, weights)

Quiz Question: What is the predicted price for the 1st house in the TEST data set for model 2 (round to nearest dollar)?

In [68]:
test_predictions[0]

366651.4116294939

What is the actual price for the 1st house in the Test data set?

In [69]:
test_output[0]

310000.0

Which estimate was closer to the true price for the 1st house on the TEST data set, model 1 or model 2?

In [71]:
print("Model 1 Difference", 356134.4432550024 - 310000.0)
print("Model 2 Difference", 366651.4116294939 - 310000.0)

Model 1 Difference 46134.44325500238
Model 2 Difference 56651.41162949387


Now compute RSS on all test data for the second model. Record the value and store it for later.

In [72]:
output = kc_house_test['price']
residuals = test_predictions - test_output
RSS = (residuals * residuals).sum()
print(RSS)

270263443629803.56
