#### Feature vectors are used to represent numeric or symbolic characteristics of data in a mathematical way. They can be represented in multidimensional vector space. We can perform mathematical operations on them, like computing the distance and adding them. We can do this by computing the vector norm, which helps to define the size of a vector and is also usful in regularization

#### algebra vs linear algebra 

In [1]:
import numpy as np

# Multiply two arrays 
x = [1,2,3]
y = [2,3,4]
product = []
for i in range(len(x)):
    product.append(x[i]*y[i])
    
# Linear algebra version (3x faster)
x = np.array([1,2,3])
y = np.array([2,3,4])
x * y

array([ 2,  6, 12])

#### simple neural network

In [5]:
import numpy as np

# sigmoid function
def nonlin(x,deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))
    
# input dataset
X = np.array([  [0,0,1],
                [0,1,1],
                [1,0,1],
                [1,1,1] ])
    
# output dataset            
y = np.array([[0,0,1,1]]).T

# seed random numbers to make calculation deterministic (just a good practice)
np.random.seed(1)

# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1)) - 1

for iter in range(10000):

    # forward propagation
    l0 = X
    l1 = nonlin(np.dot(l0,syn0))

    # how much did we miss?
    l1_error = y - l1

    # multiply how much we missed by the slope of the sigmoid at the values in l1
    l1_delta = l1_error * nonlin(l1,True)

    # update weights
    syn0 += np.dot(l0.T,l1_delta)

print ("Output After Training:")
print (l1)

Output After Training:
[[ 0.00966449]
 [ 0.00786506]
 [ 0.99358898]
 [ 0.99211957]]


### multiple regression

In [75]:
import pandas as pd
import numpy as np

from math import sqrt

In [12]:
# 'C:\\Users\\1098071\\Documents\\imath\\vector_l1l2'

C:\Users\1098071\Documents\imath\vector_l1l2


In [21]:
sales = pd.read_csv('kc_house_data.csv')

In [22]:
sales.head()

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,7129300520,20141013T000000,221900.0,3,1.0,1180,5650,1.0,0,0,...,7,1180,0,1955,0,98178,47.5112,-122.257,1340,5650
1,6414100192,20141209T000000,538000.0,3,2.25,2570,7242,2.0,0,0,...,7,2170,400,1951,1991,98125,47.721,-122.319,1690,7639
2,5631500400,20150225T000000,180000.0,2,1.0,770,10000,1.0,0,0,...,6,770,0,1933,0,98028,47.7379,-122.233,2720,8062
3,2487200875,20141209T000000,604000.0,4,3.0,1960,5000,1.0,0,0,...,7,1050,910,1965,0,98136,47.5208,-122.393,1360,5000
4,1954400510,20150218T000000,510000.0,3,2.0,1680,8080,1.0,0,0,...,8,1680,0,1987,0,98074,47.6168,-122.045,1800,7503


In [30]:
print (sales.columns)
type(sales.columns)

Index(['id', 'date', 'price', 'bedrooms', 'bathrooms', 'sqft_living',
       'sqft_lot', 'floors', 'waterfront', 'view', 'condition', 'grade',
       'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated', 'zipcode',
       'lat', 'long', 'sqft_living15', 'sqft_lot15'],
      dtype='object')


pandas.core.indexes.base.Index

Recall that the predicted value given the weights and the features is just the dot product between the feature and weight vector. Similarly, if we put all of the features row-by-row in a matrix then the predicted value for *all* the observations can be computed by right multiplying the "feature matrix" by the "weight vector". 

Then the question becomes numpy array multiplication. We use Panda's .as_matrix() to convert the dataframe into a numpy matrix, 2D numpy array (also called a matrix).

#### change df to array

Now we will write a function that will accept an SFrame, a list of feature names (e.g. ['sqft_living', 'bedrooms']) and an target feature e.g. ('price') and will return two things:
* A numpy matrix whose columns are the desired features plus a constant column (this is how we create an 'intercept')
* A numpy array containing the values of the output

With this in mind, complete the following function (where there's an empty line you should write a line of code that does what the comment above indicates)

In [40]:
def get_numpy_data(dataframe, features, output):
    dataframe['constant'] = 1 
    # add the column 'constant' to the front of the features list so that we can extract it along with the others:
    features = ['constant'] + features # this is how you combine two lists
    # select the columns of data_SFrame given by the features list into the SFrame features_sframe (now including constant):
    features_sframe = dataframe[features]
    # the following line will convert the features_SFrame into a numpy matrix:
    feature_matrix = features_sframe.values
    # assign the column of data_sframe associated with the output to the SArray output_sarray
    output_sarray = dataframe[output]
    # the following will convert the SArray into a numpy array by first converting it to a list
    output_array = output_sarray.values
    return(feature_matrix, output_array)

In [49]:
(example_features, example_output) = get_numpy_data(sales, ['sqft_living'], 'price') # the [] around 'sqft_living' makes it a list
print (example_features[0,:]) # this accesses the first row of the data the ':' indicates 'all columns'
print (example_output[0]) # and the corresponding output

[   1 1180]
221900.0


#### get predictions given weigths and features 

np.dot() also works when dealing with a matrix and a vector. 
Recall that the predictions from all the observations is just the RIGHT (as in weights on the right) dot product between the features *matrix* and the weights *vector*. Now move onto the predict_output function to compute the predictions for an entire matrix of features given the matrix and the weights:

In [54]:
def predict_output(feature_matrix, weights):
    predictions = np.dot(feature_matrix, weights)
    return predictions

In [55]:
# example weights for the prediction test run
my_weights = np.array([1,1])

In [64]:
test_predictions = predict_output(example_features, my_weights)
# slice the first and second prediction results from the numpy array output
test_predictions[[0,1]]

array([1181, 2571], dtype=int64)

#### derivative of the regression cost function 

Cost function is the sum over the data points of the squared difference between an observed output and a predicted output.

Since the derivative of a sum is the sum of the derivatives we can compute the derivative for a single data point and then sum over data points. We can write the squared difference between the observed output and predicted output for a single point as follows:

(w[0]\*[CONSTANT] + w[1]\*[feature_1] + ... + w[i] \*[feature_i] + ... +  w[k]\*[feature_k] - output)^2

Where we have k features and a constant. So the derivative with respect to weight w[i] by the chain rule is:

2\*(w[0]\*[CONSTANT] + w[1]\*[feature_1] + ... + w[i] \*[feature_i] + ... +  w[k]\*[feature_k] - output)\* [feature_i]

The term inside the paranethesis is just the error (difference between prediction and output). So we can re-write this as:

2\*error\*[feature_i]

That is, the derivative for the weight for feature i is the sum (over data points) of 2 times the product of the error and the feature itself. In the case of the constant then this is just twice the sum of the errors!

Recall that twice the sum of the product of two vectors is just twice the dot product of the two vectors. Therefore the derivative for the weight for feature_i is just two times the dot product between the values of feature_i and the current errors. 

With this in mind complete the following derivative function which computes the derivative of the weight given the value of the feature (over all data points) and the errors (over all data points).

In [66]:
def feature_derivative(errors, feature):
    derivative = 2 * np.dot(errors, feature)
    return derivative

In [74]:
(example_features, example_output) = get_numpy_data(sales, ['sqft_living'], 'price')
test_predictions = predict_output(example_features, my_weights)
errors = test_predictions - example_output

# the constants
features = example_features[:,0]
# checking the correctness of the derivative fucntion -- derivative w.r.t. the constant is just two times the error terms 
derivative = feature_derivative(errors, features)
from_error = 2 * sum(errors)
print (derivative)
print (from_error)

-23255901044.0
-23255901044.0


####  gradinet descnet 

In [None]:
from math import sqrt