# Waleed Khan

### Dataset: (https://www.kaggle.com/c/house-prices-advanced-regression-techniques)
#### Objective: Using a dataset from this challenge with 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, we want to build a model that can predict home prices.
#### Approach: I have iterated the GLM (Generalized Linear Model) 26 times, optimizing attributes used for training on the training set. These variables have been optimized for maximum prediction accuracy. I then trained my optimized GLM on a random 90% of the training set and  tested on the remaining 10% to judge accuracy.

In [2]:
# Read in all the data
trainingData=read.csv("train.csv")
# trainingData
# str(trainingData)

In [304]:
# Build training/testing set
train_index <- sample(1:nrow(trainingData), 0.9 * nrow(trainingData))
test_index <- setdiff(1:nrow(trainingData), train_index)
training_90 <- trainingData[train_index,]
testing_10 <- trainingData[test_index, -81]

# Here we decide the columns that we want to use in generating our model
# We continue to add/remove factors to increase accuracy
# HousePricesLM = lm(SalePrice ~ ., data = trainingData) is the format, this does not work for all attributes
# so I have combed through all attributes and removed non-working ones or ones that negatively impact accuracy

HousePricesLM1 = glm(SalePrice ~ 
                    MSZoning
                    , data = trainingData)

HousePricesLM2 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    , data = trainingData)

HousePricesLM3 = glm(SalePrice ~ 
                     Neighborhood
                    , data = trainingData)

HousePricesLM4 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood
                    , data = trainingData)


HousePricesLM5 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType
                    , data = trainingData)

HousePricesLM6 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType
                    +HouseStyle+OverallQual+OverallCond+YearBuilt
                    , data = trainingData)

HousePricesLM7 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    , data = trainingData)

# This model shows that adding MasVnrType makes the model less accurate, which is why we remove it later
HousePricesLM8 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl+
                    +MasVnrType
                    , data = trainingData)

# This model shows that adding MasVnrType and MasVnrArea makes the model still less accurate than LM7, which is why we remove it later
HousePricesLM9 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl+
                    +MasVnrType+MasVnrArea
                    , data = trainingData)

# Reverted to LM7
HousePricesLM10 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    , data = trainingData)


HousePricesLM11 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    , data = trainingData)

HousePricesLM12 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    , data = trainingData)

HousePricesLM13 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating
                    , data = trainingData)

HousePricesLM14 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir
                    , data = trainingData)

# This model shows that adding Electrical makes the model less accurate, which is why we remove it later
HousePricesLM15 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir+Electrical
                    , data = trainingData)

# Reverted to LM14
HousePricesLM16 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir
                    , data = trainingData)

HousePricesLM17 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir
                    +X1stFlrSF+X2ndFlrSF+LowQualFinSF+GrLivArea+BsmtFullBath+BsmtHalfBath+FullBath+HalfBath
                    +BedroomAbvGr+KitchenAbvGr+KitchenQual+TotRmsAbvGrd+Functional+Fireplaces+GarageCars+GarageArea
                    +PavedDrive+WoodDeckSF+OpenPorchSF+EnclosedPorch+X3SsnPorch+ScreenPorch+PoolArea
                    , data = trainingData)

HousePricesLM18 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir
                    +X1stFlrSF+X2ndFlrSF+LowQualFinSF+GrLivArea+BsmtFullBath+BsmtHalfBath+FullBath+HalfBath
                    +BedroomAbvGr+KitchenAbvGr+KitchenQual+TotRmsAbvGrd+Functional+Fireplaces+GarageCars+GarageArea
                    +PavedDrive+WoodDeckSF+OpenPorchSF+EnclosedPorch+X3SsnPorch+ScreenPorch+PoolArea
                    +MiscVal
                    , data = trainingData)

HousePricesLM19 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir
                    +X1stFlrSF+X2ndFlrSF+LowQualFinSF+GrLivArea+BsmtFullBath+BsmtHalfBath+FullBath+HalfBath
                    +BedroomAbvGr+KitchenAbvGr+KitchenQual+TotRmsAbvGrd+Functional+Fireplaces+GarageCars+GarageArea
                    +PavedDrive+WoodDeckSF+OpenPorchSF+EnclosedPorch+X3SsnPorch+ScreenPorch+PoolArea
                    , data = trainingData)

HousePricesLM20 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir
                    +X1stFlrSF+X2ndFlrSF+LowQualFinSF+GrLivArea+BsmtFullBath+BsmtHalfBath+FullBath+HalfBath
                    +BedroomAbvGr+KitchenAbvGr+KitchenQual+TotRmsAbvGrd+Functional+Fireplaces+GarageCars+GarageArea
                    +PavedDrive+WoodDeckSF+OpenPorchSF+EnclosedPorch+X3SsnPorch+ScreenPorch+PoolArea
                    +MoSold
                    , data = trainingData)

HousePricesLM21 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir
                    +X1stFlrSF+X2ndFlrSF+LowQualFinSF+GrLivArea+BsmtFullBath+BsmtHalfBath+FullBath+HalfBath
                    +BedroomAbvGr+KitchenAbvGr+KitchenQual+TotRmsAbvGrd+Functional+Fireplaces+GarageCars+GarageArea
                    +PavedDrive+WoodDeckSF+OpenPorchSF+EnclosedPorch+X3SsnPorch+ScreenPorch+PoolArea
                    +MoSold+YrSold
                    , data = trainingData)

HousePricesLM22 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir
                    +X1stFlrSF+X2ndFlrSF+LowQualFinSF+GrLivArea+BsmtFullBath+BsmtHalfBath+FullBath+HalfBath
                    +BedroomAbvGr+KitchenAbvGr+KitchenQual+TotRmsAbvGrd+Functional+Fireplaces+GarageCars+GarageArea
                    +PavedDrive+WoodDeckSF+OpenPorchSF+EnclosedPorch+X3SsnPorch+ScreenPorch+PoolArea
                    +YrSold
                    , data = trainingData)

HousePricesLM23 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir
                    +X1stFlrSF+X2ndFlrSF+LowQualFinSF+GrLivArea+BsmtFullBath+BsmtHalfBath+FullBath+HalfBath
                    +BedroomAbvGr+KitchenAbvGr+KitchenQual+TotRmsAbvGrd+Functional+Fireplaces+GarageCars+GarageArea
                    +PavedDrive+WoodDeckSF+OpenPorchSF+EnclosedPorch+X3SsnPorch+ScreenPorch+PoolArea
                    , data = trainingData)

HousePricesLM24 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir
                    +X1stFlrSF+X2ndFlrSF+LowQualFinSF+GrLivArea+BsmtFullBath+BsmtHalfBath+FullBath+HalfBath
                    +BedroomAbvGr+KitchenAbvGr+KitchenQual+TotRmsAbvGrd+Functional+Fireplaces+GarageCars+GarageArea
                    +PavedDrive+WoodDeckSF+OpenPorchSF+EnclosedPorch+X3SsnPorch+ScreenPorch+PoolArea
                    +SaleType
                    , data = trainingData)

# Our final model
HousePricesLM25 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir
                    +X1stFlrSF+X2ndFlrSF+LowQualFinSF+GrLivArea+BsmtFullBath+BsmtHalfBath+FullBath+HalfBath
                    +BedroomAbvGr+KitchenAbvGr+KitchenQual+TotRmsAbvGrd+Functional+Fireplaces+GarageCars+GarageArea
                    +PavedDrive+WoodDeckSF+OpenPorchSF+EnclosedPorch+X3SsnPorch+ScreenPorch+PoolArea
                    +SaleType+SaleCondition
                    , data = trainingData)

# Our final model if we had not removed certain attributes that negatively impacted accuracy
HousePricesLM26 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl+
                    +MasVnrType+MasVnrArea+
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF
                    +Heating+CentralAir+Electrical
                    +X1stFlrSF+X2ndFlrSF+LowQualFinSF+GrLivArea+BsmtFullBath+BsmtHalfBath+FullBath+HalfBath
                    +BedroomAbvGr+KitchenAbvGr+KitchenQual+TotRmsAbvGrd+Functional+Fireplaces+GarageCars+GarageArea
                    +PavedDrive+WoodDeckSF+OpenPorchSF+EnclosedPorch+X3SsnPorch+ScreenPorch+PoolArea
                    +MiscVal
                    +MoSold+YrSold
                    +SaleType+SaleCondition
                    , data = trainingData)

# Same thing as HousePricesLM25, but it trains on training_90 and will be tested on testing_10
HousePricesLM27 = glm(SalePrice ~ 
                    MSSubClass+MSZoning+LotArea+Street+LotShape+LandContour+Utilities+LotConfig+LandSlope
                    +Neighborhood+BldgType+HouseStyle+OverallQual+OverallCond+YearBuilt
                    +Condition1+Condition2+RoofStyle+RoofMatl
                    +YearRemodAdd+Exterior1st+Exterior2nd+ExterQual+ExterCond
                    +Foundation+BsmtFinSF1+BsmtFinSF2+BsmtUnfSF+TotalBsmtSF+Heating+CentralAir
                    +X1stFlrSF+X2ndFlrSF+LowQualFinSF+GrLivArea+BsmtFullBath+BsmtHalfBath+FullBath+HalfBath
                    +BedroomAbvGr+KitchenAbvGr+KitchenQual+TotRmsAbvGrd+Functional+Fireplaces+GarageCars+GarageArea
                    +PavedDrive+WoodDeckSF+OpenPorchSF+EnclosedPorch+X3SsnPorch+ScreenPorch+PoolArea
                    +SaleType+SaleCondition
                    , data = training_90)

predictionData1 = predict(HousePricesLM1, newdata = trainingData)
predictionData1[is.na(predictionData1)] <- 0
predictionData2 = predict(HousePricesLM2, newdata = trainingData)
predictionData2[is.na(predictionData2)] <- 0
predictionData3 = predict(HousePricesLM3, newdata = trainingData)
predictionData3[is.na(predictionData3)] <- 0
predictionData4 = predict(HousePricesLM4, newdata = trainingData)
predictionData4[is.na(predictionData4)] <- 0
predictionData5 = predict(HousePricesLM5, newdata = trainingData)
predictionData5[is.na(predictionData5)] <- 0
predictionData6 = predict(HousePricesLM6, newdata = trainingData)
predictionData6[is.na(predictionData6)] <- 0
predictionData7 = predict(HousePricesLM7, newdata = trainingData)
predictionData7[is.na(predictionData7)] <- 0
predictionData8 = predict(HousePricesLM8, newdata = trainingData)
predictionData8[is.na(predictionData8)] <- 0
predictionData9 = predict(HousePricesLM9, newdata = trainingData)
predictionData9[is.na(predictionData9)] <- 0
predictionData10 = predict(HousePricesLM10, newdata = trainingData)
predictionData10[is.na(predictionData10)] <- 0
predictionData11 = predict(HousePricesLM11, newdata = trainingData)
predictionData11[is.na(predictionData11)] <- 0
predictionData12 = predict(HousePricesLM12, newdata = trainingData)
predictionData12[is.na(predictionData12)] <- 0
predictionData13 = predict(HousePricesLM13, newdata = trainingData)
predictionData13[is.na(predictionData13)] <- 0
predictionData14 = predict(HousePricesLM14, newdata = trainingData)
predictionData14[is.na(predictionData14)] <- 0
predictionData15 = predict(HousePricesLM15, newdata = trainingData)
predictionData15[is.na(predictionData15)] <- 0
predictionData16 = predict(HousePricesLM16, newdata = trainingData)
predictionData16[is.na(predictionData16)] <- 0
predictionData17 = predict(HousePricesLM17, newdata = trainingData)
predictionData17[is.na(predictionData17)] <- 0
predictionData18 = predict(HousePricesLM18, newdata = trainingData)
predictionData18[is.na(predictionData18)] <- 0
predictionData19 = predict(HousePricesLM19, newdata = trainingData)
predictionData19[is.na(predictionData19)] <- 0
predictionData20 = predict(HousePricesLM20, newdata = trainingData)
predictionData20[is.na(predictionData20)] <- 0
predictionData21 = predict(HousePricesLM21, newdata = trainingData)
predictionData21[is.na(predictionData21)] <- 0
predictionData22 = predict(HousePricesLM22, newdata = trainingData)
predictionData22[is.na(predictionData22)] <- 0
predictionData23 = predict(HousePricesLM23, newdata = trainingData)
predictionData23[is.na(predictionData23)] <- 0
predictionData24 = predict(HousePricesLM24, newdata = trainingData)
predictionData24[is.na(predictionData24)] <- 0
predictionData25 = predict(HousePricesLM25, newdata = trainingData)
predictionData25[is.na(predictionData25)] <- 0
predictionData26 = predict(HousePricesLM26, newdata = trainingData)
predictionData26[is.na(predictionData26)] <- 0
predictionData27 = predict(HousePricesLM27, newdata = testing_10)
predictionData27[is.na(predictionData27)] <- 0
# length(trainingData[,1])
# length(x)
# head(predictionData1)


“prediction from a rank-deficient fit may be misleading”

In [305]:
actualPrices = trainingData$SalePrice
actualPricesTest = trainingData[test_index,]$SalePrice

# We now calculate the accuracy, error, and amount of error for each model's predictions

setDifference1 = abs(actualPrices - predictionData1)
errorSum1 = sum(setDifference1, na.rm = TRUE)
error1 = (errorSum1 / sum(actualPrices))*100
accuracy1 = 100-error1

setDifference2 = abs(actualPrices - predictionData2)
errorSum2 = sum(setDifference2, na.rm = TRUE)
error2 = (errorSum2 / sum(actualPrices))*100
accuracy2 = 100-error2

setDifference3 = abs(actualPrices - predictionData3)
errorSum3 = sum(setDifference3, na.rm = TRUE)
error3 = (errorSum3 / sum(actualPrices))*100
accuracy3 = 100-error3

setDifference4 = abs(actualPrices - predictionData4)
errorSum4 = sum(setDifference4, na.rm = TRUE)
error4 = (errorSum4 / sum(actualPrices))*100
accuracy4 = 100-error4

setDifference5 = abs(actualPrices - predictionData5)
errorSum5 = sum(setDifference5, na.rm = TRUE)
error5 = (errorSum5 / sum(actualPrices))*100
accuracy5 = 100-error5

setDifference6 = abs(actualPrices - predictionData6)
errorSum6 = sum(setDifference6, na.rm = TRUE)
error6 = (errorSum6 / sum(actualPrices))*100
accuracy6 = 100-error6

setDifference7 = abs(actualPrices - predictionData7)
errorSum7 = sum(setDifference7, na.rm = TRUE)
error7 = (errorSum7 / sum(actualPrices))*100
accuracy7 = 100-error7

setDifference8 = abs(actualPrices - predictionData8)
errorSum8 = sum(setDifference8, na.rm = TRUE)
error8 = (errorSum8 / sum(actualPrices))*100
accuracy8 = 100-error8

setDifference9 = abs(actualPrices - predictionData9)
errorSum9 = sum(setDifference9, na.rm = TRUE)
error9 = (errorSum9 / sum(actualPrices))*100
accuracy9 = 100-error9

setDifference10 = abs(actualPrices - predictionData10)
errorSum10 = sum(setDifference10, na.rm = TRUE)
error10 = (errorSum10 / sum(actualPrices))*100
accuracy10 = 100-error10

setDifference11 = abs(actualPrices - predictionData11)
errorSum11 = sum(setDifference11, na.rm = TRUE)
error11 = (errorSum11 / sum(actualPrices))*100
accuracy11 = 100-error11

setDifference12 = abs(actualPrices - predictionData12)
errorSum12 = sum(setDifference12, na.rm = TRUE)
error12 = (errorSum12 / sum(actualPrices))*100
accuracy12 = 100-error12

setDifference13 = abs(actualPrices - predictionData13)
errorSum13 = sum(setDifference13, na.rm = TRUE)
error13 = (errorSum13 / sum(actualPrices))*100
accuracy13 = 100-error13

setDifference14 = abs(actualPrices - predictionData14)
errorSum14 = sum(setDifference14, na.rm = TRUE)
error14 = (errorSum14 / sum(actualPrices))*100
accuracy14 = 100-error14

setDifference15 = abs(actualPrices - predictionData15)
errorSum15 = sum(setDifference15, na.rm = TRUE)
error15 = (errorSum15 / sum(actualPrices))*100
accuracy15 = 100-error15

setDifference16 = abs(actualPrices - predictionData16)
errorSum16 = sum(setDifference16, na.rm = TRUE)
error16 = (errorSum16 / sum(actualPrices))*100
accuracy16 = 100-error16

setDifference17 = abs(actualPrices - predictionData17)
errorSum17 = sum(setDifference17, na.rm = TRUE)
error17 = (errorSum17 / sum(actualPrices))*100
accuracy17 = 100-error17

setDifference18 = abs(actualPrices - predictionData18)
errorSum18 = sum(setDifference18, na.rm = TRUE)
error18 = (errorSum18 / sum(actualPrices))*100
accuracy18 = 100-error18

setDifference19 = abs(actualPrices - predictionData19)
errorSum19 = sum(setDifference19, na.rm = TRUE)
error19 = (errorSum19 / sum(actualPrices))*100
accuracy19 = 100-error19

setDifference20 = abs(actualPrices - predictionData20)
errorSum20 = sum(setDifference20, na.rm = TRUE)
error20 = (errorSum20 / sum(actualPrices))*100
accuracy20 = 100-error20

setDifference21 = abs(actualPrices - predictionData21)
errorSum21 = sum(setDifference21, na.rm = TRUE)
error21 = (errorSum21 / sum(actualPrices))*100
accuracy21 = 100-error21

setDifference22 = abs(actualPrices - predictionData22)
errorSum22 = sum(setDifference22, na.rm = TRUE)
error22 = (errorSum22 / sum(actualPrices))*100
accuracy22 = 100-error22

setDifference23 = abs(actualPrices - predictionData23)
errorSum23 = sum(setDifference23, na.rm = TRUE)
error23 = (errorSum23 / sum(actualPrices))*100
accuracy23 = 100-error23

setDifference24 = abs(actualPrices - predictionData24)
errorSum24 = sum(setDifference24, na.rm = TRUE)
error24 = (errorSum24 / sum(actualPrices))*100
accuracy24 = 100-error24

setDifference25 = abs(actualPrices - predictionData25)
errorSum25 = sum(setDifference25, na.rm = TRUE)
error25 = (errorSum25 / sum(actualPrices))*100
accuracy25 = 100-error25

setDifference26 = abs(actualPrices - predictionData26)
errorSum26 = sum(setDifference26, na.rm = TRUE)
error26 = (errorSum26 / sum(actualPrices))*100
accuracy26 = 100-error26

setDifference27 = abs(actualPricesTest - predictionData27)
errorSum27 = sum(setDifference27, na.rm = TRUE)
error27 = (errorSum27 / sum(actualPricesTest))*100
accuracy27 = 100-error27

cat("--- Results for LM1 (Only uses MSZoning) ---")
cat("\nTotal Accuracy: ",accuracy1,"%")
cat("\nTotal Error: ",error1,"%")
cat("\nError Amount of Error: $",errorSum1)
cat("\n\n")
cat("--- Results for LM2 ---")
cat("\nTotal Accuracy: ",accuracy2,"%")
cat("\nTotal Error: ",error2,"%")
cat("\nError Amount of Error: $",errorSum2)
cat("\n\n")
cat("--- Results for LM3 (Only uses Neighborhood) ---")
cat("\nTotal Accuracy: ",accuracy3,"%")
cat("\nTotal Error: ",error3,"%")
cat("\nError Amount of Error: $",errorSum3)
cat("\n\n")
cat("--- Results for LM4 ---")
cat("\nTotal Accuracy: ",accuracy4,"%")
cat("\nTotal Error: ",error4,"%")
cat("\nError Amount of Error: $",errorSum4)
cat("\n\n")
cat("--- Results for LM5 ---")
cat("\nTotal Accuracy: ",accuracy5,"%")
cat("\nTotal Error: ",error5,"%")
cat("\nError Amount of Error: $",errorSum5)
cat("\n\n")
cat("--- Results for LM6 ---")
cat("\nTotal Accuracy: ",accuracy6,"%")
cat("\nTotal Error: ",error6,"%")
cat("\nError Amount of Error: $",errorSum6)
cat("\n\n")
cat("--- Results for LM7 ---")
cat("\nTotal Accuracy: ",accuracy7,"%")
cat("\nTotal Error: ",error7,"%")
cat("\nError Amount of Error: $",errorSum7)
cat("\n\n")
cat("--- Results for LM8 (Adding MasVnrType seems to make it less accurate) ---")
cat("\nTotal Accuracy: ",accuracy8,"%")
cat("\nTotal Error: ",error8,"%")
cat("\nError Amount of Error: $",errorSum8)
cat("\n\n")
cat("--- Results for LM9 (Adding MasVnrArea to LM8 makes it more accurate, but still not as accurate as LM7) ---")
cat("\nTotal Accuracy: ",accuracy9,"%")
cat("\nTotal Error: ",error9,"%")
cat("\nError Amount of Error: $",errorSum9)
cat("\n\n")
cat("--- Results for LM10 (Removing both MasVnrArea and MasVnrType and reverting to LM7 is more accurate) ---")
cat("\nTotal Accuracy: ",accuracy10,"%")
cat("\nTotal Error: ",error10,"%")
cat("\nError Amount of Error: $",errorSum10)
cat("\n\n")
cat("--- Results for LM11 ---")
cat("\nTotal Accuracy: ",accuracy11,"%")
cat("\nTotal Error: ",error11,"%")
cat("\nError Amount of Error: $",errorSum11)
cat("\n\n")
cat("--- Results for LM12 ---")
cat("\nTotal Accuracy: ",accuracy12,"%")
cat("\nTotal Error: ",error12,"%")
cat("\nError Amount of Error: $",errorSum12)
cat("\n\n")
cat("--- Results for LM13 ---")
cat("\nTotal Accuracy: ",accuracy13,"%")
cat("\nTotal Error: ",error13,"%")
cat("\nError Amount of Error: $",errorSum13)
cat("\n\n")
cat("--- Results for LM14 ---")
cat("\nTotal Accuracy: ",accuracy14,"%")
cat("\nTotal Error: ",error14,"%")
cat("\nError Amount of Error: $",errorSum14)
cat("\n\n")
cat("--- Results for LM15 (Adding Electrical seems to make it less accurate)  ---")
cat("\nTotal Accuracy: ",accuracy15,"%")
cat("\nTotal Error: ",error15,"%")
cat("\nError Amount of Error: $",errorSum15)
cat("\n\n")
cat("--- Results for LM16 (Removing Electrical and reverting to LM14 is more accurate) ---")
cat("\nTotal Accuracy: ",accuracy16,"%")
cat("\nTotal Error: ",error16,"%")
cat("\nError Amount of Error: $",errorSum16)
cat("\n\n")
cat("--- Results for LM17 ---")
cat("\nTotal Accuracy: ",accuracy17,"%")
cat("\nTotal Error: ",error17,"%")
cat("\nError Amount of Error: $",errorSum17)
cat("\n\n")
cat("--- Results for LM18 (Adding MiscVal seems to make it less accurate) ---")
cat("\nTotal Accuracy: ",accuracy18,"%")
cat("\nTotal Error: ",error18,"%")
cat("\nError Amount of Error: $",errorSum18)
cat("\n\n")
cat("--- Results for LM19 (Removing MiscVal and reverting to LM17 is more accurate) ---")
cat("\nTotal Accuracy: ",accuracy19,"%")
cat("\nTotal Error: ",error19,"%")
cat("\nError Amount of Error: $",errorSum19)
cat("\n\n")
cat("--- Results for LM20 (Adding MoSold seems to make it less accurate) ---")
cat("\nTotal Accuracy: ",accuracy20,"%")
cat("\nTotal Error: ",error20,"%")
cat("\nError Amount of Error: $",errorSum20)
cat("\n\n")
cat("--- Results for LM21 (Adding MoSold and YrSold together seems to make it even less accurate) ---")
cat("\nTotal Accuracy: ",accuracy21,"%")
cat("\nTotal Error: ",error21,"%")
cat("\nError Amount of Error: $",errorSum21)
cat("\n\n")
cat("--- Results for LM22 (Removing MoSold and keeping YrSold is still less accurate than LM19) ---")
cat("\nTotal Accuracy: ",accuracy22,"%")
cat("\nTotal Error: ",error22,"%")
cat("\nError Amount of Error: $",errorSum22)
cat("\n\n")
cat("--- Results for LM23 (Reverting to LM19 is most accurate) ---")
cat("\nTotal Accuracy: ",accuracy23,"%")
cat("\nTotal Error: ",error23,"%")
cat("\nError Amount of Error: $",errorSum23)
cat("\n\n")
cat("--- Results for LM24 ---")
cat("\nTotal Accuracy: ",accuracy24,"%")
cat("\nTotal Error: ",error24,"%")
cat("\nError Amount of Error: $",errorSum24)
cat("\n\n")
cat("--- Results for LM25 (Our final model) ---")
cat("\nTotal Accuracy: ",accuracy25,"%")
cat("\nTotal Error: ",error25,"%")
cat("\nError Amount of Error: $",errorSum25)
cat("\n\n")
cat("--- Results for LM26 (Our model if we had not removed attributes that negatively impacted accuracy) ---")
cat("\nTotal Accuracy: ",accuracy26,"%")
cat("\nTotal Error: ",error26,"%")
cat("\nError Amount of Error: $",errorSum26)
cat("\n\n")
cat("--- Results for LM27 (I took LM25 and trained on a random 90% of data and tested on the remaining 10% of data) ---")
cat("\nTotal Accuracy: ",accuracy27,"%")
cat("\nTotal Error: ",error27,"%")
cat("\nError Amount of Error: $",errorSum27)
# differenceSums = sum(setDifference)
# differenceSums
# actualData[1]
# predictionData[1]

--- Results for LM1 (Only uses MSZoning) ---
Total Accuracy:  70.93624 %
Total Error:  29.06376 %
Error Amount of Error: $ 76770459

--- Results for LM2 ---
Total Accuracy:  72.7728 %
Total Error:  27.2272 %
Error Amount of Error: $ 71919271

--- Results for LM3 (Only uses Neighborhood) ---
Total Accuracy:  80.12211 %
Total Error:  19.87789 %
Error Amount of Error: $ 52506439

--- Results for LM4 ---
Total Accuracy:  81.31508 %
Total Error:  18.68492 %
Error Amount of Error: $ 49355260

--- Results for LM5 ---
Total Accuracy:  81.95444 %
Total Error:  18.04556 %
Error Amount of Error: $ 47666442

--- Results for LM6 ---
Total Accuracy:  85.88539 %
Total Error:  14.11461 %
Error Amount of Error: $ 37283019

--- Results for LM7 ---
Total Accuracy:  86.25277 %
Total Error:  13.74723 %
Error Amount of Error: $ 36312610

--- Results for LM8 (Adding MasVnrType seems to make it less accurate) ---
Total Accuracy:  85.77463 %
Total Error:  14.22537 %
Error Amount of Error: $ 37575590

--- Resul

In [306]:
cat("--- Correlation Coefficients for each LM price predictions compared to their actual prices) ---")
cat("\n1:",cor(predictionData1,actualPrices))
cat("\n2:",cor(predictionData2,actualPrices))
cat("\n3:",cor(predictionData3,actualPrices))
cat("\n4:",cor(predictionData4,actualPrices))
cat("\n5:",cor(predictionData5,actualPrices))
cat("\n6:",cor(predictionData6,actualPrices))
cat("\n7:",cor(predictionData7,actualPrices))
cat("\n8:",cor(predictionData8,actualPrices))
cat("\n9:",cor(predictionData9,actualPrices))
cat("\n10:",cor(predictionData10,actualPrices))
cat("\n11:",cor(predictionData11,actualPrices))
cat("\n12:",cor(predictionData12,actualPrices))
cat("\n13:",cor(predictionData13,actualPrices))
cat("\n14:",cor(predictionData14,actualPrices))
cat("\n15:",cor(predictionData15,actualPrices))
cat("\n16:",cor(predictionData16,actualPrices))
cat("\n17:",cor(predictionData17,actualPrices))
cat("\n18:",cor(predictionData18,actualPrices))
cat("\n19:",cor(predictionData19,actualPrices))
cat("\n20:",cor(predictionData20,actualPrices))
cat("\n21:",cor(predictionData21,actualPrices))
cat("\n22:",cor(predictionData22,actualPrices))
cat("\n23:",cor(predictionData23,actualPrices))
cat("\n24:",cor(predictionData24,actualPrices))
cat("\n25 (Final without unecessary attributes):",cor(predictionData25,actualPrices))
cat("\n26 (Final with attributes):",cor(predictionData26,actualPrices))
cat("\n27 (Final without unecessary attributes, trained on a random 90% of data and tested on 10%):",cor(predictionData27,actualPricesTest))


--- Correlation Coefficients for each LM price predictions compared to their actual prices) ---
1: 0.3279629
2: 0.4690955
3: 0.7386305
4: 0.7778006
5: 0.7934401
6: 0.8791134
7: 0.8906607
8: 0.863399
9: 0.8685293
10: 0.8906607
11: 0.9073554
12: 0.9317707
13: 0.9319325
14: 0.9319409
15: 0.930429
16: 0.9319409
17: 0.9572784
18: 0.9572797
19: 0.9572784
20: 0.9573817
21: 0.9574198
22: 0.9573008
23: 0.9572784
24: 0.9588611
25 (Final without unecessary attributes): 0.9590762
26 (Final with attributes): 0.9307512
27 (Final without unecessary attributes, trained on a random 90% of data and tested on 10%): 0.9392726

# Observations

### I found in LM1 and LM3 that the factors of MSZoning and Neighrborhood had a strong correlation with the final price. LM1's predictions with the MSZoning attribute had a 0.32 correlation coefficient. LM3's predictions using only the Neighborhood attribute had a .793 correlation coefficient with the actual prices! That is massive!

### Some attributes were found to negatively impact accuracy, and were removed to improve the LMs. These factors include: MasVnrType, MasVnrArea, MiscVal, Electrical, MoSold, YrSold, SaleType, SaleCondition

### In the end, after fine-tuning, I achieved an accuracy >90%