# Objective: To understand the support vector machines for multi-class classification and regression problems.

## Multiclass classification dataset:

### This is a Glass Identification Data Set from UCI. It contains 10 attributes including id. The response is glass type(discrete 7 values)

### Attribute Information:
1.	Id number: 1 to 214 (removed from CSV file)
2.	RI: refractive index
3.	Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes 4-10)
4.	Mg: Magnesium
5.	Al: Aluminum
6.	Si: Silicon
7.	K: Potassium
8.	Ca: Calcium
9.	Ba: Barium
10.	Fe: Iron
### Target class
Type of glass: (class attribute)
-- 1 buildingwindowsfloatprocessed -- 2 buildingwindowsnonfloatprocessed -- 3 vehiclewindowsfloatprocessed
-- 4 vehiclewindowsnonfloatprocessed (none in this database)
-- 5 containers
-- 6 tableware
-- 7 headlamps

## Regression dataset:
Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.


### Data fields
1.	MSSubClass: The building class
2.	MSZoning: The general zoning classification
3.	LotFrontage: Linear feet of street connected to property
4.	LotArea: Lot size in square feet
5.	Street: Type of road access
6.	Alley: Type of alley access
7.	LotShape: General shape of property
8.	LandContour: Flatness of the property
9.	Utilities: Type of utilities available
10.	LotConfig: Lot configuration
11.	LandSlope: Slope of property
12.	Neighborhood: Physical locations within Ames city limits
13.	Condition1: Proximity to main road or railroad
14.	Condition2: Proximity to main road or railroad (if a second is present)
15.	BldgType: Type of dwelling
16.	HouseStyle: Style of dwelling
17.	OverallQual: Overall material and finish quality
18.	OverallCond: Overall condition rating
19.	YearBuilt: Original construction date
20.	YearRemodAdd: Remodel date
21.	RoofStyle: Type of roof
22.	RoofMatl: Roof material
23.	Exterior1st: Exterior covering on house
24.	Exterior2nd: Exterior covering on house (if more than one material)
25.	MasVnrType: Masonry veneer type
26.	MasVnrArea: Masonry veneer area in square feet
27.	ExterQual: Exterior material quality
28.	ExterCond: Present condition of the material on the exterior
29.	Foundation: Type of foundation
30.	BsmtQual: Height of the basement
31.	BsmtCond: General condition of the basement
32.	BsmtExposure: Walkout or garden level basement walls
33.	BsmtFinType1: Quality of basement finished area
34.	BsmtFinSF1: Type 1 finished square feet
35.	BsmtFinType2: Quality of second finished area (if present)
36.	BsmtFinSF2: Type 2 finished square feet
37.	BsmtUnfSF: Unfinished square feet of basement area
38.	TotalBsmtSF: Total square feet of basement area
39.	Heating: Type of heating
40.	HeatingQC: Heating quality and condition
41.	CentralAir: Central air conditioning
42.	Electrical: Electrical system
43.	1stFlrSF: First Floor square feet
44.	2ndFlrSF: Second floor square feet
45.	LowQualFinSF: Low quality finished square feet (all floors)
46.	GrLivArea: Above grade (ground) living area square feet
47.	BsmtFullBath: Basement full bathrooms
48.	BsmtHalfBath: Basement half bathrooms
49.	FullBath: Full bathrooms above grade
50.	HalfBath: Half baths above grade
51.	Bedroom: Number of bedrooms above basement level
52.	Kitchen: Number of kitchens
53.	KitchenQual: Kitchen quality
54.	TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
55.	Functional: Home functionality rating
56.	Fireplaces: Number of fireplaces
57.	FireplaceQu: Fireplace quality
58.	GarageType: Garage location
59.	GarageYrBlt: Year garage was built
60.	GarageFinish: Interior finish of the garage
61.	GarageCars: Size of garage in car capacity
62.	GarageArea: Size of garage in square feet
63.	GarageQual: Garage quality
64.	GarageCond: Garage condition
65.	PavedDrive: Paved driveway
66.	WoodDeckSF: Wood deck area in square feet
67.	OpenPorchSF: Open porch area in square feet
68.	EnclosedPorch: Enclosed porch area in square feet
69.	3SsnPorch: Three season porch area in square feet
70.	ScreenPorch: Screen porch area in square feet
71.	PoolArea: Pool area in square feet
72.	PoolQC: Pool quality
73.	Fence: Fence quality
74.	MiscFeature: Miscellaneous feature not covered in other categories
75.	MiscVal: $Value of miscellaneous feature
76.	MoSold: Month Sold
77.	YrSold: Year Sold
78.	SaleType: Type of sale
79.	SaleCondition: Condition of sale

### Target:
SalePrice - the property's sale price in dollars. This is the target variable that you're trying to predict.


Source: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data


# Task 1: Multi-class Support vector machine (SVM)
1.	Load multi-class dataset
2.	Apply pre-processing techniques
3.	Divide dataset into training and testing sets (fraction of your choice)
4.	Build multi-class SVM model (use sklearn)
5.	Evaluate precision and recall
6.	Play with hyper-parameters and find best combination


# Task 2: Support vector regression (SVR)
1.	Load regression dataset
2.	Apply pre-processing techniques
3.	Divide dataset into training and testing sets (fraction of your choice)
4.	Build SVR model (use sklearn)
5.	Evaluate root mean square error
6.	Play with hyper-parameters and find best combination


# Task 3: Play with various SVM kernels such as polynomial, rbf, sigmoid tanh, etc.

### For more details: 
https://scikit-learn.org/stable/modules/svm.html
https://scikit-learn.org/stable/auto_examples/svm/plot_svm_kernels.html




## Task 1: Multi-class Support vector machine (SVM) 

In [1]:
# Load the libraries
import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
from sklearn.svm import SVR
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

In [2]:
# Load the dataset 
data1 = pd.read_csv(r"C:\Users\TANVI\Desktop\Assignments\train.csv")
data2 = pd.read_csv(r"C:\Users\TANVI\Desktop\Assignments\test.csv")

In [3]:
data1.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


In [4]:
data2.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
0,1461,20,RH,80.0,11622,Pave,,Reg,Lvl,AllPub,...,120,0,,MnPrv,,0,6,2010,WD,Normal
1,1462,20,RL,81.0,14267,Pave,,IR1,Lvl,AllPub,...,0,0,,,Gar2,12500,6,2010,WD,Normal
2,1463,60,RL,74.0,13830,Pave,,IR1,Lvl,AllPub,...,0,0,,MnPrv,,0,3,2010,WD,Normal
3,1464,60,RL,78.0,9978,Pave,,IR1,Lvl,AllPub,...,0,0,,,,0,6,2010,WD,Normal
4,1465,120,RL,43.0,5005,Pave,,IR1,HLS,AllPub,...,144,0,,,,0,1,2010,WD,Normal


In [5]:
print(data1.shape, data2.shape)

(1460, 81) (1459, 80)


In [6]:
# Preprocessing
# Encoding categorical variables (if any)
# Feature Scaling
# Filling missing values (if any)
data1.isna().sum()

Id                  0
MSSubClass          0
MSZoning            0
LotFrontage       259
LotArea             0
Street              0
Alley            1369
LotShape            0
LandContour         0
Utilities           0
LotConfig           0
LandSlope           0
Neighborhood        0
Condition1          0
Condition2          0
BldgType            0
HouseStyle          0
OverallQual         0
OverallCond         0
YearBuilt           0
YearRemodAdd        0
RoofStyle           0
RoofMatl            0
Exterior1st         0
Exterior2nd         0
MasVnrType          8
MasVnrArea          8
ExterQual           0
ExterCond           0
Foundation          0
                 ... 
BedroomAbvGr        0
KitchenAbvGr        0
KitchenQual         0
TotRmsAbvGrd        0
Functional          0
Fireplaces          0
FireplaceQu       690
GarageType         81
GarageYrBlt        81
GarageFinish       81
GarageCars          0
GarageArea          0
GarageQual         81
GarageCond         81
PavedDrive

In [7]:
data2.isna().sum()

Id                  0
MSSubClass          0
MSZoning            4
LotFrontage       227
LotArea             0
Street              0
Alley            1352
LotShape            0
LandContour         0
Utilities           2
LotConfig           0
LandSlope           0
Neighborhood        0
Condition1          0
Condition2          0
BldgType            0
HouseStyle          0
OverallQual         0
OverallCond         0
YearBuilt           0
YearRemodAdd        0
RoofStyle           0
RoofMatl            0
Exterior1st         1
Exterior2nd         1
MasVnrType         16
MasVnrArea         15
ExterQual           0
ExterCond           0
Foundation          0
                 ... 
HalfBath            0
BedroomAbvGr        0
KitchenAbvGr        0
KitchenQual         1
TotRmsAbvGrd        0
Functional          2
Fireplaces          0
FireplaceQu       730
GarageType         76
GarageYrBlt        78
GarageFinish       78
GarageCars          1
GarageArea          1
GarageQual         78
GarageCond

In [8]:
data1=data1.sample(frac=1).reset_index(drop=True)
data2=data2.sample(frac=1).reset_index(drop=True)

In [9]:
# To fill in Missing Values
from sklearn.base import TransformerMixin

class DataFrameImputer(TransformerMixin):

    def __init__(self):
        """Impute missing values.

        Columns of dtype object are imputed with the most frequent value 
        in column.

        Columns of other types are imputed with mean of column.

        """
    def fit(self, X, y=None):

        self.fill = pd.Series([X[c].value_counts().index[0]
            if X[c].dtype == np.dtype('O') else X[c].mean() for c in X],
            index=X.columns)

        return self

    def transform(self, X, y=None):
        return X.fillna(self.fill)

In [10]:
X = pd.DataFrame(data1)
data1 = DataFrameImputer().fit_transform(X)

In [11]:
data1.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,172,20,RL,141.0,31770,Pave,Grvl,IR1,Lvl,AllPub,...,0,Gd,MnPrv,Shed,0,5,2010,WD,Normal,215000
1,870,60,RL,80.0,9938,Pave,Grvl,Reg,Lvl,AllPub,...,0,Gd,GdPrv,Shed,0,6,2010,WD,Normal,236000
2,884,75,RL,60.0,6204,Pave,Grvl,Reg,Bnk,AllPub,...,0,Gd,MnPrv,Shed,0,3,2006,WD,Normal,118500
3,812,120,RM,70.049958,4438,Pave,Grvl,Reg,Lvl,AllPub,...,0,Gd,MnPrv,Shed,0,6,2008,ConLD,Normal,144500
4,279,20,RL,107.0,14450,Pave,Grvl,Reg,Lvl,AllPub,...,0,Gd,MnPrv,Shed,0,5,2007,New,Partial,415298


In [12]:
X = pd.DataFrame(data2)
data2 = DataFrameImputer().fit_transform(X)

In [13]:
data2.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
0,1666,20,RL,129.0,16870,Pave,Grvl,IR1,Lvl,AllPub,...,0,0,Ex,MnPrv,Shed,0,4,2009,WD,Normal
1,2147,190,RL,68.580357,10532,Pave,Grvl,Reg,Lvl,AllPub,...,0,0,Ex,MnPrv,Shed,0,12,2008,WD,Abnorml
2,2416,20,RL,60.0,10950,Pave,Grvl,Reg,Lvl,AllPub,...,0,0,Ex,MnPrv,Shed,0,4,2007,WD,Normal
3,1602,50,RM,50.0,9140,Pave,Grvl,Reg,HLS,AllPub,...,200,0,Ex,MnPrv,Shed,0,4,2010,COD,Normal
4,1950,20,RL,68.580357,10825,Pave,Grvl,IR1,Lvl,AllPub,...,0,0,Ex,MnPrv,Shed,0,7,2008,WD,Normal


In [14]:
LE = preprocessing.LabelEncoder()
CateList = data1.select_dtypes(exclude="int64").columns
print(CateList)

Index(['MSZoning', 'LotFrontage', 'Street', 'Alley', 'LotShape', 'LandContour',
       'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1',
       'Condition2', 'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl',
       'Exterior1st', 'Exterior2nd', 'MasVnrType', 'MasVnrArea', 'ExterQual',
       'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure',
       'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC', 'CentralAir',
       'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu', 'GarageType',
       'GarageYrBlt', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive',
       'PoolQC', 'Fence', 'MiscFeature', 'SaleType', 'SaleCondition'],
      dtype='object')


In [15]:
for i in CateList:
    data1[i] = LE.fit_transform(data1[i])

In [16]:
data1.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,172,20,3,100,31770,1,0,0,3,0,...,0,2,2,2,0,5,2010,8,4,215000
1,870,60,3,52,9938,1,0,3,3,0,...,0,2,0,2,0,6,2010,8,4,236000
2,884,75,3,31,6204,1,0,3,0,0,...,0,2,2,2,0,3,2006,8,4,118500
3,812,120,4,42,4438,1,0,3,3,0,...,0,2,2,2,0,6,2008,3,4,144500
4,279,20,3,79,14450,1,0,3,3,0,...,0,2,2,2,0,5,2007,6,5,415298


In [17]:
LE = preprocessing.LabelEncoder()
CateList = data2.select_dtypes(exclude="int64").columns
print(CateList)

Index(['MSZoning', 'LotFrontage', 'Street', 'Alley', 'LotShape', 'LandContour',
       'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1',
       'Condition2', 'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl',
       'Exterior1st', 'Exterior2nd', 'MasVnrType', 'MasVnrArea', 'ExterQual',
       'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure',
       'BsmtFinType1', 'BsmtFinSF1', 'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF',
       'TotalBsmtSF', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical',
       'BsmtFullBath', 'BsmtHalfBath', 'KitchenQual', 'Functional',
       'FireplaceQu', 'GarageType', 'GarageYrBlt', 'GarageFinish',
       'GarageCars', 'GarageArea', 'GarageQual', 'GarageCond', 'PavedDrive',
       'PoolQC', 'Fence', 'MiscFeature', 'SaleType', 'SaleCondition'],
      dtype='object')


In [18]:
for i in CateList:
    data2[i] = LE.fit_transform(data2[i])

In [19]:
data2.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
0,1666,20,3,102,16870,1,0,0,3,0,...,0,0,0,2,2,0,4,2009,8,4
1,2147,190,3,45,10532,1,0,3,3,0,...,0,0,0,2,2,0,12,2008,8,0
2,2416,20,3,36,10950,1,0,3,3,0,...,0,0,0,2,2,0,4,2007,8,4
3,1602,50,4,26,9140,1,0,3,1,0,...,200,0,0,2,2,0,4,2010,0,4
4,1950,20,3,45,10825,1,0,0,3,0,...,0,0,0,2,2,0,7,2008,8,4


In [20]:
df = data1.iloc[:,:-1]
mm = MinMaxScaler()
df[:]= mm.fit_transform(df[:])

In [21]:
df.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
0,0.117204,0.0,0.75,0.909091,0.14242,1.0,0.0,0.0,1.0,0.0,...,0.0,0.0,1.0,0.666667,0.666667,0.0,0.363636,1.0,1.0,0.8
1,0.595613,0.235294,0.75,0.472727,0.040375,1.0,0.0,1.0,1.0,0.0,...,0.0,0.0,1.0,0.0,0.666667,0.0,0.454545,1.0,1.0,0.8
2,0.605209,0.323529,0.75,0.281818,0.022922,1.0,0.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.666667,0.666667,0.0,0.181818,0.0,1.0,0.8
3,0.55586,0.588235,1.0,0.381818,0.014667,1.0,0.0,1.0,1.0,0.0,...,0.0,0.0,1.0,0.666667,0.666667,0.0,0.454545,0.5,0.375,0.8
4,0.190541,0.0,0.75,0.718182,0.061464,1.0,0.0,1.0,1.0,0.0,...,0.295833,0.0,1.0,0.666667,0.666667,0.0,0.363636,0.25,0.75,1.0


In [22]:
mm = MinMaxScaler()
data2[:]= mm.fit_transform(data2[:])

In [23]:
data2.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
0,0.140604,0.0,0.75,0.886957,0.27934,1.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.666667,1.0,0.0,0.272727,0.75,1.0,0.8
1,0.470508,1.0,0.75,0.391304,0.164375,1.0,0.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.666667,1.0,0.0,1.0,0.5,1.0,0.0
2,0.655007,0.0,0.75,0.313043,0.171957,1.0,0.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.666667,1.0,0.0,0.272727,0.25,1.0,0.8
3,0.096708,0.176471,1.0,0.226087,0.139126,1.0,0.0,1.0,0.333333,0.0,...,0.347222,0.0,0.0,0.666667,1.0,0.0,0.272727,1.0,0.0,0.8
4,0.335391,0.0,0.75,0.391304,0.16969,1.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.666667,1.0,0.0,0.545455,0.5,1.0,0.8


In [24]:
# Divide the dataset to training and testing set
X = df
y = data1['SalePrice']
print(X.shape, y.shape)

(1460, 80) (1460,)


In [25]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 123)

In [26]:
# Build SVM model 
clf = svm.SVC(kernel='linear') 
clf.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [27]:
# Evaluate the build model on test dataset
pred_t1 = clf.predict(X_train)
pred_t2 = clf.predict(X_test)

# Predicting on Kaggle Test set
pred = clf.predict(data2)

In [28]:
acc_t1 = accuracy_score(y_train, pred_t1)
acc_t2 = accuracy_score(y_test, pred_t2)

In [29]:
# Evaluate training and testing accuracy
print("Training Accuracy:",acc_t1)
print("Testing Accuracy:",acc_t2)

Training Accuracy: 0.8512720156555773
Testing Accuracy: 0.02054794520547945


## Task 2: Implement support vector regression (SVR)


In [30]:
# Load training and testing datasets
df.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
0,0.117204,0.0,0.75,0.909091,0.14242,1.0,0.0,0.0,1.0,0.0,...,0.0,0.0,1.0,0.666667,0.666667,0.0,0.363636,1.0,1.0,0.8
1,0.595613,0.235294,0.75,0.472727,0.040375,1.0,0.0,1.0,1.0,0.0,...,0.0,0.0,1.0,0.0,0.666667,0.0,0.454545,1.0,1.0,0.8
2,0.605209,0.323529,0.75,0.281818,0.022922,1.0,0.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.666667,0.666667,0.0,0.181818,0.0,1.0,0.8
3,0.55586,0.588235,1.0,0.381818,0.014667,1.0,0.0,1.0,1.0,0.0,...,0.0,0.0,1.0,0.666667,0.666667,0.0,0.454545,0.5,0.375,0.8
4,0.190541,0.0,0.75,0.718182,0.061464,1.0,0.0,1.0,1.0,0.0,...,0.295833,0.0,1.0,0.666667,0.666667,0.0,0.363636,0.25,0.75,1.0


In [31]:
# Apply pre-processing techniques
# Apply feature selection techniques of your choice to reduce the feature set
X_new = SelectKBest(chi2, k=40).fit_transform(df, data1['SalePrice'])
X_new.shape

(1460, 40)

In [32]:
X_train, X_test, y_train, y_test = train_test_split(X_new, data1['SalePrice'], test_size = 0.30, random_state = 123)

In [33]:
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

(1022, 40) (1022,)
(438, 40) (438,)


In [34]:
# Train SVR model
model = SVR()
model.fit(X_train,y_train)

SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto',
  kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)

In [35]:
# Evaluate training and testing root mean square error
y_pred1 = model.predict(X_train)
y_pred2 = model.predict(X_test)

In [36]:
mse1 = mean_squared_error(y_train, y_pred1)
rmse1 = mse1**(1/2)
print("RMSE (Train): ",rmse1)

RMSE (Train):  81165.14005205587


In [37]:
mse2 = mean_squared_error(y_test, y_pred2)
rmse2 = mse2**(1/2)
print("RMSE (Test): ",rmse2)

RMSE (Test):  80598.85845520574



## Task 3: Play with various SVM kernels such as polynomial, rbf, sigmoid tanh, etc.


In [38]:
# Play with various SVM kernels such as polynomial, rbf, sigmoid tanh, etc.
from sklearn.svm import SVC
svm_poly = SVC(kernel='poly', C=100, gamma='auto', degree=3, coef0=1)
svm_lin = SVC(kernel='linear', C=100, gamma='auto')
svm_rbf = SVC(kernel='rbf', C=100, gamma=0.1)
svm_sig = SVC(kernel='sigmoid', C=100, gamma='auto')
svm_precom = SVC(kernel='precomputed', gamma='auto')

In [40]:
svm_rbf.fit(X_train, y_train)
svm_lin.fit(X_train, y_train)
svm_poly.fit(X_train, y_train)
svm_sig.fit(X_train, y_train)

SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='sigmoid',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [41]:
pred_rbf = svm_rbf.predict(X_test)
pred_lin = svm_lin.predict(X_test)
pred_poly = svm_poly.predict(X_test)
pred_sig = svm_sig.predict(X_test)

In [42]:
acc_rbf = accuracy_score(y_test, pred_rbf)
acc_lin = accuracy_score(y_test, pred_lin)
acc_poly = accuracy_score(y_test, pred_poly)
acc_sig = accuracy_score(y_test, pred_sig)

In [43]:
print("Accuracy - rbf:",acc_rbf)
print("Accuracy - linear:",acc_lin)
print("Accuracy - poly:",acc_poly)
print("Accuracy - sigmoid:",acc_sig)

Accuracy - rbf: 0.0136986301369863
Accuracy - linear: 0.0182648401826484
Accuracy - poly: 0.0182648401826484
Accuracy - sigmoid: 0.02511415525114155
