Q1. In order to predict house price based on several characteristics, such as location, square footage,
number of bedrooms, etc., you are developing an SVM regression model. Which regression metric in this
situation would be the best to employ.


When developing an SVM regression model to predict house prices based on various characteristics, it's important to choose an appropriate regression metric to evaluate the model's performance. The choice of metric depends on the specific requirements and characteristics of your problem. Here are some commonly used regression metrics:

1. Mean Absolute Error (MAE): MAE calculates the average absolute differences between the predicted and actual values. It provides a measure of the average magnitude of errors without considering their direction. MAE is robust to outliers.

2. Mean Squared Error (MSE): MSE calculates the average squared differences between the predicted and actual values. It penalizes large errors more than smaller ones. However, it is sensitive to outliers due to the squaring operation.

3. Root Mean Squared Error (RMSE): RMSE is the square root of MSE. It has the same unit as the dependent variable and provides a measure of the standard deviation of the residuals. RMSE is widely used and easy to interpret.

4. R-squared (R2): R-squared measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, where 1 indicates a perfect fit. R-squared is useful for comparing different models but may not be the best choice if prediction accuracy is the primary concern.

5. Mean Absolute Percentage Error (MAPE): MAPE calculates the average percentage difference between the predicted and actual values. It provides a relative measure of accuracy and is useful when the scale of the data varies widely.

6. Median Absolute Error (MedAE): MedAE calculates the median of the absolute differences between the predicted and actual values. It is robust to outliers and provides a measure of central tendency.


Q2. You have built an SVM regression model and are trying to decide between using MSE or R-squared as
your evaluation metric. Which metric would be more appropriate if your goal is to predict the actual price
of a house as accurately as possible.


In [3]:
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error

In [4]:
data = fetch_california_housing()
X = data.data
y = data.target

In [10]:
data

{'data': array([[   8.3252    ,   41.        ,    6.98412698, ...,    2.55555556,
           37.88      , -122.23      ],
        [   8.3014    ,   21.        ,    6.23813708, ...,    2.10984183,
           37.86      , -122.22      ],
        [   7.2574    ,   52.        ,    8.28813559, ...,    2.80225989,
           37.85      , -122.24      ],
        ...,
        [   1.7       ,   17.        ,    5.20554273, ...,    2.3256351 ,
           39.43      , -121.22      ],
        [   1.8672    ,   18.        ,    5.32951289, ...,    2.12320917,
           39.43      , -121.32      ],
        [   2.3886    ,   16.        ,    5.25471698, ...,    2.61698113,
           39.37      , -121.24      ]]),
 'target': array([4.526, 3.585, 3.521, ..., 0.923, 0.847, 0.894]),
 'frame': None,
 'target_names': ['MedHouseVal'],
 'feature_names': ['MedInc',
  'HouseAge',
  'AveRooms',
  'AveBedrms',
  'Population',
  'AveOccup',
  'Latitude',
  'Longitude'],
 'DESCR': '.. _california_housing_dataset:\n

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
svm_regressor = SVR(kernel='linear')
svm_regressor.fit(X_train, y_train)


In [None]:
y_pred = svm_regressor.predict(X_test)

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error (MSE):", mse)

Q3. You have a dataset with a significant number of outliers and are trying to select an appropriate
regression metric to use with your SVM model. Which metric would be the most appropriate in this
scenario?


In [2]:
import pandas as pd
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error

# Generate a synthetic dataset with outliers
X, y = make_regression(n_samples=1000, n_features=1, noise=0.5, random_state=42)
X_outliers, y_outliers = make_regression(n_samples=50, n_features=1, noise=100, random_state=42)
X = np.vstack((X, X_outliers))
y = np.hstack((y, y_outliers))

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the SVM regression model
svm_regressor = SVR(kernel='linear')
svm_regressor.fit(X_train, y_train)


In [4]:
y_pred = svm_regressor.predict(X_test)

# Calculate Mean Absolute Error (MAE)
mae = mean_absolute_error(y_test, y_pred)
print("Mean Absolute Error (MAE):", mae)

Mean Absolute Error (MAE): 5.7100869521795


Q4. You have built an SVM regression model using a polynomial kernel and are trying to select the best
metric to evaluate its performance. You have calculated both MSE and RMSE and found that both values
are very close. Which metric should you choose to use in this case?



In [6]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error


In [7]:
X, y = make_regression(n_samples=1000, n_features=1, noise=0.5, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [3]:
svm_regressor = SVR(kernel='poly')
svm_regressor.fit(X_train, y_train)


In [4]:
y_pred = svm_regressor.predict(X_test)

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)


In [8]:
rmse = np.sqrt(mse)

print("Mean Squared Error (MSE):", mse)
print("Root Mean Squared Error (RMSE):", rmse)

Mean Squared Error (MSE): 208.42060962375442
Root Mean Squared Error (RMSE): 14.436779752554044


Q5. You are comparing the performance of different SVM regression models using different kernels (linear,
polynomial, and RBF) and are trying to select the best evaluation metric. Which metric would be most
appropriate if your goal is to measure how well the model explains the variance in the target variable?



In [9]:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import r2_score


In [10]:
X, y = make_regression(n_samples=1000, n_features=10, noise=0.5, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [11]:
kernels = ['linear', 'poly', 'rbf']
for kernel in kernels:
    svm_regressor = SVR(kernel=kernel)
    svm_regressor.fit(X_train, y_train)
   

In [14]:
 y_pred = svm_regressor.predict(X_test)
     r2 = r2_score(y_test, y_pred)
    
    print("Kernel:", kernel)
    print("R-squared:", r2)

IndentationError: unindent does not match any outer indentation level (<tokenize>, line 4)

In [None]:
kernels = ['linear', 'poly', 'rbf']
for kernel in kernels:
    svm_regressor = SVR(kernel=kernel)
    svm_regressor.fit(X_train, y_train)
    
    # Make predictions on the test set
    y_pred = svm_regressor.predict(X_test)
    
    # Calculate R-squared
    r2 = r2_score(y_test, y_pred)
    
    print("Kernel:", kernel)
    print("R-squared:", r2)