To use Support Vector Machine (SVM) with the kernel trick for classifying the dataset based on price, we'll follow these steps:

Preprocess the Data: We need to ensure that the data is suitable for feeding into an SVM model. This usually involves handling missing values, encoding categorical features, and scaling the features.

Feature Selection: We need to select features relevant for predicting the price classification. Not all features in the dataset might be useful for this purpose.

Splitting the Data: We'll split the dataset into training and testing sets. The training set is used to train the model, and the testing set is used to evaluate its performance.

Training the SVM Model: We'll use an SVM classifier from a library like scikit-learn. The kernel trick can be applied by choosing a kernel such as 'linear', 'poly', 'rbf', or 'sigmoid'.

Model Evaluation: After training, we evaluate the model's performance on the testing set to see how well it generalizes to new, unseen data.

In [27]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

# Load the dataset
file_path = 'NY-House-Dataset.csv'  # Replace with your file path
data = pd.read_csv(file_path)

# Determine the median price and classify properties
price_threshold = data['PRICE'].median()
data['CLASSIFICATION'] = data['PRICE'].apply(lambda x: 'High-Value' if x > price_threshold else 'Low-Value')

# Save the classified dataset
classified_file_path = 'Revised_Classified.csv'  # Replace with your desired file path
data.to_csv(classified_file_path, index=False)

# Handling missing values (if any)
data = data.dropna(subset=['BEDS', 'BATH', 'PROPERTYSQFT'])

# Converting data types if necessary
data['BEDS'] = data['BEDS'].astype(float)
data['BATH'] = data['BATH'].astype(float)
data['PROPERTYSQFT'] = data['PROPERTYSQFT'].astype(float)

# Selecting features and target variable
X = data[['BEDS', 'BATH', 'PROPERTYSQFT']]
y = data['CLASSIFICATION']

# Splitting the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# SVM Classifier with RBF kernel
svm_classifier = SVC(kernel='rbf', C=1.0)  # C is the regularization parameter
svm_classifier.fit(X_train_scaled, y_train)


