1)	The Iris dataset is a classic example for demonstrating classification algorithms. It consists of 150 samples of iris flowers belonging to three species: Setosa, Versicolor, and Virginica, with four input features (sepal and petal length/width). Use SVC from sklearn.svm on the Iris dataset and follow the steps below:

a. Load the dataset and perform train–test split (80:20).

In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("y_train shape:", y_train.shape)
print("y_test shape:", y_test.shape)


X_train shape: (120, 4)
X_test shape: (30, 4)
y_train shape: (120,)
y_test shape: (30,)


b. Train three different SVM models using the following kernels:
Linear, Polynomial (degree=3), RBF

In [2]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

iris = datasets.load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

svm_linear = SVC(kernel="linear")
svm_poly = SVC(kernel="poly", degree=3)
svm_rbf = SVC(kernel="rbf")

svm_linear.fit(X_train, y_train)
svm_poly.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)

print("Models trained: Linear, Polynomial (degree=3), RBF")


Models trained: Linear, Polynomial (degree=3), RBF


c. Evaluate each model using:
    •	Accuracy
    •	Precision
    •	Recall
    •	F1-Score


In [3]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

iris = datasets.load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

models = {
    "Linear": SVC(kernel="linear"),
    "Polynomial": SVC(kernel="poly", degree=3),
    "RBF": SVC(kernel="rbf")
}

for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred, average="macro")
    rec = recall_score(y_test, y_pred, average="macro")
    f1 = f1_score(y_test, y_pred, average="macro")
    print("\nKernel:", name)
    print("Accuracy:", acc)
    print("Precision:", prec)
    print("Recall:", rec)
    print("F1-Score:", f1)



Kernel: Linear
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-Score: 1.0

Kernel: Polynomial
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-Score: 1.0

Kernel: RBF
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1-Score: 1.0


d.Display the confusion matrix for each kernel.

In [4]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix

iris = datasets.load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

models = {
    "Linear": SVC(kernel="linear"),
    "Polynomial": SVC(kernel="poly", degree=3),
    "RBF": SVC(kernel="rbf")
}

for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    cm = confusion_matrix(y_test, y_pred)
    print("\nKernel:", name)
    print("Confusion Matrix:\n", cm)



Kernel: Linear
Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Kernel: Polynomial
Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Kernel: RBF
Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]


e.Identify which kernel performs the best and why.

In [5]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import f1_score

iris = datasets.load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

models = {
    "Linear": SVC(kernel="linear"),
    "Polynomial": SVC(kernel="poly", degree=3),
    "RBF": SVC(kernel="rbf")
}

f1_scores = {}

for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    f1 = f1_score(y_test, y_pred, average="macro")
    f1_scores[name] = f1
    print("Kernel:", name, "| F1-Score:", f1)

best_kernel = max(f1_scores, key=f1_scores.get)
print("\nBest kernel based on F1-Score:", best_kernel)


Kernel: Linear | F1-Score: 1.0
Kernel: Polynomial | F1-Score: 1.0
Kernel: RBF | F1-Score: 1.0

Best kernel based on F1-Score: Linear


2)	SVM models are highly sensitive to the scale of input features. When features have different ranges, the algorithm may incorrectly assign higher importance to variables with larger magnitudes, affecting the placement of the separating hyperplane. Feature scaling ensures that all attributes contribute equally to distance-based computations, which is especially crucial for kernels like RBF or polynomial.

A.  Use the Breast Cancer dataset from sklearn.datasets.load_breast_cancer.


In [6]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("X_train:", X_train.shape)
print("X_test:", X_test.shape)


X_train: (455, 30)
X_test: (114, 30)


B. Train an SVM (RBF kernel) model with and without feature scaling (StandardScaler). Compare both results using:
    •	Training accuracy
    •	Testing accuracy

In [7]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

data = load_breast_cancer()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = SVC(kernel="rbf")
model.fit(X_train, y_train)

train_acc = model.score(X_train, y_train)
test_acc = model.score(X_test, y_test)

print("Training accuracy without scaling:", train_acc)
print("Testing accuracy without scaling:", test_acc)


Training accuracy without scaling: 0.9142857142857143
Testing accuracy without scaling: 0.9473684210526315


Code for SVM WITH Feature Scaling (StandardScaler)

In [8]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

data = load_breast_cancer()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = SVC(kernel="rbf")
model.fit(X_train_scaled, y_train)

train_acc = model.score(X_train_scaled, y_train)
test_acc = model.score(X_test_scaled, y_test)

print("Training accuracy with scaling:", train_acc)
print("Testing accuracy with scaling:", test_acc)


Training accuracy with scaling: 0.989010989010989
Testing accuracy with scaling: 0.9824561403508771


Effect of Feature Scaling on SVM (RBF Kernel)
SVM with RBF kernel is distance-based, so features must be on the same scale.
Without scaling, large-range features dominate distance calculations → poor accuracy.
With scaling, all features contribute equally → better margin and smoother boundary.

As a result:-
Training accuracy increases
Testing accuracy increases
Overfitting reduces

In almost all datasets, SVM with scaling performs significantly better than without scaling.