## 2.2. (SVM, 25 points)

### Task Description
You are asked to write a program that constructs support vector machine models with different kernel functions and slack variables.

### Datasets
You are provided with the iris dataset. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. There are four features:
1. Sepal length in cm;
2. Sepal width in cm;
3. Petal length in cm;
4. Petal width in cm.

You need to use these features to classify each iris plant as one of the three possible types.

### What you should do
You should use the SVM function from the Python sklearn package, which provides various forms of SVM functions. For multiclass SVM, you should use the one vs rest strategy. You are recommended to use `sklearn.svm.SVC()` function. You can use numpy for vector manipulation. For the technical report, you should report the results required as mentioned below (e.g., training error, testing error, and so on).

1. **(2 points)** Split training set and test set. Split the data into a training set and a test set. The training set should contain 70% of the samples, while the test set should include 30%. The number of samples from each category in both the training and test sets should reflect this 70-30 split; for each category, the first 70% of the samples will form the training set, and the remaining 30% will form the test set. Ensure that the split maintains the original order of the data. You should report instance ids in the split training set and test set. The output format is as follows:

   **Q2.2.1 Split training set and test set:**
   Training set: xx  
   Test set: xx  
   You should fill up xx in the template. You should write ids for each set in the same line with comma separated, e.g. Training set: [1, 4, 19].

2. **(10 points)** Calculation using Standard SVM Model (Linear Kernel). Employ the standard SVM model with a linear kernel. Train your SVM on the split training dataset and validate it on the testing dataset. Calculate the classification error for both the training and testing datasets, output the weight vector w, the bias b, and the indices of support vectors (start with 0). Note that the scikit-learn package does not offer a function with hard margin, so we will simulate this using C = 1e5. You should first print out the total training error and testing error, where the error is:

   $
   \text{error} = \frac{\text{wrong predictions}}{\text{number of data}}
   $

   Then, print out the results for each class separately (note that you should calculate errors for each class separately in this part). You should also mention in your report which classes are linear separable with SVM without slack. The output format is as follows:

   **Q2.2.2 Calculation using Standard SVM Model:**
   ```{python}
   total training error: xx,  
   total testing error: xx,  
   class setosa:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   class versicolor:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   class virginica:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   Linear separable classes: xx
   ```

3. **(6 points)** Calculation using SVM with Slack Variables (Linear Kernel). For each C = 0.25 × t, where t = 1, 2, . . . , 4, train your SVM on the training dataset, and subsequently validate it on the testing dataset. Calculate the classification error for both the training and testing datasets, the weight vector w, the bias b, and the indices of support vectors, and the slack variable ζ of support vectors (you may compute it as max(0, 1 − y · f(X)). The output format is as follows:

   **Q2.2.3 Calculation using SVM with Slack Variables (C = 0.25 × t, where t = 1, . . . , 4):**
   ```{python}
   -------------------------------------------  
   C=0.25,  
   total training error: xx,  
   total testing error: xx,  
   class setosa:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   slack variable: xx,  
   class versicolor:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   slack variable: xx,  
   class virginica:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   slack variable: xx,  
   -------------------------------------------  
   C=0.5,  
   <... results for (C=0.5) ...>  
   -------------------------------------------  
   C=0.75,  
   <... results for (C=0.75) ...>  
   -------------------------------------------  
   C=1,  
   <... results for (C=1) ...>
   ```

4. **(7 points)** Calculation using SVM with Kernel Functions. Conduct experiments with different kernel functions for SVM without slack variable. Calculate the classification error for both the training and testing datasets, and the indices of support vectors for each kernel type:

   (a) 2nd-order Polynomial Kernel  
   (b) 3rd-order Polynomial Kernel  
   (c) Radial Basis Function Kernel with σ = 1  
   (d) Sigmoidal Kernel with σ = 1  

   The output format is as follows:

   **Q2.2.4 Calculation using SVM with Kernel Functions:**
   ```{python}
   -------------------------------------------  
   (a) 2nd-order Polynomial Kernel,  
   total training error: xx,  
   total testing error: xx,  
   class setosa:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   class versicolor:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   class virginica:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   -------------------------------------------  
   (b) 3rd-order Polynomial Kernel,  
   <... results for (b) ...>  
   -------------------------------------------  
   (c) Radial Basis Function Kernel with σ = 1,  
   <... results for (c) ...>  
   -------------------------------------------  
   (d) Sigmoidal Kernel with σ = 1,  
   <... results for (d) ...>
   ```

### Submission
Submit your executable code in a “HW1 yourID Q2.ipynb” Jupyter notebook (”.py” file is also acceptable). Indicate the corresponding question number in the comment for each cell, and ensure that your code can logically produce the required results for each question in the required format. Please note that you need to write clear comments and use appropriate function/variable names. Excessively unreadable code may result in point deductions.


---

### Library Initialization

In [7]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# For reproducibility
np.random.seed(3020)

# Define Constants
RANDOM_SEED = 3020
C_MARGIN = 1e5


### Load Data

In [2]:
iris = pd.read_excel('Classification iris.xlsx')

iris.head(5)

Unnamed: 0,instance_id,sepal length,sepal width,petal length,petal width,class
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


### 1. Train-Test Split
- while maintaining order

In [3]:
# # Predictor and Predicted variables
# X = iris.iloc[:, :-1]  # Features 
# y = iris.iloc[:, -1]   # Target variable 

# # 70-30 Train Test split with stratification
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=RANDOM_SEED, stratify=y)

# # Check the instance ids
# train_ids = X_train['instance_id'].tolist()  # Instance ids for training set
# test_ids = X_test['instance_id'].tolist()  # Instance ids for testing set

# print(f"Training set: {train_ids}")
# print(f"Length of training set: {len(train_ids)}")
# print()
# print(f"Test set: {test_ids}")
# print(f"Length of test set: {len(test_ids)}")


In [4]:
# Predictor and Predicted variables
X = iris.iloc[:, :-1]  # Features 
y = iris.iloc[:, -1]   # Target variable 

# 70-30 Train Test split with stratification
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=RANDOM_SEED, stratify=y)

# Reorder the training and test sets based on the instance_id column
X_train = X_train.sort_values(by='instance_id').reset_index(drop=True)
X_test = X_test.sort_values(by='instance_id').reset_index(drop=True)
y_train = y_train.sort_values(ascending=True).reset_index(drop=True)
y_test = y_test.sort_values(ascending=True).reset_index(drop=True)

# Check the instance ids
train_ids = X_train['instance_id'].tolist()  # Instance ids for training set
test_ids = X_test['instance_id'].tolist()  # Instance ids for testing set

print(f"Training set: {train_ids}")
print(f"Length of training set: {len(train_ids)}")
print()
print(f"Test set: {test_ids}")
print(f"Length of test set: {len(test_ids)}")

Training set: [1, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 18, 22, 25, 26, 27, 28, 29, 30, 33, 35, 36, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 52, 54, 55, 57, 58, 59, 60, 62, 63, 67, 68, 69, 70, 71, 72, 73, 76, 78, 80, 82, 83, 84, 85, 86, 87, 88, 89, 92, 93, 94, 95, 96, 97, 98, 100, 101, 102, 103, 105, 106, 107, 108, 110, 112, 113, 117, 118, 119, 120, 122, 123, 126, 127, 128, 129, 130, 131, 132, 133, 136, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150]
Length of training set: 105

Test set: [2, 3, 9, 17, 19, 20, 21, 23, 24, 31, 32, 34, 37, 45, 50, 51, 53, 56, 61, 64, 65, 66, 74, 75, 77, 79, 81, 90, 91, 99, 104, 109, 111, 114, 115, 116, 121, 124, 125, 134, 135, 139, 142, 145, 148]
Length of test set: 45


### 2. Standard SVM Calculation
- Linear Kernel

In [5]:
# Train svm
svm_2 = svm.SVC(kernel='linear', C=C_MARGIN)
svm_2.fit(X_train, y_train)

# Predict
y_train_pred = svm_2.predict(X_train)
y_test_pred = svm_2.predict(X_test)

# Total classification errors
total_training_error = np.mean(y_train_pred != y_train)
total_testing_error = np.mean(y_test_pred != y_test)

# Print total errors
print(f"Q2.2.2 Calculation using Standard SVM Model:")
print(f"Total training error: {total_training_error:.4f}, total testing error: {total_testing_error:.4f}\n")

# Get unique class labels
class_labels = np.unique(y_train)

# Get Xtrain vectors
xtrain_sv = iris.loc[iris['instance_id'].isin(X_train['instance_id'])]

# Initialize list of linearly separable classes
lin_sep_classes = []

# Results for each class
for label in class_labels:
    # Get indices of the current class
    train_class_indices = np.where(y_train == label)[0]
    test_class_indices = np.where(y_test == label)[0]

    # Predictions for the current class
    train_class_pred = y_train_pred[train_class_indices]
    test_class_pred = y_test_pred[test_class_indices]

    # Calculate training and testing errors for the current class
    train_class_error = np.mean(train_class_pred != label)
    test_class_error = np.mean(test_class_pred != label)

    # Get support vectors for the current class
    support_vector_indices = X_train.iloc[svm_2.support_]
    support_vectors = xtrain_sv.loc[xtrain_sv['instance_id'].isin(support_vector_indices['instance_id'])]
    sv_instance_ids = support_vectors.loc[support_vectors['class'] == label]['instance_id'].tolist()

    # Check if the current class is linearly separable
    if train_class_error == 0 and test_class_error == 0:
        lin_sep_classes.append(label)

    # Print results for the current class
    print(f"Class {label[5:]}:")
    print(f"  Training error: {train_class_error:.4f}, Testing error: {test_class_error:.4f}")
    print(f"  w: {svm_2.coef_[0]}, b: {svm_2.intercept_[0]}")
    print(f"  Support vector indices: {train_class_indices}\n")
    print()


print(f"Linear separable classes: {lin_sep_classes}")

Q2.2.2 Calculation using Standard SVM Model:
Total training error: 0.0000, total testing error: 0.0000

Class setosa:
  Training error: 0.0000, Testing error: 0.0000
  w: [-0.28368952 -0.10401949  0.04728159 -0.28368952 -0.12293212], b: 15.727268361785079
  Support vector indices: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34]


Class versicolor:
  Training error: 0.0000, Testing error: 0.0000
  w: [-0.28368952 -0.10401949  0.04728159 -0.28368952 -0.12293212], b: 15.727268361785079
  Support vector indices: [35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
 59 60 61 62 63 64 65 66 67 68 69]


Class virginica:
  Training error: 0.0000, Testing error: 0.0000
  w: [-0.28368952 -0.10401949  0.04728159 -0.28368952 -0.12293212], b: 15.727268361785079
  Support vector indices: [ 70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87
  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 1

### 3. SVM calculation with slack variables 
- linear kernel

In [6]:
# Train SVM with slack variables for different C values
C_values = [0.25 * t for t in range(1, 5)]  # C = 0.25, 0.5, 0.75, 1.0

for C in C_values:
    # Train SVM model
    svm_model = svm.SVC(kernel='linear', C=C)
    svm_model.fit(X_train, y_train)

    # Predict
    y_train_pred = svm_model.predict(X_train)
    y_test_pred = svm_model.predict(X_test)

    # Total classification errors
    total_training_error = np.mean(y_train_pred != y_train)
    total_testing_error = np.mean(y_test_pred != y_test)

    # Results for each class
    print(f"Q2.2.3 Calculation using SVM with Slack Variables (C = {C}):")
    print(f"Total training error: {total_training_error:.4f}, total testing error: {total_testing_error:.4f}")
    print()

    # Get unique class labels
    class_labels = np.unique(y_train)

    for label in class_labels:
        # Get indices of the current class
        train_class_indices = np.where(y_train == label)[0]
        test_class_indices = np.where(y_test == label)[0]

        # Predictions for the current class
        train_class_pred = y_train_pred[train_class_indices]
        test_class_pred = y_test_pred[test_class_indices]

        # Calculate training and testing errors for the current class
        train_class_error = np.mean(train_class_pred != label)
        test_class_error = np.mean(test_class_pred != label)

        # Encode the class label
        target_label = label
        label_mapping = {target_label: 1}
        for l in class_labels:
            if l != target_label:
                label_mapping[l] = -1

        # Apply the mapping to y_train
        y_train_mapped = y_train.map(label_mapping) 

        # Calculate slack variable ζ for support vectors
        decision_values = np.dot(svm_model.support_, svm_model.coef_[0]) + svm_model.intercept_[0]
        slack_variables = np.maximum(0, 1 - y_train_mapped[svm_model.support_] * decision_values)

        # empty the mapping dictionary
        label_mapping = {}

        # Print results for the current class
        print(f"Class {label}:")
        print(f"  Training error: {train_class_error:.4f}, Testing error: {test_class_error:.4f}")
        print(f"  w: {svm_model.coef_[0]}, b: {svm_model.intercept_[0]}")
        print(f"  Support vector indices: {svm_model.support_}")
        print(f"  Slack variable ζ:")
        print(f"  {slack_variables}")
        print()

    print("---------------------------------------------------------\n")


Q2.2.3 Calculation using SVM with Slack Variables (C = 0.25):
Total training error: 0.0000, total testing error: 0.0000

Class Iris-setosa:
  Training error: 0.0000, Testing error: 0.0000
  w: [-0.28368952 -0.10401949  0.04728159 -0.28368952 -0.12293212], b: 15.727268361785079
  Support vector indices: [34 35 69 70 71]
  Slack variable ζ:
  34    23.882875
35     0.000000
69     0.000000
70     0.000000
71     0.000000
Name: class, dtype: float64

Class Iris-versicolor:
  Training error: 0.0000, Testing error: 0.0000
  w: [-0.28368952 -0.10401949  0.04728159 -0.28368952 -0.12293212], b: 15.727268361785079
  Support vector indices: [34 35 69 70 71]
  Slack variable ζ:
  34     0.000000
35    23.882875
69    23.882875
70     0.000000
71     0.000000
Name: class, dtype: float64

Class Iris-virginica:
  Training error: 0.0000, Testing error: 0.0000
  w: [-0.28368952 -0.10401949  0.04728159 -0.28368952 -0.12293212], b: 15.727268361785079
  Support vector indices: [34 35 69 70 71]
  Slack va

---

### 4. Calculation of SVM using Kernal Functions

(a) 2nd-order Polynomial Kernel  
(b) 3rd-order Polynomial Kernel  
(c) Radial Basis Function Kernel with σ = 1  
(d) Sigmoidal Kernel with σ = 1  

In [23]:
# Function to calculate errors and support vectors based on kernel types
def calculate_svm(kernel, degree=3, C=1e5, gamma=0.0):
    
    if gamma == 1.0:
        svm = SVC(kernel=kernel, C=C, gamma=gamma)
    else:
        svm = SVC(kernel=kernel, C=C, degree=degree)
        
    svm.fit(X_train, y_train)
    
    # Predictions
    y_train_pred = svm.predict(X_train)
    y_test_pred = svm.predict(X_test)
    
    # Calculate errors
    train_error = 1 - accuracy_score(y_train, y_train_pred)
    test_error = 1 - accuracy_score(y_test, y_test_pred)
    
    # Get support vectors
    support_vectors_indices = svm.support_
    
    return train_error, test_error, None, None, support_vectors_indices

# Results dictionary
results = {}

# (a) 2nd-order Polynomial Kernel
results['2nd-order Polynomial Kernel'] = calculate_svm(kernel='poly', degree=2)

# (b) 3rd-order Polynomial Kernel
results['3rd-order Polynomial Kernel'] = calculate_svm(kernel='poly', degree=3)

# (c) Radial Basis Function Kernel with σ = 1
results['Radial Basis Function Kernel'] = calculate_svm(kernel='rbf', gamma=1.0)

# (d) Sigmoidal Kernel with σ = 1
results['Sigmoidal Kernel'] = calculate_svm(kernel='sigmoid', gamma=1.0)

# Output results
for kernel_type, (train_error, test_error, w, b, support_vectors) in results.items():
    print(f"-------------------------------------------")
    print(f"({kernel_type})")
    print(f"total training error: {train_error:.4f},")
    print(f"total testing error: {test_error:.4f},")
    print()
    
    # For each class, we will output the errors and support vectors
    for class_label in np.unique(y):
        class_indices = np.where(y_train == class_label)[0]
        class_train_error = np.mean(y_train[class_indices] != y_train_pred[class_indices])
        class_test_error = np.mean(y_test[y_test == class_label] != y_test_pred[y_test == class_label])
        
        print(f"class {class_label}:")
        print(f"training error: {class_train_error:.4f}, testing error: {class_test_error:.4f}")
        print()
        
        if w is not None and b is not None:
            print(f"w: {w}, b: {b},")
            print()
        
        print(f"support vector indices: {support_vectors.tolist() if support_vectors is not None else 'N/A'},")
        print()


-------------------------------------------
(2nd-order Polynomial Kernel)
total training error: 0.0000,
total testing error: 0.0000,

class Iris-setosa:
training error: 0.0000, testing error: 0.0000

support vector indices: [34, 35, 69, 70, 71],

class Iris-versicolor:
training error: 0.0000, testing error: 0.0000

support vector indices: [34, 35, 69, 70, 71],

class Iris-virginica:
training error: 0.0000, testing error: 0.0000

support vector indices: [34, 35, 69, 70, 71],

-------------------------------------------
(3rd-order Polynomial Kernel)
total training error: 0.0000,
total testing error: 0.0000,

class Iris-setosa:
training error: 0.0000, testing error: 0.0000

support vector indices: [34, 35, 69, 70],

class Iris-versicolor:
training error: 0.0000, testing error: 0.0000

support vector indices: [34, 35, 69, 70],

class Iris-virginica:
training error: 0.0000, testing error: 0.0000

support vector indices: [34, 35, 69, 70],

-------------------------------------------
(Radial 

In [22]:
0.0 == None

False