## 2.2. (SVM, 25 points)

### Task Description
You are asked to write a program that constructs support vector machine models with different kernel functions and slack variables.

### Datasets
You are provided with the iris dataset. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. There are four features:
1. Sepal length in cm;
2. Sepal width in cm;
3. Petal length in cm;
4. Petal width in cm.

You need to use these features to classify each iris plant as one of the three possible types.

### What you should do
You should use the SVM function from the Python sklearn package, which provides various forms of SVM functions. For multiclass SVM, you should use the one vs rest strategy. You are recommended to use `sklearn.svm.SVC()` function. You can use numpy for vector manipulation. For the technical report, you should report the results required as mentioned below (e.g., training error, testing error, and so on).

1. **(2 points)** Split training set and test set. Split the data into a training set and a test set. The training set should contain 70% of the samples, while the test set should include 30%. The number of samples from each category in both the training and test sets should reflect this 70-30 split; for each category, the first 70% of the samples will form the training set, and the remaining 30% will form the test set. Ensure that the split maintains the original order of the data. You should report instance ids in the split training set and test set. The output format is as follows:

   **Q2.2.1 Split training set and test set:**
   Training set: xx  
   Test set: xx  
   You should fill up xx in the template. You should write ids for each set in the same line with comma separated, e.g. Training set: [1, 4, 19].

2. **(10 points)** Calculation using Standard SVM Model (Linear Kernel). Employ the standard SVM model with a linear kernel. Train your SVM on the split training dataset and validate it on the testing dataset. Calculate the classification error for both the training and testing datasets, output the weight vector w, the bias b, and the indices of support vectors (start with 0). Note that the scikit-learn package does not offer a function with hard margin, so we will simulate this using C = 1e5. You should first print out the total training error and testing error, where the error is:

   $
   \text{error} = \frac{\text{wrong predictions}}{\text{number of data}}
   $

   Then, print out the results for each class separately (note that you should calculate errors for each class separately in this part). You should also mention in your report which classes are linear separable with SVM without slack. The output format is as follows:

   **Q2.2.2 Calculation using Standard SVM Model:**
   ```{python}
   total training error: xx,  
   total testing error: xx,  
   class setosa:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   class versicolor:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   class virginica:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   Linear separable classes: xx
   ```

3. **(6 points)** Calculation using SVM with Slack Variables (Linear Kernel). For each C = 0.25 × t, where t = 1, 2, . . . , 4, train your SVM on the training dataset, and subsequently validate it on the testing dataset. Calculate the classification error for both the training and testing datasets, the weight vector w, the bias b, and the indices of support vectors, and the slack variable ζ of support vectors (you may compute it as max(0, 1 − y · f(X)). The output format is as follows:

   **Q2.2.3 Calculation using SVM with Slack Variables (C = 0.25 × t, where t = 1, . . . , 4):**
   ```{python}
   -------------------------------------------  
   C=0.25,  
   total training error: xx,  
   total testing error: xx,  
   class setosa:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   slack variable: xx,  
   class versicolor:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   slack variable: xx,  
   class virginica:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   slack variable: xx,  
   -------------------------------------------  
   C=0.5,  
   <... results for (C=0.5) ...>  
   -------------------------------------------  
   C=0.75,  
   <... results for (C=0.75) ...>  
   -------------------------------------------  
   C=1,  
   <... results for (C=1) ...>
   ```

4. **(7 points)** Calculation using SVM with Kernel Functions. Conduct experiments with different kernel functions for SVM without slack variable. Calculate the classification error for both the training and testing datasets, and the indices of support vectors for each kernel type:

   (a) 2nd-order Polynomial Kernel  
   (b) 3rd-order Polynomial Kernel  
   (c) Radial Basis Function Kernel with σ = 1  
   (d) Sigmoidal Kernel with σ = 1  

   The output format is as follows:

   **Q2.2.4 Calculation using SVM with Kernel Functions:**
   ```{python}
   -------------------------------------------  
   (a) 2nd-order Polynomial Kernel,  
   total training error: xx,  
   total testing error: xx,  
   class setosa:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   class versicolor:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   class virginica:  
   training error: xx,  
   testing error: xx,  
   w: xx,  
   b: xx,  
   support vector indices: xx,  
   -------------------------------------------  
   (b) 3rd-order Polynomial Kernel,  
   <... results for (b) ...>  
   -------------------------------------------  
   (c) Radial Basis Function Kernel with σ = 1,  
   <... results for (c) ...>  
   -------------------------------------------  
   (d) Sigmoidal Kernel with σ = 1,  
   <... results for (d) ...>
   ```

### Submission
Submit your executable code in a “HW1 yourID Q2.ipynb” Jupyter notebook (”.py” file is also acceptable). Indicate the corresponding question number in the comment for each cell, and ensure that your code can logically produce the required results for each question in the required format. Please note that you need to write clear comments and use appropriate function/variable names. Excessively unreadable code may result in point deductions.


---

### Library Initialization

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# For reproducibility
np.random.seed(3020)

# Define Constants
RANDOM_SEED = 3020
C_MARGIN = 1e5


### Load Data

In [2]:
iris = pd.read_excel('Classification iris.xlsx')

iris.head(5)

Unnamed: 0,instance_id,sepal length,sepal width,petal length,petal width,class
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


### 1. Train-Test Split
- while maintaining order

In [4]:
# Predictor and Predicted variables
X = iris.iloc[:, :-1]  # Features 
y = iris.iloc[:, -1]   # Target variable 

# 70-30 Train Test split with stratification
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=RANDOM_SEED, stratify=y)

# Reorder the training and test sets based on the instance_id column
X_train = X_train.sort_values(by='instance_id').reset_index(drop=True)
X_test = X_test.sort_values(by='instance_id').reset_index(drop=True)
y_train = y_train.sort_values(ascending=True).reset_index(drop=True)
y_test = y_test.sort_values(ascending=True).reset_index(drop=True)

# Check the instance ids
train_ids = X_train['instance_id'].tolist()  # Instance ids for training set
test_ids = X_test['instance_id'].tolist()  # Instance ids for testing set

print(f"Training set: {train_ids}")
print(f"Length of training set: {len(train_ids)}")
print()
print(f"Test set: {test_ids}")
print(f"Length of test set: {len(test_ids)}")

Training set: [1, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 18, 22, 25, 26, 27, 28, 29, 30, 33, 35, 36, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 52, 54, 55, 57, 58, 59, 60, 62, 63, 67, 68, 69, 70, 71, 72, 73, 76, 78, 80, 82, 83, 84, 85, 86, 87, 88, 89, 92, 93, 94, 95, 96, 97, 98, 100, 101, 102, 103, 105, 106, 107, 108, 110, 112, 113, 117, 118, 119, 120, 122, 123, 126, 127, 128, 129, 130, 131, 132, 133, 136, 137, 138, 140, 141, 143, 144, 146, 147, 149, 150]
Length of training set: 105

Test set: [2, 3, 9, 17, 19, 20, 21, 23, 24, 31, 32, 34, 37, 45, 50, 51, 53, 56, 61, 64, 65, 66, 74, 75, 77, 79, 81, 90, 91, 99, 104, 109, 111, 114, 115, 116, 121, 124, 125, 134, 135, 139, 142, 145, 148]
Length of test set: 45


In [5]:
# drop the instance_id column for training and test sets
X_train = X_train.drop(columns='instance_id')
X_test = X_test.drop(columns='instance_id')

# Label encode the target variable in y_train and y_test
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_test = le.transform(y_test)



### 2. Standard SVM Calculation
- Linear Kernel

In [6]:
# Train svm
svm_2 = svm.SVC(kernel='linear', C=C_MARGIN)
svm_2.fit(X_train, y_train)

# Predict
y_train_pred = svm_2.predict(X_train)
y_test_pred = svm_2.predict(X_test)

# Total classification errors
total_training_error = 1 - accuracy_score(y_train, y_train_pred)
total_testing_error = 1 - accuracy_score(y_test, y_test_pred)

# Print total errors
print(f"Q2.2.2 Calculation using Standard SVM Model:")
print(f"Total training error: {total_training_error:.4f}, total testing error: {total_testing_error:.4f}\n")

# Get unique class labels
class_labels = le.classes_

# Initialize list of linearly separable classes
lin_sep_classes = []

# Results for each class
for i, label in enumerate(class_labels):

    # map binary y_train and y_test based on current class vs rest
    y_train_mapped = (y_train == i).astype(int)
    y_test_mapped = (y_test == i).astype(int)

    # Train a binary SVM model for the current class (for-loop local)
    svm2_binary = svm.SVC(kernel='linear', C=C_MARGIN)
    svm2_binary.fit(X_train, y_train_mapped)

    # Predictions for the current class
    train_class_pred = svm2_binary.predict(X_train)
    test_class_pred = svm2_binary.predict(X_test)

    # Calculate training and testing errors for the current class
    train_class_error = 1 - accuracy_score(y_train_mapped, train_class_pred)
    test_class_error = 1 - accuracy_score(y_test_mapped, test_class_pred)

    # Check if the current class is linearly separable
    if test_class_error == 0:
        lin_sep_classes.append(label)

    # Print results for the current class
    print(f"Class {label}:")
    print(f"  Training error: {train_class_error:.4f}, Testing error: {test_class_error:.4f}")
    print(f"  w: {svm2_binary.coef_[0]}, b: {svm2_binary.intercept_[0]}")
    print(f"  Support vector indices: {svm2_binary.support_}\n")
    print()


print(f"Linear separable classes: {lin_sep_classes}")

Q2.2.2 Calculation using Standard SVM Model:
Total training error: 0.0000, total testing error: 0.0444

Class Iris-setosa:
  Training error: 0.0000, Testing error: 0.0000
  w: [-0.08757428  0.44848304 -0.85088498 -0.44006999], b: 1.600566373049051
  Support vector indices: [39 15 28]


Class Iris-versicolor:
  Training error: 0.2571, Testing error: 0.2667
  w: [-0.99913856 -3.1328945   1.73950159 -3.13977114], b: 11.166458055426018
  Support vector indices: [  1   6  10  15  16  22  24  28  71  75  76  78  80  82  83  84  85  87
  90  91  95  96  99 100 101 104  35  37  38  39  40  41  42  45  48  51
  52  53  54  55  56  57  58  59  61  62  63  64  66  68  69]


Class Iris-virginica:
  Training error: 0.0000, Testing error: 0.0444
  w: [ -1.16007155 -15.28442522  11.40002276  45.05463381], b: -82.99849854088012
  Support vector indices: [56 75 83 90]


Linear separable classes: ['Iris-setosa']


### 3. SVM calculation with slack variables 
- linear kernel

In [7]:
# Slack variable function
def compute_slack(X, y, w, b):
    f_x = np.dot(X, w) + b  # Compute f(X) = w.X + b
    slack = np.maximum(0, 1 - y * f_x)  # Compute slack variable as max(0, 1 - y * f(X))
    return slack

In [10]:
# Train SVM with slack variables for different C values
C_values = [0.25 * t for t in range(1, 5)]  # C = 0.25, 0.5, 0.75, 1.0

for C in C_values:
    # Train SVM model
    svm3 = svm.SVC(kernel='linear', C=C)
    svm3.fit(X_train, y_train)

    # Predict
    y_train_pred = svm3.predict(X_train)
    y_test_pred = svm3.predict(X_test)

    # Total classification errors
    total_training_error = 1 - accuracy_score(y_train_pred, y_train)
    total_testing_error = 1 - accuracy_score(y_test_pred, y_test)
    
    # Results for each class
    print(f"Q2.2.3 Calculation using SVM with Slack Variables (C = {C}):")
    print(f"Total training error: {total_training_error:.4f}, total testing error: {total_testing_error:.4f}")
    print()


    for i, label in enumerate(class_labels):

        # map binary y_train and y_test based on current class vs rest
        y_train_mapped = (y_train == i).astype(int)* 2 - 1
        y_test_mapped = (y_test == i).astype(int)* 2 - 1

        # Train a binary SVM model for the current class (for-loop local)
        svm3_binary = svm.SVC(kernel='linear', C=C)
        svm3_binary.fit(X_train, y_train_mapped)

        # Predictions for the current class
        train_class_pred = svm3_binary.predict(X_train)
        test_class_pred = svm3_binary.predict(X_test)

        # Calculate training and testing errors for the current class
        train_class_error = 1 - accuracy_score(y_train_mapped, train_class_pred)
        test_class_error = 1 - accuracy_score(y_test_mapped, test_class_pred)

        # Calculate slack variable ζ for support vectors
        slack_variables = compute_slack(X_train.iloc[svm3_binary.support_], y_train_mapped[svm3_binary.support_], svm3_binary.coef_[0], svm3_binary.intercept_[0])

        # Print results for the current class
        print(f"Class {label}:")
        print(f"  Training error: {train_class_error:.4f}, Testing error: {test_class_error:.4f}")
        print(f"  w: {svm3_binary.coef_[0]}, b: {svm3_binary.intercept_[0]}")
        print(f"  Support vector indices: {svm3_binary.support_}")
        print(f"  Slack variable ζ:")
        print(f"  {slack_variables}")
        print()

    print("---------------------------------------------------------\n")


Q2.2.3 Calculation using SVM with Slack Variables (C = 0.25):
Total training error: 0.0286, total testing error: 0.0222

Class Iris-setosa:
  Training error: 0.0000, Testing error: 0.0000
  w: [-0.08005903  0.34791736 -0.80803542 -0.38898819], b: 1.7271981074005207
  Support vector indices: [39 64 15 16 28]
  Slack variable ζ:
  [1.14405493e-01 7.16078542e-02 8.72311025e-02 0.00000000e+00
 6.63229678e-08]

Class Iris-versicolor:
  Training error: 0.3429, Testing error: 0.3111
  w: [-0.29652586 -0.92272353  0.25916813 -0.24939231], b: 2.9813980553423742
  Support vector indices: [  1   6   9  10  16  20  22  24  25  28  29  31  70  71  72  73  74  75
  76  78  79  80  82  83  84  85  86  87  88  89  90  91  93  96  99 101
 102 104  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50
  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
  69]
  Slack variable ζ:
  [9.58099040e-02 3.17913772e-02 1.27799503e-01 1.98311994e-01
 9.53887258e-02 0.00000000e+00 3.1

---

### 4. Calculation of SVM using Kernal Functions

(a) 2nd-order Polynomial Kernel  
(b) 3rd-order Polynomial Kernel  
(c) Radial Basis Function Kernel with σ = 1  
(d) Sigmoidal Kernel with σ = 1  

In [17]:
def train_svm_with_kernel(kernel_type, X_train, y_train, X_test, y_test, class_labels, degree=None, gamma=None):
    class_results = {}

    for i, class_name in enumerate(class_labels):
        # binary classification for the current class (one-vs-rest)
        y_train_binary = (y_train == i).astype(int) * 2 - 1 
        y_test_binary = (y_test == i).astype(int) * 2 - 1  

        # define the SVM model with the kernel and hyperparameters
        if kernel_type == 'poly':
            svm_model = svm.SVC(kernel=kernel_type, degree=degree, C=C_MARGIN)  # Polynomial 
        elif kernel_type == 'rbf':
            svm_model = svm.SVC(kernel=kernel_type, gamma=gamma, C=C_MARGIN)  # RBF 
        elif kernel_type == 'sigmoid':
            svm_model = svm.SVC(kernel=kernel_type, gamma=gamma, C=C_MARGIN)  # Sigmoidal 
        else:
            raise ValueError("Incorrect kernel type provided")

        # Train SVM
        svm_model.fit(X_train, y_train_binary)

        # Predictions
        y_train_pred = svm_model.predict(X_train)
        y_test_pred = svm_model.predict(X_test)

        # Calculate errors
        train_error_class = 1 - accuracy_score(y_train_binary, y_train_pred)
        test_error_class = 1 - accuracy_score(y_test_binary, y_test_pred)


        # Store results for the class
        class_results[class_name] = {
            'training_error': train_error_class,
            'testing_error': test_error_class,
            'support_vector_indices': svm_model.support_
        }

    return class_results


# Function to format and print results for each kernel
def output_kernel_results(kernel_name, class_results):
    print(f"-------------------------------------------")
    print(f"({kernel_name}),")
    
    total_training_error = np.mean([results['training_error'] for results in class_results.values()])
    total_testing_error = np.mean([results['testing_error'] for results in class_results.values()])
    
    print(f"Total training error: {total_training_error:.3f}, total testing error: {total_testing_error:.3f}\n")
    
    for class_name, results in class_results.items():
        print(f"Class {class_name}:")
        print(f"training error: {results['training_error']:.3f}, testing error: {results['testing_error']:.3f},")
        print(f"support vector indices: [{', '.join(map(str, results['support_vector_indices']))}]\n")
    




In [18]:
# Run each kernel and print the results
class_labels = le.classes_

kernels = {
    '(a) 2nd-order Polynomial': {
        'kernel': 'poly', 
        'degree': 2
        },
    '(b) 3rd-order Polynomial': {
        'kernel': 'poly', 
        'degree': 3
        },
    '(c) RBF with σ=1': {
        'kernel': 'rbf', 
        'gamma': 1
        },
    '(d) Sigmoidal with σ=1': {
        'kernel': 'sigmoid', 
        'gamma': 1
        }
}

for kernel_name, params in kernels.items():
    
    class_results = train_svm_with_kernel(
        kernel_type=params['kernel'],
        X_train=X_train,
        y_train=y_train,
        X_test=X_test,
        y_test=y_test,
        class_labels=class_labels,
        degree=params.get('degree'),
        gamma=params.get('gamma')
    )

    # Print the results for the current kernel
    output_kernel_results(kernel_name, class_results)

-------------------------------------------
((a) 2nd-order Polynomial),
Total training error: 0.000, total testing error: 0.030

Class Iris-setosa:
training error: 0.000, testing error: 0.000,
support vector indices: [39, 15, 28]

Class Iris-versicolor:
training error: 0.000, testing error: 0.044,
support vector indices: [15, 28, 75, 83, 88, 90, 48, 52, 56]

Class Iris-virginica:
training error: 0.000, testing error: 0.044,
support vector indices: [48, 56, 75, 83, 90]

-------------------------------------------
((b) 3rd-order Polynomial),
Total training error: 0.000, total testing error: 0.030

Class Iris-setosa:
training error: 0.000, testing error: 0.000,
support vector indices: [39, 15, 28]

Class Iris-versicolor:
training error: 0.000, testing error: 0.044,
support vector indices: [28, 75, 83, 88, 90, 48, 52, 56]

Class Iris-virginica:
training error: 0.000, testing error: 0.044,
support vector indices: [46, 48, 56, 75, 83, 88, 90]

-------------------------------------------
((c)