<a href="https://colab.research.google.com/github/yavuzuzun/projects/blob/main/SupportVectorMachines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression analysis. The main idea behind SVM is to find a hyperplane that separates the data points into different classes with maximum margin. 

In simple terms, SVM tries to find a decision boundary that maximizes the margin between the two classes of data. The margin is defined as the distance between the decision boundary and the closest points of each class. The points that lie on the margin are called support vectors.

The SVM algorithm works as follows:

1. Data Preparation: SVM algorithm starts with the preparation of data. It takes labeled data as input and classifies the data into different classes based on the input features.

2. Selecting the Kernel Function: The next step is to select the kernel function. Kernel function maps the input data into a higher-dimensional space where the data can be easily classified. The most commonly used kernel functions are linear, polynomial, and radial basis function (RBF) kernel.

3. Optimization of Hyperparameters: SVM algorithm requires the optimization of hyperparameters to improve the accuracy of the classification. The hyperparameters include the regularization parameter (C) and kernel parameters.

4. Training the SVM Model: After the selection of kernel function and optimization of hyperparameters, the SVM model is trained using the labeled data. The SVM algorithm tries to find the optimal hyperplane that separates the data into different classes.

5. Prediction: Once the SVM model is trained, it can be used to predict the class of new data points.

The main advantages of SVM algorithm are its ability to handle non-linearly separable data, high accuracy, and ability to work well with high-dimensional data. However, SVM can be computationally expensive and sensitive to the choice of kernel function and hyperparameters.

# Using Prepackage
### Example: Classifying Breast Cancer using SVMs
In this example, we'll use SVMs to classify breast cancer as either benign or malignant based on several features, including the radius of the tumor, texture, and more.

### Step 1: Load and preprocess the data
First, we need to load and preprocess the data. We'll use the load_breast_cancer function from scikit-learn to load the data, and then split it into training and testing sets.

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the breast cancer dataset
data = load_breast_cancer()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=0)

# Scale the data to improve SVM performance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)



Note that we're scaling the data using the StandardScaler function, which subtracts the mean and divides by the standard deviation of each feature. This is often done to improve the performance of SVMs.

### Step 2: Train and evaluate the SVM
Next, we'll create an SVM model using the svm.SVC function, and train it on the training data. We'll use the radial basis function (RBF) kernel, which is a common choice for SVMs.

In [2]:
from sklearn import svm

# Create an SVM classifier with an RBF kernel
clf = svm.SVC(kernel='rbf')

# Train the classifier on the training data
clf.fit(X_train, y_train)

# Evaluate the classifier on the testing data
accuracy = clf.score(X_test, y_test)
print("Accuracy:", accuracy)


Accuracy: 0.965034965034965


When we run this code, we get an accuracy of around 0.97, which is quite good. However, there are some limitations and trade-offs to consider when using SVMs.

### Strengths of SVMs
  - SVMs can perform well even when the number of features is much larger than the number of samples.

  - SVMs can handle non-linear decision boundaries by using different kernel functions.

  - SVMs are less prone to overfitting than other classification algorithms, as long as the kernel and other parameters are chosen carefully.

  - SVMs can work well even with small datasets, as long as the number of features is not too large.

### Weaknesses of SVMs
  - SVMs can be computationally expensive to train, especially if the number of features or samples is very large.

  - SVMs can be sensitive to the choice of kernel function and other hyperparameters, which can be difficult to tune without a lot of experimentation.

  - SVMs do not provide probability estimates by default, which can make it difficult to interpret the results.

  - SVMs can be sensitive to outliers in the data, which can affect the decision boundary. This can be mitigated somewhat by using a more robust kernel function, such as the RBF kernel.

# From Scratch

In [None]:
import numpy as np

class SVM:
    def __init__(self, lr=0.01, C=1.0, max_iters=1000, tol=1e-3):
        self.lr = lr
        self.C = C
        self.max_iters = max_iters
        self.tol = tol
        
    def fit(self, X, y):
        # Initialize the parameters
        m, n = X.shape
        self.w = np.zeros(n)
        self.b = 0
        self.support_vectors = None
        
        # Iterate until convergence or max_iters
        for _ in range(self.max_iters):
            # Compute the margin
            margin = y * (X.dot(self.w) + self.b)
            
            # Identify the support vectors (points that violate the margin)
            idx = np.where(margin <= 1)[0]
            support_vectors = X[idx]
            support_labels = y[idx]
            
            # Compute the gradient of the loss function
            grad_w = self.w - self.C * np.sum(support_labels[:, np.newaxis] * support_vectors, axis=0)
            grad_b = -self.C * np.sum(support_labels)
            
            # Update the parameters
            self.w -= self.lr * grad_w
            self.b -= self.lr * grad_b
            
            # Check for convergence
            if np.linalg.norm(grad_w) < self.tol:
                self.support_vectors = support_vectors
                break
        
        if self.support_vectors is None:
            self.support_vectors = X[idx]
            
    def predict(self, X):
        # Compute the predicted class labels for a set of inputs X
        return np.sign(X.dot(self.w) + self.b)


  - The SVM class is initialized with several hyperparameters, including the learning rate lr, the regularization parameter C, the maximum number of iterations max_iters, and the tolerance tol.
  - The fit method takes in a matrix X of training examples (one example per row) and a vector y of class labels (-1 or 1). It uses the training examples to learn the parameters of the SVM (the weight vector w and the bias b) using gradient descent.
  - The predict method takes in a matrix X of test examples and returns a vector of predicted class labels (-1 or 1).

Note that this implementation uses the hinge loss function as the loss function for the SVM, and uses gradient descent to optimize the parameters. It also includes a regularization term to prevent overfitting, and uses the norm of the gradient as a stopping criterion to check for convergence.

### SVM using the dual problem

In [None]:
import numpy as np
from scipy.optimize import minimize


class SVM:
    def __init__(self, C=1.0):
        self.C = C
        self.alpha = None
        self.support_vectors = None
        self.bias = None

    def fit(self, X, y):
        n_samples, n_features = X.shape

        # Define the objective function for the dual problem
        def objective(alpha):
            return 0.5 * np.dot(alpha, alpha * y) - np.sum(alpha)

        # Define the constraints for the dual problem
        def constraints(alpha):
            return np.dot(alpha, y)

        # Set up the optimization problem for the dual problem
        bounds = [(0, self.C) for _ in range(n_samples)]
        cons = {'type': 'eq', 'fun': constraints}
        res = minimize(objective, np.zeros(n_samples), bounds=bounds, constraints=cons)

        # Get the optimal values of the dual variables
        self.alpha = res.x

        # Identify the support vectors
        idx = self.alpha > 1e-5
        self.support_vectors = X[idx]
        self.alpha = self.alpha[idx]
        self.y = y[idx]

        # Compute the bias term
        self.bias = np.mean(self.y - np.sum(self.alpha * self.y * np.dot(self.support_vectors, X.T), axis=0))

    def predict(self, X):
        # Compute the predicted class labels for a set of inputs X
        y_pred = np.sum(self.alpha * self.y * np.dot(self.support_vectors, X.T), axis=0) + self.bias
        y_pred = np.sign(y_pred)
        return y_pred
