# Support Vector Machines (SVM)

Support Vector Machines (SVM) are a powerful class of supervised machine learning algorithms used for both classification and regression tasks. They aim to find the optimal hyperplane that maximizes the margin between different classes in the feature space.

## Key Concepts

### Hyperplane

- A hyperplane is a decision boundary that separates data points of different classes in feature space. In 2D, it's a line; in 3D, it's a plane; and in higher dimensions, it's a hyperplane.

### Margin

- The margin is the distance between the hyperplane and the nearest data point of any class. SVM aims to maximize this margin, which represents the degree of separation between classes.

### Support Vectors

- Support vectors are the data points that lie closest to the hyperplane and influence the positioning of the hyperplane. They are critical for defining the margin.

### Classification

- SVM performs classification by finding the hyperplane that maximizes the margin while maintaining a trade-off with classification errors.

### Soft Margin

- In cases where perfect separation is not possible, a soft margin SVM allows for a certain number of misclassifications to be made. The regularization parameter "C" controls the softness of the margin.

## SVM Formulation

SVM finds the optimal hyperplane by solving a mathematical optimization problem. In the case of linearly separable data, the objective function is:

Minimize: 0.5 * ||w||^2

Subject to: y_i * (w * x_i + b) >= 1 for all data points

Here:
- "w" is the weight vector.
- "b" is the bias term.
- "x_i" is the feature vector.
- "y_i" is the class label (either 1 or -1).

## Kernel Trick

SVM can handle non-linearly separable data by transforming it into a higher-dimensional feature space. The kernel trick is a technique used to compute dot products in this high-dimensional space efficiently.

Common kernels include the linear, polynomial, radial basis function (RBF), and sigmoid kernels.

## Hyperparameters

- **C (Regularization Parameter)**: Controls the trade-off between maximizing the margin and minimizing classification errors. Larger "C" values lead to a narrower margin with fewer errors, while smaller values prioritize a wider margin with more errors.

- **Kernel Choice**: The choice of kernel function determines the transformation used to create the high-dimensional feature space.

- **Kernel Parameters**: Some kernels, such as the RBF kernel, have additional parameters like "gamma" that control the shape and complexity of the decision boundary.

## Training and Predictions

1. Select a kernel and set hyperparameters.
2. Train the SVM on labeled data.
3. Use the trained SVM for classification of new data points.

## Pros and Cons

**Pros**:
- Effective for high-dimensional data.
- Can handle non-linearly separable data.
- Provides robustness against overfitting (with appropriate "C" values).

**Cons**:
- Sensitive to the choice of kernel and hyperparameters.
- Can be computationally expensive for large datasets.

## Conclusion

Support Vector Machines are versatile and powerful tools for classification tasks. Understanding the fundamental concepts and tuning hyperparameters is essential for effective use.