### Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?


Ans - **Polynomial Functions:**
Polynomial functions are mathematical functions that involve variables raised to integer powers and combined using addition and multiplication operations. They can represent a wide range of complex relationships between input variables. In machine learning, polynomial functions are often used for non-linear feature mapping, where the original input features are transformed into higher-dimensional feature vectors by adding polynomial terms.
For example, in polynomial regression, a polynomial function is used to model the relationship between the input variables and the target variable. The polynomial terms allow the model to capture non-linear patterns in the data.

**Kernel Functions:**
Kernel functions, on the other hand, are used in machine learning algorithms, particularly in Support Vector Machines (SVMs), to implicitly operate in a high-dimensional feature space without explicitly computing the transformed feature vectors. Kernel functions calculate the inner product between vectors in the feature space without explicitly mapping them.

The relationship between polynomial functions and kernel functions lies in their ability to capture non-linear relationships in machine learning algorithms.
In SVMs, polynomial kernel functions are commonly used to introduce non-linear decision boundaries. The polynomial kernel computes the inner product between two vectors in the feature space created by adding polynomial terms to the input features. This implicitly allows the SVM to operate in a higher-dimensional feature space, capturing non-linear patterns in the data.

By using a polynomial kernel function, SVMs can approximate the effect of explicitly applying a polynomial feature mapping to the input data. The polynomial kernel effectively computes the inner product of the transformed feature vectors without explicitly calculating the transformation.

### Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?


In [1]:
from sklearn import svm

# Prepare the data
X = [[0, 0], [1, 1], [2, 2]]
y = [0, 1, 0]

# Create an SVM classifier with a polynomial kernel
classifier = svm.SVC(kernel='poly', degree=2, coef0=1, C=1)

# Fit the classifier to the data
classifier.fit(X, y)

# Predict the class labels for new data
X_new = [[0.5, 0.5], [1.5, 1.5]]
y_pred = classifier.predict(X_new)

print(y_pred)  # Output: [0, 1]


[0 0]


In this example, we create an SVM classifier with a polynomial kernel of degree 2 and an independent term of 1. We fit the classifier to the training data and then predict the class labels for two new data points.

### Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?


Ans - **Larger Epsilon:**
When the value of epsilon is increased, the width of the epsilon-insensitive zone becomes larger. This means that the SVR model allows more training examples to fall within this zone without incurring a penalty.

**Smaller Epsilon:**
Conversely, when the value of epsilon is decreased, the width of the epsilon-insensitive zone becomes smaller. This means that the SVR model becomes more strict in enforcing a smaller tolerance for errors.

Increasing the value of epsilon in SVR allows for a wider tolerance of errors and may lead to a larger number of support vectors. Conversely, decreasing the value of epsilon reduces the tolerance for errors and may result in a smaller number of support vectors. The choice of epsilon should be carefully considered based on the specific problem, the desired balance between model flexibility and strictness, and the amount of acceptable error in the regression task.

### Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?


Ans - **1. Kernel Function:**
The kernel function determines the type of decision boundary that the SVR model can learn. Common kernel functions include linear, polynomial, radial basis function (RBF), and sigmoid. The choice of kernel function depends on the problem's characteristics, such as linearity or non-linearity, and the available prior knowledge.
Linear Kernel: Suitable for linear relationships between features and target values. It is computationally efficient but may not capture complex patterns in the data.
- **Polynomial Kernel:** Useful when the relationship between features and target values is polynomial. The degree of the polynomial can be adjusted to control the complexity of the model.
- **RBF Kernel:** Effective for capturing non-linear and complex relationships. The gamma parameter (discussed below) affects the kernel's flexibility.
- **Sigmoid Kernel:** Suitable for problems with non-linear relationships but less commonly used compared to the RBF kernel.

**2. C Parameter:**
The C parameter controls the trade-off between minimizing the training errors and achieving a wider margin. It acts as a regularization parameter in SVR.
- **Smaller C:** Results in a larger margin but allows more training errors (soft margin). It is useful when there are potential outliers or noise in the data, as it provides more flexibility to tolerate errors.
- **Larger C:** Emphasizes minimizing the training errors and leads to a smaller margin (hard margin). It is suitable when the data is expected to be mostly noise-free and the priority is to fit the training data as closely as possible.

**3. Epsilon Parameter:**
The epsilon parameter (ε) determines the width of the epsilon-insensitive zone or margin of tolerance in SVR. It defines the range within which errors are considered acceptable and do not contribute to the loss function.
- **Larger Epsilon:** Increases the tolerance for errors, resulting in a wider margin of tolerance. It is suitable when the target variable has higher noise levels or when a larger deviation from the desired output is acceptable.
- **Smaller Epsilon:** Decreases the tolerance for errors, resulting in a narrower margin of tolerance. It is appropriate when the target variable has low noise levels or when precision is crucial.

**4. Gamma Parameter:**
The gamma parameter influences the flexibility of the SVR model when using kernel functions like RBF. It determines the reach of each training example, affecting the smoothness and complexity of the decision boundary.
- **Smaller Gamma:** Results in a smoother decision boundary, considering a wider range of training examples. It is suitable when the training data is spread out or when there is higher noise in the data.
- **Larger Gamma:** Leads to a more complex and irregular decision boundary, considering only closer training examples. It is useful when the training data is dense or when there is less noise.

### Q5. Assignment:
- Import the necessary libraries and load the dataset
- Split the dataset into training and testing sets.
- Preprocess the data using any technique of your choice (e.g. scaling, normalization)
- Create an instance of the SVC classifier and train it on the training data
- Use the trained classifier to predict the labels of the testing data
- Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-score)
- Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to improve its performance
- Train the tuned classifier on the entire dataset
- Save the trained classifier to a file for future use.

In [2]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris

In [3]:
dataset = load_iris()

In [4]:
df = pd.DataFrame(dataset.data, columns = dataset.feature_names)
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [7]:
X = df.iloc[:,: -1]
y = dataset.target

In [8]:
# Train test split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =0.25, random_state=42)

In [11]:
from sklearn.svm import SVC
svc = SVC(kernel = 'linear')
svc.fit(X_train, y_train)
y_pred = svc.predict(X_test)

In [12]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print(accuracy_score(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00        12

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38

[[15  0  0]
 [ 0 11  0]
 [ 0  0 12]]
1.0


### Hyperparameter tunning

In [17]:
from sklearn.model_selection import GridSearchCV

param_grid = {'C':[0.1, 1, 10, 100, 1000],
              'gamma':[1, 0.1, 0.01, 0.001, 0.0001],
              'kernel':['linear', 'poly', 'rbf']
             }

In [18]:
grid = GridSearchCV(SVC(), param_grid = param_grid, cv =5, verbose =3)

In [19]:
grid.fit(X_train, y_train)


Fitting 5 folds for each of 75 candidates, totalling 375 fits
[CV 1/5] END .....C=0.1, gamma=1, kernel=linear;, score=1.000 total time=   0.0s
[CV 2/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.957 total time=   0.0s
[CV 3/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.818 total time=   0.0s
[CV 4/5] END .....C=0.1, gamma=1, kernel=linear;, score=1.000 total time=   0.0s
[CV 5/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.864 total time=   0.0s
[CV 1/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.957 total time=   0.0s
[CV 2/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.957 total time=   0.0s
[CV 3/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.818 total time=   0.0s
[CV 4/5] END .......C=0.1, gamma=1, kernel=poly;, score=1.000 total time=   0.0s
[CV 5/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.955 total time=   0.0s
[CV 1/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.957 total time=   0.0s
[CV 2/5] END ........C=0.1, gamma=1, kernel=rbf

In [20]:
grid.best_params_

{'C': 1000, 'gamma': 1, 'kernel': 'poly'}

In [21]:
y_pred4 = grid.predict(X_test)
print(classification_report(y_test, y_pred4))
print(confusion_matrix(y_test, y_pred4))
print(accuracy_score(y_test,y_pred4))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      0.91      0.95        11
           2       0.92      1.00      0.96        12

    accuracy                           0.97        38
   macro avg       0.97      0.97      0.97        38
weighted avg       0.98      0.97      0.97        38

[[15  0  0]
 [ 0 10  1]
 [ 0  0 12]]
0.9736842105263158


In [22]:
# saving the trained classifier for future use
import pickle
pickle.dump(svc,open('svc.pkl','wb'))