#### Q1. What is the relationship between polynomial functions and kernel functions in machine learning algorithms?

Ans: Polynomial functions and kernel functions are closely related in machine learning algorithms, particularly in the context of kernel methods such as Support Vector Machines (SVM). The relationship lies in how kernel functions enable the use of polynomial functions to transform the input data into a higher-dimensional feature space.

Polynomial functions are mathematical functions that involve variables raised to powers, such as x^2, x^3, etc. They can capture non-linear relationships between the input features and the target variable.

Kernel functions, on the other hand, are used in kernel methods to implicitly map the input data to a higher-dimensional feature space without actually computing the transformation explicitly. They allow nonlinear relationships to be captured by defining a similarity measure between pairs of input samples.

The connection between polynomial functions and kernel functions arises from the fact that certain kernel functions, such as the Polynomial Kernel, use polynomial functions to map the input data implicitly. The Polynomial Kernel calculates the similarity between two samples as the inner product of the transformed feature vectors in a higher-dimensional space.

By using a polynomial kernel, SVM or other kernel-based algorithms can effectively capture non-linear relationships in the data without explicitly computing the high-dimensional feature space. This is known as the "kernel trick" and provides computational advantages by avoiding the explicit computation of the transformed features.

#### Q2. How can we implement an SVM with a polynomial kernel in Python using Scikit-learn?

Implementing an SVM with a polynomial kernel in Python using Scikit-learn is plane simple. Scikit-learn provides the SVC class, which supports various types of kernels, including the polynomial kernel. Here's an example to illustrate the implementation. 

In [26]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset or any other dataset of your choice
iris = datasets.load_iris()
X = iris.data
y = iris.target

In [7]:
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [8]:
# Create an SVM classifier with a polynomial kernel
svm_classifier = SVC(kernel='poly', degree=3)

In [9]:
# Train the SVM classifier
svm_classifier.fit(X_train, y_train)

In [10]:
# Make predictions on the test set
y_pred = svm_classifier.predict(X_test)

In [16]:
y_test

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0])

In [18]:
svm_classifier.predict([[7.7, 2.6, 6.9, 2.3]])

array([2])

Interpretation: In this code, we load the Iris dataset using datasets.load_iris() and split it into training and test sets using train_test_split(). We then create an SVM classifier with a polynomial kernel by setting the kernel parameter to 'poly' and specifying the desired degree of the polynomial using the degree parameter. The degree determines the complexity of the polynomial function.

Next, we train the SVM classifier using the training data by calling fit() and make predictions on the test set using predict(). Lastly, we make predictions using different values for different features of dataset.

#### Q3. How does increasing the value of epsilon affect the number of support vectors in SVR?

Ans: In Support Vector Regression (SVR), the parameter epsilon (ε) determines the width of the margin or the tube around the predicted function within which errors are tolerated. The epsilon-insensitive loss function is used to determine the support vectors and the regression line.

Increasing the value of epsilon can have an impact on the number of support vectors in SVR. Here's how it generally works:

Larger Epsilon (Wide Margin): When epsilon is set to a larger value, it allows more data points to fall within the margin or tube. This wider margin means that more data points can be tolerated within a certain distance from the predicted function without incurring a penalty. Consequently, a larger number of data points can become support vectors, leading to a potentially higher number of support vectors.

Smaller Epsilon (Narrow Margin): Conversely, when epsilon is set to a smaller value, the margin or tube becomes narrower. This means that fewer data points are allowed to fall within the margin, and the model becomes more strict in its tolerance for errors. Consequently, a smaller number of data points may become support vectors, resulting in a potentially lower number of support vectors.

The number of support vectors is influenced by the trade-off between the margin width (controlled by epsilon) and the regularization parameter (C) in SVR. By adjusting these parameters, you can balance the model's flexibility and complexity, which in turn affects the number of support vectors.

It's important to note that the optimal value of epsilon and its impact on the number of support vectors may vary depending on the specific dataset and problem at hand. It is often recommended to perform cross-validation or grid search techniques to determine the optimal values for both epsilon and C, considering the specific characteristics of your dataset and the desired trade-off between model complexity and performance.

#### Q4. How does the choice of kernel function, C parameter, epsilon parameter, and gamma parameter affect the performance of Support Vector Regression (SVR)? Can you explain how each parameter works and provide examples of when you might want to increase or decrease its value?

The choice of kernel function, C parameter, epsilon parameter, and gamma parameter in Support Vector Regression (SVR) can significantly impact the performance and behavior of the model. Let's discuss each parameter and its effect:

- Kernel Function:
The kernel function determines the type of non-linear transformation applied to the input data. Common kernel functions include linear, polynomial, radial basis function (RBF), and sigmoid.
Choosing the appropriate kernel function depends on the nature of the data and the complexity of the underlying relationship. For example, if the relationship is expected to be linear, the linear kernel can be used. If the relationship is non-linear, the polynomial or RBF kernel may be more suitable.
The choice of the kernel function can affect the model's ability to capture complex patterns and its generalization performance.

- C Parameter:
The C parameter controls the trade-off between minimizing the training errors and allowing deviations from the actual data points. It acts as a regularization parameter in SVR.
A smaller C value imposes a higher regularization strength, resulting in a simpler model with a larger margin and fewer support vectors. This can help prevent overfitting but may lead to underfitting if the model is too constrained.
Increasing the C value allows more flexibility in fitting the training data, potentially leading to better performance on the training set. However, it may increase the risk of overfitting if the model becomes too complex.

- Epsilon (ε) Parameter:
The epsilon parameter defines the width of the epsilon-insensitive tube around the predicted function. It specifies the margin within which errors are tolerated.
A larger epsilon value allows a wider margin and permits more data points to fall within the tube without incurring a penalty. This can increase the number of support vectors and lead to a more flexible model but may also introduce more errors.
A smaller epsilon value tightens the margin and makes the model less tolerant to errors, potentially reducing the number of support vectors and making the model more strict.

- Gamma Parameter:
The gamma parameter influences the shape of the decision boundary and the influence of each training example. It defines the reach of the individual training examples in the feature space.
A smaller gamma value makes the influence of training examples more widespread, resulting in a smoother decision boundary. It can prevent overfitting, especially in cases where the number of features is large.
Increasing the gamma value gives more weight to the training examples closer to the decision boundary. This can result in a more complex decision boundary and may lead to overfitting, particularly if the dataset is small or noisy.
The optimal values for these parameters depend on the specific dataset and problem at hand. It's recommended to perform hyperparameter tuning using techniques such as cross-validation or grid search to find the best combination of parameter values that maximize the model's performance on unseen data.

#### Q5. Assignment:
- Import the necessary libraries and load the dataset.
- Split the dataset into training and testing set
- Preprocess the data using any technique of your choice (e.g. scaling, normalization)
- Create an instance of the SVC classifier and train it on the training datW
- use the trained classifier to predict the labels of the testing datW
- Evaluate the performance of the classifier using any metric of your choice (e.g. accuracy, precision, recall, F1-score.
- Tune the hyperparameters of the SVC classifier using GridSearchCV or RandomiMedSearchCV to improve its performanc.
- Train the tuned classifier on the entire dataseg
- Save the trained classifier to a file for future use.

Note: You can use any dataset of your choice for this assignment, but make sure it is suitable for
classification and has a sufficient number of features and samples.

In [28]:
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np

dataset = load_iris()
df = pd.DataFrame(dataset.data, columns = dataset.feature_names)
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [29]:
X = df.iloc[:,: -1]
y = dataset.target

In [30]:
# Train test split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =0.25, random_state=42)

In [31]:
from sklearn.svm import SVC
svc = SVC(kernel = 'linear')
svc.fit(X_train, y_train)
y_pred = svc.predict(X_test)

In [32]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print(accuracy_score(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00        12

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38

[[15  0  0]
 [ 0 11  0]
 [ 0  0 12]]
1.0


##### Hyperparameter Tuning. 

In [33]:
from sklearn.model_selection import GridSearchCV

param_grid = {'C':[0.1, 1, 10, 100, 1000],
              'gamma':[1, 0.1, 0.01, 0.001, 0.0001],
              'kernel':['linear', 'poly', 'rbf']
             }
grid = GridSearchCV(SVC(), param_grid = param_grid, cv =5, verbose =3)
grid.fit(X_train, y_train)

Fitting 5 folds for each of 75 candidates, totalling 375 fits
[CV 1/5] END .....C=0.1, gamma=1, kernel=linear;, score=1.000 total time=   0.0s
[CV 2/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.957 total time=   0.0s
[CV 3/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.818 total time=   0.0s
[CV 4/5] END .....C=0.1, gamma=1, kernel=linear;, score=1.000 total time=   0.0s
[CV 5/5] END .....C=0.1, gamma=1, kernel=linear;, score=0.864 total time=   0.0s
[CV 1/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.957 total time=   0.0s
[CV 2/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.957 total time=   0.0s
[CV 3/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.818 total time=   0.0s
[CV 4/5] END .......C=0.1, gamma=1, kernel=poly;, score=1.000 total time=   0.0s
[CV 5/5] END .......C=0.1, gamma=1, kernel=poly;, score=0.955 total time=   0.0s
[CV 1/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.957 total time=   0.0s
[CV 2/5] END ........C=0.1, gamma=1, kernel=rbf

[CV 4/5] END ...C=1, gamma=0.001, kernel=linear;, score=1.000 total time=   0.0s
[CV 5/5] END ...C=1, gamma=0.001, kernel=linear;, score=0.955 total time=   0.0s
[CV 1/5] END .....C=1, gamma=0.001, kernel=poly;, score=0.348 total time=   0.0s
[CV 2/5] END .....C=1, gamma=0.001, kernel=poly;, score=0.348 total time=   0.0s
[CV 3/5] END .....C=1, gamma=0.001, kernel=poly;, score=0.318 total time=   0.0s
[CV 4/5] END .....C=1, gamma=0.001, kernel=poly;, score=0.636 total time=   0.0s
[CV 5/5] END .....C=1, gamma=0.001, kernel=poly;, score=0.455 total time=   0.0s
[CV 1/5] END ......C=1, gamma=0.001, kernel=rbf;, score=0.348 total time=   0.0s
[CV 2/5] END ......C=1, gamma=0.001, kernel=rbf;, score=0.348 total time=   0.0s
[CV 3/5] END ......C=1, gamma=0.001, kernel=rbf;, score=0.318 total time=   0.0s
[CV 4/5] END ......C=1, gamma=0.001, kernel=rbf;, score=0.636 total time=   0.0s
[CV 5/5] END ......C=1, gamma=0.001, kernel=rbf;, score=0.500 total time=   0.0s
[CV 1/5] END ..C=1, gamma=0.

[CV 1/5] END .......C=100, gamma=1, kernel=poly;, score=1.000 total time=   3.9s
[CV 2/5] END .......C=100, gamma=1, kernel=poly;, score=0.957 total time=   1.3s
[CV 3/5] END .......C=100, gamma=1, kernel=poly;, score=0.818 total time=   0.0s
[CV 4/5] END .......C=100, gamma=1, kernel=poly;, score=1.000 total time=   3.7s
[CV 5/5] END .......C=100, gamma=1, kernel=poly;, score=0.955 total time=   0.3s
[CV 1/5] END ........C=100, gamma=1, kernel=rbf;, score=0.913 total time=   0.0s
[CV 2/5] END ........C=100, gamma=1, kernel=rbf;, score=0.913 total time=   0.0s
[CV 3/5] END ........C=100, gamma=1, kernel=rbf;, score=0.818 total time=   0.0s
[CV 4/5] END ........C=100, gamma=1, kernel=rbf;, score=0.909 total time=   0.0s
[CV 5/5] END ........C=100, gamma=1, kernel=rbf;, score=0.955 total time=   0.0s
[CV 1/5] END ...C=100, gamma=0.1, kernel=linear;, score=0.957 total time=   0.0s
[CV 2/5] END ...C=100, gamma=0.1, kernel=linear;, score=0.957 total time=   0.0s
[CV 3/5] END ...C=100, gamma

[CV 3/5] END C=1000, gamma=0.001, kernel=linear;, score=0.818 total time=   0.0s
[CV 4/5] END C=1000, gamma=0.001, kernel=linear;, score=1.000 total time=   0.0s
[CV 5/5] END C=1000, gamma=0.001, kernel=linear;, score=0.955 total time=   0.0s
[CV 1/5] END ..C=1000, gamma=0.001, kernel=poly;, score=0.826 total time=   0.0s
[CV 2/5] END ..C=1000, gamma=0.001, kernel=poly;, score=0.696 total time=   0.0s
[CV 3/5] END ..C=1000, gamma=0.001, kernel=poly;, score=0.818 total time=   0.0s
[CV 4/5] END ..C=1000, gamma=0.001, kernel=poly;, score=1.000 total time=   0.0s
[CV 5/5] END ..C=1000, gamma=0.001, kernel=poly;, score=0.773 total time=   0.0s
[CV 1/5] END ...C=1000, gamma=0.001, kernel=rbf;, score=1.000 total time=   0.0s
[CV 2/5] END ...C=1000, gamma=0.001, kernel=rbf;, score=0.957 total time=   0.0s
[CV 3/5] END ...C=1000, gamma=0.001, kernel=rbf;, score=0.818 total time=   0.0s
[CV 4/5] END ...C=1000, gamma=0.001, kernel=rbf;, score=1.000 total time=   0.0s
[CV 5/5] END ...C=1000, gamm

In [36]:
grid.best_params_

y_pred4 = grid.predict(X_test)
print(classification_report(y_test, y_pred4))
print(confusion_matrix(y_test, y_pred4))
print(accuracy_score(y_test,y_pred4))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      0.91      0.95        11
           2       0.92      1.00      0.96        12

    accuracy                           0.97        38
   macro avg       0.97      0.97      0.97        38
weighted avg       0.98      0.97      0.97        38

[[15  0  0]
 [ 0 10  1]
 [ 0  0 12]]
0.9736842105263158


In [35]:
# saving the trained classifier for future use
import pickle
pickle.dump(svc,open('svc.pkl','wb'))