# **`scikit-learn`** Support Vector Machine Model

The **`sklearn.svm.SVC`** is a class within the `sklearn.svm` module in the `scikit-learn` library that implements Support Vector Classification.

It is a powerful and versatile algorithm used for both binary and multi-class classification tasks.

`SVC` allows the specification of various kernel functions to transform the input data into a higher-dimensional space, enabling the separation of non-linearly separable data. Common kernels include 'linear', 'poly' (polynomial), 'rbf' (Radial Basis Function), 'sigmoid', and 'precomputed'.

#### Usage:

* **Import:** Import the `SVC` class:<br>
`from sklearn.svm import SVC`
* **Initialization:** Create an instance of SVC, specifying desired parameters, for example:<br>
`SVC(kernel='linear'`)
* **Training:** Train the model using the fit method with training data and labels:<br>
`model.fit(X_train, y_train)`
* **Prediction:** Make predictions on new data using the predict method:<br>
`predictions = model.predict(X_test)`
* Probability Estimation (if enabled): Get class probabilities using predict_proba:<br>
`probabilities = model.predict_proba(X_test)`

# Create Train Dataset and Test Dataset

## Load Dataset

We will use a dataset of movie review.

This dataset is used for binary sentiment classification.

The dataset contains two columns: "review" and "sentiment".

The values in the "sentiment" column can have one of two values: "positive" or "negative".

In [None]:
import pandas as pd

# csv file location
url = 'https://github.com/tariqzahratahdi/MachineLearning/raw/refs/heads/main/datasets/movies_reviews.csv'

# making dataframe from csv file
data = pd.read_csv(url)

# show dataframe
data

Unnamed: 0,review,sentiment
0,Interesting and short television movie describ...,negative
1,Insignificant and low-brained (haha!) 80's hor...,negative
2,"Ingrid Bergman, playing dentist Walter Matthau...",positive
3,Infamous horror films seldom measure up the hy...,negative
4,Independent film that would make Hollywood pro...,negative
...,...,...
1995,You remember the Spice Girls movie and how bad...,negative
1996,You should never ever even consider to watch t...,negative
1997,You wear only the best Italian suits from Arma...,positive
1998,You'd think you're in for some serious sightse...,positive


## Check Dataset is Balanced

Check that the dataset contains the same number of rows with the value "positive" than with "negative" in the column "setiment".

In [None]:
data.value_counts('sentiment')

Unnamed: 0_level_0,count
sentiment,Unnamed: 1_level_1
negative,1000
positive,1000


## Splitting Data into Train and Test

### Import Libraries

In [None]:
# import sklearn train_test_split
from sklearn.model_selection import train_test_split

### Create Train and Test Dataframes


In [None]:
# create train and test dataframes
data_train, data_test = train_test_split(data, test_size=0.2, random_state=42)

# show train dataframe
data_train

Unnamed: 0,review,sentiment
968,"""Night of the Living Homeless"" was a fairly st...",positive
240,I remember watching American Gothic when it fi...,positive
819,Although the plot of this film is a bit far-fe...,positive
692,"Chuck Jones's 'Rabbit Seasoning', the second i...",positive
420,"I did not expect much from this film, but boy-...",negative
...,...,...
1130,Must have to agree with the other reviewer. Th...,negative
1294,"Resnais, wow! The genius who brought us Hirosh...",negative
860,Absolutely nothing is redeeming about this tot...,negative
1459,"The movie with its single set, minimal cast, a...",positive


## Set Predictor Variable and Response Variable

In [None]:
# set predictor variable and response variable
X_train, y_train = data_train['review'], data_train['sentiment']
X_test, y_test = data_test['review'], data_test['sentiment']

# show predictor variable dataframe
X_train

Unnamed: 0,review
968,"""Night of the Living Homeless"" was a fairly st..."
240,I remember watching American Gothic when it fi...
819,Although the plot of this film is a bit far-fe...
692,"Chuck Jones's 'Rabbit Seasoning', the second i..."
420,"I did not expect much from this film, but boy-..."
...,...
1130,Must have to agree with the other reviewer. Th...
1294,"Resnais, wow! The genius who brought us Hirosh..."
860,Absolutely nothing is redeeming about this tot...
1459,"The movie with its single set, minimal cast, a..."


# Turn Text Data into Numerical Vectors

## Create an instance of `TfidfVectorizer`

### Import Library and Create an Instance of `TfidfVectorizer`

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

# create an instance of TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english')

### Transform Text into Sparse Matrix

We use the `tfidf.fit_transform()` method to create a model and tranform train data into a sparse matrix the first time.

After that, we use the `tfidf.transform()` method to tranform test data into a sparse matrix.

In [None]:
# transform train text into sparse matrix
X_train_vector = tfidf.fit_transform(X_train)

# transform test text into sparse matrix
X_test_vector = tfidf.transform(X_test)  # use transform() instead of fit_tranform()

X_train_vector

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 140612 stored elements and shape (1600, 22363)>

# The Classification Model `SVC`

**`sklearn.svm.SVC`** is a class within the `sklearn.svm` module in the `scikit-learn` library that implements Support Vector Classification.

#### Usage:

* **Import:** Import the `SVC` class:<br>
`from sklearn.svm import SVC`
* **Initialization:** Create an instance of SVC, specifying desired parameters, for example:<br>
`SVC(kernel='linear'`)
* **Training:** Train the model using the fit method with training data and labels:<br>
`model.fit(X_train, y_train)`
* **Prediction:** Make predictions on new data using the predict method:<br>
`predictions = model.predict(X_test)`
* Probability Estimation (if enabled): Get class probabilities using predict_proba:<br>
`probabilities = model.predict_proba(X_test)`

## Create an Instance of `SVC`

In [None]:
# import SVC
from sklearn.svm import SVC

# create an instance of SVC
clf = SVC(kernel='linear')

## Train the Model

In [None]:
# train the model
clf.fit(X_train_vector, y_train)

## Make Predictions

Once the model is trained, we can use it to make predictions with unseen data.

#### Example

Make prediction with unseen data:

In [None]:
# make prediction
clf.predict(tfidf.transform(['a good movie']))

array(['positive'], dtype=object)

In [None]:
# make prediction
clf.predict(tfidf.transform(['a bad movie']))

array(['negative'], dtype=object)

### Make Prediction with Test Data

Once the model is trained, we can use it to make predictions with the test data.

In [None]:
y_pred = clf.predict(tfidf.transform(X_test))

## Evaluate the Model

To assess the model's performance we use various classification metrics available in `sklearn.metrics`.

Common metrics include:
* **Accuracy:** `accuracy_score(y_test, y_pred)`
* **Precision, Recall, F1-score:** `classification_report(y_test, y_pred)`
* **Confusion Matrix:** `confusion_matrix(y_test, y_pred)`
* **ROC AUC Score:** `roc_auc_score(y_test, y_pred_proba)` (for binary classification, requires probability estimates)

**Example:** accuracy score:

In [None]:
# import
from sklearn.metrics import accuracy_score

# evaluate the model's accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 0.84


**Example:** classification report:

In [None]:
# import
from sklearn.metrics import classification_report

# print the classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))


Classification Report:
              precision    recall  f1-score   support

    negative       0.87      0.80      0.84       202
    positive       0.81      0.88      0.84       198

    accuracy                           0.84       400
   macro avg       0.84      0.84      0.84       400
weighted avg       0.84      0.84      0.84       400

