#Sentiment analysis of customer comments with Models Based on learning transfer (BERT Language Model)

The purpose of this project is to analyze the sentiments of IMDB website users' comments with the help of models based on learning transfer. In this project, we have used the BERT Language model to develop a model that can analyze the sentiments in the comments of IMDB website users and determine whether a comment is positive or negative.

The link to the data page on the Keggle website: https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

##BERT (Bidirectional Encoder Representations from Transformers) Language Model

BERT is an open source machine learning framework for natural language processing (NLP). BERT is designed to help computers understand the meaning of ambiguous language in text by using surrounding text to establish context. The BERT framework was pre-trained using text from Wikipedia and can be fine-tuned with question and answer datasets.

BERT, which stands for Bidirectional Encoder Representations from Transformers, is based on Transformers, a deep learning model in which every output element is connected to every input element, and the weightings between them are dynamically calculated based upon their connection. (In NLP, this process is called attention.)

Historically, language models could only read text input sequentially -- either left-to-right or right-to-left -- but couldn't do both at the same time. BERT is different because it is designed to read in both directions at once. This capability, enabled by the introduction of Transformers, is known as bidirectionality.

Using this bidirectional capability, BERT is pre-trained on two different, but related, NLP tasks: Masked Language Modeling and Next Sentence Prediction.

The objective of Masked Language Model (MLM) training is to hide a word in a sentence and then have the program predict what word has been hidden (masked) based on the hidden word's context. The objective of Next Sentence Prediction training is to have the program predict whether two given sentences have a logical, sequential connection or whether their relationship is simply random.

##Import Library and dataset

1. **Pandas (import pandas as pd):** Pandas is a powerful data manipulation and analysis library for Python. It provides data structures like DataFrame, which is a two-dimensional table, and tools for data cleaning, exploration, and manipulation. Pandas is used to read the dataset from a CSV file, perform data manipulations (e.g., splitting the dataset), and create DataFrames for easy handling of tabular data.

2. **scikit-learn (from sklearn.model_selection import train_test_split):** Scikit-learn is a machine learning library that provides simple and efficient tools for data analysis and modeling. The train_test_split function is used to split the dataset into training and testing sets. The train_test_split function is applied to divide the dataset into portions for training and evaluating the model.

3. **from sklearn.preprocessing import LabelEncoder:** The LabelEncoder is a utility class provided by scikit-learn that helps encode categorical labels (strings or integers) into numerical values. It assigns a unique integer to each unique label. In the provided code, LabelEncoder is used to convert the 'sentiment' column in the dataset from string labels ('positive' and 'negative') to numeric labels (0 and 1). This is necessary for training the machine learning model, which typically expects numerical labels.

4. **from sklearn.metrics import accuracy_score, classification_report:**
"accuracy_score" Computes the accuracy of the model predictions by comparing them with the true labels. It is a common metric for classification tasks. "classification_report" Generates a text report showing the main classification metrics, including precision, recall, and F1-score, for each class. After making predictions on the test set, these metrics are used to evaluate the performance of the model.

5. **Transformers (from transformers import BertTokenizer, BertForSequenceClassification, AdamW):** Transformers is a library by Hugging Face that provides pre-trained models for natural language processing (NLP) tasks. It includes tools for working with popular transformer-based models like BERT, GPT, and others.  In this code, the BertTokenizer is used to tokenize the input text, BertForSequenceClassification is the pre-trained BERT model for sequence classification, and AdamW is an optimizer for training the model.

6. **PyTorch (import torch):** PyTorch is an open-source machine learning library used for tasks such as deep learning and neural network training. It provides a flexible and dynamic computational graph, making it popular for research and development. PyTorch is used for various tasks, including creating tensors, defining and training neural networks, and moving data between CPU and GPU.

7. **Torch DataLoader (from torch.utils.data import DataLoader, TensorDataset, random_split):** The DataLoader class from PyTorch is used to load batches of data efficiently during model training. TensorDataset is a PyTorch dataset wrapper for tensors, and random_split is used to split the dataset into training and validation sets. These components are used to prepare the data in a format suitable for training the BERT model.

8. **tqdm (from tqdm import tqdm):** tqdm is a library for adding progress bars to loops in Python. It provides a visual representation of the progress of tasks, which is particularly helpful for time-consuming operations. tqdm is used to create progress bars during the training and evaluation loops to monitor the progress of the model.

These libraries collectively facilitate the implementation of a sentiment analysis model using transfer learning with BERT. They cover data manipulation, model training, and evaluation, as well as providing pre-trained models and tools for working with natural language processing tasks.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report
from transformers import BertTokenizer, BertForSequenceClassification, AdamW
import torch
from torch.utils.data import DataLoader, TensorDataset, random_split
from tqdm import tqdm

###import dataset

In [None]:
df = pd.read_csv('/content/drive/MyDrive/ml/Sentiment-analysis-of-customer-comments/IMDB Dataset.csv')

###information about dataset

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   review     50000 non-null  object
 1   sentiment  50000 non-null  object
dtypes: object(2)
memory usage: 781.4+ KB


##Data Preparation
the code is using scikit-learn's LabelEncoder to transform the categorical labels in the 'sentiment' column of the DataFrame (df) into numeric labels. This is a common preprocessing step when working with machine learning models, as many algorithms expect numerical input for the target variable.

Here's a breakdown of what each line does:

1. **label_encoder = LabelEncoder():** Creates an instance of the LabelEncoder class.

2. **df['label'] = label_encoder.fit_transform(df['sentiment']):** The fit_transform method of LabelEncoder is applied to the 'sentiment' column in the DataFrame (df). This method fits the encoder to the unique labels in the 'sentiment' column and transforms them into numerical values. The transformed labels are then assigned to a new column named 'label' in the DataFrame. For example, if the 'sentiment' column originally had values like 'positive' and 'negative', the fit_transform operation would map 'positive' to one numerical value (e.g., 1) and 'negative' to another (e.g., 0). The resulting DataFrame would then have a new column 'label' containing these numeric representations.

This transformation is essential when training machine learning models because most algorithms require numerical labels. In the context of sentiment analysis, it enables the model to learn the patterns associated with positive and negative sentiments during the training process. After training, the model can then make predictions on new data, providing numeric labels representing the predicted sentiments.


In [None]:
label_encoder = LabelEncoder()
df['label'] = label_encoder.fit_transform(df['sentiment'])

###split data
The bottom code is using scikit-learn's train_test_split function to split the dataset into training and testing sets. This is a common step in machine learning to evaluate the performance of the model on unseen data. Here's a breakdown of the code:

1. **df['review'].values:** Extracts the values from the 'review' column of the DataFrame (df). This column contains the user comments or reviews about movies.

2. **df['label'].values:** Extracts the values from the 'label' column of the DataFrame (df). This column was created in a previous step using LabelEncoder to convert the categorical sentiment labels ('positive' and 'negative') into numerical labels.

**train_test_split Function:** The first parameter is the array of features (user reviews), and the second parameter is the array of labels (numerical sentiments).
1. test_size=0.2: Specifies that 20% of the data should be reserved for the test set, and the remaining 80% will be used for training.
2. random_state=42: Sets a seed for the random number generator to ensure reproducibility. The same seed will always produce the same split.

**Return Values:** Four sets of data are returned:
1. train_texts: The training set of user reviews.
2. test_texts: The test set of user reviews.
3. train_labels: The corresponding labels (sentiments) for the training set.
4. test_labels: The corresponding labels (sentiments) for the test set.

So, after running this code, you have separate arrays (train_texts, test_texts, train_labels, test_labels) that represent the training and testing sets of reviews and their corresponding sentiment labels. These sets are then used for training and evaluating the machine learning model.

In [None]:
train_texts, test_texts, train_labels, test_labels = train_test_split(
    df['review'].values,
    df['label'].values,
    test_size=0.2,
    random_state=42
)

##Create Model

In bottom code, the project is utilizing the Hugging Face transformers library to work with a pre-trained BERT (Bidirectional Encoder Representations from Transformers) model for sentiment analysis. Let's break down each line:

1. **tokenizer = BertTokenizer.from_pretrained('bert-base-uncased'):** Initializes a BERT tokenizer with the vocabulary and settings of the 'bert-base-uncased' pre-trained model. The tokenizer is responsible for converting text into tokens that can be understood by the BERT model. Later in the code, tokenizer will be used to tokenize user reviews before feeding them into the BERT model.

2. **model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2):** Initializes a BERT model for sequence classification tasks. In this case, the sentiment analysis task involves classifying user reviews into two classes: positive and negative. The model is the core BERT model used for processing sequences (user reviews) and making predictions. It is configured for a binary classification task with num_labels=2, where the two labels represent positive and negative sentiments.

In summary, these lines of code set up the BERT tokenizer and model for sentiment analysis. The tokenizer will be used to preprocess the text data, and the model is configured for binary sequence classification, making it suitable for the sentiment analysis task with two classes. These pre-trained components are part of the transfer learning approach, where a model pre-trained on a large corpus is fine-tuned on a specific task (sentiment analysis in this case).

####Explain a for warning:
**warning:** Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.):

The message you're seeing is a warning, not an error. It's informing you that some of the weights in the BertForSequenceClassification model were not initialized from the pre-trained checkpoint (in this case, 'bert-base-uncased'). Specifically, the weights associated with the classifier layer (named 'classifier') are being newly initialized.

The reason for this is that BertForSequenceClassification is a pre-trained BERT model that has a classification head (classifier) added on top. When you load the model using from_pretrained('bert-base-uncased', num_labels=2), it initializes the pre-trained BERT weights and adds a new classifier for binary sequence classification with two labels.

The warning suggests that if you intend to use this model for predictions or inference, you should consider training it on a downstream task. Fine-tuning on a specific task can help adapt the model to the characteristics of your data and improve performance.

If you're using the model for a specific task like sentiment analysis with your own dataset, you would typically proceed with fine-tuning the model on your dataset. Fine-tuning involves training the model on your task-specific dataset for a few additional epochs to adjust its weights to your specific data distribution and task requirements.

If you don't plan to fine-tune the model and are just using it for inference on sentiment analysis or a similar task, you can still proceed with using the model. The warning is more of a suggestion for optimal performance rather than a strict requirement.

In [None]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


###Encoding data

In bottom code, the project is using the previously initialized BERT tokenizer (tokenizer) to encode the text data (user reviews) for both the training and testing sets. Let's break down each line:

1. **train_encodings = tokenizer(train_texts.tolist(), truncation=True, padding=True, max_length=128, return_tensors='pt'):** Tokenizes and encodes the text data in the training set using the BERT tokenizer.
 * Parameters:
    1. train_texts.tolist(): Converts the training set of text data (user reviews) to a list.
    2. truncation=True: Truncates the sequences to a specified maximum length (max_length) if they exceed it.
    3. padding=True: Adds padding to sequences that are shorter than the specified maximum length (max_length).
    4. max_length=128: Specifies the maximum length of the tokenized sequences.
    5. return_tensors='pt': Returns PyTorch tensors as the output.
  * Result: train_encodings is a dictionary containing the tokenized and encoded representations of the training set, suitable for input to a BERT model. It includes keys like 'input_ids', 'attention_mask', etc.

2. **test_encodings = tokenizer(test_texts.tolist(), truncation=True, padding=True, max_length=128, return_tensors='pt'):** Similar to the training set, tokenizes and encodes the text data in the testing set using the BERT tokenizer.
  * Parameters: Similar to the parameters used for the training set.
  * Result: test_encodings is a dictionary containing the tokenized and encoded representations of the testing set, with the same structure as train_encodings.

These lines of code prepare the text data in a format that can be fed into the BERT model. The tokenized sequences include information such as input IDs (token IDs representing each word), attention masks (indicating which parts of the sequences are padding), and other relevant details. This preprocessed data will be used as input during the training and evaluation of the sentiment analysis model.

In [None]:
train_encodings = tokenizer(train_texts.tolist(), truncation=True, padding=True, max_length=128, return_tensors='pt')
test_encodings = tokenizer(test_texts.tolist(), truncation=True, padding=True, max_length=128, return_tensors='pt')

###convert label
In the bottom code snippet, the project is converting the labels from the training and testing sets to PyTorch tensors. Let's break down each line:

1. **train_labels = torch.tensor(train_labels):** Converts the training labels from a NumPy array or any compatible data type to a PyTorch tensor.
Explanation: The original labels were numeric values obtained after using the LabelEncoder to convert categorical sentiment labels ('positive' and 'negative') into numerical labels (0 and 1). The conversion to PyTorch tensors is necessary because PyTorch models typically work with tensors.

2. **test_labels = torch.tensor(test_labels):** Similar to the training set, converts the testing labels from a NumPy array or compatible data type to a PyTorch tensor. The same reasoning applies here—converting the testing labels to PyTorch tensors ensures consistency in the data type used for the labels throughout the model training and evaluation process.

In summary, these lines of code ensure that the labels for both the training and testing sets are represented as PyTorch tensors. This is important because PyTorch tensors are the preferred data type for labels when working with PyTorch models, and it allows seamless integration of the labels with the PyTorch training pipeline.

In [None]:
train_labels = torch.tensor(train_labels)
test_labels = torch.tensor(test_labels)

###Create TensorDataset

In the bottom code snippet, the project is creating PyTorch TensorDataset objects for the training and testing sets. Let's break down each line:

1. train_dataset = TensorDataset(train_encodings['input_ids'], train_encodings['attention_mask'], train_labels): Combines the tokenized and encoded representations of the training set (train_encodings) with the corresponding labels (train_labels) into a PyTorch TensorDataset. The TensorDataset class is a PyTorch utility for creating a dataset by combining tensors along the first dimension. In this case, it combines the tokenized input IDs (train_encodings['input_ids']), attention masks (train_encodings['attention_mask']), and labels (train_labels) into a single dataset.

2. test_dataset = TensorDataset(test_encodings['input_ids'], test_encodings['attention_mask'], test_labels): Similar to the training set, combines the tokenized and encoded representations of the testing set (test_encodings) with the corresponding labels (test_labels) into a PyTorch TensorDataset. This line performs the same operation as the one for the training set but with the testing set.

The resulting TensorDataset objects (train_dataset and test_dataset) provide an organized and convenient way to access and iterate over batches of data during training and evaluation. Each batch will contain input IDs, attention masks, and labels, making it easy to feed the data into a PyTorch model during the training process.

In [None]:
train_dataset = TensorDataset(train_encodings['input_ids'], train_encodings['attention_mask'], train_labels)
test_dataset = TensorDataset(test_encodings['input_ids'], test_encodings['attention_mask'], test_labels)

### setting Dataloader

In the bottom code snippet, the project is setting up PyTorch DataLoader objects for the training and testing datasets. Let's break down each line:

1. **train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True):** Creates a PyTorch DataLoader for the training dataset (train_dataset).
  * Parameters:
    1. train_dataset: The PyTorch TensorDataset containing the tokenized input IDs, attention masks, and labels for the training set.
    2. batch_size=16: Specifies the number of samples in each batch during training. In this case, each batch will contain 16 samples.
    3. shuffle=True: Shuffles the training data at the beginning of each epoch. Shuffling is beneficial during training to ensure that the model doesn't learn patterns specific to the order of the data.

2. **test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False):** Similar to the training set, creates a PyTorch DataLoader for the testing dataset (test_dataset).
  * Parameters:
    1. test_dataset: The PyTorch TensorDataset containing the tokenized input IDs, attention masks, and labels for the testing set.
    2. batch_size=16: Similar to the training set, each batch during testing will contain 16 samples.
    3. shuffle=False: No need to shuffle the testing data. Shuffling is typically done during training but not during testing or evaluation.

The DataLoader is a PyTorch utility that provides an iterable over the dataset in batches. It is used during the training loop to feed batches of data to the model. The specified batch size controls how many samples are processed together in each iteration, and shuffling helps prevent the model from memorizing the order of the data.

In summary, these lines of code set up PyTorch DataLoader objects for both the training and testing datasets, making it easier to iterate over batches of data during the training and evaluation phases of the machine learning model.

In [None]:
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False)

###configuration parameters for train set

in The bottom code snippet is setting up the optimizer for training the model and specifying the number of epochs for the training process. Let's break down each line:

1. **optimizer = AdamW(model.parameters(), lr=5e-5):** Initializes the AdamW optimizer for updating the parameters of the model during training.
  * Parameters:
    1. model.parameters(): Specifies the parameters (weights and biases) of the model that the optimizer will update during training.
    2. lr=5e-5: Sets the learning rate for the optimizer. The learning rate controls the size of the step the optimizer takes during each parameter update.
  * Explanation: AdamW (Adam with weight decay) is a variant of the Adam optimizer that includes a weight decay term to help with regularization. It is commonly used for fine-tuning pre-trained models. The learning rate of 5e-5 is a commonly used value for fine-tuning BERT models.

2. **epochs = 3:** Specifies the number of training epochs, where one epoch corresponds to one complete pass through the entire training dataset. During each epoch, the model goes through all batches of the training data, updates its parameters using backpropagation and the optimizer, and evaluates performance on the validation set if applicable. Setting the number of epochs helps control the duration and extent of the training process.

In summary, these lines of code configure the training parameters for the machine learning model. The optimizer (AdamW) is set up to update the model parameters based on the training data, and the number of epochs determines how many times the entire training dataset will be used to train the model. Adjusting the learning rate and the number of epochs can impact the training process and the final performance of the model.

In [None]:
optimizer = AdamW(model.parameters(), lr=5e-5)
epochs = 3



###configuration model for train

the bottom code snippet, is responsible for determining the device on which the model will be trained and then moving the model to that device. Additionally, it sets the model to training mode. Let's break down each line:

1. **device = torch.device('cuda' if torch.cuda.is_available() else 'cpu'):** Determines the device (GPU or CPU) available for training. If a GPU is available (torch.cuda.is_available() is True), the model will be trained on the GPU; otherwise, it will use the CPU. PyTorch allows you to utilize GPUs for faster training if they are available. The torch.device function is used to create a device object, and the model will be moved to this device in the next line.

2. **model.to(device):** Moves the entire model (including its parameters) to the specified device (GPU or CPU). This step is necessary to ensure that the model's computations are performed on the chosen device. PyTorch provides this flexibility to seamlessly move models between different devices.

3. **model.train():**
Role: Sets the model to training mode. This is important because certain layers, such as dropout layers, behave differently during training and evaluation. During the training phase, the model needs to update its parameters based on the gradients computed during backpropagation. Setting the model to training mode activates these training-specific behaviors.

In summary, this code segment ensures that the model is moved to the available GPU (if present) or CPU for training and is set to training mode. This device configuration is crucial for efficient training, and setting the model to training mode ensures the correct behavior of layers during the training phase.

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
model.train()

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

##Model training

The bottom code snippet represents the training loop for the machine learning model. It iterates through the specified number of epochs and processes batches of data using the training DataLoader. Let's break down each part:

1. **Outer Loop (for epoch in range(epochs):):** Iterates over the specified number of epochs (epochs), representing the number of times the entire training dataset is processed. The training loop repeats the inner loop for each epoch, allowing the model to learn from the entire training dataset multiple times.

2. **Inner Loop (for batch in tqdm(train_loader, desc=f'Epoch {epoch + 1}/{epochs}'):):):** Iterates over batches of data from the training DataLoader (train_loader). Each batch contains a set of input IDs, attention masks, and labels for the model to process. The tqdm function is used to create a progress bar for better visualization of training progress.
  1. Data Preparation (input_ids, attention_mask, labels = batch): Unpacks the elements from the batch. The batch is a tuple containing input IDs, attention masks, and labels. This line unpacks these elements into separate variables for easier handling.
  2. Move Data to Device (input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device)): Moves the batch data to the specified device (GPU or CPU). Ensures that the data is on the same device as the model for computation.
  3. Optimizer Step (optimizer.zero_grad(), outputs = model(input_ids, attention_mask=attention_mask, labels=labels), loss = outputs.loss, loss.backward(), optimizer.step()): Performs a single optimization step on the model using the current batch. Explanation:
    * optimizer.zero_grad(): Clears the gradients of all model parameters before computing gradients for the current batch.
    * outputs = model(...): Feeds the input data through the model and obtains the model's predictions and associated information.
    * loss = outputs.loss: Extracts the loss value from the model's output.
    * loss.backward(): Computes the gradients of the loss with respect to the model parameters using backpropagation.
    * optimizer.step(): Updates the model parameters using the computed gradients and the optimizer's update rule.

This training loop is a fundamental part of the machine learning process, where the model learns to make better predictions by adjusting its parameters based on the training data. The outer loop controls the number of passes over the entire training dataset (epochs), and the inner loop processes the data in batches, updating the model parameters after each batch. The use of tqdm provides a progress bar for better monitoring during training.

In [None]:
for epoch in range(epochs):
    for batch in tqdm(train_loader, desc=f'Epoch {epoch + 1}/{epochs}'):
        input_ids, attention_mask, labels = batch
        input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        loss.backward()
        optimizer.step()

Epoch 1/3: 100%|██████████| 2500/2500 [15:16<00:00,  2.73it/s]
Epoch 2/3: 100%|██████████| 2500/2500 [15:22<00:00,  2.71it/s]
Epoch 3/3: 100%|██████████| 2500/2500 [15:22<00:00,  2.71it/s]


##Model Evaluation

The bottom code snippet is part of the evaluation process for the machine learning model. It sets the model to evaluation mode, makes predictions on the test dataset using the provided DataLoader (test_loader), and collects the predicted labels. Let's break down each part:

1. **model.eval():** Sets the model to evaluation mode. During evaluation, the model's behavior might differ from training. For example, layers like dropout layers may operate differently. Setting the model to evaluation mode ensures consistent evaluation behavior.

2. **predictions = []:** Initializes an empty list to store the model's predictions. The model will generate predictions for each batch of the test dataset, and these predictions will be stored in the predictions list.
3. **with torch.no_grad()::** Temporarily disables gradient computation during the evaluation to save memory. Since evaluation doesn't involve parameter updates, it is more memory-efficient to disable gradient computation using torch.no_grad().

3. **Loop Over Test DataLoader (for batch in tqdm(test_loader, desc='Evaluating'):):** Iterates over batches of the test dataset. Similar to the training loop, this loop processes batches of test data using the test_loader.

4. **Data Preparation (input_ids, attention_mask, labels = batch):** Unpacks the elements from the batch. The batch is a tuple containing input IDs, attention masks, and labels. This line unpacks these elements into separate variables for easier handling.

5. **Move Data to Device (input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device)):** Moves the batch data to the specified device (GPU or CPU). Ensures that the data is on the same device as the model for computation.

6. **Model Inference (outputs = model(input_ids, attention_mask=attention_mask), logits = outputs.logits):** Obtains model predictions on the test batch. The input data is fed through the model (model(input_ids, attention_mask=attention_mask)), and the raw logits (output before applying activation function) are obtained from outputs.logits.

7. **Generate Predictions (predictions.extend(torch.argmax(logits, dim=1).cpu().numpy())):** Extends the predictions list with the model's predicted labels for the current batch. The torch.argmax(logits, dim=1) extracts the class indices with the highest logits (predicted class for each sample). .cpu().numpy() is used to move the results to the CPU and convert them to a NumPy array before extending the predictions list.

In summary, this code sets up the model for evaluation, iterates over batches in the test dataset, makes predictions, and collects these predictions in a list (predictions). This list can then be used to assess the model's performance on the test set, compare predictions with true labels, and generate evaluation metrics.

In [None]:
model.eval()
predictions = []

with torch.no_grad():
    for batch in tqdm(test_loader, desc='Evaluating'):
        input_ids, attention_mask, labels = batch
        input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device)

        outputs = model(input_ids, attention_mask=attention_mask)
        logits = outputs.logits
        predictions.extend(torch.argmax(logits, dim=1).cpu().numpy())

Evaluating: 100%|██████████| 625/625 [01:20<00:00,  7.72it/s]


### Calculate accuracy and the classification report (precision, recall, F1-Score, support)

The bottom code snippet evaluates the performance of the machine learning model on the test set and prints accuracy along with a detailed classification report. Let's break down each part:

1. **accuracy = accuracy_score(test_labels.numpy(), predictions):** Computes the accuracy of the model's predictions on the test set. The accuracy_score function from scikit-learn is used to compare the true labels (test_labels.numpy()) with the predicted labels (predictions) and calculate the accuracy. The accuracy is the ratio of correctly predicted samples to the total number of samples.

2. **print(f'Accuracy: {accuracy:.2f}'):** Prints the calculated accuracy. The accuracy is printed with two decimal places for better readability.

3. **print(classification_report(test_labels.numpy(), predictions)):** Prints a detailed classification report. The classification_report function from scikit-learn generates a text report containing precision, recall, F1-score, and support for each class (positive and negative). It provides a more detailed understanding of the model's performance beyond just accuracy.

In summary, these lines of code assess and report the model's performance on the test set. The accuracy gives a general measure of correct predictions, while the classification report provides more detailed metrics for each class, including precision, recall, and F1-score. This information is valuable for understanding how well the model is performing on specific aspects of the sentiment analysis task.

In [None]:
accuracy = accuracy_score(test_labels.numpy(), predictions)
print(f'Accuracy: {accuracy:.2f}')
print(classification_report(test_labels.numpy(), predictions))

Accuracy: 0.89
              precision    recall  f1-score   support

           0       0.87      0.91      0.89      4961
           1       0.91      0.87      0.89      5039

    accuracy                           0.89     10000
   macro avg       0.89      0.89      0.89     10000
weighted avg       0.89      0.89      0.89     10000



## Samples for Check the model

In [24]:
from transformers import BertTokenizer, BertForSequenceClassification
import torch

#input_text = "This movie was amazing! The plot was captivating, and the acting was superb. I highly recommend it."

#input_text = "The movie was very stupid. It didn't help at all and had no content. The picture and sound quality was very low."

input_text = "The movie was very nice, but the ending was not good."

# Tokenize and prepare the input for the model
input_ids = tokenizer.encode(input_text, return_tensors='pt')
attention_mask = tokenizer(input_text, return_tensors='pt', truncation=True, padding=True)['attention_mask']

# Ensure both input and model are on the same device
input_ids = input_ids.to(model.device)
attention_mask = attention_mask.to(model.device)

# Make the prediction
with torch.no_grad():
    logits = model(input_ids, attention_mask=attention_mask).logits

# Get the predicted class (0 for negative, 1 for positive)
predicted_class = torch.argmax(logits).item()

# Map the predicted class back to the original sentiment labels
predicted_sentiment = label_encoder.inverse_transform([predicted_class])[0]

# Display the result
print(f"Predicted Sentiment: {predicted_sentiment}")

Predicted Sentiment: negative
