# Text Classification Task
In this task, you would require to claasify the BBC News text into 5 classes ['business' 'entertainment' 'politics' 'sport''tech'] For this task, the code skeleton has been given and you have to write your code in the #TODO part.

## Importing relevant libraries 
If any of the below list libraries is not installed already, then use "pip install #library_name" to install it

In [1]:
!pip install torch==1.6.0



In [2]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelBinarizer, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,f1_score
from sklearn.feature_extraction.text import CountVectorizer
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

## Importing BBC News Dataset
Source data from public data set on BBC news articles:
D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006. [PDF] [BibTeX].

http://mlg.ucd.ie/datasets/bbc.html

Cleaned up version of the Dataset is given as csv file with the assignment

In [3]:
data_train = pd.read_csv("bbc-text_train.csv")
data_test= pd.read_csv("bbc-text_test.csv")

In [4]:
data_train.head()

Unnamed: 0,category,text
0,entertainment,farrell due to make us tv debut actor colin fa...
1,business,china continues rapid growth china s economy h...
2,business,ebbers aware of worldcom fraud former worldc...
3,entertainment,school tribute for tv host carson more than 1 ...
4,tech,broadband fuels online expression fast web acc...


In [5]:
data_train['category'].value_counts()

sport            413
business         409
politics         334
tech             319
entertainment    305
Name: category, dtype: int64

## Splitting training data into Train and validation set
Note: Validation set is surrogate to test set and while training the network , we evaluate the model on validation set

In [6]:
train_x_df,val_x_df,train_y_df,val_y_df = train_test_split(data_train['text'],data_train['category'],test_size=0.2,random_state=42)

## Encoding prediction classes/labels into integers


In [7]:
le =LabelEncoder()
le.fit(train_y_df)
print(le.classes_)
train_y=le.transform(train_y_df)
val_y=le.transform(val_y_df)
test_y=le.transform(data_test['category'])

['business' 'entertainment' 'politics' 'sport' 'tech']


## Converting News text into numerical vector using count vectorizer

In [10]:
vectorizer = CountVectorizer()
vectorizer.fit_transform(train_x_df)
train_x=vectorizer.transform(train_x_df)
val_x=vectorizer.transform(val_x_df)
test_x=vectorizer.transform(data_test['text'])

In [11]:
train_x.toarray()

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 4, 0, ..., 0, 0, 0],
       [0, 1, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [13]:
class ClassificationNet(nn.Module):

    def __init__(self):
        super(ClassificationNet, self).__init__()
        '''
        Defining layers of neural network
        '''
        self.fc1 = nn.Linear(24295, 64) 
        self.fc2 = nn.Linear(64, 5)


    def forward(self, x):
        """The forward pass of the classifier
        
        Args:
            x_in (torch.Tensor): an input data tensor. 
                x_in.shape should be (data_points, num_features)
        Returns:
            the resulting tensor.
        """
        x = torch.sigmoid(self.fc1(x))
        x = self.fc2(x)
        
        return x


net =ClassificationNet()

#define learning rate
learning_rt=0.5

#Construct an optimizer object
optimizer = optim.SGD(net.parameters(), lr=learning_rt)

#Construct an loss/criterion object
criterion=nn.CrossEntropyLoss()

#define number of epochs/ number of training iteration
epochs=300

#converting train and validation set arrays to tensor
train_x_tensor=torch.tensor(train_x.toarray()).float()
train_y_tensor=torch.tensor(train_y)
val_x_tensor=torch.tensor(val_x.toarray()).float()
val_y_tensor=torch.tensor(val_y)


def evaluation_metrics(predict_y,ground_truth_y):
    '''
    Returns accuracy and f1 score metrics for evaluation
    '''
    accuracy=accuracy_score(ground_truth_y,predict_y)
    f1score=f1_score(ground_truth_y,predict_y,average='macro')
    
    return (accuracy,f1score)

## Training Loop

In [14]:
for i in range(epochs):
    # the training routine is these 5 steps:
    
    # step 1. zero the gradients
    optimizer.zero_grad()
    
    # step 2. compute the output
    output = net(train_x_tensor)
    
    # step 3. compute the loss
    loss = criterion(output, train_y_tensor)
    
    # step 4. use loss to produce gradients
    loss.backward()
    
    # step 5. use optimizer to take gradient step
    optimizer.step() 
    
    with torch.no_grad():
        # validation set evaluation:
        
        # compute the output
        output_val=net(val_x_tensor)
        
        # compute the loss
        loss_val = criterion(output_val, val_y_tensor)
        
        # compute the prediction
        predict_y= output_val.data.max(1, keepdim=True)[1]
        
        # Use the "evaluation_metrics" function to find accuracy and f1 score
        accuracy,f1score=evaluation_metrics(predict_y,val_y_tensor)
        
        print('Epoch %d/%d - Loss_train: %.3f   loss_val: %.3f   accuracy_val: %.3f f1score_val: %.3f   '% \
              (i + 1, epochs,loss.item(),loss_val.item(),accuracy,f1score))

Epoch 1/300 - Loss_train: 1.628   loss_val: 1.781   accuracy_val: 0.163 f1score_val: 0.056   
Epoch 2/300 - Loss_train: 1.748   loss_val: 1.750   accuracy_val: 0.312 f1score_val: 0.179   
Epoch 3/300 - Loss_train: 1.789   loss_val: 1.563   accuracy_val: 0.194 f1score_val: 0.077   
Epoch 4/300 - Loss_train: 1.556   loss_val: 1.500   accuracy_val: 0.596 f1score_val: 0.505   
Epoch 5/300 - Loss_train: 1.491   loss_val: 1.448   accuracy_val: 0.520 f1score_val: 0.387   
Epoch 6/300 - Loss_train: 1.435   loss_val: 1.398   accuracy_val: 0.525 f1score_val: 0.404   
Epoch 7/300 - Loss_train: 1.383   loss_val: 1.372   accuracy_val: 0.458 f1score_val: 0.359   
Epoch 8/300 - Loss_train: 1.352   loss_val: 1.571   accuracy_val: 0.233 f1score_val: 0.076   
Epoch 9/300 - Loss_train: 1.562   loss_val: 1.810   accuracy_val: 0.230 f1score_val: 0.075   
Epoch 10/300 - Loss_train: 1.784   loss_val: 1.533   accuracy_val: 0.295 f1score_val: 0.225   
Epoch 11/300 - Loss_train: 1.521   loss_val: 1.439   accura

Epoch 88/300 - Loss_train: 0.235   loss_val: 0.351   accuracy_val: 0.868 f1score_val: 0.861   
Epoch 89/300 - Loss_train: 0.263   loss_val: 0.411   accuracy_val: 0.846 f1score_val: 0.843   
Epoch 90/300 - Loss_train: 0.298   loss_val: 0.499   accuracy_val: 0.809 f1score_val: 0.794   
Epoch 91/300 - Loss_train: 0.392   loss_val: 0.567   accuracy_val: 0.787 f1score_val: 0.783   
Epoch 92/300 - Loss_train: 0.441   loss_val: 0.551   accuracy_val: 0.795 f1score_val: 0.767   
Epoch 93/300 - Loss_train: 0.464   loss_val: 0.469   accuracy_val: 0.851 f1score_val: 0.831   
Epoch 94/300 - Loss_train: 0.395   loss_val: 0.345   accuracy_val: 0.888 f1score_val: 0.884   
Epoch 95/300 - Loss_train: 0.253   loss_val: 0.331   accuracy_val: 0.893 f1score_val: 0.885   
Epoch 96/300 - Loss_train: 0.260   loss_val: 0.408   accuracy_val: 0.854 f1score_val: 0.849   
Epoch 97/300 - Loss_train: 0.284   loss_val: 0.421   accuracy_val: 0.862 f1score_val: 0.848   
Epoch 98/300 - Loss_train: 0.367   loss_val: 0.391

Epoch 177/300 - Loss_train: 0.033   loss_val: 0.131   accuracy_val: 0.955 f1score_val: 0.953   
Epoch 178/300 - Loss_train: 0.032   loss_val: 0.131   accuracy_val: 0.955 f1score_val: 0.953   
Epoch 179/300 - Loss_train: 0.032   loss_val: 0.130   accuracy_val: 0.955 f1score_val: 0.953   
Epoch 180/300 - Loss_train: 0.031   loss_val: 0.130   accuracy_val: 0.955 f1score_val: 0.953   
Epoch 181/300 - Loss_train: 0.031   loss_val: 0.129   accuracy_val: 0.955 f1score_val: 0.953   
Epoch 182/300 - Loss_train: 0.031   loss_val: 0.129   accuracy_val: 0.955 f1score_val: 0.953   
Epoch 183/300 - Loss_train: 0.030   loss_val: 0.129   accuracy_val: 0.955 f1score_val: 0.953   
Epoch 184/300 - Loss_train: 0.030   loss_val: 0.128   accuracy_val: 0.955 f1score_val: 0.953   
Epoch 185/300 - Loss_train: 0.030   loss_val: 0.128   accuracy_val: 0.955 f1score_val: 0.953   
Epoch 186/300 - Loss_train: 0.029   loss_val: 0.128   accuracy_val: 0.955 f1score_val: 0.953   
Epoch 187/300 - Loss_train: 0.029   loss

Epoch 267/300 - Loss_train: 0.014   loss_val: 0.114   accuracy_val: 0.958 f1score_val: 0.956   
Epoch 268/300 - Loss_train: 0.014   loss_val: 0.114   accuracy_val: 0.958 f1score_val: 0.956   
Epoch 269/300 - Loss_train: 0.014   loss_val: 0.114   accuracy_val: 0.958 f1score_val: 0.956   
Epoch 270/300 - Loss_train: 0.014   loss_val: 0.113   accuracy_val: 0.958 f1score_val: 0.956   
Epoch 271/300 - Loss_train: 0.014   loss_val: 0.113   accuracy_val: 0.958 f1score_val: 0.956   
Epoch 272/300 - Loss_train: 0.014   loss_val: 0.113   accuracy_val: 0.958 f1score_val: 0.956   
Epoch 273/300 - Loss_train: 0.014   loss_val: 0.113   accuracy_val: 0.958 f1score_val: 0.956   
Epoch 274/300 - Loss_train: 0.014   loss_val: 0.113   accuracy_val: 0.958 f1score_val: 0.956   
Epoch 275/300 - Loss_train: 0.014   loss_val: 0.113   accuracy_val: 0.958 f1score_val: 0.956   
Epoch 276/300 - Loss_train: 0.014   loss_val: 0.113   accuracy_val: 0.958 f1score_val: 0.956   
Epoch 277/300 - Loss_train: 0.014   loss

## Test set Prediction and Evaluation

In [15]:
test_x_tensor=torch.tensor(test_x.toarray()).float()
test_y_tensor=torch.tensor(test_y)

with torch.no_grad():
    # Test set evaluation:
    
    # compute the output
    output_test=net(test_x_tensor)
    
    # compute the prediction
    predict_test_y= output_test.data.max(1, keepdim=True)[1]
    
    # Use the "evaluation_metrics" function to find accuracy and f1 score
    accuracy,f1score=evaluation_metrics(predict_test_y,test_y_tensor)
    print('Accuracy_test: %.3f f1score_val: %.3f   '% (accuracy,f1score))

Accuracy_test: 0.951 f1score_val: 0.951   
