# Natural Language Processing Analysis & Binary Classification Using CatBoost

This notebook aims to provide an introduction to documenting an NLP model using the ValidMind Developer Framework. The use case presented is a sentiment analysis of tweets related to COVID-19 into "positive" and "negative"; the model is a binary text classification using the PyTorch library.

We will train a sample model and demonstrate the following documentation functionalities:

- Initializing the ValidMind Developer Framework
- Using a sample datasets provided by the library to train a simple nlp classification model using PyTorch library
- Running a test various tests to quickly generate document about the data and model

## Before you begin

::: {.callout-tip}
### New to ValidMind? 
For access to all features available in this notebook, create a free ValidMind account. 

Signing up is FREE — [**Sign up now!**](https://app.prod.validmind.ai)
:::

If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html).

## Install the client library

In [None]:
%pip install -q validmind

## Initialize the client library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

1. In a browser, log into the [Platform UI](https://app.prod.validmind.ai).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register new model**.

3. Enter the model details, making sure to select **`NLP-based Text Classification`**, and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/register-models-in-model-inventory.html))

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:

In [None]:
# Replace with your code snippet
import validmind as vm

vm.init(
    api_host="https://api.prod.validmind.ai/api/v1/tracking",
    api_key="...",
    api_secret="...",
    project="..."
)

## 1. Exploratory Data Analysis of Covid tweets data
The emphasis in this section is on the in-depth analysis and preprocessing of the text data (tweets). In this section, we introduce the manually tagged COVID-19 tweets, which range from Highly Negative to Highly Positive, representing five distinct classes. In this Exploratory Data Analysis (EDA), these five classes will be simplified to two classes: Positive and Negative.



### Load library

In [None]:
%set_env PYTORCH_MPS_HIGH_WATERMARK_RATIO 0.8

import pandas as pd
import numpy as np
import os
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import torch
if torch.backends.mps.is_available():
    device = torch.device("mps")
elif torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

device = "cpu"

train_model = False

###  Load covid-19 tweets data

In [None]:
from validmind.datasets.nlp import twitter_covid_19 as demo_data
df = demo_data.load_data()
df.head(10)

### Run text data quality test plan
In this section we use the ValidMind Developer Framework to run various data quality checks on the dataset, and send the results to the model document on the ValidMind Platform UI.

In [None]:
vm_ds = vm.init_dataset(dataset=df, type="generic",
                        text_column='OriginalTweet', target_column="Sentiment")

In [None]:
config = {
    "class_imbalance": {"min_percent_threshold": 3}
}
text_data_test_suite = vm.run_test_suite("text_data_quality",
                                         dataset=vm_ds,
                                         config=config)

## 2. Preprocess data

### Handle class bias 
One way to handle class bias is to merge a specific class data with related class. 
Here, we will copy the text and class lables in separate columns so that the original text is also there for comparison.

In [None]:
print("Original Classes:", df.Sentiment.unique())

df['text'] = df.OriginalTweet
df["text"] = df["text"].astype(str)


def classes_def(x):
    if x == "Extremely Positive":
        return "positive"
    elif x == "Extremely Negative":
        return "negative"
    elif x == "Negative":
        return "negative"
    elif x == "Positive":
        return "positive"
    else:
        return "neutral"


df['sentiment'] = df['Sentiment'].apply(lambda x: classes_def(x))
target = df['sentiment']

print(df.sentiment.value_counts(normalize=True))
print("Modified Classes:", df.sentiment.unique())

### Remove neutral class

In [None]:
df = df[df["sentiment"] != "neutral"]
print(df.sentiment.unique())
print(df.sentiment.value_counts(normalize=True))
print(df.shape)

In [None]:
df

### Remove urls and html links

In [None]:
# Remove Urls and HTML links
import re


def remove_urls(text):
    url_remove = re.compile(r'https?://\S+|www\.\S+')
    return url_remove.sub(r'', text)


df['text'] = df['text'].apply(lambda x: remove_urls(x))


def remove_html(text):
    html = re.compile(r'<.*?>')
    return html.sub(r'', text)


df['text'] = df['text'].apply(lambda x: remove_html(x))

### Convert text to lower case 


In [None]:
# Lower casing
def lower(text):
    low_text = text.lower()
    return low_text


df['text'] = df['text'].apply(lambda x: lower(x))

### Remove numbers 

In [None]:
# Number removal
def remove_num(text):
    remove = re.sub(r'\d+', '', text)
    return remove


df['text'] = df['text'].apply(lambda x: remove_num(x))

### Remove stopwords 

In [None]:
# Remove stopwords
from nltk.corpus import stopwords
", ".join(stopwords.words('english'))
STOPWORDS = set(stopwords.words('english'))


def remove_stopwords(text):
    """custom function to remove the stopwords"""
    return " ".join([word for word in str(text).split() if word not in STOPWORDS])


df['text'] = df['text'].apply(lambda x: remove_stopwords(x))

### Remove Punctuations 

In [None]:
# Remove Punctuations

def punct_remove(text):
    punct = re.sub(r"[^\w\s\d]", "", text)
    return punct


df['text'] = df['text'].apply(lambda x: punct_remove(x))

### Remove mentions 

In [None]:
# Remove mentions
def remove_mention(x):
    text = re.sub(r'@\w+', '', x)
    return text


df['text'] = df['text'].apply(lambda x: remove_mention(x))

### Remove hashtags 

In [None]:
# Remove hashtags

def remove_hash(x):
    text = re.sub(r'#\w+', '', x)
    return text


df['text'] = df['text'].apply(lambda x: remove_hash(x))

### Remove extra white space left while removing stuff

In [None]:
# Remove extra white space left while removing stuff
def remove_space(text):
    space_remove = re.sub(r"\s+", " ", text).strip()
    return space_remove


df['text'] = df['text'].apply(lambda x: remove_space(x))

In [None]:
df

### Run text data quality tests again
Here, we are checking the quality of the data again by running data quality tests so verify that we have preprocess data well and tests are passing according to our requirements.

In [None]:
vm_ds = vm.init_dataset(dataset=df, type="generic",
                        text_column='text', target_column="sentiment")

config = {
    "class_imbalance": {"min_percent_threshold": 3}
}
text_data_test_suite = vm.run_test_suite("text_data_quality",
                                         dataset=vm_ds,
                                         config=config)

## 3. Feature Engineering 

### Encoding the words

The embedding lookup requires that we pass in integers to our network. The easiest way to do this is to create dictionaries that map the words in the vocabulary to integers. Then we can convert each of our tweets into integers so they can be passed into the network.

Now you're going to encode the words with integers. Build a dictionary that maps words to integers. Later we're going to pad our input vectors with zeros, so make sure the integers **start at 1, not 0**.
Also, convert the tweets to integers and store the tweets in a new list called `tweets_ints`. 

#### Text to words

In [None]:
all_text = ' '.join(df.text)
# create a list of words
words = all_text.split()

#### Build dictionary and map words to integers

In [None]:
# feel free to use this import
from collections import Counter

# Build a dictionary that maps words to integers
counts = Counter(words)
vocab = sorted(counts, key=counts.get, reverse=True)
vocab_to_int = {word: ii for ii, word in enumerate(vocab, 1)}
vocab[1:10]

#### Tokenize each tweet 

In [None]:
# use the dict to tokenize each tweet in tweets_split
# store the tokenized tweets in tweets_ints
tweets_ints = []
for tweet in df.text:
    tweets_ints.append([vocab_to_int[word] for word in tweet.split()])

In [None]:
# stats about vocabulary
print('Unique words: ', len((vocab_to_int)))  # should ~ 74000+
print()

# print tokens in first tweet
print('Tokenized tweet: \n', tweets_ints[:1])
print(len(tweets_ints))

### Encoding the labels

Our labels are "positive" or "negative". To use these labels in our network, we need to convert them to 0 and 1.

Convert labels from `positive` and `negative` to 1 and 0, respectively, and place those in a new list, `encoded_labels`.

In [None]:
# 1=positive, 0=negative label conversion
import numpy as np

labels_split = df.sentiment.values
encoded_labels = np.array([1 if label == 'positive' else 0 for label in labels_split])
print(len(encoded_labels))

#### Padding sequences

To deal with both short and very long tweets, we'll pad or truncate all our tweets to a specific length. For tweets shorter than some `seq_length`, we'll pad with 0s. For tweets longer than `seq_length`, we can truncate them to the first `seq_length` words. A good `seq_length`, in this case, is 200.

Define a function that returns an array `features` that contains the padded data, of a standard size, that we'll pass to the network. 
* The data should come from `tweet_ints`, since we want to feed integers to the network. 
* Each row should be `seq_length` elements long. 
* For tweets longer than `seq_length`, use only the first `seq_length` words as the feature vector.

**Your final `features` array should be a 2D array, with as many rows as there are tweets, and as many columns as the specified `seq_length`.**

This isn't trivial and there are a bunch of ways to do this. But, if you're going to be building your own deep learning networks, you're going to have to get used to preparing your data.

In [None]:
def pad_features(tweets_ints, seq_length):
    ''' Return features of tweet_ints, where each tweet is padded with 0's
        or truncated to the input seq_length.
    '''
    # getting the correct rows x cols shape
    features = np.zeros((len(tweets_ints), seq_length), dtype=int)

    # for each tweet, I grab that tweet
    for i, row in enumerate(tweets_ints):
        features[i, -len(row):] = np.array(row)[:seq_length]

    return features

In [None]:
# Test your implementation!

seq_length = 100

features = pad_features(tweets_ints, seq_length=seq_length)

## test statements - do not change - ##
assert len(features) == len(
    tweets_ints), "Your features should have as many rows as tweets."
assert len(
    features[0]) == seq_length, "Each feature row should contain seq_length values."

# print first 10 values of the first 30 batches
print(features[:10, -25:])
features = features[0:len(features) - 23]
encoded_labels = encoded_labels[0:len(encoded_labels) - 23]
print(len(features), len(encoded_labels))

## 4. Modeling 

### Training, validation, test

With our data in nice shape, we'll split it into training, validation, and test sets.

Create the training, validation, and test sets. 
* You'll need to create sets for the features and the labels, `train_x` and `train_y`, for example. 
* Define a split fraction, `split_frac` as the fraction of data to **keep** in the training set. Usually this is set to 0.8 or 0.9. 
* Whatever data is left will be split in half to create the validation and *testing* data.

In [None]:
split_frac = 0.8

# split data into training, validation, and test data (features and labels, x and y)
split_idx = 25000
train_x, remaining_x = features[:split_idx], features[split_idx:]
train_y, remaining_y = encoded_labels[:split_idx], encoded_labels[split_idx:]

test_idx = int(len(remaining_x) * 0.53449)
val_x, test_x = remaining_x[:test_idx], remaining_x[test_idx:]
val_y, test_y = remaining_y[:test_idx], remaining_y[test_idx:]

# print out the shapes of your resultant feature data
print("\t\t\tFeatures Shapes:")
print("Train set: \t\t{}".format(train_x.shape),
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))


### Dataloaders and batching

After creating training, test, and validation data, we can create DataLoaders for this data by following two steps:
1. Create a known format for accessing our data, using [TensorDataset](https://pytorch.org/docs/stable/data.html#) which takes in an input set of data and a target set of data with the same first dimension, and creates a dataset.
2. Create DataLoaders and batch our training, validation, and test Tensor datasets.

```
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
train_loader = DataLoader(train_data, batch_size=batch_size)
```

This is an alternative to creating a generator function for batching our data into full batches.

In [None]:
import torch
from torch.utils.data import TensorDataset, DataLoader

# create Tensor datasets
train_data = TensorDataset(torch.from_numpy(train_x).to(
    device), torch.from_numpy(train_y).to(device))
valid_data = TensorDataset(torch.from_numpy(val_x).to(device),
                           torch.from_numpy(val_y).to(device))
test_data = TensorDataset(torch.from_numpy(test_x).to(device),
                          torch.from_numpy(test_y).to(device))

# dataloaders
batch_size = 50

# make sure to SHUFFLE your data
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
valid_loader = DataLoader(valid_data, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)

In [None]:
# obtain one batch of training data
dataiter = iter(train_loader)
sample_x, sample_y = next(dataiter)

print('Sample input size: ', sample_x.size())  # batch_size, seq_length
print('Sample input: \n', sample_x)
print()
print('Sample label size: ', sample_y.size())  # batch_size
print('Sample label: \n', sample_y)

### Sentiment network with PyTorch

Below is where you'll define the network.
### Network architecture

The architecture for this network is shown below.

```mermaid
    Input (Word Tokens)" --> "Embedding Layer" --> "LSTM Layer" --> "Fully-Connected Layer" --> "Sigmoid Activation" --> "Output (Last Sigmoid)";
```

First, we'll pass in words to an embedding layer. We need an embedding layer because we have tens of thousands of words, so we'll need a more efficient representation for our input data than one-hot encoded vectors. You should have seen this before from the Word2Vec lesson. You can actually train an embedding with the Skip-gram Word2Vec model and use those embeddings as input, here. However, it's good enough to just have an embedding layer and let the network learn a different embedding table on its own. 

After input words are passed to an embedding layer, the new embeddings will be passed to LSTM cells. The LSTM cells will add *recurrent* connections to the network and give us the ability to include information about the *sequence* of words in the covid twitter data. 

Finally, the LSTM outputs will go to a sigmoid output layer. We're using a sigmoid function because positive and negative = 1 and 0, respectively, and a sigmoid will output predicted, sentiment values between 0-1. 

We don't care about the sigmoid outputs except for the **very last one**; we can ignore the rest. We'll calculate the loss by comparing the output at the last time step and the training label (pos or neg).


The layers are as follows:
1. An [embedding layer](https://pytorch.org/docs/stable/nn.html#embedding) that converts our word tokens (integers) into embeddings of a specific size.
2. An [lstm layer](https://pytorch.org/docs/stable/nn.html#lstm) defined by a hidden_state size and number of layers
3. A fully-connected output layer that maps the LSTM layer outputs to a desired output_size
4. A sigmoid activation layer which turns all outputs into a value 0-1; return **only the last sigmoid output** as the output of this network.

### The embedding layer

We need to add an [embedding layer](https://pytorch.org/docs/stable/nn.html#embedding) because there are 53000+ words in our vocabulary. It is massively inefficient to one-hot encode that many classes. So, instead of one-hot encoding, we can have an embedding layer and use that layer as a lookup table. You could train an embedding layer using Word2Vec, then load it here. But, it's fine to just make a new layer, using it for only dimensionality reduction, and let the network learn the weights.


### The LSTM layer(s)

We'll create an [LSTM](https://pytorch.org/docs/stable/nn.html#lstm) to use in our recurrent network, which takes in an input_size, a hidden_dim, a number of layers, a dropout probability (for dropout between multiple layers), and a batch_first parameter.

Most of the time, you're network will have better performance with more layers; between 2-3. Adding more layers allows the network to learn really complex relationships. 

Complete the `__init__`, `forward`, and `init_hidden` functions for the SentimentRNN model class.

Note: `init_hidden` should initialize the hidden and cell state of an lstm layer to all zeros, and move those state to GPU, if available.

In [None]:
if (lower(device) == "gpu"):
    print('Training on GPU.')
elif (lower(device) == "mps"):
    print('Training on mps.')
else:
    print('No GPU available, training on CPU.')

In [None]:
import torch.nn as nn


class SentimentRNN(nn.Module):
    """
    The RNN model that will be used to perform Sentiment analysis.
    """

    def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.5):
        """
        Initialize the model by setting up the layers.
        """
        super(SentimentRNN, self).__init__()

        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim

        # embedding and LSTM layers
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers,
                            dropout=drop_prob, batch_first=True)

        # dropout layer
        self.dropout = nn.Dropout(0.5)

        # linear and sigmoid layer
        self.fc = nn.Linear(hidden_dim, output_size)
        self.sig = nn.Sigmoid()

    def forward(self, x, hidden):
        """
        Perform a forward pass of our model on some input and hidden state.
        """
        batch_size = x.size(0)

        # embeddings and lstm_out
        embeds = self.embedding(x)
        lstm_out, hidden = self.lstm(embeds, hidden)

        # stack up lstm outputs
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)

        # dropout and fully connected layer
        out = self.dropout(lstm_out)
        out = self.fc(out)

        # sigmoid function
        sig_out = self.sig(out)

        # reshape to be batch_size first
        sig_out = sig_out.view(batch_size, -1)
        sig_out = sig_out[:, -1]  # get last batch of labels

        # return last sigmoid output and hidden state
        return sig_out, hidden

    def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data

        if (lower(device) == "gpu"):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        elif (lower(device) == "mps"):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device))
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())

        return hidden

    def predict(self, x_data):

        test_loader = DataLoader(x_data, shuffle=False, batch_size=batch_size)

        # init hidden state
        h = self.init_hidden(batch_size)

        self.eval()
        predictions = torch.empty((0), dtype=torch.float32)

        # iterate over test data
        for inputs in test_loader:

            # Creating new variables for the hidden state, otherwise
            # we'd backprop through the entire training history
            h = tuple([each.data for each in h])

            if (lower(device) == "gpu"):
                inputs = inputs.cuda()
            if (lower(device) == "mps"):
                inputs = inputs.to(device)
            # get predicted outputs
            output, h = self(inputs, h)

            # convert output probabilities to predicted class (0 or 1)
            pred = torch.round(output.squeeze())  # rounds to the nearest integer

            # compare predictions to true label
            # correct_tensor = pred.eq(labels.float().view_as(pred))
            if (lower(device) == "mps"):
                pred = pred.cpu()
            elif lower(device) == "gpu":
                pred = pred
            else:
                pred = pred.cpu()
            predictions = torch.cat((predictions, pred), 0)

        return predictions.detach().numpy()

#### Weights and hyper parameters

In [None]:
# Instantiate the model w/ hyperparams
vocab_size = len(vocab_to_int) + 1  # +1 for zero padding + our word tokens
output_size = 1
embedding_dim = 400
hidden_dim = 256
n_layers = 4

net = SentimentRNN(vocab_size, output_size, embedding_dim, hidden_dim, n_layers)
print(net)

#### Loss and optimization functions

In [None]:
# loss and optimization functions
lr = 0.001

criterion = nn.BCELoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)

In [None]:
# Check if the model exists before training it
if os.path.exists('./model/model.pt'):
    net = torch.load(os.path.join('./model/', 'model.pt'))
    net = net.to(device)
    net.eval()
else:
    print("Training the model will take sometime")
    # training params

    epochs = 4  # 3-4 is approx where I noticed the validation loss stop decreasing

    counter = 0
    print_every = 100
    clip = 5  # gradient clipping

    # move model to GPU, if available
    if (lower(device) == "gpu"):
        net.cuda()
    if (lower(device) == "mps"):
        net = net.to(device)

    net.train()
    # train for some number of epochs
    for e in range(epochs):
        # initialize hidden state
        h = net.init_hidden(batch_size)

        # batch loop
        for inputs, labels in train_loader:
            counter += 1

            if (lower(device) == "gpu"):
                inputs, labels = inputs.cuda(), labels.cuda()
            if (lower(device) == "mps"):
                inputs, labels = inputs.to(device), labels.to(device)

            # Creating new variables for the hidden state, otherwise
            # we'd backprop through the entire training history
            h = tuple([each.data for each in h])

            # zero accumulated gradients
            net.zero_grad()

            # get the output from the model
            output, h = net(inputs, h)

            # calculate the loss and perform backprop
            loss = criterion(output.squeeze(), labels.float())
            loss.backward()
            # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
            nn.utils.clip_grad_norm_(net.parameters(), clip)
            optimizer.step()

            # loss stats
            if counter % print_every == 0:
                # Get validation loss
                val_h = net.init_hidden(batch_size)
                val_losses = []
                net.eval()
                for inputs, labels in valid_loader:

                    # Creating new variables for the hidden state, otherwise
                    # we'd backprop through the entire training history
                    val_h = tuple([each.data for each in val_h])

                    if (lower(device) == "gpu"):
                        inputs, labels = inputs.cuda(), labels.cuda()
                    if (lower(device) == "mps"):
                        inputs, labels = inputs.to(device), labels.to(device)

                    output, val_h = net(inputs, val_h)
                    val_loss = criterion(output.squeeze(), labels.float())

                    val_losses.append(val_loss.item())

                net.train()
                print("Epoch: {}/{}...".format(e + 1, epochs),
                      "Step: {}...".format(counter),
                      "Loss: {:.6f}...".format(loss.item()),
                      "Val Loss: {:.6f}".format(np.mean(val_losses)))

    torch.save(net, os.path.join('./model/', 'model.pt'))

#### Traning accuracy

In [None]:
def compute_accuracy(net, data_loader, batch_size, device, criterion):
    # Get test data loss and accuracy

    test_losses = []  # track loss
    num_correct = 0

    # init hidden state
    h = net.init_hidden(batch_size)

    net.eval()
    # iterate over test data
    for inputs, labels in data_loader:

        # Creating new variables for the hidden state, otherwise
        # we'd backprop through the entire training history
        h = tuple([each.data for each in h])

        if (lower(device) == "gpu"):
            inputs, labels = inputs.cuda(), labels.cuda()
        if (lower(device) == "mps"):
            inputs, labels = inputs.to(device), labels.to(device)
        # get predicted outputs
        output, h = net(inputs, h)

        # calculate loss
        test_loss = criterion(output.squeeze(), labels.float())
        test_losses.append(test_loss.item())

        # convert output probabilities to predicted class (0 or 1)
        pred = torch.round(output.squeeze())  # rounds to the nearest integer

        # compare predictions to true label
        correct_tensor = pred.eq(labels.float().view_as(pred))
        if (lower(device) == "mps"):
            correct = np.squeeze(correct_tensor.cpu().numpy())
        elif lower(device) == "gpu":
            correct = np.squeeze(correct_tensor.numpy())
        else:
            correct = np.squeeze(correct_tensor.cpu().numpy())

        num_correct += np.sum(correct)

    # -- stats! -- ##
    # avg test loss
    avg_loss = np.mean(test_losses)
    test_acc = num_correct / len(data_loader.dataset) * 100
    return test_acc, avg_loss

In [None]:
if train_model:
    training_accuracy, avg_loss = compute_accuracy(
        net, train_loader, batch_size, device, criterion)
    print(f"Training loss: {avg_loss}")
    print(f"Training accuracy: {training_accuracy}")

#### Test accuracy

In [None]:
if train_model:
    test_accuracy, avg_loss = compute_accuracy(
        net, test_loader, batch_size, device, criterion)
    print(f"Test loss: {avg_loss}")
    print(f"Test accuracy: {test_accuracy}")

## 5. Testing and Documenting the model
### Initialize validmind objects
First, we initialize the dataset and model objects required to run ValidMind's model metrics and validation test plans

In [None]:
train_data = TensorDataset(torch.from_numpy(train_x[:15000]).to(
    device), torch.from_numpy(train_y[:15000]).to(device))

vm_train_ds = vm.init_dataset(dataset=train_data, type="generic")

In [None]:
vm_test_ds = vm.init_dataset(dataset=test_data, type="generic")

In [None]:

vm_model = vm.init_model(net, train_ds=vm_train_ds, test_ds=vm_test_ds)

#### Run model metrics test plan
Since, we are working on a binary classification model task, we can run the `binary_classifier_metrics` test plan to generate various artefacts for the model documentation that can be automatically visible in the model documentation section in ValidMind UI.

In [None]:
model_metrics_test_suite = vm.run_test_suite("binary_classifier_metrics",
                                             model=vm_model
                                             )

#### Run model validation test plan
Similarly, we can run the `binary_classifier_validation` test plan to validate and to generate artefacts. 

In [None]:

model_validation_test_suite = vm.run_test_suite("binary_classifier_validation",
                                                model=vm_model
                                                )