# CNN-based Text Classification

## Imports

Here are the packages we need to import.

In [1]:
from nlpmodels.models import text_cnn
from nlpmodels.utils import train,utils,text_cnn_dataset
from argparse import Namespace
utils.set_seed_everywhere()


## Sentiment Analysis with CNNs

Following the logic in Kim's paper, we are running an embedding + convolutional layer architecture in order
to conduct sentiment analysis.

### Hyper-parameters

These are the data processing and model training hyper-parameters for this run. Note that we are running a smaller model
than cited in the paper for fewer iterations...on a CPU. This is meant merely to demonstrate it works.

In [2]:
args = Namespace(
        # Model hyper-parameters
        max_sequence_length=400, #Important parameter. Makes a big difference on output.
        dim_model=50, # embedding size I tried 300->50
        num_filters=100, # output filters from convolution
        window_sizes=[3,5], # different filter sizes, total number of filters len(window_sizes)*num_filters
        num_classes=2, # binary classification problem
        dropout=0.5, # 0.5 from original implementation, kind of high
        # Training hyper-parameters
        num_epochs=3, #30 from original implementation
        learning_rate=1.e-4, #chosing LR is important, often accompanied with scheduler to change
        batch_size=64 #from original implementation
)

In [3]:
train_loader, vocab = text_cnn_dataset.TextCNNDataset.get_training_dataloader(args)
model = text_cnn.TextCNN(vocab_size = len(vocab),
                        dim_model = args.dim_model,
                        num_filters = args.num_filters,
                        window_sizes =  args.window_sizes,
                        num_classes = args.num_classes,
                        dropout = args.dropout)

trainer = train.TextCNNTrainer(args, vocab.mask_index, model, train_loader, vocab)

25000lines [00:01, 15721.50lines/s]


Let's run this.

In [4]:
trainer.run()

  correct += (np.round(F.softmax(y_hat)[:,1].detach().numpy()).reshape(y_hat.shape[0],1) == target.data.numpy()).sum()
[Epoch 0]: 100%|██████████| 342/342 [02:47<00:00,  2.04it/s, accuracy=42.8, loss=0.0469]
[Epoch 1]: 100%|██████████| 342/342 [02:34<00:00,  2.22it/s, accuracy=42.8, loss=0.00377]
[Epoch 2]: 100%|██████████| 342/342 [02:35<00:00,  2.20it/s, accuracy=42.8, loss=0.000307]


Finished Training...


### Review

The goal is just to show how this works - you can play with the hyper-parameters as you see fit.
In an ideal situation, we would check the data against an unseen val or test set to diagnose performance.

#### Parameter importance

In playing with the model's hyper-parameters, there are a few things to note

- *max_sequence_length*: This parameter makes a very big difference in the model accuracy. Too small, and the model does
not have a sufficient amount of context for learning the target. I started off with something way too small and the accuracy
never eclipsed higher than 6%.
- *num_filters*: This parameter was originally set to 100, but I varied it and found that if could go <10.
- *learning_rate*: The learning rate in a pretty important aspect of training a model. I set the parameter to be static,
but often times it makes sense to use a scheduler to allow larger parameter changes initially and then fine-tune over updates.

