# General Lessons about Natural Lanuage Processing

- Word embeddings work by vectorizing all the words so that the ones that are the most similar to eachother are the closest to eachother
- Way we can define embedding using pytorch is through `nn.Embeddings` and defining it in the constructor of your model similar to what we did before with CNN's





## Here's a sample model build out 


In [None]:
import torch 
from torch import nn



class SentimentAnalysisModel(nn.Module):
    def __init__(self, vocab_size):
        self.embeddings = nn.Embedding(vocab_size, 16) 
        '''
            The idea by setting the words into a 16 dimension vector. We are essentially extracting more features from the data so that we when
            we consolidate and train the data we are able to pick on more relationships between the data to get better results. 
        '''

        self.linear = nn.linear(in_features = 16, out_features= 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        word_embeddings = self.embeddings(x)
        averaged = torch.mean(word_embeddings, dim = 1)

        return self.sigmoid(self.linear(averaged)) # shape will be 2 by 1

## What is dropout?


Dropout is basically to combat overfitting which is when the model overfits due to having too much data whether thats in the form of having too many layers, too many nodes, etc which causes the model to be less accurate because it is memorizing too many irrelevant details.


Implementing dropout is simple, its literally jsut doing adding a `nn.Dropout()` with some probabiltiy as its parameter (p =)

What this does is that for every node that is below or equal to that probability, the activation is 0 meaning that all the model only picks up on the details that you want it to. 



## Building a Digit Classifier Model ( with sample data )

In [1]:
import torch
from torch import nn


class DigitClassifierModel01(nn.Module):

    def __init__(self):
        super().__init__()
        self.layer_stack = nn.Sequential(
            nn.Linear(in_features= 784, out_features= 512),
            nn.Relu(),
            nn.Dropout(p=.2),
            nn.Linear(in_features= 512, out_features= 10),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layer_stack(x)
        



In [None]:
## Training and Testing


sample_data_test

sample_data_train 

#assume variables above represent training and testing datasets that fully formatted and on the same device for the model


epochs = 5

model01 = DigitClassifierModel01()

loss_fn = nn.CrossEntropyLoss()

optimizer = torch.optim.SGD(params= model01.parameters(), lr= 0.01)


for epoch in range(epochs):

    for X,y in enumerate(sample_data_train):

        model01.train()

        # forward pass
        y_predictions = model01(X)

        #calulcate loss

        loss = loss_fn(y_predictions, y)

        # optimizer zero grad

        optimizer.zero_grad()

        #backpropogation 

        loss.backward()

        #forward step 

        optimizer.step()


for epoch in range(epochs):

    for X,y in enumerate(sample_data_test):

        model01.train()

        # forward pass
        y_predictions = model01(X)

        #calulcate loss

        loss = loss_fn(y_predictions, y)

        # optimizer zero grad

        optimizer.zero_grad()

        #backpropogation 

        loss.backward()

        #forward step 

        optimizer.step()






