<a href="https://colab.research.google.com/github/vessln/Deep_learning/blob/main/4_Neural_networks_for_language_processing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [9]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, SimpleRNN, GRU, LSTM, Bidirectional, Embedding, Attention, AdditiveAttention

In [8]:
!pip install transformers



# Neural networks for language processing

In [2]:
model = Sequential([
    Input((6, 20)),
    # regression
    SimpleRNN(64, activation = None),
])

## RNN architectures
- One to one – I give 1 input and get 1 output – standard neural network: Ex: 1 picture of a cat -> returns 1 class ‘cat‘.
- One to many – I give 1 input, and the rest are 0-left vectors and I get many outputs (completion model). Ex: picture of a cat -> returns a sequence of words: “A cat is sitting”. One input goes through an RNN, which generates multiple outputs, one per time step.
- Many to one – the input is a sequence and the model returns 1 output (sentiment analysis). Ex: input: sequence of words “I love this movie”, output: one label (“Positive”). A sequence of inputs is processed, and the last hidden state is used to generate an output.
- Many to Many – sequence of inputs, sequence of outputs:
1. Many to many Unequal Length – I give data, but I ignore the first outputs (they do not participate in the loss function), finally I give zero vectors and it returns something (generative model). Ex: machine translation ("How are you?" → "Как си?").
2. Many to many Equal Length – I get as much as I give with return_sequences=True. Ex: Part of speech tagging (POS tagging).

In [3]:
model1 = Sequential([
    # 6 steps, 20 features
    Input((6, 20)),
    SimpleRNN(128, return_sequences=True),
    SimpleRNN(64, return_sequences=True),
    SimpleRNN(64, return_sequences=True),
    SimpleRNN(64, return_sequences=True, activation = "softmax"),
])

In [4]:
model1.summary()

## Improved models
- GRU (Gated reccurent unit)
- LSTM (Long-Short Term Memory)

In [5]:
model_lstm = Sequential([
    Input((6, 20)),
    LSTM(128, activation = None),
]).summary()

### Bidirectional RNN
Bidirectional Recurrent Neural Network is an architecture that processes input data in two directions: From beginning to end (forward pass) and from end to beginning (backward pass). This structure allows the network to take into account context from the past and future when making decisions at each time step.

In [6]:
model_bidir = Sequential([
    Input((6, 20)),
    Bidirectional(SimpleRNN(15), merge_mode= "sum"),
]).summary()

## Representing tokens

Embedding is used for dimentionality reduction.

In [7]:
# context_size = 500,
model_srnn = Sequential([
    Input((20_000, )),
    Embedding(20_000, 128),
    SimpleRNN(15),
]).summary()

## Attention

Attention is a technique by which models focus their attention on different parts of the input that are most important for the current task.
In standard RNN or CNN, the sequence is processed linearly (the meaning of some more distant words may be lost). Through the Attention model estimates which words in the input are more important for the current task and gives them "weights" according to their importance. Each word is represented as a vector and is converted into three different components:
- Query (a vector that searches for relevant information),
- Key (a vector that represents the index of each word) and
- Value (a vector that contains the information itself).

In [None]:
attention = Attention()

q = Embedding()(...)
v = Embedding()(...)

q_v = attention(q, v)

### Transformers