# Week 5_ Sequence-to-Sequence Models

- Introduction to sequence-to-sequence models and its architecture
- Understanding of Encoder-Decoder Models and its variants
- Introduction to attention mechanism and its role in sequence-to-sequence models
- Understanding of Beam Search and its application in sequence-to-sequence models
- Implementing machine translation models using PyTorch or TensorFlow
- Understanding of evaluation metrics for machine translation
- Understanding of transfer learning and fine-tuning pre-trained models for machine translation tasks
- Introduction to unsupervised machine translation and its techniques
- Understanding of Multilingual models and its application in NLP tasks
- Understanding the concept of zero-shot learning and its application in machine translation tasks
- Understanding the concept of back-translation and its application in machine translation tasks
- Understanding the concept of ensembling in machine translation tasks
- Understanding the concept of language model pre-training and its application in machine translation tasks

##  Sequence-to-sequence models and its architecture

Sequence to Sequence (often abbreviated to seq2seq) models is a special class of Recurrent Neural Network architectures that we typically use (but not restricted) to solve complex Language problems like Machine Translation, Question Answering, creating Chatbots, Text Summarization, etc.

<img src="images/sequence.jpg" width ="600px" height ="600px">

Image source: [Link to source](https://miro.medium.com/v2/resize:fit:669/0*iDgmgGnrzq65dPXy.jpg)


<img src="images/sequence 1.png" width ="600px" height ="600px">

Image source: [Link to source](https://miro.medium.com/v2/resize:fit:700/1*y4D1XNJQmx-Gii1oHeHy_A.png)



## Encoder-Decoder Models and its variants

#### The Encoder-Decoder Network

This network have been applied to very wide range of applications including machine translation, text summarisation, questioning answering and dialogue. Let’s try to understand the idea underlying the encoder-decoder networks. The encoder takes the input sequence and creates a contextual representation (which is also called context) of it and the decoder takes this contextual representation as input and generates output sequence.

<img src="images/encoder decoder core.png" width ="600px" height ="600px">

Image source: [Link to source](https://medium.com/nerd-for-tech/nlp-theory-and-code-encoder-decoder-models-part-11-30-e686bcb61dc7)

### Encoder:

Encoder takes the input sequence and generated a context which is the essence of the input to the decoder.

<img src="images/encoder.png" width ="600px" height ="600px">

Image source: [Link to source](https://medium.com/nerd-for-tech/nlp-theory-and-code-encoder-decoder-models-part-11-30-e686bcb61dc7)

The entire purpose of the encoder is to generate a contextual representation/ context for the input sequence.

### Decoder:

Decoder takes the context as input and generates a sequence of output. When we employ RNN as decoder, the context is the final hidden state of the RNN encoder.

<img src="images/decoder.png" width ="600px" height ="600px">

Image source: [Link to source](https://medium.com/nerd-for-tech/nlp-theory-and-code-encoder-decoder-models-part-11-30-e686bcb61dc7)

The first decoder RNN cell takes “CONTEXT” as its prior hidden state. The decoder then generated the output until the end-of-sequence marker is generated.

- complete encoder decoder model

<img src="images/complete encoder decoder.png" width ="600px" height ="600px">

Image source: [Link to source](https://medium.com/nerd-for-tech/nlp-theory-and-code-encoder-decoder-models-part-11-30-e686bcb61dc7)

<img src="images/encoder decoder 2.png" width ="600px" height ="600px">

Image source: [Link to source](https://towardsdatascience.com/what-is-an-encoder-decoder-model-86b3d57c5e1a)

### Types of Encoders and Decoders

There are two main types of encoder and decoder: 
- Linear 
- Nonlinear

#### Linear encoders and decoders:

These are the most common type. They work by taking an input signal and converting it into an output signal that is proportional to the input.

#### Nonlinear encoders and decoders:

These are less common but are more versatile. They work by taking an input signal and converting it into an output signal that is not proportional to the input.









## Introduction to attention mechanism and its role in sequence-to-sequence models

<img src="images/attention mechanism.png" width ="600px" height ="600px">

Image source: [Link to source](https://miro.medium.com/v2/resize:fit:1022/1*qhOlQHLdtfZORIXYuoCtaA.png)



Seq2Seq model with an attention mechanism consists of an encoder, decoder, and attention layer.

Attention layer consists of

- Alignment layer
- Attention weights
- Context vector

## Understanding of Beam Search and its application in sequence-to-sequence models


<img src="images/beam1.png" width ="600px" height ="600px">

Image source: [Link to source](https://medium.com/@dhartidhami/beam-search-in-seq2seq-model-7606d55b21a5)



<img src="images/beam2.png" width ="600px" height ="600px">

Image source: [Link to source](https://medium.com/@dhartidhami/beam-search-in-seq2seq-model-7606d55b21a5)



<img src="images/beam3.png" width ="600px" height ="600px">

Image source: [Link to source](https://medium.com/@dhartidhami/beam-search-in-seq2seq-model-7606d55b21a5)


### APPLICATIONS 

A beam search is most often used to maintain tractability in large systems with insufficient memory to store the entire search tree.For example, 
- It has been used in many machine translation systems.
- Each part is processed to select the best translation, and many different ways of translating the words appear.
- According to their sentence structures, the top best translations are kept, and the rest are discarded. The translator then evaluates the translations according to a given criterion, choosing the translation which best keeps the goals.
- The first use of a beam search was in the Harpy Speech Recognition System, CMU 1976.