# Signal echoing

 - Practicum: [Week6](https://www.youtube.com/watch?v=8cAffg2jaT0) - [Instant](https://youtu.be/8cAffg2jaT0?t=3019) 


Echoing signal `n` steps is an example of synchronized many-to-many task. 

Echoing signal means the network has to remember previous states. In this case, we are going to use a simple RNN network.

In [1]:
from res.sequential_tasks import EchoData
import torch
import torch.nn as nn
# import torch.nn.functional as F
import torch.optim as optim
import numpy as np

torch.manual_seed(1);

## Generate dataset

To generate the dataset we are going to use the utility [`EchoData`](https://github.com/Atcold/pytorch-Deep-Learning/blob/6a8a86013de961acea1d1fd6dc0c712db607974d/res/sequential_tasks.py#L75).

This utility generates (`batch_size`) sequences of (`series_length`) binary digits each one. The expected output is move (`echo_step`) positions to the right. In the first position we introduce (`echo_step`) random digits. 

For example, to this toy sequence:

```
x = [1 0 1 1 1 0 0]
```

the expected response with 2 echo steps is:

```
y = [0 0 1 0 1 1 1]
```

In [2]:
batch_size = 5 
echo_step = 3 # We use a short number of echoing steps
series_length = 20_000 
BPTT_T = 20

train_data = EchoData(
    echo_step=echo_step,
    batch_size=batch_size,
    series_length=series_length,
    truncated_length=BPTT_T,
)
train_size = len(train_data)

test_data = EchoData(
    echo_step=echo_step,
    batch_size=batch_size,
    series_length=series_length,
    truncated_length=BPTT_T,
)
test_size = len(test_data)

`len` function in a `EchoData` object returns the number of available baches. Number of baches is equal to `series_length/T`

In [3]:
print(train_size)
print(int(series_length/BPTT_T))

1000
1000


Let's print first 20 timesteps of the first sequences to see the echo data:

In [4]:
print('(1st input sequence)  x:', *train_data.x_batch[0, :20], '... ')
print('(1st target sequence) y:', *train_data.y_batch[0, :20], '... ')

(1st input sequence)  x: 1 0 0 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 0 1 ... 
(1st target sequence) y: 0 0 0 1 0 0 1 1 1 0 1 1 0 0 1 0 1 1 1 1 ... 


Batch_size different sequences are created

In [5]:
print('x_batch:', *(str(d)[1:-1] + ' ...' for d in train_data.x_batch[:, :20]), sep='\n')
print('x_batch size:', train_data.x_batch.shape)
print()
print('y_batch:', *(str(d)[1:-1] + ' ...' for d in train_data.y_batch[:, :20]), sep='\n')
print('y_batch size:', train_data.y_batch.shape)

x_batch:
1 0 0 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 0 1 ...
0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 ...
1 0 1 1 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 0 ...
0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 0 1 0 0 ...
0 0 1 1 1 1 0 0 1 0 0 1 1 1 0 0 1 0 0 0 ...
x_batch size: (5, 20000)

y_batch:
0 0 0 1 0 0 1 1 1 0 1 1 0 0 1 0 1 1 1 1 ...
0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 ...
0 0 0 1 0 1 1 0 1 1 0 1 0 1 0 1 0 0 1 1 ...
0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 0 ...
0 0 0 0 0 1 1 1 1 0 0 1 0 0 1 1 1 0 0 1 ...
y_batch size: (5, 20000)


In order to use RNNs data is organized into temporal chunks of size `[batch_size, T, feature_dim]`

In [6]:
print('x_chunk:', *train_data.x_chunks[0].squeeze(), sep='\n')
print('1st x_chunk size:', train_data.x_chunks[0].shape)
print()
print('y_chunk:', *train_data.y_chunks[0].squeeze(), sep='\n')
print('1st y_chunk size:', train_data.y_chunks[0].shape)

x_chunk:
[1 0 0 1 1 1 0 1 1 0 0 1 0 1 1 1 1 1 0 1]
[0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1]
[1 0 1 1 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 0]
[0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 0 1 0 0]
[0 0 1 1 1 1 0 0 1 0 0 1 1 1 0 0 1 0 0 0]
1st x_chunk size: (5, 20, 1)

y_chunk:
[0 0 0 1 0 0 1 1 1 0 1 1 0 0 1 0 1 1 1 1]
[0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0]
[0 0 0 1 0 1 1 0 1 1 0 1 0 1 0 1 0 0 1 1]
[0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 0]
[0 0 0 0 0 1 1 1 1 0 0 1 0 0 1 1 1 0 0 1]
1st y_chunk size: (5, 20, 1)


## Defining the Recurrent Neural Network (RNN) model

In this case, we need a model that remember what was the last processed information (we echoing the previous step). RNN gives us some capacity of remember. 

In [7]:
class SimpleRNN(nn.Module):
    def __init__(self, input_size, rnn_hidden_size, output_size):
        super().__init__()
        self.rnn_hidden_size = rnn_hidden_size
        self.rnn = torch.nn.RNN(
            input_size=input_size,
            hidden_size=rnn_hidden_size,
            num_layers=1,
            nonlinearity='relu',
            batch_first=True
        )
        self.linear = torch.nn.Linear(
            in_features=rnn_hidden_size,
            out_features=1
        )

    def forward(self, x, hidden):
        x, hidden = self.rnn(x, hidden)  
        x = self.linear(x)
        return x, hidden

In [8]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Defining the Training Loop

Define training routine. Remember all the training process includes the following 5 steps:
 1. Forward process: `output = model(data)`
 2. Get loss: `loss = criterion(output, target)`
 3. Client gradient buffers: `optimizer.zero_grad()`
 4. Calculate gradient (the partial derivate of the loss with the respect the network paramenters): `loss.backward()`
 5. Perform training setp (step in the oppositional direction of the gradient): `optimizer.step()`

If some of these steps are missed the training is going to go wrong!

In [9]:
def train(hidden):
    
    # Set the model to training mode.
    model.train()
       
    # Store the number of sequences that were classified correctly
    correct = 0
    
    # Iterate over every batch of sequences.
    for batch_idx in range(train_size):
        
        # Request a batch of sequences and class labels, convert them into tensors
        # of the correct type, and then send them to the appropriate device.
        data, target = train_data[batch_idx]
        data, target = torch.from_numpy(data).float().to(device), torch.from_numpy(target).float().to(device)
        
         # Clear the gradient buffers of the optimized parameters.
        optimizer.zero_grad()
        
        if hidden is not None: hidden.detach_()
        
        # Perform the forward pass of the model
        logits, hidden = model(data, hidden)
        
        # Compute the loss
        loss = criterion(logits, target)
        
        # Calculate gradient 
        loss.backward()
        
        # Perform training setp (
        optimizer.step()
        
        # Compute if the prediction was correct. In particular, if the values of each digit is greater than 0.5 is
        # a 1 and if their are less than 0.5 is a 0.
        pred = (torch.sigmoid(logits) > 0.5)
        correct += (pred == target.byte()).int().sum().item()
        
    return correct, loss.item(), hidden

## Defining the Testing Loop

In testing it is important to avoid the gradient calculations during inference: `with torch.no_grad():`

In [10]:
def test(hidden):
    
    # Set the model to evaluation mode.
    model.eval()   
    
    # Store the number of sequences that were classified correctly
    correct = 0
    
    # Avoid the gradient calculations during inference
    with torch.no_grad():
        
        # Iterate over every batch of sequences
        for batch_idx in range(test_size):
            
            # Request a batch of sequences and class labels, convert them into tensors
            # of the correct type, and then send them to the appropriate device.
            data, target = test_data[batch_idx]
            data, target = torch.from_numpy(data).float().to(device), torch.from_numpy(target).float().to(device)
            
            # Perform the forward pass of the model
            logits, hidden = model(data, hidden)
            
            # Compute if the prediction was correct.
            pred = (torch.sigmoid(logits) > 0.5)
            correct += (pred == target.byte()).int().sum().item()

    return correct

## Putting it All Together

In [11]:
feature_dim = 1 #since we have a scalar series
h_units = 4

model = SimpleRNN(
    input_size=1,
    rnn_hidden_size=h_units,
    output_size=feature_dim
).to(device)
hidden = None
        
# BCEWithLogitsLoss: This loss combines a Sigmoid layer and the BCELoss in one single class.
#  - https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html
criterion = torch.nn.BCEWithLogitsLoss()

# RMSprop - https://pytorch.org/docs/stable/optim.html
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001)

Execute training

In [12]:
n_epochs = 5
epoch = 0

while epoch < n_epochs:
    correct, loss, hidden = train(hidden)
    epoch += 1
    train_accuracy = float(correct) / train_size
    print(f'Train Epoch: {epoch}/{n_epochs}, loss: {loss:.3f}, accuracy {train_accuracy:.1f}%')

#test    
correct = test(hidden)
test_accuracy = float(correct) / test_size
print(f'Test accuracy: {test_accuracy:.1f}%')

## Depends on the initialization of the network the model can't reach or not the maximum accuracy.
## Re-lunch this cell if you don't get the 100% accuracy.

Train Epoch: 1/5, loss: 0.488, accuracy 56.8%
Train Epoch: 2/5, loss: 0.052, accuracy 87.0%
Train Epoch: 3/5, loss: 0.001, accuracy 99.9%
Train Epoch: 4/5, loss: 0.000, accuracy 100.0%
Train Epoch: 5/5, loss: 0.000, accuracy 100.0%
Test accuracy: 100.0%


Execute the model with a random sequence

In [13]:
# Generate the input sequence
my_input = torch.empty(1, 100, 1).random_(2).to(device)
print("Input sequence: ")
print(my_input.view(1, -1).byte())

Input sequence: 
tensor([[1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1,
         0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1,
         0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0,
         0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0,
         1, 1, 0, 0]], device='cuda:0', dtype=torch.uint8)


In [14]:
# Apply the model to the sequence
hidden = None
my_out, _ = model(my_input, hidden)

In [15]:
# Visualize the result
print("Echoing sequence:")
response = (my_out > 0).view(1, -1).to(int)
response

Echoing sequence:


tensor([[1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
         1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1,
         1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1,
         1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,
         1, 0, 0, 1]], device='cuda:0')

We check if the echoing was correct.

In [16]:
expected_result = my_input.detach().view(1, -1).to(int).cpu().numpy()
expected_result = np.append(expected_result[:,-3:], expected_result[:,0:-3]).astype(int)

obtained_result = (my_out > 0).view(1, -1).to(int).cpu().numpy()

print(f"Number of correct echoing digits from the {len(expected_result[3:])} : ")
(obtained_result[0][3:] == expected_result[3:]).sum().item()

Number of correct echoing digits from the 97 : 


95

## Play with configs

RNN is valid to process with no so much needed memory. 

For the main example, the number of echoing steps was little. **What happens when the RNN model has to learn echoing more steps?**

* Define a new dataset with more echoing steps

In [17]:
batch_size = 5 
echo_step = 10 # We increase the number of echoing steps
series_length = 20_000 
BPTT_T = 20

train_data = EchoData(
    echo_step=echo_step,
    batch_size=batch_size,
    series_length=series_length,
    truncated_length=BPTT_T,
)
train_size = len(train_data)

test_data = EchoData(
    echo_step=echo_step,
    batch_size=batch_size,
    series_length=series_length,
    truncated_length=BPTT_T,
)
test_size = len(test_data)

* Re-use the same model configuration

In [18]:
feature_dim = 1 #since we have a scalar series
h_units = 4

model = SimpleRNN(
    input_size=1,
    rnn_hidden_size=h_units,
    output_size=feature_dim
).to(device)
hidden = None
        
# BCEWithLogitsLoss: This loss combines a Sigmoid layer and the BCELoss in one single class.
#  - https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html
criterion = torch.nn.BCEWithLogitsLoss()

# RMSprop - https://pytorch.org/docs/stable/optim.html
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001)

* Run the training (with some more epochs to help the model)

In [19]:
n_epochs = 20 # We increase the number of 
epoch = 0

while epoch < n_epochs:
    correct, loss, hidden = train(hidden)
    epoch += 1
    train_accuracy = float(correct) / train_size
    print(f'Train Epoch: {epoch}/{n_epochs}, loss: {loss:.3f}, accuracy {train_accuracy:.1f}%')

#test    
correct = test(hidden)
test_accuracy = float(correct) / test_size
print(f'Test accuracy: {test_accuracy:.1f}%')

Train Epoch: 1/20, loss: 0.693, accuracy 49.7%
Train Epoch: 2/20, loss: 0.693, accuracy 49.9%
Train Epoch: 3/20, loss: 0.694, accuracy 50.1%
Train Epoch: 4/20, loss: 0.692, accuracy 50.2%
Train Epoch: 5/20, loss: 0.693, accuracy 50.1%
Train Epoch: 6/20, loss: 0.694, accuracy 49.6%
Train Epoch: 7/20, loss: 0.693, accuracy 50.1%
Train Epoch: 8/20, loss: 0.693, accuracy 50.3%
Train Epoch: 9/20, loss: 0.693, accuracy 50.0%
Train Epoch: 10/20, loss: 0.694, accuracy 50.0%
Train Epoch: 11/20, loss: 0.693, accuracy 49.9%
Train Epoch: 12/20, loss: 0.694, accuracy 50.4%
Train Epoch: 13/20, loss: 0.693, accuracy 50.2%
Train Epoch: 14/20, loss: 0.693, accuracy 50.1%
Train Epoch: 15/20, loss: 0.693, accuracy 49.9%
Train Epoch: 16/20, loss: 0.693, accuracy 50.1%
Train Epoch: 17/20, loss: 0.693, accuracy 50.1%
Train Epoch: 18/20, loss: 0.693, accuracy 50.1%
Train Epoch: 19/20, loss: 0.693, accuracy 50.0%
Train Epoch: 20/20, loss: 0.693, accuracy 50.2%
Test accuracy: 49.9%


The RNN cann't learn the echoing sequence because with a long echoing sequence it needs more memory.

**What happens when we increase the number of hidden units? Does it increase the RNN memory?**

In [20]:
feature_dim = 1 #since we have a scalar series
h_units = 60 ## We increase the number of units

model = SimpleRNN(
    input_size=1,
    rnn_hidden_size=h_units,
    output_size=feature_dim
).to(device)
hidden = None
        
# BCEWithLogitsLoss: This loss combines a Sigmoid layer and the BCELoss in one single class.
#  - https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html
criterion = torch.nn.BCEWithLogitsLoss()

# RMSprop - https://pytorch.org/docs/stable/optim.html
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001)

In [21]:
n_epochs = 20 # We increase the number of 
epoch = 0

while epoch < n_epochs:
    correct, loss, hidden = train(hidden)
    epoch += 1
    train_accuracy = float(correct) / train_size
    print(f'Train Epoch: {epoch}/{n_epochs}, loss: {loss:.3f}, accuracy {train_accuracy:.1f}%')

#test    
correct = test(hidden)
test_accuracy = float(correct) / test_size
print(f'Test accuracy: {test_accuracy:.1f}%')

Train Epoch: 1/20, loss: 0.693, accuracy 50.2%
Train Epoch: 2/20, loss: 0.693, accuracy 49.8%
Train Epoch: 3/20, loss: 0.694, accuracy 49.8%
Train Epoch: 4/20, loss: 0.693, accuracy 50.2%
Train Epoch: 5/20, loss: 0.692, accuracy 50.1%
Train Epoch: 6/20, loss: 0.694, accuracy 49.8%
Train Epoch: 7/20, loss: 0.693, accuracy 49.9%
Train Epoch: 8/20, loss: 0.694, accuracy 49.9%
Train Epoch: 9/20, loss: 0.693, accuracy 50.2%
Train Epoch: 10/20, loss: 0.693, accuracy 49.9%
Train Epoch: 11/20, loss: 0.693, accuracy 50.2%
Train Epoch: 12/20, loss: 0.694, accuracy 50.0%
Train Epoch: 13/20, loss: 0.693, accuracy 49.9%
Train Epoch: 14/20, loss: 0.693, accuracy 50.2%
Train Epoch: 15/20, loss: 0.693, accuracy 50.1%
Train Epoch: 16/20, loss: 0.693, accuracy 49.9%
Train Epoch: 17/20, loss: 0.692, accuracy 50.1%
Train Epoch: 18/20, loss: 0.693, accuracy 50.0%
Train Epoch: 19/20, loss: 0.694, accuracy 49.9%
Train Epoch: 20/20, loss: 0.693, accuracy 50.0%
Test accuracy: 49.8%


It seems that increasing the number of hidden units doesn't help to increase the long-term memory.

**What happens if we added more hidden layers? Does it increase the RNN memory?**

In [22]:
class LongerRNN(nn.Module):
    def __init__(self, input_size, rnn_hidden_size, output_size):
        super().__init__()
        self.rnn_hidden_size = rnn_hidden_size
        self.rnn = torch.nn.RNN(
            input_size=input_size,
            hidden_size=rnn_hidden_size,
            num_layers=6,
            nonlinearity='relu',
            batch_first=True
        )
        self.linear = torch.nn.Linear(
            in_features=rnn_hidden_size,
            out_features=1
        )

    def forward(self, x, hidden):
        x, hidden = self.rnn(x, hidden)  
        x = self.linear(x)
        return x, hidden

In [23]:
feature_dim = 1 #since we have a scalar series
h_units = 60 ## We increase the number of units

model = LongerRNN(
    input_size=1,
    rnn_hidden_size=h_units,
    output_size=feature_dim
).to(device)
hidden = None
        
# BCEWithLogitsLoss: This loss combines a Sigmoid layer and the BCELoss in one single class.
#  - https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html
criterion = torch.nn.BCEWithLogitsLoss()

# RMSprop - https://pytorch.org/docs/stable/optim.html
optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001)

In [24]:
n_epochs = 20 # We increase the number of 
epoch = 0

while epoch < n_epochs:
    correct, loss, hidden = train(hidden)
    epoch += 1
    train_accuracy = float(correct) / train_size
    print(f'Train Epoch: {epoch}/{n_epochs}, loss: {loss:.3f}, accuracy {train_accuracy:.1f}%')

#test    
correct = test(hidden)
test_accuracy = float(correct) / test_size
print(f'Test accuracy: {test_accuracy:.1f}%')

Train Epoch: 1/20, loss: 0.690, accuracy 49.9%
Train Epoch: 2/20, loss: 0.693, accuracy 50.2%
Train Epoch: 3/20, loss: 0.694, accuracy 49.9%
Train Epoch: 4/20, loss: 0.694, accuracy 50.1%
Train Epoch: 5/20, loss: 0.691, accuracy 50.1%
Train Epoch: 6/20, loss: 0.694, accuracy 49.9%
Train Epoch: 7/20, loss: 0.693, accuracy 49.9%
Train Epoch: 8/20, loss: 0.693, accuracy 50.0%
Train Epoch: 9/20, loss: 0.693, accuracy 50.1%
Train Epoch: 10/20, loss: 0.694, accuracy 49.8%
Train Epoch: 11/20, loss: 0.692, accuracy 49.9%
Train Epoch: 12/20, loss: 0.694, accuracy 50.0%
Train Epoch: 13/20, loss: 0.693, accuracy 49.9%
Train Epoch: 14/20, loss: 0.693, accuracy 50.1%
Train Epoch: 15/20, loss: 0.692, accuracy 50.2%
Train Epoch: 16/20, loss: 0.692, accuracy 49.9%
Train Epoch: 17/20, loss: 0.693, accuracy 50.1%
Train Epoch: 18/20, loss: 0.693, accuracy 50.0%
Train Epoch: 19/20, loss: 0.695, accuracy 50.0%
Train Epoch: 20/20, loss: 0.693, accuracy 50.0%
Test accuracy: 49.8%


Neither that works. **How can we improve the long-term memory of a RNN?**