# Retention Time Preidiction 

This notebook is prepared to be run in Google [Colaboratory](https://colab.research.google.com/). In order to train the model faster, please change the runtime of Colab to use Hardware Accelerator, either GPU or TPU.

This is an extension of the original walkthrough example available [here](https://github.com/wilhelm-lab/dlomix-resources/blob/main/notebooks/Example_RTModel_Walkthrough_colab.ipynb).

### Task 3: Model Architecture
Similar to the initial notebook, we will initialize our model and train it. The target here is to experiment with values for model architecture parameters and observe thg performance of the trained model.

In [1]:
# install the mlomix package in the current environment using pip

!python -m pip install -q git+https://github.com/wilhelm-lab/dlomix



In [3]:
import numpy as np
import pandas as pd
import dlomix
from dlomix.models import RetentionTimePredictor
import tensorflow as tf
from dlomix.eval import TimeDeltaMetric

In [4]:
from dlomix.data import RetentionTimeDataset

TRAIN_DATAPATH = 'https://raw.githubusercontent.com/wilhelm-lab/dlomix/develop/example_dataset/proteomTools_train_val.csv'
BATCH_SIZE = 64

rtdata = RetentionTimeDataset(data_source=TRAIN_DATAPATH,
                              seq_length=30, batch_size=BATCH_SIZE, val_ratio=0.2, test=False)

The code below creates a model and trains it. You should try out different loss functions and observe the impact on the training. Please Refer to the initial notebook to analyze the results.

Hint: Change `embedding_dim` size and `encoder` and explore how this would change the model performance. The parameter `embedding_dim` is the size of vector representing each Amino Acid, the higher it is, the more representative power it has. The `encoder` parameter determines the type of encoder that follows the embeddings' layer. 

Change one thing at time to see how it will affect the model. Possible values are:
- `embedding_dim`: use values from the range $10$ to $100$
- `encoder`: use one of the values `'conv1d'` or `'lstm'` 

In [None]:
DIM = 10
ENCODER = "conv1d"

# create model
model = RetentionTimePredictor(seq_length=30, embedding_dim=DIM, encoder=ENCODER)

optimizer = tf.keras.optimizers.SGD(learning_rate=1e-4, decay=1e-7)

# compile the model  with the optimizer and the metrics we want to use, we can add our custom time-delta metric

model.compile(optimizer=optimizer, 
            loss=loss_function, metrics=[TimeDeltaMetric()])

history = model.fit(rtdata.train_data, validation_data=rtdata.val_data, epochs=15)