# Fragment ion intensities Prediction 

This notebook is prepared to be run in Google [Colaboratory](https://colab.research.google.com/). In order to train the model faster, please change the runtime of Colab to use Hardware Accelerator, either GPU or TPU.

This is an extension of the original walkthrough example available [here](https://github.com/wilhelm-lab/dlomix-resources/tree/tasks/intensity/notebooks/Intensity/Example_IntensityModel_Walkthrough_colab.ipynb).

### Task 2: Loss Function
Similar to the initial notebook, we will initialize our model and train it. The target here is to experiment with different loss functions and observe thg performance of the trained model. The loss function is our optimization objective, which we use to quantify how good or bad our model, being trained, is performing and find better set of parameters that result in better performance on the task at hand.

In [1]:
# install the mlomix package in the current environment using pip

!python -m pip install -q git+https://github.com/wilhelm-lab/dlomix.git@feature/intensity_tutorial



In [None]:
!python -m pip install wandb

In [1]:
import numpy as np
import pandas as pd
import dlomix
from dlomix.models import PrositIntensityPredictor
import tensorflow as tf
from dlomix.losses import masked_spectral_distance, masked_pearson_correlation_distance
tf.get_logger().setLevel('ERROR')

import wandb
from wandb.keras import WandbCallback

In [2]:
# enter project name for weights and biases
project_name = 'dlomix_intensity'

In [4]:
from dlomix.data import IntensityDataset

TRAIN_DATAPATH = 'https://raw.githubusercontent.com/wilhelm-lab/dlomix-resources/main/example_datasets/Intensity/proteomeTools_train_val.csv'
BATCH_SIZE = 64

int_data = IntensityDataset(data_source=TRAIN_DATAPATH,
                              seq_length=30, collision_energy_col='collision_energy', batch_size=BATCH_SIZE, val_ratio=0.2, test=False)

The code below creates a model and trains it. You should try out different loss functions and observe the impact on the training. Please Refer to the initial notebook to analyze the results.

Hint: Change embedding_output_dim size and recurrent layers size and explore how this would change the model performance. The parameter embedding_dim is the size of vector representing each Amino Acid, the higher it is, the more representative power it has. The recurrent layers size is the number of units in the two GRU layers in the model encoder, the higher it is, the more parameters the model will have and that can help with detecting complex patterns but can also lead to overfitting.

Change one thing at time to see how it will affect the model. Possible values are:

- `embedding_output_dim`: use values from the range $1$ to $25$
- `recurrent_layers_sizes`: use values from the range of $32$ to $512$ increment in powers of two

Current values set are the default values

In [5]:
# Enter weights and biases run name. Make sure that different models have different run names.
wandb.init(project=project_name, name='model_arc_')

# create model
model = PrositIntensityPredictor(seq_length=30,embedding_output_dim=16,
        recurrent_layers_sizes=(256, 512))

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)

# compile the model  with the optimizer and the metrics we want to use, we can add our custom time-delta metric

model.compile(optimizer=optimizer, 
            loss=masked_spectral_distance, metrics=[masked_pearson_correlation_distance])

history = model.fit(int_data.train_data, validation_data=int_data.val_data, epochs=30
                    , callbacks=[WandbCallback(save_model=False)])

# Mark the run as finished
wandb.finish()

[34m[1mwandb[0m: Currently logged in as: [33mwassimg[0m ([33mprosit-compms[0m). Use [1m`wandb login --relogin`[0m to force relogin


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.016666666666666666, max=1.0…

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


VBox(children=(Label(value='0.002 MB of 0.002 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
epoch,▁▁▂▃▃▃▄▅▅▅▆▇▇▇█
loss,█▆▆▅▄▃▃▃▂▂▂▂▁▁▁
masked_pearson_correlation_distance,█▅▆▄▂▂▂▂▂▂▁▁▁▁▁
val_loss,██▇▅▅▄▄▃▃▃▂▂▂▁▁
val_masked_pearson_correlation_distance,██▇▄▃▃▂▂▂▂▁▂▂▁▁

0,1
best_epoch,14.0
best_val_loss,0.45027
epoch,14.0
loss,0.4701
masked_pearson_correlation_distance,0.377
val_loss,0.45027
val_masked_pearson_correlation_distance,0.34612
