# Making Predictions with _ExpressYeaself_

### Introduction

* This interactive notebook takes you through the process of loading a pre-existing model. This can either be one that you have trained yourself, or one of _ExpressYeaself_'s saved models.
* All you need is: 
    1. **If you have trained your own model**, the absolute path of the file containing your saved model weights.
    1. the relative path of the file containing sequences for which you would like to predict the expression of.
* Run the relevant cells as instructed.
* At any time you can restart the process by selecting 'Kernel' > 'Restart & Clear Output' in the toolbar of the notebook.

### Import relevant packages

In [None]:
import context
import os

ROOT_DIR = os.getcwd()[:os.getcwd().rfind('Express')] + 'ExpressYeaself/'
construct = context.construct_neural_net

## Choose which model you want to load

* Here you can choose whetehr to use one of _ExpressYeaself_'s pre-trained models, or use one of your own.
* Only run cells below option a) **or** below option b). Do not run both.

#### EITHER a) Use one of _ExpressYeaself_'s models ...

* Choose from: ``'1d_cnn_classifier'``, ``'1d_cnn_sequential'``, ``'1d_cnn_parallel'``, ``'1d_loccon_classifier'``. 
* Input your choice on the first line: ``model_to_use = ``.
* Do not edit the second line

In [None]:
model_to_use = 
saved_model = construct.get_saved_model_path(model_to_use)

#### OR b) Use your own saved model

* Here you need to specify the absolute paths of the file containing your saved model (.hdf5 file).

In [None]:
saved_model = 

* The next cell will check that the file you have specified exists.
* If an error is thrown, check your file path and run the cells again.

In [None]:
assert os.path.exists(saved_model), 'Input file for saved model (.hdf5 extension) does not exist.'

# Make predictions!

* Now you are ready to calculate predictions for the contribution sequences have on the expression of genes in yeast.
* Please define the absolute path of the file containing sequences that you would like to get predictions for in the first line of the cell below, after ``input_sequences = ``.
* This input file must contain **sequences only**, where each sequence is on a separate line.
* Sequences must be of length equal to the length of the sequences that the model has been trained on.
* If you are using _ExpressYeaself_'s pretrained models, sequences **must be 80 nucleotides in length**.

In [None]:
input_sequences = 

* The results of the prediction will be returned in the form of a data frame. If you would like this data frame to be sorted, so that the sequences with the highest predicted expression levels are at the top, set ``sort_df = True``. If you do so, the original position of each sequence in the input file you specify will be indicated in the column 'index' in the resulting data frame.

In [None]:
sort = 

* If you would like the results to be written to an output file, specify ``write_to_file = True``. 
* Otherwise, set to ``False``. 
* The absolute path of the output file, if specified ``True``, will be printed in this notebook. The long number it contains is a unique time stamp, so that you can keep track of different prediction outputs created at different times without overwriting existing files.

In [None]:
write_to_file = 

* Now just run the next cell to get the prediction results!

In [None]:
construct.get_predictions_for_input_file(input_sequences, model_to_use, sort_df=sort, write_to_file=write_to_file)