# Demo notebook for preparing input fasta for APPRAISE

Author: Xiaozhe Ding

Email: xding@caltech.edu, dingxiaozhe@gmail.com

## Introduction

This demo notebook shows a few examples for preparing input fasta files for ColabFold prediction.

## Environment preparation

#### Check the environment

APPRAISE 1.2 was tested with the following environment. We suggest using versions equal or higher than these for optimal compatibility:

For input file preparation and data analysis:

 - MacOS 10.14.6

 - Python 3.6.10

 - PyMOL 2.3.3 (Schrodinger LLC.)

 - Python packages: 
 
    - scipy 1.4.1

    - numpy 1.18.2

    - pandas 1.1.5

    - matplotlib 3.2.1

    - seaborn 0.11.2

For structural modeling:

- slphafold-colabfold 2.1.14 (Accessed using Google Colaboratory. Notebook available [here](https://github.com/sokrypton/ColabFold))
    - AlphaFold model version: AlphaFold-multimer-v2


#### APPRAISE package

If you haven't install the appraise package, run the following box in the notebook to install the package. ***You'll need to restart the kernal after installation***. 

Skip this block if the package had already been installed.

In [None]:
!pip install -e ..

### Example 1 - Prepare input fasta files for pairwise competition in ColabFold

In [21]:
# Import necessary modules
import appraise
from appraise.utilities import *
from appraise.input_fasta_prep import *

The peptide sequences should be provided in a .csv table with two columns titled "peptide_name" and "peptide_seq", respectively.

You can find example peptide listes in folder ./data/manuscript_example_sequences. 

#### Generate input fasta for pairwise matrix

In [None]:
csv_file_path_default = '../demo_100AAV_screening/stage_1/APPRAISE1.2_selected_top_18_peptides.csv'

csv_file_path = interactive_input('csv_file_path',csv_file_path_default) #@param {type:"string"}

folder_path_for_fastas = interactive_input('folder_path_for_fastas', './demo_stage_2_input_fasta/')#@param {type:"string"}

receptor_name = interactive_input('receptor_name', 'Ly6a') #@param {type:"string"}

receptor_seq = interactive_input('receptor_seq', "LECYQCYGVPFETSCPSITCPYPDGVCVTQEAAVIVDSQTRKVKNNLCLPICPPNIESMEILGTKVNVKTSCCQEDLCNVAVP") #@param {type:"string"}

peptide_names, peptide_seqs = load_peptides(csv_file_path)

list_query_sequence, list_jobname = get_complex_fastas(receptor_name, \
                                                        receptor_seq, \
                                                        peptide_names, \
                                                        peptide_seqs, \
                                                        mode = 'pairwise', \
                                                        square_matrix = True, \
                                                        folder_path = folder_path_for_fastas)

### Example 2- Prepare input fasta files for pooled competition in ColabFold (for HT-APPRAISE stage 1)

In [1]:
# Import necessary modules
import appraise
from appraise.utilities import *
from appraise.input_fasta_prep import *


Bad key "text.kerning_factor" on line 4 in
/anaconda3/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test_patch.mplstyle.
You probably need to get an updated matplotlibrc file from
https://github.com/matplotlib/matplotlib/blob/v3.1.3/matplotlibrc.template
or from the matplotlib source distribution


The peptide sequences should be provided in a .csv table with two columns titled "peptide_name" and "peptide_seq", respectively.

You can find example peptide listes in folder ./data/manuscript_example_sequences. 

Generate pooled fastas with random grouping 1 (4 variants per group):

In [None]:
csv_file_path = interactive_input('csv_file_path','../demo_100AAV_screening/stage_1/AAV_mock_selection_100_peptide_list.csv')

folder_path_for_fastas = interactive_input('folder_path_for_fastas', './demo_stage_1_grouping_1_input_fasta_for_colabfold/')

receptor_name = interactive_input('receptor_name', 'Ly6a')

receptor_seq = interactive_input('receptor_seq', "LECYQCYGVPFETSCPSITCPYPDGVCVTQEAAVIVDSQTRKVKNNLCLPICPPNIESMEILGTKVNVKTSCCQEDLCNVAVP")

pool_size = interactive_input('pool_size', 4)

peptide_names, peptide_seqs = load_peptides(csv_file_path)

list_query_sequence, list_jobname = get_complex_fastas(receptor_name, \
                                                        receptor_seq,
                                                        peptide_names, \
                                                        peptide_seqs, \
                                                        mode = 'pooled', \
                                                        pool_size = pool_size,\
                                                        folder_path = folder_path_for_fastas)

Generate pooled fastas with random grouping 2 (4 variants per group):

In [None]:
csv_file_path = interactive_input('csv_file_path','../demo_100AAV_screening/stage_1/AAV_mock_selection_100_peptide_list.csv')

folder_path_for_fastas = interactive_input('folder_path_for_fastas', './demo_stage_1_grouping_2_input_fasta_for_colabfold/')

receptor_name = interactive_input('receptor_name', 'Ly6a')

receptor_seq = interactive_input('receptor_seq', "LECYQCYGVPFETSCPSITCPYPDGVCVTQEAAVIVDSQTRKVKNNLCLPICPPNIESMEILGTKVNVKTSCCQEDLCNVAVP")

pool_size = interactive_input('pool_size', 4)

peptide_names, peptide_seqs = load_peptides(csv_file_path)

list_query_sequence, list_jobname = get_complex_fastas(receptor_name, \
                                                        receptor_seq,
                                                        peptide_names, \
                                                        peptide_seqs, \
                                                        mode = 'pooled', \
                                                        pool_size = pool_size,\
                                                        folder_path = folder_path_for_fastas)

### Example 3- Prepare input fasta files for single peptide-receptor complex modeling in ColabFold (for comparison with APPRAISE)

In [1]:
# Import necessary modules
import appraise
from appraise.utilities import *
from appraise.input_fasta_prep import *

In [None]:
csv_file_path = interactive_input('csv_file_path','../demo_100AAV_screening/stage_1/AAV_mock_selection_100_peptide_list.csv')

folder_path_for_fastas = interactive_input('folder_path_for_fastas', './single_peptide_receptor_modeling/')

receptor_name = interactive_input('receptor_name', 'Ly6a')

receptor_seq = interactive_input('receptor_seq', "LECYQCYGVPFETSCPSITCPYPDGVCVTQEAAVIVDSQTRKVKNNLCLPICPPNIESMEILGTKVNVKTSCCQEDLCNVAVP")

peptide_names, peptide_seqs = load_peptides(csv_file_path)

list_query_sequence, list_jobname = get_complex_fastas(receptor_name, \
                                                        receptor_seq,
                                                        peptide_names, \
                                                        peptide_seqs, \
                                                        mode = 'single', \
                                                        folder_path = folder_path_for_fastas)