# Run a test of hypedsearch with generated data
The following steps describe how the test works
1. Load a fasta database
2. Generate
    1. Hybrid proteins
    2. Peptides
    3. Hybrid peptides from the hybrid proteins
3. Generate spectra for all the peptides created
4. Run hypedsearch with the .fasta file (no hybrid proteins included) and the spectra files
5. Load the summary.json file created
6. Determine what number of alignments were correct

## 1. Load fasta database

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
module_path = os.path.abspath(os.path.join('../..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from src.file_io import fasta

fasta_file = '../data/databases/4prots.fasta'
database = fasta.read(fasta_file)
database = {x['name']: x for x in database}

## 2.  Generate the peptides, hybrid proteins and peptides

In [2]:
from sequence_generation import proteins, peptides

num_hybs = 5
min_length=5
max_length = 35
num_peptides = 50

# make hybrid proteins
hyb_prots = proteins.generate_hybrids([x for _, x in database.items()], num_hybs, min_contribution=max_length)
# create peptides
non_hybrid_peps = peptides.gen_peptides([x for _, x in database.items()], num_peptides, min_length=min_length, max_length=max_length, digest='random', dist='beta')
# create hybrid peptides
hyb_peps = peptides.gen_peptides(hyb_prots, num_hybs, min_length=min_length, max_length=max_length, digest='random', hybrid_list=True)

all_proteins_raw = [x for _,x in database.items()] + hyb_prots
all_peptides_raw = non_hybrid_peps + hyb_peps


Generating hybrid protein 0/5[0%]Generating hybrid protein 1/5[20%]Generating hybrid protein 2/5[40%]Generating hybrid protein 3/5[60%]Generating hybrid protein 4/5[80%]
Finished generating hybrid proteins


## 3. Generate spectra

In [3]:
from src.spectra import gen_spectra
from src.utils import utils
from sequence_generation import write_spectra

test_directory = '../data/testing_output/'
utils.make_dir(test_directory)

spectra = []
for pep in all_peptides_raw:
    cont = gen_spectra.gen_spectrum(pep['sequence'])
    spec = cont['spectrum']
    pm = cont['precursor_mass']
    spectra.append({'spectrum': spec, 'precursor_mass': pm})
    
write_spectra.write_mzml('testSpectraFile', spectra, output_dir=test_directory)


Determination of memory status is not supported on this 
 platform, measuring for memoryleaks will never fail


'../data/testing_output/testSpectraFile.mzML'

## 4. Run hypedsearch

In [4]:
from src import runner
from time import time

args = {
    'spectra_folder': test_directory,
    'database_file': fasta_file,
    'output_dir': test_directory
}
st = time()
runner.run(args)
print('\nTotal runtime: {} seconds'.format(time() - st))

Analyzing spectrum 294/295[99%]
Total runtime: 39.35865497589111 seconds


## 5. Load the summary json

In [6]:
import json

summary = json.load(open(test_directory + 'summary.json', 'r'))

## 6. Determine which number of alignments were correct
This needs to be broken down into hybrid and non hybrid peptides to get some stats on how well its doing