# Issue with missing ion peaks
For spectra with missing ions, we still want to be able to sequence them. With non-hybrid sequences, this really isn't that much of a problem, as hopefully the b ions make up for some of the missing y ions and vice versa. However, especially in the case of unbalanced ions. The picture kind of shows the issue

![](../unbalancedIons.png)

In this case, there are enough b-ions to describe the left k-mer but not enough y-ions to describe the right k-mer. What we need to do is try new extensions after getting top scoring k-mers to try and describe the b-ions found all the way on the right

# Testing steps
So we want to load some proteins, create both non-hybrid and hybrid peptides, make it lopsided like the picture, then run hypedsearch and see how well it does
1. Load fasta file
2. Make peptides
    1. Make non-hybrid peptides
    2. Make hybrid peptides
3. Remove b or y ion peaks to make spectrum unbalanced
4. Run hypedsearch
5. Load results and see what alignmets were made

### Imports

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
module_path = os.path.abspath(os.path.join('../..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from modules.sequence_generation import write_spectra
from collections import namedtuple
from src.spectra import gen_spectra
from pyteomics.mzxml import read as mzxmlread
from src.file_io import fasta
import random
import json

Determination of memory status is not supported on this 
 platform, measuring for memoryleaks will never fail


## 1. Load fasta file

In [2]:
fasta_file = '../../testing framework/data/databases/4prots.fasta'
db = {x['name']: x for x in fasta.read(fasta_file)}


## 2. Make peptides 
We will make 4 peptides with 6 spectra:
* Non hybrid peptide with all ions
* 2 cases of another non hybrid peptide:
    * Missing lots of b ions
    * Missing lots of y ions
* Hybrid peptide with all ions
* 2 cases of another hybrid peptide:
    * Missing lots of b ions
    * Missing lots of y ions

In [3]:
# theres only 4 proteins, so use the first 2 to make the first 3 spectra, and the second 2 for the other 3
peptide = namedtuple('peptide', ['hybrid', 'left', 'right', 'sequence', 'hybrid_sequence'])
min_len = 7
max_len = 15
hp_min_len = 4
hp_max_len = 8
protkeys = list(db.keys())

get_start = lambda protlen: random.randint(0, protlen - max_len)
p1s = get_start(len(db[protkeys[0]]['sequence']))
p2s = get_start(len(db[protkeys[1]]['sequence']))
p1 = peptide(False, protkeys[0], protkeys[0], db[protkeys[0]]['sequence'][p1s: p1s + random.randint(min_len, max_len)], '')
p2 = peptide(False, protkeys[1], protkeys[1], db[protkeys[1]]['sequence'][p2s: p2s + random.randint(min_len, max_len)], '')

hp1ls = get_start(len(db[protkeys[2]]['sequence']))
hp1rs = get_start(len(db[protkeys[3]]['sequence']))
hp1seq = db[protkeys[2]]['sequence'][hp1ls: hp1ls + random.randint(hp_min_len, hp_max_len)] + '-' + db[protkeys[3]]['sequence'][hp1rs: hp1rs + random.randint(hp_min_len, hp_max_len)]
hp1 = peptide(True, protkeys[2], protkeys[3], hp1seq.replace('-', ''), hp1seq)

hp2ls = get_start(len(db[protkeys[2]]['sequence']))
hp2rs = get_start(len(db[protkeys[3]]['sequence']))
hp2seq = db[protkeys[2]]['sequence'][hp2ls: hp2ls + random.randint(hp_min_len, hp_max_len)] + '-' + db[protkeys[3]]['sequence'][hp2rs: hp2rs + random.randint(hp_min_len, hp_max_len)]
hp2 = peptide(True, protkeys[2], protkeys[3], hp2seq.replace('-', ''), hp2seq)



## 3. Make unbalanced spectra

In [4]:
max_small_count = 4
spec1 = gen_spectra.gen_spectrum(p1.sequence[:-1])['spectrum']
spec2b = gen_spectra.gen_spectrum(p2.sequence[:-1], ion='b')['spectrum'] + gen_spectra.gen_spectrum(p2.sequence[len(p2.sequence) - random.randint(1, max_small_count+1):], ion='y')['spectrum']
spec2y = gen_spectra.gen_spectrum(p2.sequence[:random.randint(0, max_small_count)], ion='b')['spectrum'] + gen_spectra.gen_spectrum(p2.sequence[:-1], ion='y')['spectrum']
spec3 = gen_spectra.gen_spectrum(hp1.sequence[:-1])['spectrum']
spec4b = gen_spectra.gen_spectrum(hp2.sequence[:-1], ion='b')['spectrum'] + gen_spectra.gen_spectrum(hp2.sequence[len(hp2.sequence) - random.randint(1, max_small_count+1):], ion='y')['spectrum']
spec4y = gen_spectra.gen_spectrum(hp2.sequence[:random.randint(0, max_small_count)], ion='b')['spectrum'] + gen_spectra.gen_spectrum(hp2.sequence[:-1], ion='y')['spectrum']


In [5]:
spectra = [
    {'spectrum': spec1, 'precursor_mass': gen_spectra.gen_spectrum(p1.sequence[:-1])['precursor_mass']},
    {'spectrum': spec2b, 'precursor_mass': gen_spectra.gen_spectrum(p2.sequence[:-1])['precursor_mass']},
    {'spectrum': spec2y, 'precursor_mass': gen_spectra.gen_spectrum(p2.sequence[:-1])['precursor_mass']},
    {'spectrum': spec3, 'precursor_mass': gen_spectra.gen_spectrum(hp1.sequence[:-1])['precursor_mass']}, 
    {'spectrum': spec4b, 'precursor_mass': gen_spectra.gen_spectrum(hp2.sequence[:-1])['precursor_mass']},
    {'spectrum': spec4y, 'precursor_mass': gen_spectra.gen_spectrum(hp2.sequence[:-1])['precursor_mass']}
]

In [6]:
write_spectra.write_mzml('unbalancedSpectra', spectra, output_dir='../../testing framework/data/testing_output/')

'../../testing framework/data/testing_output/unbalancedSpectra.mzML'

## 4. Run hypedsearch

In [7]:
from src import runner

test_directory = '../data/testing_output/'
spec_dir = '../data/spectra/'

args = {
    'spectra_folder': spec_dir,
    'database_file': fasta_file,
    'output_dir': test_directory,
    'min_peptide_len': 3,
    'max_peptide_len': 35,
}
runner.run(args)


Loading database...
Done. Indexing database...
1426 unique kmers
Done.
Number of 3-mers found in the database: 1426
Analyzing spectra file 0/1[0%]

Finished search. Writting results to .../data/testing_output/...


## 5. Load results

In [8]:
summary_file = '../data/testing_output/summary.json'

summ = json.load(open(summary_file, 'r'))

for file_scanno, result in summ.items():
    print(result['alignments'][0])

FileNotFoundError: [Errno 2] No such file or directory: '../data/testing_output/summary.json'