# Task_01. In Silico PCR for Covid-19 diagnosis.
CDC(Center for Disease Control and Prevention) released the information about PCR primers/probes to detect Covid-19. (See [this page](https://www.cdc.gov/coronavirus/2019-ncov/lab/rt-pcr-panel-primer-probes.html) for more information). We are curious how these primers/probes work.

If you need more info about "PCR-based diagnosis", see [this video](https://www.youtube.com/watch?v=fkUDu042xic).

## Data files
<ul>
    <li> The genome of Covid-19: '../data/2019nCoV_genomes.2020_02_03.fa'
    <li> The primers for Covid-19 detection: '../data/2019nCoV_primers.fa'
</ul>
    
## Procedures
<ol>
<li> Read 2019nCoV primers from a FASTA file (**see below**).
<li> Read 2019nCoV genomes from a FASTA file (**see below**).
<li> Find the position of primers (F, R) on each genome sequence.
<li> Calculate the length of PCR amplicons for Covid-19 diagnostics.
</ol>

## Questions
<ol>
    <li> What is the length of amplicons generated by N1/N2/N3 primers? Any variation among the genomes?
    <li> What is the sequence of amplicons?
    <li> Can these primers detect all Covid-19 genomes?
    <li> Can these primers detect MERS genomes? How about SARS genomes? (you can find those genomes under '../data' directory).
</ol>

In [32]:
# The function to get the reverse complementary sequences
rc = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
def revcomp(tmp_seq):
    # from the end of input sequence to the beginning, 
    # replace each nucleotide to its complementary one.
    #
    # The single line code at the end of this function is equivalent to the following 4 lines.
    # rv = []
    # for tmp_n in tmp_seq[::-1]:
    #     rv.append(rc[tmp_n])
    # return ''.join(rv)
    
    return ''.join([rc[x] for x in tmp_seq[::-1]])

print(revcomp('AATTGGCC'))

GGCCAATT


In [19]:
# The function to read sequences from a FASTA file
def read_fasta(tmp_filename):
    rv = dict()
    f = open(tmp_filename, 'r')
    for line in f:
        if line.startswith('>'):
            tmp_h = line.strip().lstrip('>')
            rv[tmp_h] = ''
        else:
            rv[tmp_h] += line.strip().replace(' ', '')
    f.close()
    return rv

In [None]:
# Read primers

filename_primers = '../data/2019nCoV_primers.fa'
primer_list = read_fasta(filename_primers)
for p_h, p_seq in primer_list.items():
    print(p_h, p_seq)

In [None]:
# Read genomes
filename_genomes = '../data/2019nCoV_genomes.2020_02_03.fa'
genome_list = read_fasta(filename_genomes)
for g_h, g_seq in genome_list.items():
    print(g_h.split()[0], g_seq[:25])

In [None]:
# Example of sequence matching
seq1 = 'AATTGGCCAATTGGCCAATTGGCCAATTGGCC'
seq2 = 'CCAAT'
print(seq1.find(seq2))
print(seq1.index(seq2))
seq3 = 'CCTTA'
print(seq1.find(seq3))
print(seq1.index(seq3))