# A Futuristic Approach for the Design of Primers - **'designPRI'**



---




The course entitled "Python Applied to Biomedical Sciences," which was co-coordinated by Investigators Sofia Seabra and Luís Filipe Lopes, covered the basic principles of Python programming. The course is primarily targeted to graduate students and professionals from various fields, regardless of their prior programming experience. Throughout the course, Wilson Tavares and David Silva developed the **'designPRI'** tool to facilitate primer design for sequences submitted to GenBank. Accessing this tool is straightforward: simply enter the accession ID into the search box, define primer parameters, and voilà! You will receive a list of primer pairs required for your laboratory analysis. In this code we use the **biopython** and primer3 modules. The **primer3** was the basis of the primer design. 

**Modules Installation**

---



In [None]:
pip install biopython

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
!pip install primer3-py

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


**Importing and accessing module functions in Python**

---



In [None]:
import pandas as pd
import primer3
from Bio import SeqIO
from Bio import Entrez
Entrez.email = "a21001766@ihmt.unl.pt","wtavares@ihmt.unl.pt"
                

**Retrieve sequence information from GenBank database**


---



In [None]:
#Define the Accession ID from GenBank
our_id = "MW015936.1"
our_id

'MW015936.1'

In [None]:
#Enter the nucleotide GenBank sequence and retrieve all the information required for the Accession ID
handle = Entrez.efetch(db="nucleotide", id= our_id, rettype="gb", retmode="text")
gb_record = SeqIO.read(handle, "genbank")
gb_record

SeqRecord(seq=Seq('AGTTGTTGATCTGTGTGAATCAGACTGCGACAGTTCGAGTTTGAAGCGAAAGCT...TCT'), id='MW015936.1', name='MW015936', description='Zika virus isolate Zika virus/H.sapiens-tc/THA/2006/CVD_06-020, complete genome', dbxrefs=[])

In [None]:
#Select and retrieve only the sequence from the Accession ID and convert it to a string
gb_record.seq
SEQUENCE_TEMPLATE = str(gb_record.seq)
print(SEQUENCE_TEMPLATE)

AGTTGTTGATCTGTGTGAATCAGACTGCGACAGTTCGAGTTTGAAGCGAAAGCTAGCAACAGTATCAACAGGTTTTATTTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCAAAGAAGAAATCCGGAGGATTCCGGATTGTCAATATGCTAAAACGCGGAGTAGCCCGTGTGAGCCCCTTTGGGGGCTTGAAGAGGCTGCCAGCCGGACTTCTGCTGGGTCATGGGCCCATCAGGATGGTCTTGGCGATTCTAGCCTTTTTGAGATTCACGGCAATCAAGCCATCACTGGGTCTCATCAATAGATGGGGTTCAGTGGGGAAAAAAGAGGCTATGGAAATAATAAAGAAGTTCAAGAAAGATCTGGCTGCCATGCTGAGAATAATCAATGCTAGGAAGGAGAAGAAGAGACGAGGCACAGATACTAGTGTCGGAATTGTTGGCCTCCTGCTGACCACAGCCATGGCAGTGGAGGTCACTAGACGTGGGAGTGCATACTATATGTACTTGGACAGAAGTGATGCTGGGGAGGCCATATCTTTTCCAACCACACTGGGGATGAATAAGTGTTATATACAGATCATGGATCTTGGACACATGTGTGATGCCACCATGAGCTATGAATGCCCTATGCTGGATGAGGGGGTAGAACCAGATGACGTCGATTGTTGGTGCAACACGACGTCAACTTGGGTTGTGTACGGAACCTGCCACCACAAAAAAGGTGAAGCACGGAGATCTAGAAGAGCTGTGACGCTCCCCTCCCATTCCACTAGGAAGCTGCAAACGCGGTCGCAGACCTGGTTGGAATCAAGAGAATACACAAAGCACTTGATTAGAGTCGAAAATTGGATATTCAGGAACCCTGGCTTCGCGTTAGCAGCAGCTGCCATCGCTTGGCTTTTGGGAAGCTCAACGAGCCAAAAAGTCATATACTTGGTCATGATACTGCTGATTGCCCCGGCATACAGCATCAGGTGCATAGGAGTCAGCAA

**Define query parameters**

---



In [None]:
#Create a dictionary with the parameters for the argument "seq_args"
seq_args = {
        'SEQUENCE_ID': '',
        'SEQUENCE_TEMPLATE':'' ,
        'SEQUENCE_INCLUDED_REGION': [36,10000]}   #primer search region definition
seq_args['SEQUENCE_TEMPLATE'] = SEQUENCE_TEMPLATE
seq_args['SEQUENCE_ID'] = our_id
print(seq_args)

{'SEQUENCE_ID': 'MW015936.1', 'SEQUENCE_TEMPLATE': 'AGTTGTTGATCTGTGTGAATCAGACTGCGACAGTTCGAGTTTGAAGCGAAAGCTAGCAACAGTATCAACAGGTTTTATTTTGGATTTGGAAACGAGAGTTTCTGGTCATGAAAAACCCAAAGAAGAAATCCGGAGGATTCCGGATTGTCAATATGCTAAAACGCGGAGTAGCCCGTGTGAGCCCCTTTGGGGGCTTGAAGAGGCTGCCAGCCGGACTTCTGCTGGGTCATGGGCCCATCAGGATGGTCTTGGCGATTCTAGCCTTTTTGAGATTCACGGCAATCAAGCCATCACTGGGTCTCATCAATAGATGGGGTTCAGTGGGGAAAAAAGAGGCTATGGAAATAATAAAGAAGTTCAAGAAAGATCTGGCTGCCATGCTGAGAATAATCAATGCTAGGAAGGAGAAGAAGAGACGAGGCACAGATACTAGTGTCGGAATTGTTGGCCTCCTGCTGACCACAGCCATGGCAGTGGAGGTCACTAGACGTGGGAGTGCATACTATATGTACTTGGACAGAAGTGATGCTGGGGAGGCCATATCTTTTCCAACCACACTGGGGATGAATAAGTGTTATATACAGATCATGGATCTTGGACACATGTGTGATGCCACCATGAGCTATGAATGCCCTATGCTGGATGAGGGGGTAGAACCAGATGACGTCGATTGTTGGTGCAACACGACGTCAACTTGGGTTGTGTACGGAACCTGCCACCACAAAAAAGGTGAAGCACGGAGATCTAGAAGAGCTGTGACGCTCCCCTCCCATTCCACTAGGAAGCTGCAAACGCGGTCGCAGACCTGGTTGGAATCAAGAGAATACACAAAGCACTTGATTAGAGTCGAAAATTGGATATTCAGGAACCCTGGCTTCGCGTTAGCAGCAGCTGCCATCGCTTGGCTTTTGGGAAGCTCAACGAGCCAAAAAGTCATATACTTGGTCA

**Define primer design parameters**

---



In [None]:
#Create a dictionary by including the parameters required for the argument "primer_params"
#This dictionary enables the optimization of parameters for primer search
primer_params = {
    'PRIMER_OPT_SIZE': 20,
    'PRIMER_MIN_SIZE': 15,
    'PRIMER_MAX_SIZE': 25,
    'PRIMER_OPT_TM': 60.0,
    'PRIMER_MIN_TM': 50.0,
    'PRIMER_MAX_TM': 63.0,
    'PRIMER_MIN_GC': 20.0,
    'PRIMER_MAX_GC': 80.0,
    'PRIMER_MAX_POLY_X': 3,
}
print(primer_params)

{'PRIMER_OPT_SIZE': 20, 'PRIMER_MIN_SIZE': 15, 'PRIMER_MAX_SIZE': 25, 'PRIMER_OPT_TM': 60.0, 'PRIMER_MIN_TM': 50.0, 'PRIMER_MAX_TM': 63.0, 'PRIMER_MIN_GC': 20.0, 'PRIMER_MAX_GC': 80.0, 'PRIMER_MAX_POLY_X': 3}


In [None]:
#Run the primer3.bindings.designPrimers function from the primer3 module and ensure that the following function is saved within the'primers' dictionary
primers = primer3.bindings.designPrimers(seq_args,primer_params)



In [None]:
print(primers)

{'PRIMER_LEFT_EXPLAIN': 'considered 88421, GC content failed 350, low tm 14275, high tm 17675, long poly-x seq 2642, ok 53479', 'PRIMER_RIGHT_EXPLAIN': 'considered 88794, GC content failed 343, low tm 14246, high tm 17828, high hairpin stability 2, long poly-x seq 2594, ok 53781', 'PRIMER_PAIR_EXPLAIN': 'considered 385, unacceptable product size 377, ok 8', 'PRIMER_LEFT_NUM_RETURNED': 5, 'PRIMER_RIGHT_NUM_RETURNED': 5, 'PRIMER_INTERNAL_NUM_RETURNED': 0, 'PRIMER_PAIR_NUM_RETURNED': 5, 'PRIMER_PAIR_0_PENALTY': 0.06085219309989043, 'PRIMER_LEFT_0_PENALTY': 0.029266998437663005, 'PRIMER_RIGHT_0_PENALTY': 0.03158519466222742, 'PRIMER_LEFT_0_SEQUENCE': 'AGCCTGGGAAGATGGGATCT', 'PRIMER_RIGHT_0_SEQUENCE': 'CGACCGTCAGTTGAACTCCA', 'PRIMER_LEFT_0': [2632, 20], 'PRIMER_RIGHT_0': [2753, 20], 'PRIMER_LEFT_0_TM': 60.02926699843766, 'PRIMER_RIGHT_0_TM': 59.96841480533777, 'PRIMER_LEFT_0_GC_PERCENT': 55.0, 'PRIMER_RIGHT_0_GC_PERCENT': 55.0, 'PRIMER_LEFT_0_SELF_ANY_TH': 0.0, 'PRIMER_RIGHT_0_SELF_ANY_TH':

In [None]:
#Create a data.frame for the 'primers' dictionary
data_primers = pd.DataFrame(primers)

In [None]:
#String of columns for forward primers
left = ('PRIMER_LEFT_0_SEQUENCE', 'PRIMER_LEFT_1_SEQUENCE', 'PRIMER_LEFT_2_SEQUENCE','PRIMER_LEFT_3_SEQUENCE','PRIMER_LEFT_4_SEQUENCE')

#String of columns for reverse primers
right = ('PRIMER_RIGHT_0_SEQUENCE', 'PRIMER_RIGHT_1_SEQUENCE', 'PRIMER_RIGHT_2_SEQUENCE','PRIMER_RIGHT_3_SEQUENCE','PRIMER_RIGHT_4_SEQUENCE')

#Save the above strings in two different dictionaries
left_dictionary = dict.fromkeys(left)
right_dictionary = dict.fromkeys(right)

#Add values for the keys in the above dictionaries
for i in left:
  left_dictionary[i]=primers[i]
display(left_dictionary)
for i in right:
  right_dictionary[i]=primers[i]
display(right_dictionary)

#Create a Dataframe with the dictionaries which have been created previously
left2=pd.DataFrame([left_dictionary])
display(left2)
right2=left=pd.DataFrame([right_dictionary])

#Transpose the Dataframes 
left3 = left2.transpose()
display(left3)
resetl3 = left3.reset_index()
display(resetl3)
right3 = right2.transpose()
display(right3)
reset_rl3 = right3.reset_index()
display(reset_rl3)

#Join the forward and reverse Dataframes into one single dataframe
result = pd.concat([resetl3, reset_rl3], axis=1)

#Redefine column names from the Dataframe
new_column_names = ['Primer forward', 'Sequence', 'Primer reverse ','Sequence']
result.columns = new_column_names
display(result)

{'PRIMER_LEFT_0_SEQUENCE': 'AGCCTGGGAAGATGGGATCT',
 'PRIMER_LEFT_1_SEQUENCE': 'AGCCTGGGAAGATGGGATCT',
 'PRIMER_LEFT_2_SEQUENCE': 'AGCCTGGGAAGATGGGATCT',
 'PRIMER_LEFT_3_SEQUENCE': 'AGCCTGGGAAGATGGGATCT',
 'PRIMER_LEFT_4_SEQUENCE': 'AGCCTGGGAAGATGGGATCT'}

{'PRIMER_RIGHT_0_SEQUENCE': 'CGACCGTCAGTTGAACTCCA',
 'PRIMER_RIGHT_1_SEQUENCE': 'ACGACCGTCAGTTGAACTCC',
 'PRIMER_RIGHT_2_SEQUENCE': 'CAGTGTGTCACCATCCACGA',
 'PRIMER_RIGHT_3_SEQUENCE': 'TCAGTGTGTCACCATCCACG',
 'PRIMER_RIGHT_4_SEQUENCE': 'AGTGTGTCACCATCCACGAC'}

Unnamed: 0,PRIMER_LEFT_0_SEQUENCE,PRIMER_LEFT_1_SEQUENCE,PRIMER_LEFT_2_SEQUENCE,PRIMER_LEFT_3_SEQUENCE,PRIMER_LEFT_4_SEQUENCE
0,AGCCTGGGAAGATGGGATCT,AGCCTGGGAAGATGGGATCT,AGCCTGGGAAGATGGGATCT,AGCCTGGGAAGATGGGATCT,AGCCTGGGAAGATGGGATCT


Unnamed: 0,0
PRIMER_LEFT_0_SEQUENCE,AGCCTGGGAAGATGGGATCT
PRIMER_LEFT_1_SEQUENCE,AGCCTGGGAAGATGGGATCT
PRIMER_LEFT_2_SEQUENCE,AGCCTGGGAAGATGGGATCT
PRIMER_LEFT_3_SEQUENCE,AGCCTGGGAAGATGGGATCT
PRIMER_LEFT_4_SEQUENCE,AGCCTGGGAAGATGGGATCT


Unnamed: 0,index,0
0,PRIMER_LEFT_0_SEQUENCE,AGCCTGGGAAGATGGGATCT
1,PRIMER_LEFT_1_SEQUENCE,AGCCTGGGAAGATGGGATCT
2,PRIMER_LEFT_2_SEQUENCE,AGCCTGGGAAGATGGGATCT
3,PRIMER_LEFT_3_SEQUENCE,AGCCTGGGAAGATGGGATCT
4,PRIMER_LEFT_4_SEQUENCE,AGCCTGGGAAGATGGGATCT


Unnamed: 0,0
PRIMER_RIGHT_0_SEQUENCE,CGACCGTCAGTTGAACTCCA
PRIMER_RIGHT_1_SEQUENCE,ACGACCGTCAGTTGAACTCC
PRIMER_RIGHT_2_SEQUENCE,CAGTGTGTCACCATCCACGA
PRIMER_RIGHT_3_SEQUENCE,TCAGTGTGTCACCATCCACG
PRIMER_RIGHT_4_SEQUENCE,AGTGTGTCACCATCCACGAC


Unnamed: 0,index,0
0,PRIMER_RIGHT_0_SEQUENCE,CGACCGTCAGTTGAACTCCA
1,PRIMER_RIGHT_1_SEQUENCE,ACGACCGTCAGTTGAACTCC
2,PRIMER_RIGHT_2_SEQUENCE,CAGTGTGTCACCATCCACGA
3,PRIMER_RIGHT_3_SEQUENCE,TCAGTGTGTCACCATCCACG
4,PRIMER_RIGHT_4_SEQUENCE,AGTGTGTCACCATCCACGAC


Unnamed: 0,Primer forward,Sequence,Primer reverse,Sequence.1
0,PRIMER_LEFT_0_SEQUENCE,AGCCTGGGAAGATGGGATCT,PRIMER_RIGHT_0_SEQUENCE,CGACCGTCAGTTGAACTCCA
1,PRIMER_LEFT_1_SEQUENCE,AGCCTGGGAAGATGGGATCT,PRIMER_RIGHT_1_SEQUENCE,ACGACCGTCAGTTGAACTCC
2,PRIMER_LEFT_2_SEQUENCE,AGCCTGGGAAGATGGGATCT,PRIMER_RIGHT_2_SEQUENCE,CAGTGTGTCACCATCCACGA
3,PRIMER_LEFT_3_SEQUENCE,AGCCTGGGAAGATGGGATCT,PRIMER_RIGHT_3_SEQUENCE,TCAGTGTGTCACCATCCACG
4,PRIMER_LEFT_4_SEQUENCE,AGCCTGGGAAGATGGGATCT,PRIMER_RIGHT_4_SEQUENCE,AGTGTGTCACCATCCACGAC


In [None]:
def designPRI(our_id,include_region):

#Enter the nucleotide GenBank sequence and retrieve all the information required for the Accession ID
  handle = Entrez.efetch(db="nucleotide", id= our_id, rettype="gb", retmode="text")
  gb_record = SeqIO.read(handle, "genbank")

#Select and retrieve only the sequence from the Accession ID and convert it to a string
  gb_record.seq
  SEQUENCE_TEMPLATE = str(gb_record.seq)
  print(SEQUENCE_TEMPLATE)

#Create a dictionary with the parameters for the argument "seq_args"  
  seq_args = {
        'SEQUENCE_ID': '',
        'SEQUENCE_TEMPLATE':'' ,
        'SEQUENCE_INCLUDED_REGION': include_region}
  seq_args['SEQUENCE_TEMPLATE'] = SEQUENCE_TEMPLATE
  seq_args['SEQUENCE_ID'] = our_id
  print(seq_args)

#Create a dictionary by including the parameters required for the argument "primer_params"
#This dictionary enables the optimization of parameters for primer search  
  primer_params = {
    'PRIMER_OPT_SIZE': 20,
    'PRIMER_MIN_SIZE': 15,
    'PRIMER_MAX_SIZE': 25,
    'PRIMER_OPT_TM': 60.0,
    'PRIMER_MIN_TM': 50.0,
    'PRIMER_MAX_TM': 63.0,
    'PRIMER_MIN_GC': 20.0,
    'PRIMER_MAX_GC': 80.0,
    'PRIMER_MAX_POLY_X': 3,
}
  print(primer_params)

#Run the primer3.bindings.designPrimers function from the primer3 module and ensure that the following function is saved within the'primers' dictionary  
  primers = primer3.bindings.design_primers(seq_args,primer_params)

#String of columns for forward primers  
  left = ('PRIMER_LEFT_0_SEQUENCE', 'PRIMER_LEFT_1_SEQUENCE', 'PRIMER_LEFT_2_SEQUENCE','PRIMER_LEFT_3_SEQUENCE','PRIMER_LEFT_4_SEQUENCE')
  
#String of columns for reverse primers  
  right = ('PRIMER_RIGHT_0_SEQUENCE', 'PRIMER_RIGHT_1_SEQUENCE', 'PRIMER_RIGHT_2_SEQUENCE','PRIMER_RIGHT_3_SEQUENCE','PRIMER_RIGHT_4_SEQUENCE')
  
#Save the above strings in two different dictionaries  
  left_dictionary = dict.fromkeys(left)
  right_dictionary = dict.fromkeys(right)
  
#Add values for the keys in the above dictionaries  
  for i in left:
   left_dictionary[i]=primers[i]
  for i in right:
   right_dictionary[i]=primers[i]
  
#Create a Dataframe with the dictionaries which have been created previously  
  left2=pd.DataFrame([left_dictionary])
  right2=pd.DataFrame([right_dictionary])
  
#Transpose the Dataframes   
  left3 = left2.transpose()
  resetl3 = left3.reset_index()
  right3 = right2.transpose()
  
#Join the forward and reverse Dataframes into one single dataframe  
  reset_rl3 = right3.reset_index()
  result = pd.concat([resetl3, reset_rl3], axis=1)
  
  #Redefine column names from the Dataframe
  new_column_names = ['Primer forward', '5-Sequence-3', 'Primer reverse ','3-Sequence-5']
  result.columns = new_column_names
  display(result)

In [None]:
#Enter the Accession ID for the sequence in GenBank for which you want to design primers
our_id = 'NC_001512.1'  #https://www.ncbi.nlm.nih.gov/nuccore/NC_001512.1
#Define the region in the sequence for which you want to design primers
include_region =[36,3000]
designPRI(our_id,include_region)

ATAGCTGCGTGATACACACACGCAGCTTACGGGTTTCATACTGCTCTACTCTGCATTGCAAGAGATTAAAGTACCCATCATGGATTCAGTGTATGTAGACATAGATGCTGACAGCGCGTTTCTGAAGGCGTTGCAGCAAGCATACCCCATGTTTGAGGTGGAACCAAAGCAGGTCACGCCAAATGACCATGCAAACGCTAGAGCATTTTCGCATCTAGCAATAAAACTGATAGAGCAGGAAATTGATCCAGACTCAACCATTCTAGACATTGGTAGCGCACCAGCTAGGAGGATGATGTCTGATAGAAAATACCACTGCGTCTGCCCGATGCGCAGCGCAGAAGACCCTGAGAGGCTCGCGAATTACGCGAGAAAACTTGCGTCAGCCGCTGGAAAGGTGACAGATAAAAACATCTCCGGAAAAATTAATGATCTACAAGCTGTGATGGCCGTACCGAATATGGAAACATCCACATTCTGCCTACACACTGATGCTACATGCAAACAAAGAGGAGACGTCGCCATTTATCAAGACGTCTACGCCGTCCATGCACCTACCTCGCTGTACCATCAGGCGATTAAAGGAGTCCGCGTGGCATACTGGATAGGGTTCGATACGACACCTTTCATGTACAATGCAATGGCTGGCGCATACCCATCATATTCAACAAACTGGGCTGATGAGCAGGTACTGAAAGCTAAGAACATAGGGCTGTGTTCAACAGACCTATCTGAGGGTAGACGAGGCAAACTATCCATCATGAGAGGCAAAAAATTGAAGCCATGCGACCGAGTGCTATTCTCGGTCGGCTCAACACTCTACCCTGAAAGTCGTAAACTTCTACAAAGCTGGCATTTACCATCGGTATTTCATCTGAAGGGTAAACTCAGCTTCACCTGCCGCTGTGACACGATCGTCTCATGCGAAGGATACGTTGTCAAGAGAGTGACCATGAGTCCAGGCATCTACGGAAAGACATCGGGGTATGCTGTAACTCAT

Unnamed: 0,Primer forward,5-Sequence-3,Primer reverse,3-Sequence-5
0,PRIMER_LEFT_0_SEQUENCE,TCTGAGGGTAGACGAGGCAA,PRIMER_RIGHT_0_SEQUENCE,TCCTTCGCATGAGACGATCG
1,PRIMER_LEFT_1_SEQUENCE,TCTGAGGGTAGACGAGGCAA,PRIMER_RIGHT_1_SEQUENCE,TCCGTAGATGCCTGGACTCA
2,PRIMER_LEFT_2_SEQUENCE,TGCACGTACAGTCGACTCAC,PRIMER_RIGHT_2_SEQUENCE,CGCGGTCTAACCATGGCTAT
3,PRIMER_LEFT_3_SEQUENCE,CACGCCAAATGACCATGCAA,PRIMER_RIGHT_3_SEQUENCE,ATCTGTCACCTTTCCAGCGG
4,PRIMER_LEFT_4_SEQUENCE,CGTCAGACCTTGTTGTCGGA,PRIMER_RIGHT_4_SEQUENCE,CATCGTATGCTTCAACCGCG
