# Designing primers
This notebook illustrates the various functions contained in the `designer` class, which can be used to generate primers for extension PCR and DNA assembly.

In [1]:
import plasmid as pge
import importlib
importlib.reload(pge)

<module 'plasmid' from '/home/peguin/python/lib/python3.9/site-packages/plasmid/__init__.py'>

<a class="anchor" id="xtPCR"></a>
## Extension PCR
[Extension PCR](https://en.wikipedia.org/wiki/Overlap_extension_polymerase_chain_reaction) uses primers with overlapping annealing sites to assemble a larger DNA fragment. This method is useful for adding promoters, ribosome binding sites, or cloning overhangs onto a larger gene fragment. 

The `Designer` class has the following parameters and functions associated with extension PCR.

In [2]:
pcr = pge.Designer()
help(pcr.xtPCR)
print('parameters for extension PCR')
print(pcr.params['xtPCR'])

Help on method xtPCR in module plasmid.designer:

xtPCR(fL, seq, fR=None, padding=[2, 2], niter=3, w=[10, 100, 1, 1, 2], get_cost=False) method of plasmid.designer.Designer instance
    Find primers which can seed and extend a PCR fragment
    fL = flanking sequence on 5' end
    seq = sequence on 3' end which gets amplified
    fR = flanking sequence on 3' end
    padding = number of extra primers to try
    w = weights for cost function
    method = optimization method
    returns list of primers

parameters for extension PCR
{'Tm': 50, 'len': [15, 60], 'nM': [20, 500], 'Na': 50}


`Tm` is the target melting temperature for all primers.

`len` is min and max primer lengths allowed in the design.

`nM` is the concentration of seed and finishing primers used in the reaction.

`Na` is the salt concentration of the PCR reaction.

These parameters are used to compute the thermodynamic properties of the PCR reaction such as annealing temperature. In the backend, several optimization algorithms can be used to find the optimal primer sequences that minimize deviation from the target annealing temperature and off target binding.

<a class="anchor" id="running_xtPCR"></a>
### Running extension PCR
The following code blocks generate a generic gene fragment from RFP, RBS, and pLac promoters. Extension PCR is used to add the pLac promoter and ribosome binding site sequences onto the RFP gene.

In [3]:
# slice out the RFP gene
RFP = pge.read_genbank('../data/dcas9_RFP.gb')
RFP = RFP[RFP['locus_tag'].str.contains('mRFP')].splice()
# slice out the ribosome binding site
RBS = pge.read_genbank('../data/xRFP.gb')
RBS = RBS[RBS['locus_tag'].str.contains('BBa_B0034')].splice()
# slice out the promoter
pLac = pge.read_genbank('../data/xRFP.gb')
pLac = pLac[pLac['locus_tag'].str.contains('pLac')].splice()

# assemble the promoter, rbs, and mRFP
df = pLac + 'gagacc' + RBS + 'ggtctc' + RFP

reading  ../data/dcas9_RFP.gb  as genbank file
reading  ../data/xRFP.gb  as genbank file
reading  ../data/xRFP.gb  as genbank file


In [4]:
pcr = pge.Designer()
pcr.params['xtPCR']['Tm'] = 55         # target annealing temperature for xtPCR
pcr.params['xtPCR']['len'] = [15, 60]  # defines the [min, max] primer lengths
pcr.params['xtPCR']['Nm'] = [25,500]   # defines the seed and finishing primer concentration in nM
pcr.params['verbose'] = False

insert = pLac + 'gagacc' + RBS + 'ggtctc'
res = pcr.xtPCR(insert, RFP, ' ')
print(res)

running fwd


running rev


  locus_tag         Tm                                           sequence  \
0       0_F  56.437456  ACAgagaccAAAGAGGAGAAAggtctc ATGGCGAGTAGCGAAGAC...   
1     fin_F  55.207985  AATTGACAATGTGAGCGAGTAACAAGATACTGAGC ACAgagaccA...   
0     fin_R  56.777386                                 TTAAGCACCGGTGGAGTG   

                   annealed  strand  
0  ATGGCGAGTAGCGAAGACGTTATC       1  
1     ACAgagaccAAAGAGGAGAAA       1  
0        TTAAGCACCGGTGGAGTG      -1  


In the above output dataframe, `fin_F` and `fin_R` are the finishing primers (used at 500nM), which are the last set of primers to amplify in the PCR reaction. `0_F` is a forward primer (used at 25nM) associated with seeding and extending the sequence from the 5' end of the oligo.

In an extension PCR reaction, all these primers would be mixed together to generate the desired oligo at the target annealing tempertaure.

<a class="anchor" id="gibson"></a>
## Gibson Assembly
[Gibson Assembly](https://www.neb.com/applications/cloning-and-synthetic-biology/dna-assembly-and-cloning/nebuilder-hifi-dna-assembly) uses exonucleases to reveal basepair overlaps between multiple DNA oligos. These overlaps can anneal and assemble into a longer DNA fragment. The following shows how to design primers for amplifying individual DNA fragments that can later be assembled into a longer construct. 

In the design process, optimal overlaps are first found via optimization. Next [xtPCR](#xtPCR) is used to design primers which can add the proper overlaps to the DNA oligos.

In [5]:
pcr = pge.Designer()
help(pcr.Gibson)
print(pcr.params['gibson'])

Help on method Gibson in module plasmid.designer:

Gibson(seqlist, w=[10, 1], method='differential_evolution', circular=True) method of plasmid.designer.Designer instance
    Design primers for gibson assembly
    seqlist = list of sequences to assemble via gibson in order 
    circular = assemble fragments into a circular construct
    returns list of primers

{'Tm': 50, 'nM': 500, 'len': 30, 'window': 40}


`Tm` is the target melting temperature of the overlaps.

`nM` is the concentration of the oligos.

`len` is the overlap length.

`window` is the window of sequences from the fragment edges from which to check for optimal overlaps.

These parameters are used to compute the annealing temperature of the overlap such that the target annealing temperature can be reached and overlaps minimize off target binding.

### Running gibson assembly
The following code blocks generate a new construct from LacI, RFP, and plasmid vector. Gibson is used to generate primers and oligos with optimal annealing overlaps.

In [6]:
def get_parts():
    # slice out the LacI gene
    LacI = pge.read_genbank('../data/xRFP.gb')
    LacI = LacI[LacI['locus_tag'].str.contains('LacI')].splice()

    # slice out the RFP gene
    RFP = pge.read_genbank('../data/dcas9_RFP.gb')
    RFP = RFP[RFP['locus_tag'].str.contains('mRFP')].splice()

    # slice out the origin of replication
    df = pge.read_genbank('../data/xRFP.gb')
    vec = df[df['locus_tag'].str.contains('pSC101')]
    start = vec['start'][0]
    stop = vec['end'][0]
    vec = df[start:stop]
    return LacI, RFP, vec

In [7]:
# Generate the parts which need to be combined
LacI, RFP, vec = get_parts()
seq = []
seq+= [[' ',LacI,'AAAActttt']] # add the LacI gene with 3' flanking sequences
seq+= [[' ',RFP,'CGCCctttt']]  # add the RFP gene with 3' flanking sequences 
seq+= [['',vec,'']]  # add the vector

pcr = pge.Designer()
pcr.params['gibson']['Tm'] = 50     # target annealing temperature of gibson fragments    
pcr.params['gibson']['window'] = 30 # +/i window in bp around frag edges to look for gibson overlap
pcr.params['gibson']['len'] = 20    # length of gibson overlap

pcr.params['xtPCR']['Tm'] = 55         # target annealing temperature for xtPCR
pcr.params['xtPCR']['len'] = [15, 60]  # defines the [min, max] primer lengths
pcr.params['xtPCR']['nM'] = [20, 500]  # defines the [seed, finisher] primer conc in nM
pcr.params['verbose'] = False

res = pcr.Gibson(seq)
print(res)

reading  ../data/xRFP.gb  as genbank file
reading  ../data/dcas9_RFP.gb  as genbank file
reading  ../data/xRFP.gb  as genbank file


res.x [11.51496424  1.67750846 15.96703809]
res.fun -54.0
exclude: []
overlaps: ['CGGGCAGTAAAAAActttt ', 'GTCACTCCACCGGTGCTTAA', 'GATCGACAATGTAAC ATGG']
Tm overlap: [45.828042650749126, 53.47392347753578, 45.13818370718934]
processing primers for frag 0
running fwd


running rev


processing primers for frag 1
running fwd


running rev


parasail_sg_flags_scan_avx2_256_16: s2Len must be > 0


RuntimeError: The map-like callable must be of the form f(func, iterable), returning a sequence of numbers the same length as 'iterable'

In the above output, `fin_F` and `fin_R` are the finishing primers (used at 500nM) for each gibson fragment. These are used independently in their own extension PCR reaction. After amplification, the PCR products are combined with gibson reaction mix to join the DNA oligos.

seq0, seq1, and seq2 are full length fragments with the overlaps which can be order as gene fragments from vendors like Twist or IDT.

<a class="anchor" id="ggate"></a>
## Goldengate Assembly
[Goldengate](https://www.neb.com/applications/cloning-and-synthetic-biology/dna-assembly-and-cloning/golden-gate-assembly) assembly is another method of DNA assembly which relies on restriction enzymes to generate overlaps that can be annealed and ligated. This assembly process is more efficient than gibson assembly, but requires that the DNA oligos do not contain the restriction enzyme sites. Only the edges of the DNA fragments can contain the restriction sites.

The goal of primer design is to find a set of primers which can artificially add restriction enzyme sites and overlaps to a set of DNA oligos that need to be assembled.

In [None]:
pcr = pge.Designer()
help(pcr.GoldenGate)
print(pcr.params['goldengate'])

`Tm` is the target melting temperature of the overlaps.

`nM` is the concentration of the oligos or DNA fragments.

`padding` the extra nucleotide sequence that needs to be added to the 5' and 3' ends of the oligo after the restriction site so the enzymes can cut DNA properly.

`ggsite` is the restriction enzyme to use.


`ggN` is the overlap length generated by the restriction enzyme.

`window` is the window of sequences from the fragment edges from which to check for optimal overlaps.

These parameters are used to compute the annealing temperature of the overlap and primers such that oligos with the proper goldengate sites and overlaps can be generated for cloning.

### Running golden gate assembly
The following code blocks generate a new construct from LacI, RFP, and plasmid vector. Goldengate is used to generate primers that add goldengate sites to the flanks of the oligos. Optimization works by first finding optimal overlaps for annealing the oligos. xtPCR is then used to add the proper goldengate sites to the oligos.

In [10]:
LacI, RFP, vec = get_parts()
seq = []
seq+= [['',LacI,'AAAActttt']]
seq+= [['',RFP,'CGCCctttt']]
seq+= [['',vec,'GGGGctttt']]

pcr = pge.Designer()
pcr.params['goldengate']['window'] = 20 # +/i window in bp around frag edges to look for overlap
pcr.params['goldengate']['ggN'] = 4     # length of golden gate overlap
pcr.params['goldengate']['ggsite'] = 'GGTCTCc'     # golden gate enzyme site
pcr.params['goldengate']['padding'] = 'atatatatgg' # padding around the golden gate site
pcr.params['xtPCR']['len'] = [15, 60]  # defines the [min, max] primer lengths
pcr.params['xtPCR']['nM'] = [20, 500] # defines the [seed, finisher] primer conc in nM
pcr.params['xtPCR']['Tm'] = 55 # defines the [seed, finisher] primer conc in nM

res = pcr.GoldenGate(seq)
print(res)

reading  ../data/xRFP.gb  as genbank file
reading  ../data/dcas9_RFP.gb  as genbank file
reading  ../data/xRFP.gb  as genbank file
res.x [10.34275508  4.10592271  9.27857462]
res.fun -12.0
exclude: []
overlaps: ['AAAA', 'TGCT', 'ACGG']
Tm overlap: [-79.98814282127228, -58.96562554863294, -46.79908254371483]
processing primers for frag 0
running fwd
running rev


  df = fun(x) - f0


processing primers for frag 1
running fwd
running rev
processing primers for frag 2
running fwd
running rev
     locus_tag         Tm                                           sequence   
0  frag0_fin_F  55.851352  atatatatggGGTCTCcACGGGGctttt ATGGTGAATGTGAAACC...  \
1  frag0_fin_R  56.106442             atatatatggGGTCTCcTTT TTACTGCCCGCTTTCCA   
2  frag1_fin_F  55.335316      atatatatggGGTCTCcAAAAActttt ATGGCGAGTAGCGAAGA   
3  frag1_fin_R  56.119961                  atatatatggGGTCTCc AGCACCGGTGGAGTG   
4  frag2_fin_F  55.363272  atatatatggGGTCTCcTGCTTAACGCCctttt CTGTCAGACCAA...   
5  frag2_fin_R  54.627626        atatatatggGGTCTCcCC GTTACATTGTCGATCTGTTCATG   
6         seq0        NaN  atatatatggGGTCTCcACGGGGcttttATGGTGAATGTGAAACCA...   
7         seq1        NaN  atatatatggGGTCTCcAAAAActtttATGGCGAGTAGCGAAGACG...   
8         seq2        NaN  atatatatggGGTCTCcTGCTTAACGCCcttttCTGTCAGACCAAG...   

                  annealed  strand  
0  ATGGTGAATGTGAAACCAGTAAC     1.0  
1        TTACTGCC

  df = fun(x) - f0


In the above output, `fin_F` and `fin_R` are the finishing primers (used at 500nM) for each goldengate fragment. These primers add BsaI sites to each gene fragment. These primers are used independently in their own extension PCR reaction. After amplification, the PCR products are combined with the golden gate enzyme and ligase to join the DNA oligos.

seq0, seq1, and seq2 are full length fragments with the overlaps which can be order as gene fragments from vendors like Twist or IDT.