# Demo for python gene editor

In [1]:
import gene_editor as pge

## Manipulating and displaying genbank files
### Reading genbank or fasta files

Genbank, fasta, and text files can be read using the `plasmid` dataframe class. This function is analogous to `read_csv` in pandas. This data structure uses Biopython in the backend to store genbank information such as sequence and genomic features or tags. The underlying data in Biopython is redisplayed into a more user friendly for splicing, sorting, and concatenating like a pandas dataframe.

In [2]:
df = pge.plasmid.read('xRFP.gb')
df

reading xRFP.gb as genbank file


<class 'gene_editor.plasmid'> at 0x7f35cc64a3b0
molecule_type:DNA
topology:circular
data_file_division:   
date:05-DEC-2022
accessions:['<unknown', 'id>']
keywords:['']
source:
organism:. .
taxonomy:[]
comment:
ApEinfo:methylated:1
                              locus_tag          type        location  length  \
0                         AmpR promoter      promoter       [0:29](+)      29   
1                              AmpR RBS           RBS      [29:70](+)      41   
2                          SpectomycinR           CDS     [70:868](+)     798   
3                             BBa_B0053    terminator    [878:948](+)      70   
4   J23108, 0.51, Constitutive Promoter      promoter  [1032:1067](+)      35   
5          RBS 1.00 strength, BBa_B0034           RBS  [1086:1098](+)      12   
6                       LacI, BBa_C0012           CDS  [1104:2196](+)    1092   
7                             Esp3I_fix  misc_feature  [2072:2078](+)       6   
8         Forward Terminator, BBa_B1002

The raw sequence with color annotations can be displayed with `.print()`. The dataframe can also be directly cast as a string to obtain the raw sequence.

In [3]:
df.print()

[38;2;15;127;254mTTCAAATATCTATCCGCTCATGAGACAAT[39m[38;2;0;255;255mAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAAT[39m[38;2;212;0;55mATGAGTGAAAAAGTGCCCGCCGAGATTTCGGTGCAACTATCACAAGCACTCAACGTCATCGGGCGCCACTTGGAGTCGACGTTGCTGGCCGTGCATTTGTACGGCTCCGCACTGGATGGCGGATTGAAACCGTACAGTGATATTGATTTGCTGGTGACTGTAGCTGCACCGCTCAATGATGCCGTGCGGCAAGCCCTGCTCGTCGATCTCTTGGAGGTTTCAGCTTCCCCTGGCCAAAACAAGGCACTCCGCGCCTTGGAAGTGACCATCGTCGTGCACAGTGACATCGTACCTTGGCGTTATCCGGCCAGGCGGGAACTGCAGTTCGGAGAGTGGCAGCGCAAAGACATCCTTGCGGGCATCTTCGAGCCCGCCACAACCGATTCTGACTTGGCGATTCTGCTAACAAAGGCAAAGCAACATAGCGTCGTCTTGGCAGGTTCAGCAGCGAAGGATCTCTTCAGCTCAGTCCCAGAAAGCGATCTATTCAAGGCACTGGCCGATACTCTGAAGCTATGGAACTCGCCGCCAGATTGGGCGGGCGATGAGCGGAATGTAGTGCTTACTTTGTCTCGTATCTGGTACACCGCAGCAACCGGCAAGATCGCGCCAAAGGATGTTGCTGCCACTTGGGCAATGGCACGCTTGCCAGCTCAACATCAGCCCATCCTGTTGAATGCCAAGCGGGCTTATCTTGGGCAAGAAGAAGATTATTTGCCCGCTCGTGCGGATCAGGTGGCGGCGCTCATTAAATTCGTGAAGTATGAAGCAGTTAAACTGCTTGGTGCCAGCCAATGA[39mTAATACTAGC[38;2;157;27;28mTCCGGCAAAAAAACGGGCAAGGTGTCACCACCCTGCCCT

The underlying sequence can be sliced like a string.

In [4]:
df[0:2411]

<class 'gene_editor.plasmid'> at 0x7f35cc4d74f0
molecule_type:DNA
topology:circular
data_file_division:   
date:05-DEC-2022
accessions:['<unknown', 'id>']
keywords:['']
source:
organism:. .
taxonomy:[]
comment:
ApEinfo:methylated:1
                              locus_tag          type        location  length  \
0                         AmpR promoter      promoter       [0:29](+)      29   
1                              AmpR RBS           RBS      [29:70](+)      41   
2                          SpectomycinR           CDS     [70:868](+)     798   
3                             BBa_B0053    terminator    [878:948](+)      70   
4   J23108, 0.51, Constitutive Promoter      promoter  [1032:1067](+)      35   
5          RBS 1.00 strength, BBa_B0034           RBS  [1086:1098](+)      12   
6                       LacI, BBa_C0012           CDS  [1104:2196](+)    1092   
7                             Esp3I_fix  misc_feature  [2072:2078](+)       6   
8         Forward Terminator, BBa_B1002

In [238]:
df[0:2411].print()

[38;2;15;127;254mTTCAAATATCTATCCGCTCATGAGACAAT[39m[38;2;0;255;255mAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAAT[39m[38;2;212;0;55mATGAGTGAAAAAGTGCCCGCCGAGATTTCGGTGCAACTATCACAAGCACTCAACGTCATCGGGCGCCACTTGGAGTCGACGTTGCTGGCCGTGCATTTGTACGGCTCCGCACTGGATGGCGGATTGAAACCGTACAGTGATATTGATTTGCTGGTGACTGTAGCTGCACCGCTCAATGATGCCGTGCGGCAAGCCCTGCTCGTCGATCTCTTGGAGGTTTCAGCTTCCCCTGGCCAAAACAAGGCACTCCGCGCCTTGGAAGTGACCATCGTCGTGCACAGTGACATCGTACCTTGGCGTTATCCGGCCAGGCGGGAACTGCAGTTCGGAGAGTGGCAGCGCAAAGACATCCTTGCGGGCATCTTCGAGCCCGCCACAACCGATTCTGACTTGGCGATTCTGCTAACAAAGGCAAAGCAACATAGCGTCGTCTTGGCAGGTTCAGCAGCGAAGGATCTCTTCAGCTCAGTCCCAGAAAGCGATCTATTCAAGGCACTGGCCGATACTCTGAAGCTATGGAACTCGCCGCCAGATTGGGCGGGCGATGAGCGGAATGTAGTGCTTACTTTGTCTCGTATCTGGTACACCGCAGCAACCGGCAAGATCGCGCCAAAGGATGTTGCTGCCACTTGGGCAATGGCACGCTTGCCAGCTCAACATCAGCCCATCCTGTTGAATGCCAAGCGGGCTTATCTTGGGCAAGAAGAAGATTATTTGCCCGCTCGTGCGGATCAGGTGGCGGCGCTCATTAAATTCGTGAAGTATGAAGCAGTTAAACTGCTTGGTGCCAGCCAATGA[39mTAATACTAGC[38;2;157;27;28mTCCGGCAAAAAAACGGGCAAGGTGTCACCACCCTGCCCT

Plasmids are circular, not linear. Therefore, when we view a plasmids, it is from a certain origin or reference position. We can choose a new reference position to view the plasmid via `.set_origin`

In [5]:
x = df[0:2411]
print(x.__repr__())
x.print()

<class 'gene_editor.plasmid'> at 0x7f35cc36caf0
molecule_type:DNA
topology:circular
data_file_division:   
date:05-DEC-2022
accessions:['<unknown', 'id>']
keywords:['']
source:
organism:. .
taxonomy:[]
comment:
ApEinfo:methylated:1
                              locus_tag          type        location  length  \
0                         AmpR promoter      promoter       [0:29](+)      29   
1                              AmpR RBS           RBS      [29:70](+)      41   
2                          SpectomycinR           CDS     [70:868](+)     798   
3                             BBa_B0053    terminator    [878:948](+)      70   
4   J23108, 0.51, Constitutive Promoter      promoter  [1032:1067](+)      35   
5          RBS 1.00 strength, BBa_B0034           RBS  [1086:1098](+)      12   
6                       LacI, BBa_C0012           CDS  [1104:2196](+)    1092   
7                             Esp3I_fix  misc_feature  [2072:2078](+)       6   
8         Forward Terminator, BBa_B1002

In [6]:
x = x.set_origin(1032)
print(x.__repr__())
x.print()

<class 'gene_editor.plasmid'> at 0x7f35cc36cc10
molecule_type:DNA
topology:circular
data_file_division:   
date:05-DEC-2022
accessions:['<unknown', 'id>']
keywords:['']
source:
organism:. .
taxonomy:[]
comment:
ApEinfo:methylated:1
                              locus_tag          type        location  length  \
0   J23108, 0.51, Constitutive Promoter      promoter       [0:35](+)      35   
1          RBS 1.00 strength, BBa_B0034           RBS      [54:66](+)      12   
2                       LacI, BBa_C0012           CDS    [72:1164](+)    1092   
3                             Esp3I_fix  misc_feature  [1040:1046](+)       6   
4         Forward Terminator, BBa_B1002    terminator  [1164:1198](+)      34   
5   pLacO2, single operon pLac ZC082818      promoter  [1198:1236](+)      38   
6                                 LacO2  protein_bind  [1206:1223](+)      17   
7                  RFP targeting region  misc_binding  [1236:1256](+)      20   
8          spec/1_xRFPg1_0_xRFPg1_0_.gb

Genes can also be selected and sliced out of the dataframe via bool selection like in pandas 

In [9]:
y = x[x['type']=='terminator']
print(y.__repr__())
y.print()

<class 'gene_editor.plasmid'> at 0x7f35cc385f00
molecule_type:DNA
topology:circular
data_file_division:   
date:05-DEC-2022
accessions:['<unknown', 'id>']
keywords:['']
source:
organism:. .
taxonomy:[]
comment:
ApEinfo:methylated:1
                       locus_tag        type        location  length    color
0  Forward Terminator, BBa_B1002  terminator  [1164:1198](+)      34  [38;2;128;64;0m#804000[39m
1             S. pyog terminator  terminator  [1298:1339](+)      41  [38;2;128;64;0m#804000[39m
2  Forward Terminator, BBa_B1010  terminator  [1339:1379](+)      40  [38;2;128;64;0m#804000[39m
3                      BBa_B0053  terminator  [2257:2327](+)      70  [38;2;157;27;28m#9d1b1c[39m
total length:2411

CTGACAGCTAGCTCAGTCCTAGGTATAATGCTAGCAAAAGAATTCAAAAGATCTAAAGAGGAGAAAGGATCTATGGTGAATGTGAAACCAGTAACGTTATACGATGTCGCAGAGTATGCCGGTGTCTCTTATCAGACCGTTTCCCGCGTGGTGAACCAGGCCAGCCACGTTTCTGCGAAAACGCGGGAAAAAGTGGAAGCGGCGATGGCGGAGCTGAATTACATTCCCAACCGCGTGGCACAACAACTGGCGGGCAAACAGTCGTTGCTGATTG

In [10]:
y = x[x['locus_tag'].str.contains('LacI')]
y = y.slice()
print(y.__repr__())
y.print()

<class 'gene_editor.plasmid'> at 0x7f35cc386ec0
molecule_type:DNA
topology:circular
data_file_division:   
date:05-DEC-2022
accessions:['<unknown', 'id>']
keywords:['']
source:
organism:. .
taxonomy:[]
comment:
ApEinfo:methylated:1
         locus_tag type     location  length    color
0  LacI, BBa_C0012  CDS  [0:1092](+)    1092  [38;2;252;102;101m#fc6665[39m
total length:1092

[38;2;252;102;101mATGGTGAATGTGAAACCAGTAACGTTATACGATGTCGCAGAGTATGCCGGTGTCTCTTATCAGACCGTTTCCCGCGTGGTGAACCAGGCCAGCCACGTTTCTGCGAAAACGCGGGAAAAAGTGGAAGCGGCGATGGCGGAGCTGAATTACATTCCCAACCGCGTGGCACAACAACTGGCGGGCAAACAGTCGTTGCTGATTGGCGTTGCCACCTCCAGTCTGGCCCTGCACGCGCCGTCGCAAATTGTCGCGGCGATTAAATCTCGCGCCGATCAACTGGGTGCCAGCGTGGTGGTGTCGATGGTAGAACGAAGCGGCGTCGAAGCCTGTAAAGCGGCGGTGCACAATCTTCTCGCGCAACGCGTCAGTGGGCTGATCATTAACTATCCGCTGGATGACCAGGATGCCATTGCTGTGGAAGCTGCCTGCACTAATGTTCCGGCGTTATTTCTTGATGTCTCTGACCAGACACCCATCAACAGTATTATTTTCTCCCATGAAGACGGTACGCGACTGGGCGTGGAGCATCTGGTCGCATTGGGTCACCAGCAAATCGCGCTGTTAGCGGGCCCATTAAGTTCTGTCTCGGCGCGTCTGC

DNA sequences can be translated to amino acid sequences via `.translate()`

In [11]:
y = x[x['locus_tag'].str.contains('LacI')].slice()
y.print()
y.translate()

[38;2;252;102;101mATGGTGAATGTGAAACCAGTAACGTTATACGATGTCGCAGAGTATGCCGGTGTCTCTTATCAGACCGTTTCCCGCGTGGTGAACCAGGCCAGCCACGTTTCTGCGAAAACGCGGGAAAAAGTGGAAGCGGCGATGGCGGAGCTGAATTACATTCCCAACCGCGTGGCACAACAACTGGCGGGCAAACAGTCGTTGCTGATTGGCGTTGCCACCTCCAGTCTGGCCCTGCACGCGCCGTCGCAAATTGTCGCGGCGATTAAATCTCGCGCCGATCAACTGGGTGCCAGCGTGGTGGTGTCGATGGTAGAACGAAGCGGCGTCGAAGCCTGTAAAGCGGCGGTGCACAATCTTCTCGCGCAACGCGTCAGTGGGCTGATCATTAACTATCCGCTGGATGACCAGGATGCCATTGCTGTGGAAGCTGCCTGCACTAATGTTCCGGCGTTATTTCTTGATGTCTCTGACCAGACACCCATCAACAGTATTATTTTCTCCCATGAAGACGGTACGCGACTGGGCGTGGAGCATCTGGTCGCATTGGGTCACCAGCAAATCGCGCTGTTAGCGGGCCCATTAAGTTCTGTCTCGGCGCGTCTGCGTCTGGCTGGCTGGCATAAATATCTCACTCGCAATCAAATTCAGCCGATAGCGGAACGGGAAGGCGACTGGAGTGCCATGTCCGGTTTTCAACAAACCATGCAAATGCTGAATGAGGGCATCGTTCCCACTGCGATGCTGGTTGCCAACGATCAGATGGCGCTGGGCGCAATGCGCGCCATTACCGAGTCCGGGCTGCGCGTTGGTGCGGATATCTCGGTAGTGGGATACGACGATACCGAAGACAGCTCATGTTATATCCCGCCGTTAACCACCATCAAACAGGATTTTCGCCTGCTGGGGCAAACCAGCGTGGACCGCTTGCTGCAACTCTCTCAGGGCCAGGCGGTGAAGGGCAATCAGCTGTTGCCCGTTTCACTGGTG

'MVNVKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQQLAGKQSLLIGVATSSLALHAPSQIVAAIKSRADQLGASVVVSMVERSGVEACKAAVHNLLAQRVSGLIINYPLDDQDAIAVEAACTNVPALFLDVSDQTPINSIIFSHEDGTRLGVEHLVALGHQQIALLAGPLSSVSARLRLAGWHKYLTRNQIQPIAEREGDWSAMSGFQQTMQMLNEGIVPTAMLVANDQMALGAMRAITESGLRVGADISVVGYDDTEDSSCYIPPLTTIKQDFRLLGQTSVDRLLQLSQGQAVKGNQLLPVSLVKRKTTLAPNTRTASPRALADSLMQLARQVSRLESGQ*'

### Editing genes and generating new constructs
New genes and constructs can be generated by combining genetic parts via concatenation.

In [12]:
# slice out the RFP gene
RFP = pge.plasmid.read('dcas9_RFP.gb')
RFP = RFP[RFP['locus_tag'].str.contains('mRFP')].slice()
print(RFP.__repr__())
# slice out the ribosome binding site
RBS = pge.plasmid.read('xRFP.gb')
RBS = RBS[RBS['locus_tag'].str.contains('BBa_B0034')].slice()
print(RBS.__repr__())
# slice out the promoter
pLac = pge.plasmid.read('xRFP.gb')
pLac = pLac[pLac['locus_tag'].str.contains('pLac')].slice()
print(pLac.__repr__())

# assemble the promoter, rbs, and mRFP
df = pLac + 'gagacc' + RBS + 'ggtctc' + RFP
print(df.__repr__())
df.print()

reading dcas9_RFP.gb as genbank file
<class 'gene_editor.plasmid'> at 0x7f35cc4d7640
molecule_type:DNA
topology:circular
data_file_division:   
date:05-DEC-2022
accessions:['<unknown', 'id>']
keywords:['']
source:
organism:. .
taxonomy:[]
comment:
ApEinfo:methylated:1
                         locus_tag type    location  length    color
0  mRFP, uniprot drFP583, pdb 2H5O  CDS  [0:678](+)     678  [38;2;128;0;64m#800040[39m
total length:678

reading xRFP.gb as genbank file
<class 'gene_editor.plasmid'> at 0x7f35cc64a0b0
molecule_type:DNA
topology:circular
data_file_division:   
date:05-DEC-2022
accessions:['<unknown', 'id>']
keywords:['']
source:
organism:. .
taxonomy:[]
comment:
ApEinfo:methylated:1
                      locus_tag type   location  length color
0  RBS 1.00 strength, BBa_B0034  RBS  [0:12](+)      12  [38;2;0;255;255mcyan[39m
total length:12

reading xRFP.gb as genbank file
<class 'gene_editor.plasmid'> at 0x7f35ccdeca90
molecule_type:DNA
topology:circular
data_file_d

## Annotating features and writing genbank files
Annotation of new genes or features can be done with the `.annotate` function. The data can be written to genbank files via `.to_genbank`

In [246]:
help(df.annotate)

Help on method annotate in module gene_editor:

annotate(file='', label='', sequence='', pos=[], feature='unknown', color=['cyan', 'dodgerblue'], circular=True, inplace=False) method of gene_editor.plasmid instance
    Adds annotations to a plasmid using a parts library
    label = name of the gene
    sequence = DNA sequence of the feature, which will be used for matching on the plasmid
    pos = position of the genetic feature [start, end, strand]
    feature = type of genetic feature such as cds, mRNA, primer_bind
    color = [fwd_color, rev_color] to use
    If sequence is not provided, start and end are used. Otherwise, sequence overrides start, end, and strand options.
    inplace = performs modifications inplace
    returns a modified plasmid dataframe



In [13]:
print(df.__repr__())
df.print()

<class 'gene_editor.plasmid'> at 0x7f35cc384970
molecule_type:DNA
topology:circular
                             locus_tag      type     location  length    color
0  pLacO2, single operon pLac ZC082818  promoter    [0:38](+)      38  [38;2;0;128;128m#008080[39m
1         RBS 1.00 strength, BBa_B0034       RBS   [44:56](+)      12     [38;2;0;255;255mcyan[39m
2      mRFP, uniprot drFP583, pdb 2H5O       CDS  [62:740](+)     678  [38;2;128;0;64m#800040[39m
total length:740

[38;2;0;128;128mAATTGACAATGTGAGCGAGTAACAAGATACTGAGCACA[39mgagacc[38;2;0;255;255mAAAGAGGAGAAA[39mggtctc[38;2;128;0;64mATGGCGAGTAGCGAAGACGTTATCAAAGAGTTCATGCGTTTCAAAGTTCGTATGGAAGGTTCCGTTAACGGTCACGAGTTCGAAATCGAAGGTGAAGGTGAAGGTCGTCCGTACGAAGGTACCCAGACCGCTAAACTGAAAGTTACCAAAGGTGGTCCGCTGCCGTTCGCTTGGGACATCCTGTCCCCGCAGTTCCAGTACGGTTCCAAAGCTTACGTTAAACACCCGGCTGACATCCCGGACTACCTGAAACTGTCCTTCCCGGAAGGTTTCAAATGGGAACGTGTTATGAACTTCGAAGACGGTGGTGTTGTTACCGTTACCCAGGACTCCTCCCTGCAAGACGGTGAGTTCATCTACAAAGTTAAACTGCGTGGTACCAACTTCCCGTCCGA

In [14]:
df = df.annotate(label='BsaI', sequence='GGTCTC', color=['red','orange'], feature='protein_bind')
df = df.drop_duplicates()
print(df.__repr__())
df.print()
df.to_genbank('demo_RFP.gb')

<class 'gene_editor.plasmid'> at 0x7f35cc3d6d10
molecule_type:DNA
topology:circular
                             locus_tag          type     location  length  \
0  pLacO2, single operon pLac ZC082818      promoter    [0:38](+)      38   
1                                 BsaI  protein_bind   [38:44](-)       6   
2         RBS 1.00 strength, BBa_B0034           RBS   [44:56](+)      12   
3                                 BsaI  protein_bind   [56:62](+)       6   
4      mRFP, uniprot drFP583, pdb 2H5O           CDS  [62:740](+)     678   

     color  
0  [38;2;0;128;128m#008080[39m  
1   [38;2;255;165;0morange[39m  
2     [38;2;0;255;255mcyan[39m  
3      [38;2;255;0;0mred[39m  
4  [38;2;128;0;64m#800040[39m  
total length:740

[38;2;0;128;128mAATTGACAATGTGAGCGAGTAACAAGATACTGAGCACA[39m[38;2;255;165;0mgagacc[39m[38;2;0;255;255mAAAGAGGAGAAA[39m[38;2;255;0;0mggtctc[39m[38;2;128;0;64mATGGCGAGTAGCGAAGACGTTATCAAAGAGTTCATGCGTTTCAAAGTTCGTATGGAAGGTTCCGTTAACGGTCACGAGTTCGAAAT

In [15]:
with open('demo_RFP.gb','r') as f:
    text = f.read()
print(text)

LOCUS       .                        740 bp    DNA     circular UNK 01-JAN-1980
DEFINITION  .
ACCESSION   <unknown id>
VERSION     <unknown id>
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     promoter        1..38
                     /locus_tag="pLacO2, single operon pLac ZC082818"
                     /ApEinfo_label="pLacO2, single operon pLac ZC082818"
                     /ApEinfo_fwdcolor="#008080"
                     /ApEinfo_revcolor="#008080"
     protein_bind    complement(39..44)
                     /locus_tag="BsaI"
                     /ApEinfo_fwdcolor="red"
                     /ApEinfo_revcolor="orange"
                     /ApEinfo_label="BsaI"
     RBS             45..56
                     /locus_tag="RBS 1.00 strength, BBa_B0034"
                     /ApEinfo_label="RBS 1.00 strength, BBa_B0034"
                     /ApEinfo_fwdcolor="cyan"
                     /ApEinfo_revcolor="cyan"
     protein_bind    57..

Annotations can also search for amino acid sequences.

In [16]:
print(RFP.translate())

df = pLac + 'gagacc' + RBS + 'ggtctc' + RFP
df = df.annotate(label='peptide', sequence='DGALKGEIKMRLKLKDG', color='orange')
df = df.drop_duplicates()
df.print()

MASSEDVIKEFMRFKVRMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFQYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGVVTVTQDSSLQDGEFIYKVKLRGTNFPSDGPVMQKKTMGWEASTERMYPEDGALKGEIKMRLKLKDGGHYDAEVKTTYMAKKPVQLPGAYKTDIKLDITSHNEDYTIVEQYERAEGRHSTGA*
[38;2;0;128;128mAATTGACAATGTGAGCGAGTAACAAGATACTGAGCACA[39mgagacc[38;2;0;255;255mAAAGAGGAGAAA[39mggtctc[38;2;128;0;64mATGGCGAGTAGCGAAGACGTTATCAAAGAGTTCATGCGTTTCAAAGTTCGTATGGAAGGTTCCGTTAACGGTCACGAGTTCGAAATCGAAGGTGAAGGTGAAGGTCGTCCGTACGAAGGTACCCAGACCGCTAAACTGAAAGTTACCAAAGGTGGTCCGCTGCCGTTCGCTTGGGACATCCTGTCCCCGCAGTTCCAGTACGGTTCCAAAGCTTACGTTAAACACCCGGCTGACATCCCGGACTACCTGAAACTGTCCTTCCCGGAAGGTTTCAAATGGGAACGTGTTATGAACTTCGAAGACGGTGGTGTTGTTACCGTTACCCAGGACTCCTCCCTGCAAGACGGTGAGTTCATCTACAAAGTTAAACTGCGTGGTACCAACTTCCCGTCCGACGGTCCGGTTATGCAGAAAAAAACCATGGGTTGGGAAGCTTCCACCGAACGTATGTACCCGGAA[39m[38;2;255;165;0mGACGGTGCTCTGAAAGGTGAAATCAAAATGCGTCTGAAACTGAAAGACGGT[39m[38;2;128;0;64mGGTCACTACGACGCTGAAGTTAAAACCACCTACATGGCTAAAAAACCGGTTCAGCTGCCGGGTGCTTACAAAACCGACATCAAACTGGACATCACCTC



## Extension PCR
This library provides functionality for designing primers for cloning the DNA constructs using the `Design` class.

In [18]:
pcr = pge.Design()
help(pcr.xtPCR)

Help on method xtPCR in module gene_editor:

xtPCR(fL, seq, fR=None, padding=[2, 2], niter=3, w=[10, 100, 1, 1, 2], verbose=False, get_cost=False) method of gene_editor.Design instance
    Find primers which can seed and extend a PCR fragment
    fL = flanking sequence on 5' end
    seq = sequence on 3' end which gets amplified
    fR = flanking sequence on 3' end
    padding = number of extra primers to try
    w = weights for cost function
    method = optimization method
    returns list of primers



The following shows to obtain extension PCR primers that will add promoter and rbs sequences to the RFP gene.

In [19]:
# slice out the RFP gene
RFP = pge.plasmid.read('dcas9_RFP.gb')
RFP = RFP[RFP['locus_tag'].str.contains('mRFP')].slice()
# slice out the ribosome binding site
RBS = pge.plasmid.read('xRFP.gb')
RBS = RBS[RBS['locus_tag'].str.contains('BBa_B0034')].slice()
# slice out the promoter
pLac = pge.plasmid.read('xRFP.gb')
pLac = pLac[pLac['locus_tag'].str.contains('pLac')].slice()

# assemble the promoter, rbs, and mRFP
df = pLac + 'gagacc' + RBS + 'ggtctc' + RFP
print(df.__repr__())
df.print()

reading dcas9_RFP.gb as genbank file
reading xRFP.gb as genbank file
reading xRFP.gb as genbank file
<class 'gene_editor.plasmid'> at 0x7f35cc3f25f0
molecule_type:DNA
topology:circular
                             locus_tag      type     location  length    color
0  pLacO2, single operon pLac ZC082818  promoter    [0:38](+)      38  [38;2;0;128;128m#008080[39m
1         RBS 1.00 strength, BBa_B0034       RBS   [44:56](+)      12     [38;2;0;255;255mcyan[39m
2      mRFP, uniprot drFP583, pdb 2H5O       CDS  [62:740](+)     678  [38;2;128;0;64m#800040[39m
total length:740

[38;2;0;128;128mAATTGACAATGTGAGCGAGTAACAAGATACTGAGCACA[39mgagacc[38;2;0;255;255mAAAGAGGAGAAA[39mggtctc[38;2;128;0;64mATGGCGAGTAGCGAAGACGTTATCAAAGAGTTCATGCGTTTCAAAGTTCGTATGGAAGGTTCCGTTAACGGTCACGAGTTCGAAATCGAAGGTGAAGGTGAAGGTCGTCCGTACGAAGGTACCCAGACCGCTAAACTGAAAGTTACCAAAGGTGGTCCGCTGCCGTTCGCTTGGGACATCCTGTCCCCGCAGTTCCAGTACGGTTCCAAAGCTTACGTTAAACACCCGGCTGACATCCCGGACTACCTGAAACTGTCCTTCCCGGAAGGTTTCAAATGGGAACGTGTTATGAAC

In [21]:
pcr = pge.Design()
pcr.params['xtPCR']['Tm'] = 55         # target annealing temperature for xtPCR
pcr.params['xtPCR']['len'] = [15, 60]  # defines the [min, max] primer lengths

insert = pLac + 'gagacc' + RBS + 'ggtctc'
res = pcr.xtPCR(insert, RFP, ' ')
print(res)
print(res.values)

running fwd
running rev
  locus_tag         Tm                                           sequence  \
0       0_F  56.437456  ACAgagaccAAAGAGGAGAAAggtctc ATGGCGAGTAGCGAAGAC...   
1     fin_F  55.207985  AATTGACAATGTGAGCGAGTAACAAGATACTGAGC ACAgagaccA...   
0     fin_R  56.777386                                 TTAAGCACCGGTGGAGTG   

                   annealed  strand  
0  ATGGCGAGTAGCGAAGACGTTATC       1  
1     ACAgagaccAAAGAGGAGAAA       1  
0        TTAAGCACCGGTGGAGTG      -1  
[['0_F' 56.43745579420619
  'ACAgagaccAAAGAGGAGAAAggtctc ATGGCGAGTAGCGAAGACGTTATC'
  'ATGGCGAGTAGCGAAGACGTTATC' 1]
 ['fin_F' 55.207984641427004
  'AATTGACAATGTGAGCGAGTAACAAGATACTGAGC ACAgagaccAAAGAGGAGAAA'
  'ACAgagaccAAAGAGGAGAAA' 1]
 ['fin_R' 56.777386483231 '  TTAAGCACCGGTGGAGTG' 'TTAAGCACCGGTGGAGTG' -1]]


## Gibson assembly
The following shows how to design primers for gibson assembly.

In [254]:
help(pcr.Gibson)

Help on method Gibson in module gene_editor:

Gibson(seqlist, w=[10, 1], method='differential_evolution', circular=True) method of gene_editor.Design instance
    Design primers for gibson assembly
    seqlist = list of sequences to assemble via gibson in order 
    circular = assemble fragments into a circular construct
    returns list of primers



In [255]:
# slice out the LacI gene
LacI = pge.plasmid.read('xRFP.gb')
LacI = LacI[LacI['locus_tag'].str.contains('LacI')].slice()

# slice out the RFP gene
RFP = pge.plasmid.read('dcas9_RFP.gb')
RFP = RFP[RFP['locus_tag'].str.contains('mRFP')].slice()

# slice out the origin of replication
df = pge.plasmid.read('xRFP.gb')
vec = df[df['locus_tag'].str.contains('pSC101')]
start = vec['start'][0]
stop = vec['end'][0]
vec = df[start:stop]

reading xRFP.gb as genbank file
reading dcas9_RFP.gb as genbank file
reading xRFP.gb as genbank file


In [256]:
seq = []
seq+= [[' ',LacI,'AAAActttt']]
seq+= [[' ',RFP,'CGCCctttt']]
seq+= [[' ',vec,'GGGGctttt']]

pcr = pge.Design()
pcr.params['gibson']['Tm'] = 50     # target annealing temperature of gibson fragments    
pcr.params['gibson']['window'] = 30 # +/i window in bp around frag edges to look for gibson overlap
pcr.params['gibson']['len'] = 20    # length of gibson overlap

pcr.params['xtPCR']['Tm'] = 55         # target annealing temperature for xtPCR
pcr.params['xtPCR']['len'] = [15, 60]  # defines the [min, max] primer lengths
pcr.params['xtPCR']['nM'] = [20, 500] # defines the [seed, finisher] primer conc in nM

res = pcr.Gibson(seq)
print(res)

Invalid input provided for ggsite
res.x [10.17460988 14.62532845 27.85966033]
res.fun -57.0
exclude: []
overlaps: ['GCGGGCAGTAAAAAActttt', 'TGCTTAACGCCctttt CTG', 'ttt ATGGTGAATGTGAAAC']
Tm overlap: [49.290031925644485, 49.73623253267522, 42.289058891654236]
processing primers for frag 0
running fwd
running rev
processing primers for frag 1
running fwd
running rev
processing primers for frag 2
running fwd
running rev
     locus_tag         Tm                                           sequence  \
0  frag0_fin_F  55.851352                       ttt  ATGGTGAATGTGAAACCAGTAAC   
1  frag0_fin_R  56.106442                        aaaagTTTT TTACTGCCCGCTTTCCA   
2  frag1_fin_F  55.335316            GCGGGCAGTAAAAAActttt  ATGGCGAGTAGCGAAGA   
3  frag1_fin_R  56.777386                   CAG aaaagGGCG TTAAGCACCGGTGGAGTG   
4  frag2_fin_F  55.363272            TGCTTAACGCCctttt  CTGTCAGACCAAGTTTACGAG   
5  frag2_fin_R  54.627626  GTTTCACATTCACCAT aaaagCCCC GTTACATTGTCGATCTGTT...   
6         seq0     

## Golden gate assembly
The following shows how to design primers for golden gate assembly

In [257]:
help(pcr.GoldenGate)

Help on method GoldenGate in module gene_editor:

GoldenGate(seqlist, exclude=[], w=[0, 1], circular=True) method of gene_editor.Design instance
    Design primers for goldengate assembly
    seqlist = list of sequences to assemble
    exclude = sites to exclude
    circular = assemble fragments into a circular construct
    verbose = print out assembled construct and highlight plasmid locations
    returns list of primers



In [258]:
# slice out the LacI gene
LacI = pge.plasmid.read('xRFP.gb')
LacI = LacI[LacI['locus_tag'].str.contains('LacI')].slice()

# slice out the RFP gene
RFP = pge.plasmid.read('dcas9_RFP.gb')
RFP = RFP[RFP['locus_tag'].str.contains('mRFP')].slice()

# slice out the origin of replication
df = pge.plasmid.read('xRFP.gb')
vec = df[df['locus_tag'].str.contains('pSC101')]
start = vec['start'][0]
stop = vec['end'][0]
vec = df[start:stop]

reading xRFP.gb as genbank file
reading dcas9_RFP.gb as genbank file
reading xRFP.gb as genbank file


In [259]:
seq = []
seq+= [['',LacI,'AAAActttt']]
seq+= [['',RFP,'CGCCctttt']]
seq+= [['',vec,'GGGGctttt']]

pcr = pge.Design()
pcr.params['goldengate']['window'] = 20 # +/i window in bp around frag edges to look for overlap
pcr.params['goldengate']['ggN'] = 4     # length of golden gate overlap
pcr.params['goldengate']['ggsite'] = 'GGTCTCc'     # golden gate enzyme site
pcr.params['goldengate']['padding'] = 'atatatatgg' # padding around the golden gate site
pcr.params['xtPCR']['len'] = [15, 60]  # defines the [min, max] primer lengths
pcr.params['xtPCR']['nM'] = [20, 500] # defines the [seed, finisher] primer conc in nM
pcr.params['xtPCR']['Tm'] = 55 # defines the [seed, finisher] primer conc in nM

res = pcr.GoldenGate(seq)
print(res)

res.x [16.93129484 22.60276769 10.20414393]
res.fun -12.0
exclude: []
overlaps: ['tttt', 'GTCA', 'CGGG']
Tm overlap: [-79.98814282127228, -59.96603763048182, -39.688627672698374]
processing primers for frag 0
running fwd
running rev
processing primers for frag 1
running fwd
running rev
processing primers for frag 2
running fwd
running rev
     locus_tag         Tm                                           sequence  \
0  frag0_fin_F  55.851352  atatatatggGGTCTCcCGGGGctttt ATGGTGAATGTGAAACCA...   
1  frag0_fin_R  56.106442       atatatatggGGTCTCcaaaagTTTT TTACTGCCCGCTTTCCA   
2  frag1_fin_F  55.335316            atatatatggGGTCTCctttt ATGGCGAGTAGCGAAGA   
3  frag1_fin_R  56.777386  atatatatggGGTCTCcTGACAGaaaagGGCG TTAAGCACCGGTG...   
4  frag2_fin_F  57.814294           atatatatggGGTCTCc GTCAGACCAAGTTTACGAGCTC   
5  frag2_fin_R  54.627626       atatatatggGGTCTCcCCC GTTACATTGTCGATCTGTTCATG   
6         seq0        NaN  atatatatggGGTCTCcCGGGGcttttATGGTGAATGTGAAACCAG...   
7         seq1     

  df = fun(x) - f0
