In [1]:
import sys 
sys.path.append("../../")
from QUEEN.queen import *
if "output" not in os.listdir("./"):
    os.mkdir("output")

----
#### Example code 1: Create a QUEEN class object (blunt-ends) 
A `QUEEN_object` (blunt-end) is created by providing its top-stranded sequence (5’-to-3’). By default, the DNA topology will be linear.

In [2]:
dna = QUEEN(seq="CCGGTATGCGTCGA")

-----
#### Example code 2: Create a QUEEN class object (sticky-end)
The left and right values separated by `"/"` show the top and bottom strand sequences of the generating `QUEEN_object`, respectively. The top strand sequence is provided in the 5’-to-3’ direction from left to right, whereas the bottom strand sequence is provided in the 3′-to-5′ direction from left to right. Single-stranded regions can be provided by `"-"` for the corresponding nucleotide positions on the opposite strands. A:T and G:C base-pairing rule is required between the two strings except for the single-stranded positions.

In [3]:
dna = QUEEN(seq="CCGGTATGCG----/----ATACGCAGCT") 

----
#### Example code 3: Create a circular QUEEN object
The sequence toology of generating `QUEEN_object` can be specified by `"linear"` or `"circular"`.

In [4]:
dna = QUEEN(seq="CCGGTATGCGTCGA", topology="circular") 

----
#### Example code 4.1: Create a QUEEN class object from a GenBank file in a local directory 
GenBank file can be loaded by specifying its local file path.

In [5]:
plasmid  = QUEEN(record="input/pX330.gbk")

#### Example code 4.2: Create a QUEEN class object using a NCBI accession number
`QUEEN_object` can be generated from a NCBI accession number with `dbtype="ncbi"`. 

In [6]:
pUC19 = QUEEN(record="M77789.2", dbtype="ncbi")

#### Example code 4.3: Create a QUEEN class object using an Addgene plasmid ID
`QUEEN_object` can be generated from an Addgene plasmid ID with `dbtype="addgene"`.

In [7]:
pUC19 = QUEEN(record="50005", dbtype="addgene")

#### Example code 4.4: Create a QUEEN class object from a Benchling share link
`QUEEN_object` can be generated from a Benchling shared link with `dbtype="benchling"`.

In [8]:
plasmid = QUEEN(record="https://benchling.com/s/seq-U4pePb09KHutQzjyOPQV", dbtype="benchling")

pX330 plasmid encoding a Cas9 gene and a gRNA expression unit is provided in the above example. The `QUEEN_object` generated here is used in the following example codes in this document.

----
#### Example code 5: Print a dsDNA object

In [9]:
fragment = QUEEN(seq="CCGGTATGCG----/----ATACGCAGCT") 
fragment.printsequence(display=True)

5' CCGGTATGCG---- 3'
3' ----ATACGCAGCT 5'



'CCGGTATGCG----/----ATACGCAGCT'

----
#### Example code 6: Print DNA features in a well-formatted table

In [10]:
plasmid.printfeature()

feature_id  feature_type   qualifier:label     start  end   strand  
1           source         source              0      8484  +       
100         primer_bind    hU6-F               0      21    +       
200         promoter       U6 promoter         0      241   +       
300         primer_bind    LKO.1 5'            171    191   +       
400         misc_RNA       gRNA scaffold       267    343   +       
500         enhancer       CMV enhancer        439    725   +       
600         intron         hybrid intron       983    1211  +       
700         regulatory     Kozak sequence      1222   1232  +       
800         CDS            3xFLAG              1231   1297  +       
900         CDS            SV40 NLS            1303   1324  +       
1000        CDS            Cas9                1348   5449  +       
1100        CDS            nucleoplasmin NLS   5449   5497  +       
1200        primer_bind    BGH-rev             5524   5542  -       
1300        polyA_signal   bGH pol

----
#### Example code 7: Search for a DNA sequence motif with regular expression

In [11]:
match_list = plasmid.searchsequence(query="G[ATGC]{19}GGG")
plasmid.printfeature(match_list, seq=True, attribute=["start", "end", "strand"])

start  end   strand  sequence                 
115    138   +       GTAGAAAGTAATAATTTCTTGGG  
523    546   +       GACTTTCCATTGACGTCAATGGG  
816    839   +       GTGCAGCGATGGGGGCGGGGGGG  
1372   1395  +       GACATCGGCACCAACTCTGTGGG  
1818   1841  +       GGCCCACATGATCAAGTTCCGGG  
3097   3120  +       GATCGGTTCAACGCCTCCCTGGG  
3300   3323  +       GCGGCGGAGATACACCGGCTGGG  
3336   3359  +       GAAGCTGATCAACGGCATCCGGG  
3529   3552  +       GGCAGCCCCGCCATTAAGAAGGG  
3577   3600  +       GACGAGCTCGTGAAAGTGATGGG  
3640   3663  +       GAGAACCAGACCACCCAGAAGGG  
3697   3720  +       GAAGAGGGCATCAAAGAGCTGGG  
3783   3806  +       GTACTACCTGCAGAATGGGCGGG  
3915   3938  +       GACCAGAAGCGACAAGAACCGGG  
4303   4326  +       GCCTACCTGAACGCCGTCGTGGG  
4552   4575  +       GGGGAGATCGTGTGGGATAAGGG  
4701   4724  +       GATCGCCAGAAAGAAGGACTGGG  
4777   4800  +       GTGGTGGCCAAAGTGGAAAAGGG  
5217   5240  +       GTCCGCCTACAACAAGCACCGGG  
5653   5676  +       GTAGGTGTCATTCTATTCTGGGG  
5679   5702  

-----
#### Example code 8: Search DNA sequences with a fuzzy matching
Search for `"AAAAAAAA"` sequence, permitting a single nucleotide mismatch.

In [12]:
match_list = plasmid.searchsequence(query="(?:AAAAAAAA){s<=1}")
plasmid.printfeature(match_list, seq=True)

feature_id  feature_type  qualifier:label  start  end   strand  sequence  
null        misc_feature  null             5484   5492  +       AAAAAAGA  
null        misc_feature  null             6369   6377  +       AACAAAAA  
null        misc_feature  null             7872   7880  +       AAACAAAA  
null        misc_feature  null             346    354   -       AAAACAAA  
null        misc_feature  null             799    807   -       AAAAAATA  
null        misc_feature  null             1201   1209  -       GAAAAAAA  
null        misc_feature  null             6716   6724  -       AAAAATAA  
null        misc_feature  null             7844   7852  -       AGAAAAAA  



---- 
#### Example code 9: Search for a DNA sequence with the IUPAC nucleotide code

In [13]:
match_list = plasmid.searchsequence(query="SWSWSWDSDSBHBRHH")
plasmid.printfeature(match_list, seq=True)

feature_id  feature_type  qualifier:label  start  end   strand  sequence          
null        misc_feature  null             4098   4114  +       GAGACAGCTGGTGGAA  
null        misc_feature  null             3550   3566  -       CTGTCTGCAGGATGCC  
null        misc_feature  null             5239   5255  -       CTCTGATGGGCTTATC  
null        misc_feature  null             6415   6431  -       GAGAGTGCACCATAAA  
null        misc_feature  null             8357   8373  -       GTCAGAGGTGGCGAAA  



----
#### Example code 10: Search for sequence features having specific attribute values   
Search for DNAfeature_objects` with a feature type `"primer_bind"`, and then further screen ones holding a specific string in "qualifiers:label".

In [14]:
feature_list = plasmid.searchfeature(key_attribute="feature_type", query="primer_bind")
plasmid.printfeature(feature_list)
sub_feature_list = plasmid.searchfeature(key_attribute="qualifier:label", query=".+-R$", source=feature_list)
plasmid.printfeature(sub_feature_list)

feature_id  feature_type  qualifier:label  start  end   strand  
100         primer_bind   hU6-F            0      21    +       
300         primer_bind   LKO.1 5'         171    191   +       
1200        primer_bind   BGH-rev          5524   5542  -       
1700        primer_bind   F1ori-R          6048   6068  -       
1800        primer_bind   F1ori-F          6258   6280  +       
1900        primer_bind   pRS-marker       6433   6453  -       
2000        primer_bind   pGEX 3'          6552   6575  +       
2100        primer_bind   pBRforEco        6612   6631  -       
2400        primer_bind   Amp-R            7021   7041  -       
2600        primer_bind   pBR322ori-F      8323   8343  +       

feature_id  feature_type  qualifier:label  start  end   strand  
1700        primer_bind   F1ori-R          6048   6068  -       
2400        primer_bind   Amp-R            7021   7041  -       



----
#### Example code 11: Cut pX330 plasmid at multiple positions
Cut a circular plasmid px330 at the three different positions, resulting in the generation of three fragments. Then, cut one of the three fragments again.

In [15]:
print(plasmid)
fragments = cutdna(plasmid ,1000, 2000, 4000)
print(fragments)
fragment3, fragment4 = cutdna(fragments[1], 500)
print(fragment3)
print(fragment4)

<queen.QUEEN object; project='pX330_0', length='8484 bp', topology='circular'>
[<queen.QUEEN object; project='pX330_1', length='1000 bp', topology='linear'>, <queen.QUEEN object; project='pX330_2', length='2000 bp', topology='linear'>, <queen.QUEEN object; project='pX330_3', length='5484 bp', topology='linear'>]
<queen.QUEEN object; project='pX330_4', length='500 bp', topology='linear'>
<queen.QUEEN object; project='pX330_5', length='1500 bp', topology='linear'>


If an invalid cut pattern are provided, an error message will be returned.  

In [16]:
#fragments = cutdna(plasmid, *["50/105", "100/55", "120/110"])

-----
#### Example code 12: Digest pX330 plasmid by EcoRI
Digestion of pX330 plasmid with EcoRI can be simulated as follows.
1. Search for EcoRI recognition sites in pX330 with its cut motif and obtain the `DNAfeature_objects` representing its cut position(s) and motif.
2. Use the `DNAfeature_objects` to cut pX330 by `cutdna()`.

In [17]:
sites     = plasmid.searchsequence("G^AATT_C")
fragments = cutdna(plasmid, *sites)
for fragment in fragments:
    print(fragment)
    fragment.printsequence(display=True, hide_middle=10)

<queen.QUEEN object; project='pX330_6', length='8488 bp', topology='linear'>
5' AATTCCTAGA...AGTAAG---- 3'
3' ----GGATCT...TCATTCTTAA 5'



QUEEN provides a library of restriction enzyme motifs (described in the New England Biolab's website).

In [18]:
from QUEEN import cutsite #Import a restriction enzyme library 
sites = plasmid.searchsequence(cutsite.lib["EcoRI"])
fragments = cutdna(plasmid, *sites)
for fragment in fragments:
    print(fragment)
    fragment.printsequence(display=True, hide_middle=10) 

<queen.QUEEN object; project='pX330_7', length='8488 bp', topology='linear'>
5' AATTCCTAGA...AGTAAG---- 3'
3' ----GGATCT...TCATTCTTAA 5'



-----
#### Example code 13: Digest pX330 plasmid by Type-IIS restriction enzyme BbsI  

In [19]:
sites = plasmid.searchsequence("GAAGAC(2/6)")
fragments = cutdna(plasmid,*sites)
for fragment in fragments:
    print(fragment)
    fragment.printsequence(display=True, hide_middle=10)

<queen.QUEEN object; project='pX330_8', length='8466 bp', topology='linear'>
5' GTTTTAGAGC...ACGAAA---- 3'
3' ----ATCTCG...TGCTTTGTGG 5'

<queen.QUEEN object; project='pX330_9', length='26 bp', sequence='CACCGGGTCTTCGAGAAGACCTGTTT', topology='linear'>
5' CACCGGGTCT...AGACCT---- 3'
3' ----CCCAGA...TCTGGACAAA 5'



Here, the BbsI recognition motif can also be represented by "(6/2)GTCTTC", "GAAGACNN^NNNN_" or "^NNNN_NNGTCTTC".
 
The BbsI recognition motif is also available from the library of restriction enzyme motifs. 

In [20]:
from QUEEN import cutsite #Import a restriction enzyme library 
sites = plasmid.searchsequence(cutsite.lib["BbsI"])
fragments = cutdna(plasmid, *sites)
for fragment in fragments:
    print(fragment)
    fragment.printsequence(display=True, hide_middle=10) 

<queen.QUEEN object; project='pX330_10', length='8466 bp', topology='linear'>
5' GTTTTAGAGC...ACGAAA---- 3'
3' ----ATCTCG...TGCTTTGTGG 5'

<queen.QUEEN object; project='pX330_11', length='26 bp', sequence='CACCGGGTCTTCGAGAAGACCTGTTT', topology='linear'>
5' CACCGGGTCT...AGACCT---- 3'
3' ----CCCAGA...TCTGGACAAA 5'



Additionally, BbsI cut site also can be imported from "Queen/RE.py" as follows.

----
#### Example code 14: Crop a fragmented dna object in a specific region 
If the second fragment of "Example code 11" is for further manipulation, `cropdna()` is convenient.

In [21]:
fragment = cropdna(plasmid ,2000, 4000)

If a start position is larger than an end position, an error message will be returned. 

In [22]:
#fragment = cropdna(fragment, 1500, 1000)

----
#### Example code 15: Trim nucleotides from a blunt-ended dsDNA to generate a sticky-ended dsDNA
Sticky ends can be generated by trimming nucleotides where their end structures are given by top and bottom strand strings with "*" and "-" separated by "/", respectively. The letters "-" indicate nucleotide letters to be trimmed, and the letters "*" indicate ones to remain. 

In [23]:
fragment = cropdna(plasmid, 100, 120)
fragment.printsequence(display=True)
fragment = modifyends(fragment, "-----/*****", "**/--")
fragment.printsequence(display=True)

5' TACAAAATACGTGACGTAGA 3'
3' ATGTTTTATGCACTGCATCT 5'

5' -----AATACGTGACGTAGA 3'
3' ATGTTTTATGCACTGCAT-- 5'



'-----AATACGTGACGTAGA/ATGTTTTATGCACTGCAT--'

The following codes achieve the same manipulation.

In [24]:
fragment = cropdna(plasmid,'105/100', '120/118')
fragment.printsequence(display=True)

5' -----AATACGTGACGTAGA 3'
3' ATGTTTTATGCACTGCAT-- 5'



'-----AATACGTGACGTAGA/ATGTTTTATGCACTGCAT--'

A regex-like format can also be used.

In [25]:
fragment = modifyends(fragment, "-{5}/*{5}","*{2}/-{2}")
fragment.printsequence(display=True)

5' -----AATACGTGACGTAGA 3'
3' ATGTTTTATGCACTGCAT-- 5'



'-----AATACGTGACGTAGA/ATGTTTTATGCACTGCAT--'

----
#### Example code 16: Add adapter sequences  
modifyends() can also add adapter sequences to DNA ends. 

In [26]:
#Add blunt-ended dsDNA sequences to both ends
fragment = cropdna(plasmid, 100, 120)
fragment = modifyends(fragment,"TACATGC","TACGATG")
fragment.printsequence(display=True)

#Add sticky-ended dsDNA sequences to both ends
fragment = cropdna(plasmid, 100, 120)
fragment = modifyends(fragment,"---ATGC/ATGTACG","TACG---/ATGCTAC")
fragment.printsequence(display=True)

5' TACATGCTACAAAATACGTGACGTAGATACGATG 3'
3' ATGTACGATGTTTTATGCACTGCATCTATGCTAC 5'

5' ---ATGCTACAAAATACGTGACGTAGATACG--- 3'
3' ATGTACGATGTTTTATGCACTGCATCTATGCTAC 5'



'---ATGCTACAAAATACGTGACGTAGATACG---/ATGTACGATGTTTTATGCACTGCATCTATGCTAC'

----- 
#### Example code 17: Clone an EGFP fragment into pX330
1. Generate a QUEEN class object for an EGFP fragment,
2. Create EcoRI sites to both ends of the EGFP fragment,
3. Digest the EGFP fragment and pX330 by EcoRI, and
4. Assemble the EGFP fragment and linearized pX330.

In [27]:
EGFP     = QUEEN(record="input/EGFP.fasta")
EGFP     = modifyends(EGFP, cutsite.lib["EcoRI"].seq, cutsite.lib["EcoRI"].seq)
sites    = EGFP.searchsequence(cutsite.lib["EcoRI"]) 
insert   = cutdna(EGFP, *sites)[1]

insert.printsequence(display=True, hide_middle=10)
sites    = plasmid.searchsequence(cutsite.lib["EcoRI"])
backbone = cutdna(plasmid, *sites)[0]

backbone.printsequence(display=True, hide_middle=10)
pEGFP    = joindna(backbone, insert, topology="circular") 
print(backbone)
print(insert)
print(pEGFP)

5' AATTCGGCAG...ACAAGG---- 3'
3' ----GCCGTC...TGTTCCTTAA 5'

5' AATTCCTAGA...AGTAAG---- 3'
3' ----GGATCT...TCATTCTTAA 5'

<queen.QUEEN object; project='pX330_22', length='8488 bp', topology='linear'>
<queen.QUEEN object; project='EGFP_2', length='787 bp', topology='linear'>
<queen.QUEEN object; project='pX330_25', length='9267 bp', topology='circular'>


If connecting DNA end structures of the input QUEEN_object are not compatible, an error message will be returned.

In [28]:
#EGFP     = QUEEN(record="input/EGFP.fasta")
#EGFP     = modifyends(EGFP, cutsite.lib["BamHI"].seq, cutsite.lib["BamHI"].seq)
#sites    = EGFP.searchsequence(cutsite.lib["BamHI"]) 
#insert   = cutdna(EGFP, *sites)[1]
#insert.printsequence(display=True, hide_middle=10)
#pEGFP    = joindna(backbone, insert, topology="circular")

----
#### Example code 18: Create a gRNA expression plasmid
pX330 serves as a standard gRNA expression backbone plasmid. A gRNA spacer can simply be cloned into a BbsI-digested destination site of pX330 as follows:
1. Generate QUEEN object for a sticky-ended gRNA spacer dsDNA,
2. Digest pX330 by BbsI, and
3. Assemble the spacer with the BbsI-digested pX330.

In [29]:
gRNA      = QUEEN(seq="CACCGACCATTGTTCAATATCGTCC----/----CTGGTAACAAGTTATAGCAGGCAAA") 
sites     = plasmid.searchsequence(cutsite.lib["BbsI"])
fragments = cutdna(plasmid, *sites)
backbone  = fragments[0] if len(fragments[0].seq) > len(fragments[1].seq) else fragment[1]
pgRNA     = joindna(gRNA, backbone, topology="circular", product="pgRNA")
print(backbone)
print(insert)
print(pgRNA) 

<queen.QUEEN object; project='pX330_26', length='8466 bp', topology='linear'>
<queen.QUEEN object; project='EGFP_2', length='787 bp', topology='linear'>
<queen.QUEEN object; project='pgRNA', length='8487 bp', topology='circular'>


----
#### Example code 19: Flip ampicillin-resistant gene in pX330
1. Search for the ampicillin-resistant gene in pX330,
2. Cut pX330 with start and end positions of the ampicillin-resistant gene,
3. Flip the ampicillin-resistant gene fragment, and 
4. Join it with the other fragment.

In [30]:
site         = plasmid.searchfeature(query="^AmpR$")[0]
fragments    = cutdna(plasmid, site.start, site.end)
fragments[0] = flipdna(fragments[0])
new_plasmid  = joindna(*fragments, topology="circular")
plasmid.printfeature(plasmid.searchfeature(query="^AmpR$"))
new_plasmid.printfeature(new_plasmid.searchfeature(query="^AmpR$")) 

feature_id  feature_type  qualifier:label  start  end   strand  
2300        CDS           AmpR             6803   7664  +       

feature_id  feature_type  qualifier:label  start  end   strand  
2400        CDS           AmpR             6803   7664  -       



#### Example code 20: Insert an EGFP sequence into pX330
An EGFP sequence insertion to the EcoRI site demonstrated in Example code17 can be described with a simpler code using editsequence()`. 

In [31]:
EGFP  = QUEEN(record="input/EGFP.fasta")
pEGFP = editsequence(plasmid, "({})".format(cutsite.lib["EcoRI"].seq), r"\1{}\1".format(EGFP.seq))
print(plasmid)
print(pEGFP)

<queen.QUEEN object; project='pX330_0', length='8484 bp', topology='circular'>
<queen.QUEEN object; project='pX330_35', length='9267 bp', topology='circular'>


-----
#### Example code 21: Insert a DNA string "AAAAA" to the 5’ end of every CDS

In [32]:
new_plasmid  = editfeature(plasmid, key_attribute="feature_type", query="CDS", strand=1, target_attribute="sequence", operation=replaceattribute(r"(.+)", r"AAAAA\1"))
for feat in new_plasmid.searchfeature(key_attribute="feature_type", query="CDS", strand=1):
    print(feat.start, feat.end, new_plasmid.printsequence(feat.start, feat.start+20, strand=1), feat.qualifiers["label"][0], sep="\t")

1231	1302	AAAAAGACTATAAGGACCAC	3xFLAG
1308	1334	AAAAACCAAAGAAGAAGCGG	SV40 NLS
1358	5464	AAAAAGACAAGAAGTACAGC	Cas9
5464	5517	AAAAAAAAAGGCCGGCGGCC	nucleoplasmin NLS
6823	7689	AAAAAATGAGTATTCAACAT	AmpR


-----
#### Example code 22:  Convert the feature type of every annotation from "CDS" to "gene"

In [33]:
new_plasmid = editfeature(plasmid, key_attribute="feature_type", query="CDS", target_attribute="feature_type", operation=replaceattribute("gene"))
new_plasmid.printfeature()

feature_id  feature_type   qualifier:label     start  end   strand  
1           source         source              0      8484  +       
100         primer_bind    hU6-F               0      21    +       
200         promoter       U6 promoter         0      241   +       
300         primer_bind    LKO.1 5'            171    191   +       
400         misc_RNA       gRNA scaffold       267    343   +       
500         enhancer       CMV enhancer        439    725   +       
600         intron         hybrid intron       983    1211  +       
700         regulatory     Kozak sequence      1222   1232  +       
800         gene           3xFLAG              1231   1297  +       
900         gene           SV40 NLS            1303   1324  +       
1000        gene           Cas9                1348   5449  +       
1100        gene           nucleoplasmin NLS   5449   5497  +       
1200        primer_bind    BGH-rev             5524   5542  -       
1300        polyA_signal   bGH pol

----
#### Example code 23: Add single cutter annotations to pX330
1. Search for all of the single restriction enzyme cutters in pX330 using the library of restriction enzymes listed on the website of NEW England Biolabs.
2. Add the single cutter annotations to pX330.

In [34]:
unique_cutters = []
for key, re in cutsite.lib.items():
    sites = plasmid.searchsequence(re.cutsite)
    if len(sites) == 1: 
        unique_cutters.append(sites[0])
    else:
        pass 
new_plasmid = editfeature(plasmid, source=unique_cutters, target_attribute="feature_id", operation=createattribute("RE"))
new_plasmid = editfeature(new_plasmid, key_attribute="feature_id", query="RE", target_attribute="feature_type", operation=replaceattribute("misc_bind"))
features    = new_plasmid.searchfeature(key_attribute="feature_type", query="misc_bind")
new_plasmid.printfeature(features, seq=True)

feature_id  feature_type  qualifier:label  start  end   strand  sequence      
RE-1        misc_bind     Acc65I           433    439   +       GGTACC        
RE-2        misc_bind     AgeI             1216   1222  +       ACCGGT        
RE-3        misc_bind     ApaI             2700   2706  +       GGGCCC        
RE-4        misc_bind     BglII            1595   1601  +       AGATCT        
RE-5        misc_bind     BsaBI            4839   4849  +       GATCACCATC    
RE-6        misc_bind     BseRI            1098   1104  -       GAGGAG        
RE-7        misc_bind     BsmI             4979   4985  +       GAATGC        
RE-8        misc_bind     CspCI            4127   4139  +       CAAAGCACGTGG  
RE-9        misc_bind     EcoRI            5500   5506  +       GAATTC        
RE-10       misc_bind     EcoRV            3196   3202  +       GATATC        
RE-11       misc_bind     FseI             5472   5480  +       GGCCGGCC      
RE-12       misc_bind     FspI             7365   73