In [1]:
from Bio import motifs

Creating a simple DNA Motif
Let's create a DNA motif sequence

In [2]:
from Bio.Seq import Seq
instances = [
    Seq("TACAA"),
    Seq("TACGC"),
    Seq("TACAC"),
    Seq("TACCC"),
    Seq("AACCC"),
    Seq("AATGC"),
    Seq("AATGC")
]

In [3]:
m = motifs.create(instances)

In [4]:
print(m)

TACAA
TACGC
TACAC
TACCC
AACCC
AATGC
AATGC


The instances are saved in an attribute .instances

In [5]:
m.instances



[Seq('TACAA'),
 Seq('TACGC'),
 Seq('TACAC'),
 Seq('TACCC'),
 Seq('AACCC'),
 Seq('AATGC'),
 Seq('AATGC')]

The Motif object has an attribute .counts containing the counts of each nucleotide at each position. Printing this counts matrix shows it in an easily readable format:

In [6]:
#uporer seqnce thke 0,1,2,3,4 no position e a,t,c,g, koyta ase setar table

print(m.counts)

        0      1      2      3      4
A:   3.00   7.00   0.00   2.00   1.00
C:   0.00   0.00   5.00   2.00   6.00
G:   0.00   0.00   0.00   3.00   0.00
T:   4.00   0.00   2.00   0.00   0.00



We can access these counts as a dictionary:

In [7]:
#specific a er count kre dey
m.counts['A']

[3.0, 7.0, 0.0, 2.0, 1.0]

We can also directly access columns of the counts matrix:

In [8]:
#3 no position kon neuclotide koya ase oita ber kore dey
m.counts[:, 3]

{'A': 2.0, 'C': 2.0, 'G': 3.0, 'T': 0.0}

The motif has an associated consensus sequence, defined as the sequence of letters along the positions of the motif for which the largest value in the corresponding columns of the .counts matrix is obtained:

In [9]:
#highest position er sequence create kre dey
m.consensus

Seq('TACGC')

As well as an anticonsensus sequence, corresponding to the smallest values in the columns of the .counts matrix:

In [10]:
#lowest wise sequence create kore dey
m.anticonsensus

Seq('CCATG')

Note that there is some ambiguity in the definition of the consensus and anticonsensus sequence if in some columns multiple nucleotides have the maximum or minimum count. You can also ask for a degenerate consensus sequence, in which ambiguous nucleotides are used for positions where there are multiple nucleotides with high counts:

In [11]:
m.degenerate_consensus

Seq('WACVC')

Here, W and V follow the IUPAC nucleotide ambiguity codes: W is either A or T, and V is A, C, or G