## Blast (Basic Local Alignment Search Tool)
BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance [¹](https://blast.ncbi.nlm.nih.gov/Blast.cgi).

Autor: Luiz Carlos Vieira

### Blast of the protein sequences

In [16]:
## Importing modules to do the blast
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

In [None]:
prot_list = ['MRTSCSDTSTSKSCRRRATRWRRRNARSTPRRRRCGRIRRCNRTFMTRCNRSRRPA',
'MKPENLVACHECDLLFWRPPRLRALAAHCPRCRARVGGSAHGRPALDRRCAIALRS',
'MLPLDLPEPEIRPRSRWIPSLVWIVPLVCALIGLALVYRGIAATGPTITVTFANPK*']

In [21]:
# protein blast to the protein 2 in the list using the pdb database
blast_handle = NCBIWWW.qblast('blastp', 'pdb', prot_list[1])

# Saving blast results to a variable blast_record
blast_record = NCBIXML.read(blast_handle)

# Visualizing the results of blast
for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
        print('******************************************Alignment******************************************')
        print('sequence:', alignment.title)
        print('length:', alignment.length)
        print('e value:', hsp.expect)
        print(hsp.query)
        print(hsp.match)
        print(hsp.sbjct)
        print()

******************************************Alignment******************************************
sequence: pdb|7ANE|at Chain at, mS66 [Leishmania major]
length: 397
e value: 1.2372
PRLRALAAHCPRCRARVGGSAHGRPALDRRCAIALRS
PR+   +AHCP C  R   +A GR A +    + L +
PRITEWSAHCPACAWRTNMTAIGRKAQEEGQYLGLET

******************************************Alignment******************************************
sequence: pdb|7D5K|A Chain A, Cellulose synthase [Gossypium hirsutum] >pdb|7D5K|B Chain B, Cellulose synthase [Gossypium hirsutum] >pdb|7D5K|C Chain C, Cellulose synthase [Gossypium hirsutum]
length: 1042
e value: 4.73142
LVACHECDLLFWRPP---RLRALAAHCPRCRAR
 VAC+EC     RP      R     CP+C+ R
FVACNECGFPVCRPCYEYERREGTQQCPQCKTR



## Defining a function to blast

In [18]:
def blast(blast_type, datadabe, proten_sequence):
    # Realizing a protein blast
    blast_handle = NCBIWWW.qblast(blast_type, datadabe, proten_sequence)

    # Saving blast results to a variable blast_record
    blast_record = NCBIXML.read(blast_handle)

    # Visualizing the results of blast
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            print('******************************************Alignment******************************************')
            print('sequence:', alignment.title)
            print('length:', alignment.length)
            print('e value:', hsp.expect)
            print(hsp.query)
            print(hsp.match)
            print(hsp.sbjct)
            print()

In [19]:
# protein blast to the protein 2 in the list using the swissprot database
blast('blastp', 'swissprot', prot_list[1])

******************************************Alignment******************************************
sequence: sp|Q6GMD3.1| RecName: Full=Shiftless antiviral inhibitor of ribosomal frameshifting protein homolog; Short=SHFL; AltName: Full=Repressor of yield of DENV protein homolog [Xenopus laevis]
length: 305
e value: 0.378018
ACHECDLLFWRPPRLRALAAHCPRCRAR
AC ECD ++WR    R   + C RCR +
ACKECDYMWWRRVPQRKEVSRCQRCRKK

******************************************Alignment******************************************
sequence: sp|O55005.1| RecName: Full=Roundabout homolog 1; Flags: Precursor [Rattus norvegicus]
length: 1651
e value: 0.392471
PRLRALAAHCPRCRARVGGSAHGRPALDRRCAIALRS
P+L ++ A   R   R GGS  GR ALD R    LR+
PKLASIEARADRSSDRKGGSYKGREALDGRQVTDLRT

******************************************Alignment******************************************
sequence: sp|O89026.1| RecName: Full=Roundabout homolog 1; Flags: Precursor [Mus musculus]
length: 1612
e value: 0.502584
PRLRALAAHCPRCRARVGGSAHGRPALDRRCAIALRS
P

In [20]:
# For loop to blast all sequences from a list of protein sequence
for i in range(0, len(prot_list)):
    blast("blastp", "pdb", prot_list[i])

******************************************Alignment******************************************
sequence: pdb|6QB7|A Structure of the H1 domain of human KCTD16 [Homo sapiens] >pdb|6QB7|B Structure of the H1 domain of human KCTD16 [Homo sapiens] >pdb|6QB7|C Structure of the H1 domain of human KCTD16 [Homo sapiens] >pdb|6QB7|D Structure of the H1 domain of human KCTD16 [Homo sapiens] >pdb|6QB7|E Structure of the H1 domain of human KCTD16 [Homo sapiens]
length: 163
e value: 4.95091
TPRRRRCGRIRRCNRTFMTRCNRSRRP
 PR   CGRI      F    N SR P
VPRILVCGRISLAKEVFGETLNESRDP

******************************************Alignment******************************************
sequence: pdb|7ANE|at Chain at, mS66 [Leishmania major]
length: 397
e value: 1.2372
PRLRALAAHCPRCRARVGGSAHGRPALDRRCAIALRS
PR+   +AHCP C  R   +A GR A +    + L +
PRITEWSAHCPACAWRTNMTAIGRKAQEEGQYLGLET

******************************************Alignment******************************************
sequence: pdb|7D5K|A Chain A, Cellulose syntha