# Searching sanger reads on genomes using BLAST
## Unziping data
I used a bash magic to unzip the `server_data.zip` file that was uploaded to my server.

In [1]:
!unzip server_data.zip

Archive:  server_data.zip
  inflating: blast_indexing.csv      
  inflating: fgr0023_f01_trimmed.fasta  
  inflating: fgr0023_f17_trimmed.fasta  
  inflating: fgr0023_r01_trimmed.fasta  
  inflating: fgr0023_r17_trimmed.fasta  
  inflating: fgr0027_f01_trimmed.fasta  
  inflating: fgr0027_f17_trimmed.fasta  
  inflating: fgr0027_r01_trimmed.fasta  
  inflating: fgr0027_r17_trimmed.fasta  
  inflating: fgr0072_f01_trimmed.fasta  
  inflating: fgr0072_f17_trimmed.fasta  
  inflating: fgr0072_r01_trimmed.fasta  
  inflating: fgr0072_r17_trimmed.fasta  
  inflating: fgr1122_f01_trimmed.fasta  
  inflating: fgr1122_f17_trimmed.fasta  
  inflating: fgr1122_r01_trimmed.fasta  
  inflating: fgr1122_r17_trimmed.fasta  
 extracting: fgr1149_f01_trimmed.fasta  
  inflating: fgr1149_f17_trimmed.fasta  
  inflating: fgr1149_r01_trimmed.fasta  
  inflating: fgr1149_r17_trimmed.fasta  
  inflating: fgr1183_f01_trimmed.fasta  
  inflating: fgr1183_f17_trimmed.fasta  
  inflating: fgr1183_r01_trimmed.f

## Importing packages

In [2]:
from Bio.Blast.Applications import NcbiblastnCommandline
from Bio.Blast.Applications import NcbiblastformatterCommandline
import pandas as pd
import os

## Inspecting `blast_indexing.csv`
The `blast_indexing.csv` was created to link queries and the databases that they will be searched against.

In [None]:
blast_index = pd.read_csv('blast_indexing.csv')
blast_index

## Creating blast databases

In [4]:
!makeblastdb -in housekeeping_genes.fasta -out housekeeping_genes_database -dbtype nucl -parse_seqids




Building a new DB, current time: 07/09/2020 23:34:20
New DB name:   /home/gabriel/projects/functional_genomics_of_resistome/results/2020.07.09/genome3_blast_database
New DB title:  genome3.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 466 sequences in 0.076216 seconds.


Building a new DB, current time: 07/09/2020 23:34:21
New DB name:   /home/gabriel/projects/functional_genomics_of_resistome/results/2020.07.09/genome36_blast_database
New DB title:  genome36.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 275 sequences in 0.0717859 seconds.


Building a new DB, current time: 07/09/2020 23:34:22
New DB name:   /home/gabriel/projects/functional_genomics_of_resistome/results/2020.07.09/genome40_blast_database
New DB title:  genome40.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 160 sequences in

In [5]:
for line in blast_index.index:
    blast_run = NcbiblastnCommandline(query = blast_index['queries'][line],
                                      db = 'housekeeping_genes_database'
                                      num_threads = 12, outfmt = 11,
                                      out = (blast_index['queries'][line][0:-6] + '_blastn_result.asn'))
    print(blast_index['reads'][line] + ' will now be blasted against ' + blast_index['genomes']
          [line])
    blast_run()
    print('Creating .xml file')
    create_xml = NcbiblastformatterCommandline(archive = (blast_index['reads'][line][0:-6] + '_blastn_result.asn'),
                                               outfmt = 5,
                                               out = (blast_index['reads'][line][0:-6] + '_blastn_result.xml'))
    create_xml()
    print('Creating .txt file') 
    create_txt = NcbiblastformatterCommandline(archive = (blast_index['reads'][line][0:-6] + '_blastn_result.asn'),
                                               outfmt = 0,
                                               out = (blast_index['reads'][line][0:-6] + '_blastn_result.txt'))
    create_txt()
    print('Creating .csv file') 
    create_csv = NcbiblastformatterCommandline(archive = (blast_index['reads'][line][0:-6] + '_blastn_result.asn'),
                                               outfmt = 10,
                                               out = (blast_index['reads'][line][0:-6] + '_blastn_result.csv'))
    create_csv()
    print('Creating .tsv file') 
    create_tsv = NcbiblastformatterCommandline(archive = (blast_index['reads'][line][0:-6] + '_blastn_result.asn'),
                                               outfmt = 6,
                                               out = (blast_index['reads'][line][0:-6] + '_blastn_result.tsv'))
    create_tsv()
print('Job finished')

fgr0023_f01_trimmed.fasta will now be blasted against genome40.fasta
Creating .xml file
Creating .txt file
Creating .csv file
Creating .tsv file
fgr0023_f17_trimmed.fasta will now be blasted against genome40.fasta
Creating .xml file
Creating .txt file
Creating .csv file
Creating .tsv file
fgr0023_r01_trimmed.fasta will now be blasted against genome40.fasta
Creating .xml file
Creating .txt file
Creating .csv file
Creating .tsv file
fgr0023_r17_trimmed.fasta will now be blasted against genome40.fasta
Creating .xml file
Creating .txt file
Creating .csv file
Creating .tsv file
fgr0027_f01_trimmed.fasta will now be blasted against genome3.fasta
Creating .xml file
Creating .txt file
Creating .csv file
Creating .tsv file
fgr0027_f17_trimmed.fasta will now be blasted against genome3.fasta
Creating .xml file
Creating .txt file
Creating .csv file
Creating .tsv file
fgr0027_r01_trimmed.fasta will now be blasted against genome3.fasta
Creating .xml file
Creating .txt file
Creating .csv file
Creatin

## Organizing and compressing results

In [6]:
%%bash
mkdir blastn_results
mkdir blastn_results/asn_files
mkdir blastn_results/txt_files
mkdir blastn_results/tsv_files
mkdir blastn_results/csv_files
mkdir blastn_results/xml_files
mv *.asn blastn_results/asn_files
mv *.txt blastn_results/txt_files
mv *.tsv blastn_results/tsv_files
mv *.csv blastn_results/csv_files
mv *.xml blastn_results/xml_files
7z a blastn_results blastn_results


7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,12 CPUs Intel(R) Xeon(R) Bronze 3104 CPU @ 1.70GHz (50654),ASM,AES-NI)

Scanning the drive:
6 folders, 161 files, 638401 bytes (624 KiB)

Creating archive: blastn_results.7z

Items to compress: 167


Files read from disk: 157
Archive size: 38326 bytes (38 KiB)
Everything is Ok
