# Elastic-BLAST Quickstart in Jupyter notebook


## Make sure that Elastic-BLAST and awscli are installed
This notebook when run in _mybinder.org_ comes with Elastic-BLAST and awscli tool pre-installed. If you are running it in another environment, please make sure that these two tools are installed in the virtual environment that runs jupyter notebook. You can use the [requirements.txt](https://github.com/boratyng/elastic-blast-notebook/blob/main/requirements.txt) file. <br>
The cells below should show versions information for both tools.

In [None]:
!elastic-blast --version

In [None]:
!aws --version

## Set up AWS credentials
You need to provide credentials for your AWS user account so that Elastic-BLAST can use cloud resources. Generating and providing user credentials is described here: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html. There are two steps to this process:
1. Create a key pair via AWS console: https://console.aws.amazon.com/iam/
1. Paste AWS access key and AWS secret access key in the code below (remember to use quotes as these are python strings)

In [None]:
import os
os.environ['AWS_ACCESS_KEY_ID'] =
os.environ['AWS_SECRET_ACCESS_KEY'] =

## Create results bucket (if one does not exist)
Elastic-BLAST saves results in a cloud bucket. If you already have a cloud bucket in AWS, you can just provide its name.

### Name the results bucket
Select a name for your results bucket or provide your bucket name. Please, remember that bucket names must be  globally unique. You can either edit _YOURNAME_ variable or change value of _RESULTS_BUCKET_ variable.

In [None]:
from uuid import uuid4
YOURNAME = str(uuid4())[:8]
RESULTS_BUCKET = f'elasticblast-{YOURNAME}'
print(f'Your results bucket: s3://{RESULTS_BUCKET}')

### Create results bucket
Skip if the bucket already exists.

In [None]:
!aws s3 mb s3://{RESULTS_BUCKET}

## Elastic-BLAST config
Below is the contents of Elastic-BLAST configuration file, borrowed from [Elastic-BLAST AWS Quickstart]( https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-aws.html), and code that writes it to a file named _BDQA.ini_.

In [None]:
conf_file = 'BDQA.ini'
conf = f"""[cloud-provider]
aws-region = us-east-1

[cluster]
num-nodes = 2
labels = owner={YOURNAME}

[blast]
program = blastp
db = refseq_protein
queries = s3://elasticblast-test/queries/BDQA01.1.fsa_aa
results = s3://{RESULTS_BUCKET}
options = -task blastp-fast -evalue 0.01 -outfmt "7 std sskingdoms ssciname"
"""

with open(conf_file, 'w') as f:
    print(conf, file=f)

## Submit Elastic-BLAST search
Run the cell below to submit Elastic-BLAST search.

In [None]:
!elastic-blast submit --cfg {conf_file}

## Check search status
The cell below checks search status. Elastic-BLAST splits query sequences into parts. _elastic-blast status_ command shows how many of these parts are pending, running, completed, or completed. When the whole search is done you will see only the message: "Your Elastic-BLAST search succeeded ..." or "Your Elastic-BLAST search failed ..."

In [None]:
!elastic-blast status --cfg {conf_file}

## Download results
This search should take about ??? minutes. When it is done, download results.

In [None]:
!aws s3 cp {RESULTS_BUCKET}/ . --exclude "*" --include "*.out.gz" --recursive

## Uncompress results