# Assignment 9: Blastomatic: Parsing Delimited Text Files

Delimited text files are a standard way to encode columnar data. You are likely familiar with spreadsheets like Microsoft Excel or Google Sheets where each worksheet may hold a data set with columns across the top and records running down. You can export this data to a text file where the columns of data are delimited, or separated by a character.
Quite often the delimiter is a comma, and the file will have an extension
of .csv.
This format is called CSV for comma-separated values.
When the delimiter is a Tab, the extension may be .tab, .txt, or .tsv for tab-separated values.
The first line of the file usually will contain the names of the columns. Notably, this is not the case with the tabular output from BLAST (Basic Local Alignment Search Tool), one of the most popular tools in bioinformatics used to compare sequences.
In this homework, you will parse blast output and merge the BLAST results with metadata from another delimited text file using the csv and pandas modules.

In this homework, you will:

* Learn how to use the csv and pandas modules to parse delimited text files

Write a program called `blastomatic.py` that will select BLAST hits above a given percent ID and will merge them with annotations and print the query sequence ID, the percent ID, the depth, and the lat/lon.

In [None]:
# Go to your working directory on the HPC
%cd ~/be434-Spring2025
!git pull

## Getting Started with new.py

Let's start out by using new.py to create a program template for us.


In [None]:
# Generate the `blastomatic.py` using `new.py`
%cd ~/be434-Spring2025/assignments/09_blastomatic
!../../bin/new.py -p 'Parse blast file' blastomatic.py

You should see the following:

```
$ new.py -p 'Parse blast file' blastomatic.py
Done, see new script "blastomatic.py."
```

## Instructions

### Usage and Arguments

```
$ ./blastomatic.py -a tests/inputs/meta.csv -b tests/inputs/hits1.csv -p 99
Exported 22 to "out.csv".
$ head out.csv
qseqid,pident,depth,lat_lon
"JCVI_READ_1095913011720","100.000","12","41.485832,-71.35111"
"JCVI_READ_1095900076806","100.000","32","41.09111,-71.60222"
"JCVI_READ_1095900076806","100.000","32","41.09111,-71.60222"
"JCVI_READ_1095901257294","100.000","32","41.09111,-71.60222"
"JCVI_READ_1095899227776","100.000","25","38.946945,-76.41722"
"JCVI_READ_1093012135235","100.000","20","36.003887,-75.39472"
"JCVI_READ_1093012135235","100.000","20","36.003887,-75.39472"
"JCVI_READ_1093012135235","100.000","20","36.003887,-75.39472"
"JCVI_READ_1093012135235","100.000","20","36.003887,-75.39472"
```

The program should produce a usage:

```
$ ./blastomatic.py -h
usage: blastomatic.py [-h] -b FILE -a FILE [-o FILE] [-d DELIM] [-p PCTID]

Annotate BLAST output

optional arguments:
  -h, --help            show this help message and exit
  -b FILE, --blasthits FILE
                        BLAST -outfmt 6 (default: None)
  -a FILE, --annotations FILE
                        Annotations file (default: None)
  -o FILE, --outfile FILE
                        Output file (default: out.csv)
  -d DELIM, --delimiter DELIM
                        Output field delimiter (default: )
  -p PCTID, --pctid PCTID
                        Minimum percent identity (default: 0.0)
```

## Time to write some code!

Open the script in VSCode through the HPC app to edit the code. Write the code to match the instructions above. Note that you must follow the instructions exactly (including all spaces and punctuation!)

## Testing

As you write your code, you can test it along the way to make sure that you are passing all of the tests for the homework. 

We will use the test suite that is included with the assignment to test that you are meeting all of the requirements in the instructions above.

You will find the steps below to test your code. Note that you can run these commands from a "shell" within the VS Code GUI. Or, you can run them here... 

In [None]:
# Format your code to make it beautiful (this is called linting)
%cd ~/be434-Spring2025/assignments/09_blastomatic
!apptainer run /xdisk/bhurwitz/bh_class/biosystems/biosystems.sif black blastomatic.py

In [None]:
# Now run the tests on your code
%cd ~/be434-Spring2025/assignments/09_blastomatic
!apptainer run /xdisk/bhurwitz/bh_class/biosystems/biosystems.sif make test

A passing test suite looks like this:


```
$ make test
python3 -m pytest -xv --disable-pytest-warnings --flake8 --pylint
--mypy blastomatic.py tests/*_test.py
============================= test session starts ==============================
...
collected 15 items

blastomatic.py::FLAKE8 SKIPPED                                           [  6%]
blastomatic.py::mypy PASSED                                              [ 12%]
tests/blastomatic_test.py::FLAKE8 SKIPPED                                [ 18%]
tests/blastomatic_test.py::mypy PASSED                                   [ 25%]
tests/blastomatic_test.py::test_exists PASSED                            [ 31%]
tests/blastomatic_test.py::test_usage PASSED                             [ 37%]
tests/blastomatic_test.py::test_bad_annotations PASSED                   [ 43%]
tests/blastomatic_test.py::test_bad_input_file PASSED                    [ 50%]
tests/blastomatic_test.py::test_good_input PASSED                        [ 56%]
tests/blastomatic_test.py::test_delimiter PASSED                         [ 62%]
tests/blastomatic_test.py::test_guess_delimiter PASSED                   [ 68%]
tests/blastomatic_test.py::test_pctid PASSED                             [ 75%]
tests/unit_test.py::FLAKE8 SKIPPED                                       [ 81%]
tests/unit_test.py::mypy PASSED                                          [ 87%]
tests/unit_test.py::test_guess_delimiter PASSED                          [ 93%]
::mypy PASSED                                                            [100%]
===================================== mypy =====================================

Success: no issues found in 3 source files
======================== 13 passed, 3 skipped in 2.84s =========================
```

Your grade is whatever percentage of tests your code passes.

## Uploading your code to GitHub

Once you have written the code for your assignment, and are passing all of the tests above, you are ready to submit the assignment for grading. Use the steps below to submit your code to GitHub.

* Note, if you are having any issues with passing tests, and need help, you can also submit the code with a different commit message like the following. 

```
git commit -m "help!"
```

Once you have done that, send a private slack message to me @bhurwitz to let me know you submitted code and need help.


In [None]:
# Submit your code to Github
%cd
%cd be434-Spring2025
!git add -A && git commit -m "Submitting 09_blastomatic for grading"
!git push

Great job! You are done with this assignment.

## Authors

Bonnie Hurwitz <bhurwitz@arizona.edu> and Ken Youens-Clark <kyclark@gmail.com>