# Assignment 6: IUPAC

## Expanding DNA IUPAC Codes into Regular Expressions

In this assignment, you will write a Python program called `iupac.py` that translates an 
IUPAC-encoded (https://www.bioinformatics.org/sms/iupac.html) string of DNA into a regular expression 
that will match all the possible strings of DNA.

Following are the iupac codes: 

```
+------------+------+
| IUPAC code | Base |
|------------+------|
| A          | A    |
| C          | C    |
| G          | G    |
| T          | T    |
| U          | U    |
| R          | AG   |
| Y          | CT   |
| S          | GC   |
| W          | AT   |
| K          | GT   |
| M          | AC   |
| B          | CGT  |
| D          | AGT  |
| H          | ACT  |
| V          | ACG  |
| N          | ACGT |
+------------+------+
```

For instance, the pattern `AYG` would match both `ACG` and `ATG`, so the regular expression would be `^A[CT]G$`.

In [5]:
# You can verify that this works:

import re

re.search('^A[CT]G$', 'ACG')
#re.Match object; span=(0, 3), match='ACG'

re.search('^A[CT]G$', 'ATG')
#re.Match object; span=(0, 3), match='ATG'

'OK' if re.search('^A[CT]G$', 'ACG') else 'NO'

# You should see 'OK'

'OK'

In [None]:
# Run this cell to make sure this assignment is up to date
%cd ~/be434-Spring2024
!git pull --no-edit upstream main

## Getting Started with new.py

Let's start out by using new.py to create a program template for us.


In [None]:
# Generate the `iupac.py` using `new.py`
%cd ~/be434-Spring2024/assignments/06_iupac
!new.py -p 'convert to IUPAC' iupac.py

You should see the following:

```
$ new.py -p 'convert to IUPAC' iupac.py
Done, see new script "iupac.py."
```

## Instructions

### Usage
Your program should accept the following arguments:

1. One or more sequences as positional arguments
2. An optional output filename. The default output should be printed to `STDOUT`.

When run with no arguments, the program should print a brief usage statement:

```
$ ./iupac.py
usage: iupac.py [-h] [-o FILE] SEQ [SEQ ...]
iupac.py: error: the following arguments are required: SEQ
```

When run with `-h|--help`, it should print a more verbose help document:

```
$ ./iupac.py -h
usage: iupac.py [-h] [-o FILE] SEQ [SEQ ...]

Expand IUPAC codes

positional arguments:
  SEQ                   Input sequence(s)

optional arguments:
  -h, --help            show this help message and exit
  -o FILE, --outfile FILE
                        Output filename (default: <_io.TextIOWrapper
                        name='<stdout>' mode='w' encoding='utf-8'>)
```

### Output

For each input sequence, the program should print the sequence, a space, and theregular expression for that sequence:

```
$ ./iupac.py MCG GWC
MCG [AC]CG
GWC G[AT]C
```

When the output filename is given, the preceding output should be printed to the given filename and the `STDOUT` of the program should include a statement of where the output was printed:

```
$ ./iupac.py KCM BDA -o out.txt
Done, see output in "out.txt"
```

The preceding command should have created an file called _out.txt_ that has the following contents:

```
$ cat out.txt
KCM [GT]C[AC]
BDA [CGT][AGT]A
```

## Time to write some code!

Open the script here in VS Code in be434-Spring2024 -> assignments -> 06_iupac -> iupac.py 

Write/edit the code using the instructions above.

## Testing

As you write your code, you can test it along the way to make sure that you are passing all of the tests for the homework. 
We will use the test suite that is included with the assignment to test that you are meeting all of the requirements in the instructions above. 
You will find the steps below to test your code. Note that you can also run these commands from a "shell" within the VS Code GUI. Or, you can run them here... 

## Starting up a virtual environment for testing

In order to run the tests for each assignment, we need access to several Python packages (pytest, flake8, and pylint). 
These packages have been installed in a virtual environment for you. 
You can activate the environment using the following command:

In [None]:
# Format your code to make it beautiful (this is called linting)
!black ~/be434-Spring2024/assignments/06_iupac/iupac.py

In [None]:
# Now run the tests on your code
%cd ~/be434-Spring2024/assignments/06_iupac
!make test

A passing test suite looks like the following:

```
$ conda activate /groups/bhurwitz/bh_class/be434/be434-conda
$ make test
pytest -xv --pylint --disable-warnings test.py iupac.py
============================= test session starts ==============================
...
collected 21 items
--------------------------------------------------------------------------------
Linting files
..
--------------------------------------------------------------------------------

test.py::PYLINT PASSED                                                   [  4%]
test.py::FLAKE8 PASSED                                                   [  9%]
test.py::test_exists PASSED                                              [ 14%]
test.py::test_usage PASSED                                               [ 19%]
test.py::test_no_args PASSED                                             [ 23%]
test.py::test_1 PASSED                                                   [ 28%]
test.py::test_2 PASSED                                                   [ 33%]
test.py::test_3 PASSED                                                   [ 38%]
test.py::test_4 PASSED                                                   [ 42%]
test.py::test_5 PASSED                                                   [ 47%]
test.py::test_6 PASSED                                                   [ 52%]
test.py::test_7 PASSED                                                   [ 57%]
test.py::test_1_outfile PASSED                                           [ 61%]
test.py::test_2_outfile PASSED                                           [ 66%]
test.py::test_3_outfile PASSED                                           [ 71%]
test.py::test_4_outfile PASSED                                           [ 76%]
test.py::test_5_outfile PASSED                                           [ 80%]
test.py::test_6_outfile PASSED                                           [ 85%]
test.py::test_7_outfile PASSED                                           [ 90%]
iupac.py::PYLINT PASSED                                                  [ 95%]
iupac.py::FLAKE8 PASSED                                                  [100%]


Your grade is whatever percentage of tests your code passes.

## Uploading your code to GitHub

Once you have written the code for your assignment, and are passing all of the tests above, you are ready to submit the assignment for grading. Use the steps below to submit your code to GitHub.

* Note, if you are having any issues with passing tests, and need help, you can also submit the code with a different commit message like the following. 

```
git commit -m "test_XXX failing for 06_iupac"
```

Once you have done that, send a private slack message to me @bhurwitz to let me know you submitted code and need help.


In [None]:
# Submit your code to Github
%cd
%cd be434-Spring2024
!git add -A && git commit -m "Submitting 06_iupac for grading"
!git push

Great job! You are done with this assignment.

## Authors

Bonnie Hurwitz <bhurwitz@arizona.edu> and Ken Youens-Clark <kyclark@gmail.com>