# 1. 

Update your sequence alignment code to run as an array job in Slurm. You program should partition the database based on the number of nodes you are using. Then run each of the databases against your query. When all the jobs are completed, coallate the results into a single file.

You should use a separate helper script to take care of the partitioning and submitting.

# 2.

We have seen firsthand how shared computing resources have both their good and bad points. Fortunately, everyone can now have their own private computing clusters (as long as you're willing to pay for it).

Follow this codelab frrom Google Clould Platform to deploy your own HPC Cluster with Slurm: https://codelabs.developers.google.com/codelabs/hpc-slurm-on-gcp/#0

Once you have run the sample, get your sequence alignment code to run on it. Provide the sbatch script used to conduct a database search using 4 nodes.

# 3.

Identify a gene you are interested in studying for your gene presentation. Ideally, your gene should have a three-dimensional model so that you can do a full analysis of it. However, if it does not, there are options we will discuss in the next class.

Provide a link to the gene in NCBI:

Provide a link to the structure in PDB:

# 4. 

## Visualizing Proteins 

The goal of this question is to familiarize yourself with protein structures and to visualize 3D models of proteins using PyMOL.

* Download and install PyMOL (https://www.pymol.org/) on your computer. You can download the installer or use Anaconda: `conda install -c schrodinger pymol`. You will not need a license to run it.
* Consult the PyMOL wiki (https://pymolwiki.org) for answers to all your questions.

Download a protein from the PDB within Pymol interactive console using the following command and answer the following questions:
```
PyMOL>fetch 1mbn
```
- What is this protein? 
- What organism does it come from? 
- Provide the URL to the protein from the PDB website?
- Take a screenshot and include it below.

When you load a structure, the default visualization in PyMOL is a `cartoon` represenation. You can highlight the secondary strucure elements by coloring by secondary structure elements by selecting `C -> by ss -> Helix Sheet Loop`. Answer the following questions:

- How many alpha-helices does the protein have? 
- How many beta sheets does it have?

While this visualization makes for great images for magazine covers, it does not provide the detail we may need to analyze it. In particular, the `licorice sticks` view allows us to see an all-atom reprentation that will help us view the side chains. Change the view in the GUI by selecting `H -> everything` and then `S -> sticks`.

If you prefer, you can also control the depiction from the command line.
```
PyMOL> hide everything, 1mbn
PyMOL> show sticks, 1mbn
```
Take a screenshot and include it below.

In [None]:
Insert image here.

# 5.

## Comparing Proteins

Download a different protein from the PDB using the following commands.
```
PyMOL> fetch 1ase
PyMOL> hide everything 
PyMOL> show cartoon
```

Answer the following questions.

- What is this protein?
- How many alpha-helices does the protein have?
- How many beta sheets does it have?
- What are the ligands (small molecules not part of the protein) that is bound to it?

This protein was part of a functional study that included mutating residues across multiple protein models. Doing this allowed the researchers to identify precisely the amino acids that were functionally active. 

Load another version of the protein where a mutation was introduced in residue 226.

```
PyMOL> fetch 1asf
PyMOL> align 1ase, 1asf
PyMOL> select mutant, resi 226
PyMol> show sticks, mutant
```

- What is the mutation (i.e. what is the change) that was made in the protein? 

# 6. 

PyRosetta is a powerful molecular modeling toolkit that we will be using later on in the course. Visualization is a key component to interpreting and analyzing molecular models so they provide a very detailed tutorial as part of their instruction manual. 

Complete the following PyMOL Tutorial from the PyRosetta group: https://graylab.jhu.edu/pyrosetta/downloads/documentation/pyrosetta4_online_format/PyRosetta4_Workshop1_PyMOL.pdf 

There are 8 questions in the tutorial, please answer them below:

# 7.

One of the most common tasks for performing structure analaysis is to performing structure alignments to identify the similarity between protein. You will write a program to perform both global and local structure alignments.  

There are many approaches to performing structure alignements, but you will consult [this approach to structure alignment and RMSD](http://boscoh.com/protein/rmsd-root-mean-square-deviation.html) for the technical details on performing the alignments.

Despite the complexitity of the underlying approach, you will find it boils down to a couple of matrix command in `numpy`.  As is often the case in bioinformatics, you will be building on what others have contributed to the field.  Understanding other people's code is an important part of this. There are many complex bioinformatics packages in the wild, but often they require a substantial amount of tinkering to work properly.

You can (and should) use and modify the [code from the article](https://github.com/boscoh/pdbremix/blob/master/pdbremix/rmsd.py) to help parse PDB files and perform the alignments.  We are only interested in the core PDB parsing and RMSD calculations in this code.

For each question below, you only need to consider the alpha carbon coordinates to represent the center of mass of the amino acid.  That is, use only one atom to represent each amino acid. 

Your program should take input as follows:
```
align.py -reference file1.pdb -mobile file2.pdb [-local window] -out outfile.pdb
```

Each solution should print the following to standard output:

* RMSD calculation of the alignment
* The rotation matrix that gives the best RMSD
* A list of aligned residues in the following format (`A1->B1`,`A2->B2`, etc.)

The `outfile.pdb` should be a new file (in PDB format) of the aligned structure.  Note that in an alignment, one structure is kept as reference and one is mobile.  The file should represent the mobile structure.  _You should apply the rotation matrix to the entire PDB, not just the alpha carbon atoms._

# 6. 

Find the global RMSD betwen the follwoing myoglobin structures: 1mbn, 1np4, 3qm9, 4nos. Include the output from your program below.

# 7. 

Find the best local RMSD alignment between the follwoing myoglobin structures (1mbn, 1np4, 3qm9, 4nos) by comparing segments of length 10. You should apply the rotation matrix to the entire protein.

# 8. 

Find the best structural alignment between residues in the heme binding pockets. Use both global and local approaches to find the best alignment.

Use the [HEM binding pocket from 1mbn](https://googledrive.com/host/0B3XzcKIiWyccaUFjZ2RjTndyd2c//MPCS56420/2014-Autumn/2014-Autumn-Assignments/1mbn.HEM_A_155.pdb) as the query binding pocket.  Compare the query against each of the following binding pockets. 

* [HEM binding pocket from 1np4](https://googledrive.com/host/0B3XzcKIiWyccaUFjZ2RjTndyd2c//MPCS56420/2014-Autumn/2014-Autumn-Assignments/1np4.HEM_A_185.pdb)

* [HEM binding pocket from 3qm9](https://googledrive.com/host/0B3XzcKIiWyccaUFjZ2RjTndyd2c//MPCS56420/2014-Autumn/2014-Autumn-Assignments/3qm9.HEM_A_201.pdb)

* [HEM binding pocket from 4nos](https://googledrive.com/host/0B3XzcKIiWyccaUFjZ2RjTndyd2c//MPCS56420/2014-Autumn/2014-Autumn-Assignments/4nos.HEM_A_510.pdb)

How does the similarity between the HEME binding pockets compare to the overall sequence and structural similarity? Report the sequence similarity in addition to the structural similarity.