# Table of Contents
<ol>
 <li><a href="#introduction">Introduction</a></li>
 <ol>
  <li><a href="#using-the-notebook">Using the notebook</a></li>
 </ol>
    <li><a href="#the-proteins">The Proteins</a></li>   
  <ol>
    <li><a href="#1c8q">1C8Q: Human salivary amylase</a></li>
    <li><a href="#4m6u">4M6U: *P. putida* mandelate racemase</a></li>
    <li><a href="#4ads">4ADS: Plasmodial PLP synthase complex</a></li>
    <li><a href="#1ubq">1UBQ: Ubiquitin (1.8 Å resolution)</a></li>
    <li><a href="#1zqa">1ZQA: DNA polymerase complexed with DNA</a></li>
  </ol>
    <li><a href="#the-code-cell">The code cell</a></li>
</ol>

# Introduction <a class="anchor" id="introduction"></a>
The Protein Data Bank (PDB) stores data for various proteins, each protein being designated by a four character alphanumeric ID. Some of this data includes 3-D structure information; this data is important because the structure of a protein determines its function.

So what is the PDB? Taken from their [website](https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/introduction):

> The PDB archive is a repository of atomic coordinates and other information describing proteins and other important biological macromolecules. Structural biologists use methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy to determine the location of each atom relative to each other in the molecule. They then deposit this information, which is then annotated and publicly released into the archive by the wwPDB.

Even still, predicting the shape of a protein still remains challenging, though huge breakthroughs have been made in this field using [machine learning](https://pubmed.ncbi.nlm.nih.gov/34015749/). This notebook serves as a brief exploration into proteins and protein modeling.


### Using the Notebook <a class="anchor" id="using-the-notebook"></a>
Execute the code cell below and use the dropdown box that appears to select a protein.

To execute the cell, click on it and press ```Shift + Enter```.

Alternatively, click on the cell and click ```Run > Run Selected Cells``` or the play button in the notebook toolbar.

After selecting a protein, click and drag to move the protein and scroll to zoom. Different representations can be selected be navigating to the ```Extra > Quick``` tab in the display and selecting from the list of representations.

# The Proteins <a class="anchor" id="the-proteins"></a>

Proteins are all over and do all manner of important things, and that's an understatement. They're so important in fact, that the whole purpose of DNA is to serve as an instruction set for making proteins. 

Proteins are just chains of amino acids arranged in a certain way. The way these amino acid are arranged create the protein's structure, and this structure determines what the protein can do. 

For instance, a certain sequence of amino aicds might create a protein that has a small pocket for a chemical substrate to fit into, thus facilitating some reaction. In fact, an enzyme is just a protein that promotes some such reaction, though not necessarily in that way. Proteins with names ending in -ase are enzymes.

The proteins available here are:

<ul>
<li><strong>1C8Q: Human salivary amylase</strong><a class="anchor" id="1c8q"></a></li>
    
- Amylase is an enzyme that aids digestion by breaking down starches into sugars. It's made by both the pancreas and the salivary glands; its presence in the latter explains why eating foods high in starch (rice, for instance) may taste slightly sweet while being chewed.
    
<li><strong>4M6U: *P. putida* mandelate racemase</strong><a class="anchor" id="4m6u"></a></li>
    
- The bacterium *Pseudomonas putida*, from which this protein comes, was the subject of the U.S. Supreme Court case *Diamond v. Chakrabarty* and ultimately became the first patented living organism. This particular protein essentially creates a mirror image of its substrate, a molecule called mandelic acid.

<li><strong>4ADS: Plasmodial PLP synthase complex</strong><a class="anchor" id="4ads"></a></li>
    
- The parasite *Plasmodium berghei* causes malaria in some rodents and is a model organism for studying malaria in humans. Protein 4ADS is a complex (meaning it's composed of multiple individual proteins) that synthesizes the molecule pyridoxal phosphate (PLP), or vitamin B<sub>6</sub>.

<li><strong>1UBQ: Ubiquitin (1.8 Å resolution)</strong><a class="anchor" id="1ubq"></a></li>
    
- Ubiquitin is so named because of its ubiquitous nature in most eukaryotes (animals, plants, and fungi). In fact, it's so useful that four diffent genes encode for its creation in humans. Importantly, ubiquitin fosters a process called ubiquintinylation, which can greatly alter a protein's function.

<li><strong>1ZQA: DNA polymerase complexed with DNA</strong><a class="anchor" id="1zqa"></a></li>
    
- DNA polymerases are a class of proteins that are cruicial in the replication and repair of DNA. This particular variant is involved heavily in the DNA repair process in humans. Here, it's pictured with a very small segment of DNA.
</ul>


# The code cell <a class="anchor" id="the-code-cell"></a>

In [None]:
# To display the protein selection dropdown box,
# click on this cell and press the play buttion in the toolbar above

import warnings
from Bio import BiopythonWarning
from Bio.PDB import PDBParser
import ipywidgets as widgets
import nglview as nv

# Ignore Bio.PDBConstructionWarning for discontinuous protein chains
warnings.simplefilter('ignore', BiopythonWarning)

widget_output = widgets.Output() # Capture widget output for display
protein_selector = widgets.Dropdown(
    options=[
        'Select a protein...', 
        '1C8Q: Amylase ', 
        '4M6U: Racemase', 
        '4ADS: Synthase', 
        '1UBQ: Ubiquitin', 
        '1ZQA: Polymerase'
    ],
    value='Select a protein...',
    description='PDB ID:',
    disabled=False
)

def visualize_protein(change):
    # Handle error if user re-selects default selector value
    if change['new'] == 'Select a protein...':
        with widget_output:
            print('Please select a protein to display.')
            widget_output.clear_output(wait=True)
    else:
        pdb_id = change['new'][0:4] # Get ID from selector widget
        parser = PDBParser()
        structure = parser.get_structure(f'{pdb_id}',f'pdb_data/{pdb_id}.pdb')
        view = nv.show_biopython(structure, gui=True)
        
        with widget_output:
            display(view)
            widget_output.clear_output(wait=True)

protein_selector.observe(visualize_protein, names='value') # Watch for widget change

display(widgets.VBox([protein_selector, widget_output])) # Display selector widget, output