# Introduction
The Protein Data Bank (PDB) stores data for various proteins, each protein being designated by a four character alphanumeric ID. Some of this data includes 3-D structure information; this data is important because the structure of a protein determines its function.

Even still, predicting the shape of a protein still remains challenging, though huge breakthroughs have been made in this field using [machine learning](https://pubmed.ncbi.nlm.nih.gov/34015749/). This Notebook serves as a brief exploration into proteins and protein modeling.

# The Proteins

The proteins listed below are available for viewing.
## 1C8Q: Human salivary amylase
Amylase is an enzyme that aids digestion by breaking down starches into sugars. It's made by both the pancreas and the salivary glands; its presence in the latter explains why eating foods high in starch (rice, for instance) may taste slightly sweet while chewing them.

## 4M6U: *P. putida* mandelate racemase
The bacterium *Pseudomonas putida*, from which this protein comes, was the first patented living organism and resulted in the U.S. Supreme Court case *Diamond v. Chakrabarty*. This particular protein essentially creates a mirror image of its substrate, a molecule called mandelic acid.

## 4ADS: Plasmodial PLP synthase complex
The parasite *Plasmodium berghei* causes malaria in some rodents and is a model organism for studying malaria in humans. Protein 4ADS is a complex (meaning it's composed of multiple individual proteins) that synthesizes the molecule pyridoxal phosphate (PLP), or vitamin B<sub>6</sub>.


## 1UBQ: Ubiquitin (1.8 Å resolution)
Ubiquitin is so named because of its ubiquitous nature in most eukaryotes (basically animals, plants, and fungi). In fact, it's so useful that four diffent genes encode for its creation in humans. Importantly, ubiquitin fosters a process called ubiquintinylation, which can greatly alter a protein's function.


## 1ZQA: DNA polymerase complexed with DNA
DNA polymerases are a class of proteins that are cruicial in the replication and repair of DNA. This particular variant is involved heavily in the DNA repair process in humans. Here, it's pictured with a very small segment of DNA.

# GUI Usage

While much of the GUI functionality is geared toward creating custom representations, the model representation can most easily be changed by navigating to Extra > Quick tab and selecting a representation. Representations can stack.

"Cartoon" is typically the default as it's easiest to get a sense of the motifs that make up the protein.

The "contact" view represents distance between amino acids and has become increasingly useful for machine learning computations as these values are independent of model transformations. 

"Spacefill" and "ball-and-stick" are both common representations of molecules in chemistry, though the later is largely ignored by this GUI as most proteins would be too large to fit on screen if modeled accurately this way.

In [2]:
# Press Shift 

import urllib
import warnings
import Bio
from Bio import BiopythonWarning
from Bio.PDB import PDBParser
import ipywidgets as widgets
import nglview as nv

# Ignore Bio.PDBConstructionWarning for discontinuous protein chains
warnings.simplefilter('ignore', BiopythonWarning)

def visualize_protein(change):
    
    # Handle error if user re-selects default value
    if change['new'] == 'Select a protein...':
        with widget_output:
            print('Please select a protein to display.')
            widget_output.clear_output(wait=True)
        
    else:
        pdb_id = change['new']
        
        # Fetch and cache PDB data
        urllib.request.urlretrieve(f'https://files.rcsb.org/download/{pdb_id}.pdb',
                                   f'pdb_data/{pdb_id}.pdb')
        # Create parser instance    
        parser = PDBParser()

        # Read in protein structure from stored PDB file
        structure = parser.get_structure(f'{pdb_id}',f'pdb_data/{pdb_id}.pdb')
    
        # Visualize struture with accompanying GUI
        view = nv.show_biopython(structure, gui=True)
    
        # Display widget output; clear display when output changes
        with widget_output:
            display(view)
            widget_output.clear_output(wait=True)

# Capture the output of widgets
widget_output = widgets.Output()

# Selector widget
protein_selector = widgets.Dropdown(
    options=['Select a protein...', '1C8Q', '4M6U', '1UBQ', '4ADS', '1ZQA'],
    value='Select a protein...',
    description='PDB ID:',
    disabled=False)

# Handle PDB ID selection from widget
protein_selector.observe(visualize_protein, names='value')

# Display selector widget, output
widgets.VBox([protein_selector, widget_output])

VBox(children=(Dropdown(description='PDB ID:', options=('Select a protein...', '1C8Q', '4M6U', '1UBQ', '4ADS',…