# Protein Structure

## Overview

Our goal is to build familiarity with protein structure and sequence files and gain some intuition into the utility of protein structures. In part 1, we will visualize protein structures and determine if we can say anything about the protein's function. In part 2, we will then search for the proteins using their amino acid sequence to check their actual function. 

In this notebook we will take a look at several protein structures. These structures were predicted automatically from alphafold and were downloaded from the [AlphaFold database](https://alphafold.ebi.ac.uk/).

There are two types of files that we will work with here (.pdb and .fasta files). Both of these files are plain text files (you can open them in any text editor). Each file has a standard format for the information it represents.
* .pdb files: protein structure, this file specifies the coordinates of each atom in each amino acid of the protein
* .fasta files: protein sequence, this file lists the amino acid sequence of the protein (Note: .fasta files can contain either amino acid or nucleotide sequences)

We will first load the .pdb files and visulaize the protein structures. Next we will load the .fasta files and search the NCBI database via BLAST to identify the protein.

In [1]:
# Import cell (RUN THIS CELL TO TEST YOUR INSTALL/ENVIRONMENT)
# part 1
from Bio.PDB import *
import nglview as nv
import ipywidgets
# part 2
from Bio import SeqIO
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML



## Part 1: Visualize Structures

In this section we will load and visualize the protein structure files. Replace the text "P1_structure.pdb" with the name of each of the protein .pdb files and run the code block. Once the visualization is generated, you can click and drag to spin the protein around and scroll to zoom in and out. Copy the code block and repeat this for each of the protein files.

Here we are using a simple approach to view the proteins with the biopython (Bio.PDB) and nglview packages. There are many more sophisticated tools that allow you to visualize proteins and run analyses on them such as "molecular docking". If you are interested in exploring protein structure analysis further a popular and more powerful software tool is [pymol](https://pymol.org/2/).

In [7]:
#LDH-A Structure
pdb_parser = PDBParser()
structure = pdb_parser.get_structure("P", "AF-Q6ZMR3-F1-model_v4.pdb")
view = nv.show_biopython(structure)
view

NGLWidget()

In [8]:
# Load the predicted structure of LDH-A
pdb_parser = PDBParser()
structure = pdb_parser.get_structure("P", "AF-Q6ZMR3-F1-model_v4.pdb")

# Get the first model of the structure
model = structure[0]

# Get the chain containing the active site residues
chain_id = "A"
chain = model[chain_id]

# Get the residues within a certain distance of the substrate
substrate_residues = [residue for residue in chain.get_residues() if residue.get_resname() == "LAC"]
active_site_residues = [residue for residue in chain.get_residues() if residue.get_id()[1] in range(163, 168)]

# Visualize the structure and highlight the active site residues
view = nv.show_biopython(structure)
view.add_representation("cartoon", selection=f"chain {chain_id}")
view.add_representation("ball+stick", selection=f"resid {'+'.join(map(str, active_site_residues))}")
view.add_representation("ball+stick", selection=f"resname LAC")
view


NGLWidget()