<a href="https://colab.research.google.com/github/tekpinar/correlationplus/blob/master/correlationplus.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Install correlationplus version 0.2.1

In [1]:
#@title Install correlationplus {run: "auto"}
!pip install correlationplus

Collecting correlationplus
  Downloading correlationplus-0.2.1-py3-none-any.whl.metadata (5.7 kB)
Collecting biopython==1.76 (from correlationplus)
  Downloading biopython-1.76.tar.gz (16.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.3/16.3 MB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting MDAnalysis>=1.1.0 (from correlationplus)
  Downloading MDAnalysis-2.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (108 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m108.5/108.5 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting matplotlib<=3.3.4,>=3.2.2 (from correlationplus)
  Downloading matplotlib-3.3.4.tar.gz (37.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m37.9/37.9 MB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting prody (from correlationplus)
  Download

In [1]:
#@title Upload your PDB and trajectory files {run: "auto"}
from google.colab import files

print("Upload your PDB file:")
uploaded_pdb = files.upload()
pdb_filename = list(uploaded_pdb.keys())[0]

print("\nUpload your trajectory file (xtc, trr, or dcd):")
uploaded_traj = files.upload()
traj_filename = list(uploaded_traj.keys())[0]

print(f"\nPDB file uploaded: {pdb_filename}")
print(f"Trajectory file uploaded: {traj_filename}")

Upload your PDB file:


Saving aces_human_monomer_wt-nojump-prot-prod-sim1-ref.pdb to aces_human_monomer_wt-nojump-prot-prod-sim1-ref.pdb

Upload your trajectory file (xtc, trr, or dcd):


Saving aces_human_monomer_wt-nojump-prot-prod-sim1.xtc to aces_human_monomer_wt-nojump-prot-prod-sim1.xtc

PDB file uploaded: aces_human_monomer_wt-nojump-prot-prod-sim1-ref.pdb
Trajectory file uploaded: aces_human_monomer_wt-nojump-prot-prod-sim1.xtc


Now, let's select calculation type. If you want to calculate normalized linear mutual information (nlmi), choose nlmi. Otherwise, choose ndcc please.

In [2]:
#@title Select Calculation Type {run: "auto"}
calculationType = "nlmi" #@param ["nlmi", "ndcc"]

In [3]:
#@title Calculate the correlation metric you selected {run: "auto"}
!correlationplus calculate -t $calculationType -p $pdb_filename -f $traj_filename -o $calculationType".dat"



|------------------------------Correlation Plus------------------------------|
|                                                                            |
|        A Python package to calculate, visualize and analyze protein        |
|                           correlation maps.                                |
|               Copyright (C) Mustafa Tekpinar, 2017-2018                    |
|                   Copyright (C) CNRS-UMR3528, 2019                         |
|             Copyright (C) Institut Pasteur Paris, 2020-2021                |
|                         Author: Mustafa Tekpinar                           |
|                       Email: tekpinar@buffalo.edu                          |
|                           Licence: GNU LGPL V3                             |
|     Please cite us: https://pubs.acs.org/doi/10.1021/acs.jcim.1c00742      |
|                              Version: 0.2.1                                |
|-------------------------------------------------

We should have an nlmi.dat or ndcc.dat file in our folder. We will use it to generate images of correlation matrix. In addition, we will generate tcl files for VMD and pml files for Pymol. Here, we are submitting two parameters. '-d 15' paramater tells the program to project interactions of amino acids which has more than 15 Angstrom distance. On the other hand, '-v 0.625' tells the program to project interactions which has 0.625 nlmi (or ndcc) value or more.

In [5]:
#@title Generate 2D visualizations and projections on protein structure {run: "auto"}

!correlationplus visualize -i $calculationType".dat" -t $calculationType -p $pdb_filename -d 10 -v 0.625



|------------------------------Correlation Plus------------------------------|
|                                                                            |
|        A Python package to calculate, visualize and analyze protein        |
|                           correlation maps.                                |
|               Copyright (C) Mustafa Tekpinar, 2017-2018                    |
|                   Copyright (C) CNRS-UMR3528, 2019                         |
|             Copyright (C) Institut Pasteur Paris, 2020-2021                |
|                         Author: Mustafa Tekpinar                           |
|                       Email: tekpinar@buffalo.edu                          |
|                           Licence: GNU LGPL V3                             |
|     Please cite us: https://pubs.acs.org/doi/10.1021/acs.jcim.1c00742      |
|                              Version: 0.2.1                                |
|-------------------------------------------------

**Let's see the interactive correlation map first.**

In [33]:
#@title Generate interactive correlation map as a heatmap. {run: "auto"}

#TODO: Here, x and y axis labels starts from zero. You should add real residue indices to x and y axis.
#      In this way, people can really see interacting residues with correct residue IDs.
import plotly.graph_objects as go
import numpy as np
corrFile=calculationType+".dat"
# Assuming 'nlmi.dat' is in the current working directory
try:
    data = np.loadtxt(corrFile)
except FileNotFoundError:
    print("Error: 'nlmi.dat' not found. Please make sure the file exists in the current directory.")
    exit()

fig = go.Figure(data=go.Heatmap(
                   z=data,
                   colorscale='Turbo',
                   showscale=True
                   ))

fig.update_layout(
    title=calculationType.upper()+' Heatmap',
    xaxis_title='Residues',
    yaxis_title='Residues',
    xaxis = dict(
        scaleanchor = "y",
        scaleratio = 1,
    ),
    width=800,  # Set the width of the plot
    height=800,  # Set the height of the plot
    plot_bgcolor='rgba(0,0,0,0)' # transparent background
)

fig.show()

In [34]:
#@title Build a network and calculate centralities {run: "auto"}

!correlationplus analyze -i $calculationType".dat" -t $calculationType -p $pdb_filename -d 100



|------------------------------Correlation Plus------------------------------|
|                                                                            |
|        A Python package to calculate, visualize and analyze protein        |
|                           correlation maps.                                |
|               Copyright (C) Mustafa Tekpinar, 2017-2018                    |
|                   Copyright (C) CNRS-UMR3528, 2019                         |
|             Copyright (C) Institut Pasteur Paris, 2020-2021                |
|                         Author: Mustafa Tekpinar                           |
|                       Email: tekpinar@buffalo.edu                          |
|                           Licence: GNU LGPL V3                             |
|     Please cite us: https://pubs.acs.org/doi/10.1021/acs.jcim.1c00742      |
|                              Version: 0.2.1                                |
|-------------------------------------------------

In [35]:
#@title Plot interactive 2D plots of 'Current Flow Betweenness' centrality. {run: "auto"}

import plotly.graph_objects as go
import pandas as pd

# Assuming 'correlation_betweenness_value_filter0.30.dat' is in the current directory
# Replace with the actual path if needed
df = pd.read_csv('correlation_current_flow_betweenness_value_filter0.30.dat', sep='\s+', header=None, names=['Residues', 'Current Flow Betweenness'])
#print(df)
fig = go.Figure()
fig.add_trace(go.Scatter(x=df['Residues'], y=df['Current Flow Betweenness'], mode='lines+markers'))

fig.update_layout(
    title="Interactive Current Flow Betweenness Plot",
    xaxis_title="Residues",
    yaxis_title="Current Flow Betweenness",
)

fig.show()

In [36]:
#@title Plot interactive 2D plots of 'Current Flow Closeness' centrality. {run: "auto"}

import plotly.graph_objects as go
import pandas as pd

# Assuming 'correlation_betweenness_value_filter0.30.dat' is in the current directory
# Replace with the actual path if needed
df = pd.read_csv('correlation_current_flow_closeness_value_filter0.30.dat', sep='\s+', header=None, names=['Residues', 'Current Flow Closeness'])
#print(df)
fig = go.Figure()
fig.add_trace(go.Scatter(x=df['Residues'], y=df['Current Flow Closeness'], mode='lines+markers'))

fig.update_layout(
    title="Current Flow Closeness Centrality",
    xaxis_title="Residues",
    yaxis_title="Current Flow Closeness",
)
fig.update_traces(line_color='red', line_width=2)
fig.show()

In [37]:
#@title Plot interactive 2D plots of 'Eigenvector' centrality. {run: "auto"}

import plotly.graph_objects as go
import pandas as pd

# Assuming 'correlation_betweenness_value_filter0.30.dat' is in the current directory
# Replace with the actual path if needed
df = pd.read_csv('correlation_eigenvector_value_filter0.30.dat', sep='\s+', header=None, names=['Residues', 'Eigenvector'])
#print(df)
fig = go.Figure()
fig.add_trace(go.Scatter(x=df['Residues'], y=df['Eigenvector'], mode='lines+markers'))

fig.update_layout(
    title="Interactive Eigenvector Centrality Plot",
    xaxis_title="Residues",
    yaxis_title="Eigenvector",
)
fig.update_traces(line_color='orange', line_width=2)
fig.show()

Let's try to visualize the centralities on the protein.

**Let's start to work on sedy calculations a bit here!**

In [55]:
!git clone https://gitlab.com/tekpinar/sedy.git

Cloning into 'sedy'...
remote: Enumerating objects: 408, done.[K
remote: Counting objects: 100% (49/49), done.[K
remote: Compressing objects: 100% (49/49), done.[K
remote: Total 408 (delta 28), reused 0 (delta 0), pack-reused 359 (from 1)[K
Receiving objects: 100% (408/408), 257.68 MiB | 31.14 MiB/s, done.
Resolving deltas: 100% (247/247), done.


In [56]:
!cd sedy && pip install -e .

Obtaining file:///content/sedy
  Preparing metadata (setup.py) ... [?25l[?25hdone
Installing collected packages: sedy
  Running setup.py develop for sedy
Successfully installed sedy-0.1.3


In [40]:
!sedy dfi -p $pdb_filename -t $traj_filename -o dfi.dat



| sedy       :  A Python toolkit to investigate relations between protein sequences and dynamics. 
|                                                                                                 
| Copyright   (C) Mustafa Tekpinar 2021-2024                                                           
| Address      :  Department of Physics, Van YYU, 65080, Van, Turkey.                   
| Email        :  tekpinar@buffalo.edu                                                            
| Licence      :  GNU LGPL V3                                                                     
|                                                                                                 
| Documentation:                                                                                  
| Citation     : .................................................................................
| Version      : 0.1.3                                                                            


@> Calculat

In [41]:
!sedy loadit -p $pdb_filename -i dfi.dat -o dfi.pdb -s 100



| sedy       :  A Python toolkit to investigate relations between protein sequences and dynamics. 
|                                                                                                 
| Copyright   (C) Mustafa Tekpinar 2021-2024                                                           
| Address      :  Department of Physics, Van YYU, 65080, Van, Turkey.                   
| Email        :  tekpinar@buffalo.edu                                                            
| Licence      :  GNU LGPL V3                                                                     
|                                                                                                 
| Documentation:                                                                                  
| Citation     : .................................................................................
| Version      : 0.1.3                                                                            


@> 9457 ato

In [42]:
#@title Plot interactive 2D plots of DFI. {run: "auto"}

import plotly.graph_objects as go
import pandas as pd

# Assuming 'dfi.dat' is in the current directory
# Replace with the actual path if needed
df = pd.read_csv('dfi.dat', sep='\s+')
#print(df)
fig = go.Figure()
fig.add_trace(go.Scatter(x=df['#Resid'], y=df['Value'], mode='lines+markers', marker_symbol='square' , line={'dash': 'dot', 'color': 'green', 'width':2}))

fig.update_layout(
    title="Interactive DFI Plot",
    xaxis_title="Residues",
    yaxis_title="DFI",
)

fig.show()

In [43]:
!pip install py3Dmol

Collecting py3Dmol
  Downloading py3Dmol-2.4.2-py2.py3-none-any.whl.metadata (1.9 kB)
Downloading py3Dmol-2.4.2-py2.py3-none-any.whl (7.0 kB)
Installing collected packages: py3Dmol
Successfully installed py3Dmol-2.4.2


In [44]:
#@title Display 3D structure {run: "auto"}
import py3Dmol
import pandas as pd
import re

def get_bfactor_range(pdb_file):
    """
    Extract B-factor values from PDB file and return their range

    Parameters:
    pdb_file (str): Path to the PDB file

    Returns:
    tuple: (minimum B-factor, maximum B-factor)
    """
    bfactors = []

    with open(pdb_file, 'r') as f:
        for line in f:
            if line.startswith('ATOM') or line.startswith('HETATM'):
                try:
                    # B-factor is typically in columns 61-66
                    bfactor = float(line[60:66].strip())
                    bfactors.append(bfactor)
                except (ValueError, IndexError):
                    continue

    if not bfactors:
        return (0, 100)  # default range if no B-factors found

    return (min(bfactors), max(bfactors))

def visualize_protein_bfactor(pdb_file):
    """
    Visualize protein structure in cartoon representation colored by B-factor
    using automatically determined range and rainbow colors

    Parameters:
    pdb_file (str): Path to the PDB file
    """

    # Create a py3Dmol view instance
    view = py3Dmol.view()

    # Get B-factor range from the file
    bfactor_min, bfactor_max = get_bfactor_range(pdb_file)

    # Load the PDB file
    with open(pdb_file, 'r') as f:
        pdb_data = f.read()

    # Add the molecule to the viewer
    view.addModel(pdb_data, "pdb")

    # Set cartoon representation with rainbow coloring based on B-factor
    view.setStyle({'cartoon': {
        'colorscheme': {
            'prop': 'b',
            'gradient': 'linear',  # Using rainbow color scheme
            'min': bfactor_min,
            'max': bfactor_max,
            'colors': ["blue", "white", "red"]
        }
    }})

    # Center and zoom the view
    view.zoomTo()

    # Add legend for B-factor coloring
    view.addPropertyLabels(
        prop='b',
        gradient='bwr',
        min=bfactor_min,
        max=bfactor_max,
        legend={'x': 0.85, 'y': 0.5}
    )

    # Add text showing the B-factor range
    view.addLabel(f"B-factor range: {bfactor_min:.2f} - {bfactor_max:.2f}",
                 {'position': {'x': -20, 'y': -20, 'z': 0},
                  'backgroundColor': 'white',
                  'fontColor': 'black'})

    return view

# Replace with your PDB file path
pdb_file = "dfi.pdb"
view = visualize_protein_bfactor(pdb_file)
view.show()

# Optional: Save the visualization as HTML
# view.save('protein_visualization.html')

In [57]:
!sedy schlitter -p $pdb_filename -t $traj_filename -o schlitter.dat



| sedy       :  A Python toolkit to investigate relations between protein sequences and dynamics. 
|                                                                                                 
| Copyright   (C) Mustafa Tekpinar 2021-2024                                                           
| Address      :  Department of Physics, Van YYU, 65080, Van, Turkey.                   
| Email        :  tekpinar@buffalo.edu                                                            
| Licence      :  GNU LGPL V3                                                                     
|                                                                                                 
| Documentation:                                                                                  
| Citation     : .................................................................................
| Version      : 0.1.3                                                                            


@> Calculat

In [58]:
#@title Plot interactive 2D plots of Schlitter Entropy. {run: "auto"}

import plotly.graph_objects as go
import pandas as pd

# Assuming 'dfi.dat' is in the current directory
# Replace with the actual path if needed
df = pd.read_csv('schlitter.dat', sep='\s+')
#print(df)
fig = go.Figure()
fig.add_trace(go.Scatter(x=df['#Resid'], y=df['Value'], mode='lines+markers', marker_symbol='square' , line={'dash': 'dot', 'color': 'orange', 'width':2}))

fig.update_layout(
    title="Interactive Schlitter Entropy Plot",
    xaxis_title="Residues",
    yaxis_title="Schlitter Entropy",
)

fig.show()

In [59]:
!sedy loadit -p $pdb_filename -i schlitter.dat -o schlitter.pdb



| sedy       :  A Python toolkit to investigate relations between protein sequences and dynamics. 
|                                                                                                 
| Copyright   (C) Mustafa Tekpinar 2021-2024                                                           
| Address      :  Department of Physics, Van YYU, 65080, Van, Turkey.                   
| Email        :  tekpinar@buffalo.edu                                                            
| Licence      :  GNU LGPL V3                                                                     
|                                                                                                 
| Documentation:                                                                                  
| Citation     : .................................................................................
| Version      : 0.1.3                                                                            


@> 9457 ato

In [60]:
pdb_file = "schlitter.pdb"
view = visualize_protein_bfactor(pdb_file)
view.show()

In [61]:
!sedy rmsf -p $pdb_filename -t $traj_filename -o rmsf.dat



| sedy       :  A Python toolkit to investigate relations between protein sequences and dynamics. 
|                                                                                                 
| Copyright   (C) Mustafa Tekpinar 2021-2024                                                           
| Address      :  Department of Physics, Van YYU, 65080, Van, Turkey.                   
| Email        :  tekpinar@buffalo.edu                                                            
| Licence      :  GNU LGPL V3                                                                     
|                                                                                                 
| Documentation:                                                                                  
| Citation     : .................................................................................
| Version      : 0.1.3                                                                            


@> Calculat

In [62]:
#@title Plot interactive 2D plots of RMSF. {run: "auto"}

import plotly.graph_objects as go
import pandas as pd

# Assuming 'dfi.dat' is in the current directory
# Replace with the actual path if needed
df = pd.read_csv('rmsf.dat', sep='\s+')
#print(df)
fig = go.Figure()
fig.add_trace(go.Scatter(x=df['#Resid'], y=df['Value'], mode='lines+markers', marker_symbol='square' , line={'dash': 'dot', 'color': 'blue', 'width':2}))

fig.update_layout(
    title="Interactive RMSF Plot",
    xaxis_title="Residues",
    yaxis_title="RMSF",
)

fig.show()

In [63]:
!sedy loadit -p $pdb_filename -i rmsf.dat -o rmsf.pdb



| sedy       :  A Python toolkit to investigate relations between protein sequences and dynamics. 
|                                                                                                 
| Copyright   (C) Mustafa Tekpinar 2021-2024                                                           
| Address      :  Department of Physics, Van YYU, 65080, Van, Turkey.                   
| Email        :  tekpinar@buffalo.edu                                                            
| Licence      :  GNU LGPL V3                                                                     
|                                                                                                 
| Documentation:                                                                                  
| Citation     : .................................................................................
| Version      : 0.1.3                                                                            


@> 9457 ato

In [64]:
pdb_file = "rmsf.pdb"
view = visualize_protein_bfactor(pdb_file)
view.show()