This notebook will load in the needed data from the milestoning simulation and save the cleaned data into a pair of tables for the milestoning coordinates and restraint parameters respectively.

If you are running this on google colab (or another cloud based site) you will need to clone the repository using the cell below (uncomment and run).

In [None]:
#!git clone https://github.com/wesleymsmith/Milestoning_Analysis.git

In [1]:
import numpy as np
import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
import tqdm
import copy
import gc
import sys
import os
import f90nml

import bokeh
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource, CDSView, GroupFilter, HoverTool
from bokeh.plotting import figure, show
from bokeh.transform import factor_cmap
from bokeh.palettes import Spectral11

from bokeh.models.mappers import CategoricalColorMapper

import ipywidgets as widgets
from ipywidgets import interact, interact_manual

import scipy as sp
from scipy.sparse import linalg

In [2]:
dataDir='test_md_data'
#dataDir='Milestoning_analysis/test_md_data' #uncomment this line if you needed to clone the repo
dataFiles=np.sort(os.listdir(dataDir))
dataFiles

array(['window_00.rest', 'window_00_rest.dat', 'window_01.rest',
       'window_01_rest.dat', 'window_02.rest', 'window_02_rest.dat',
       'window_03.rest', 'window_03_rest.dat'], dtype='|S18')

The files ending in '.rest' are the restraint parameters for milestoning. They are formatted as fortran 90 namelists. The files ending in '.dat' are the ouput files for the simulation restraints and are in whitespace delimeted tables.

The column names for the '.dat' files are not given within the data file itself so we need to define them ourselves. The data file will contain some dummy columns which are present to make it more human readable.
We will label them accordingly so they can be easily removed later.

This simulation required a number of other restraints that are not relavent to the milestoning analysis. For instance, there are restraints to keep magnesium bound to key sites along the protein through which the ligand is being pulled so that the channel remains in its open state. For current purposes, we only need the Z coordinates for the waters-ligand restraints. They are labeled as 'W1L_Z' and 'W2L_Z'... More specifically, we only need 'W2L_Z' since this particular subset of the full milestoning data set only used water two to define its softwall restraints.

The cell below shows an example of what we will get upon reading in a '.dat' file

In [3]:
datColNames=np.concatenate([
     ['Frame'],
     ['MG%g'%iiMG for iiMG in np.arange(6)],
     ['DUMMYC_X','C_X','DUMMYC_Y','C_Y','DUMMYC_Z','C_Z','R1'],
     ['DUMMYW1L_X','W1L_X','DUMMYW1L_Y','W1L_Y','DUMMYW1L_Z','W1L_Z','W1L_R'],
     ['DUMMYW2L_X','W2L_X','DUMMYW2L_Y','W2L_Y','DUMMYW2L_Z','W2L_Z','W2L_R'],
    ])
pd.read_csv(dataDir+'/'+dataFiles[1],delim_whitespace=True,
            names=datColNames).head()

Unnamed: 0,Frame,MG0,MG1,MG2,MG3,MG4,MG5,DUMMYC_X,C_X,DUMMYC_Y,...,DUMMYW1L_Z,W1L_Z,W1L_R,DUMMYW2L_X,W2L_X,DUMMYW2L_Y,W2L_Y,DUMMYW2L_Z,W2L_Z,W2L_R
0,0,1.959,1.873,1.94,1.902,2.015,1.946,x:,-13.329,y:,...,z:,110.14,112.844,x:,10.766,y:,26.174,z:,-15.195,32.123
1,500,1.958,1.946,1.963,1.899,1.988,1.888,x:,-13.094,y:,...,z:,110.548,113.265,x:,10.342,y:,26.464,z:,-14.744,32.01
2,1000,1.935,2.071,1.951,1.895,1.986,1.95,x:,-13.316,y:,...,z:,110.709,113.389,x:,10.483,y:,26.262,z:,-14.688,31.864
3,1500,1.969,1.922,1.99,1.912,1.984,1.942,x:,-12.631,y:,...,z:,110.696,113.418,x:,10.123,y:,26.661,z:,-14.631,32.052
4,2000,1.883,1.915,1.894,1.895,2.094,1.984,x:,-12.681,y:,...,z:,111.143,113.829,x:,10.083,y:,26.527,z:,-14.23,31.746


In addition to the reaction coordinate data from the '.dat' files, we also need to know about the parameters of the restraints that were used in each window. As mentioned before, this is found in the '.rest' files. These files contain only soft wall restraint settings.

These fortran 90 namelist formatted files can be loaded using the 'f90nml' package. We show an example in the cell below.

The relevant parameters are 'r2' and 'r3', which define the flat bottom portion of the milestoning restraint, and 'rk2' and 'rk3' which define the force constant of the pseudo-harmonic walls.

In [4]:
f90nml.read(dataDir+'/'+dataFiles[0])

Namelist([('rst',
           [Namelist([('iat', [-1, -1]),
                      ('r1', 95.0),
                      ('r2', 109.0),
                      ('r3', 111.0),
                      ('r4', 135.0),
                      ('rk2', 0.0),
                      ('rk3', 0.0),
                      ('iresid', 0),
                      ('fxyz', [0, 0, 1]),
                      ('outxyz', 1),
                      ('igr1', 102701),
                      ('igr2', 22402)]),
            Namelist([('iat', [-1, -1]),
                      ('r1', 5.0),
                      ('r2', 14.0),
                      ('r3', 16.0),
                      ('r4', 40.0),
                      ('rk2', 100.0),
                      ('rk3', 100.0),
                      ('iresid', 0),
                      ('fxyz', [0, 0, 1]),
                      ('outxyz', 1),
                      ('igr1', 136412),
                      ('igr2', 22402)])])])

We are now ready to start loading in our data. In doing so, we will need to keep track of which window each data file came from.

We will produce two combined tables. One for the '.dat' files and another for the '.rest' files.

In [5]:
datFiles=[datFile for datFile in dataFiles if '.dat' in datFile]
restFiles=[restFile for restFile in dataFiles if '.rest' in restFile]
print 'reaction coordinate data files:',
print datFiles
print 'soft-wall restraint parameter files:',
print restFiles

windowDataTables=[]

restParmNMLname='rst'
restParmNames=['r2','r3','rk2','rk3']
restParmSubsetInds=[0,1]
parmData={}
parmData['Window']=[]
for parmInd in restParmSubsetInds:
    for parmName in restParmNames:
        parmData['W%gL_'%(parmInd+1)+parmName]=[]

for datFile in datFiles:
    print 'loading milestone coordinate data from %s'%datFile
    datFilePath=dataDir+'/'+datFile
    window=np.int(datFile.split('_')[1])
    tempTable=pd.read_csv(datFilePath,delim_whitespace=True,
                          names=datColNames)
    tempTable['Window']=window
    windowTable=tempTable[['Window']].copy()
    windowTable['Time']=tempTable['Frame']
    windowTable['X']=tempTable['W2L_Z'].abs()
    windowDataTables.append(windowTable.copy())
simData=pd.concat(windowDataTables)
print simData.head()    
simData.to_csv("Simulation_Milestone_Coordinate_Data.csv",index=False)

for restFile in restFiles:
    print 'loading milestone restraint data from %s'%restFile
    restFilePath=dataDir+'/'+restFile
    window=np.int(restFile.split('.')[0].split('_')[-1])
    tempNMLdata=f90nml.read(restFilePath)
    parmData['Window'].append(window)
    for restParmInd in restParmSubsetInds:
        for restParmName in restParmNames:
            parmData['W%gL_'%(restParmInd+1)+restParmName].append(tempNMLdata[
                restParmNMLname][restParmInd][restParmName])
windowRestraintTable=pd.DataFrame(parmData)
print windowRestraintTable
windowRestraintTable.to_csv("Simulation_Milestone_Restraint_Data.csv",index=False)

reaction coordinate data files: ['window_00_rest.dat', 'window_01_rest.dat', 'window_02_rest.dat', 'window_03_rest.dat']
soft-wall restraint parameter files: ['window_00.rest', 'window_01.rest', 'window_02.rest', 'window_03.rest']
loading milestone coordinate data from window_00_rest.dat
loading milestone coordinate data from window_01_rest.dat
loading milestone coordinate data from window_02_rest.dat
loading milestone coordinate data from window_03_rest.dat
   Window  Time       X
0       0     0  15.195
1       0   500  14.744
2       0  1000  14.688
3       0  1500  14.631
4       0  2000  14.230
loading milestone restraint data from window_00.rest
loading milestone restraint data from window_01.rest
loading milestone restraint data from window_02.rest
loading milestone restraint data from window_03.rest
   W1L_r2  W1L_r3  W1L_rk2  W1L_rk3  W2L_r2  W2L_r3  W2L_rk2  W2L_rk3  Window
0   109.0   111.0      0.0      0.0    14.0    16.0    100.0    100.0       0
1   107.0   109.0      0.