Ensemble
========

<hr style="height:2.5px">

This tutorial shows the user how to appropriately use the `biceps.Ensemble` class to construct the ensemble and apply data restraints that were prepared from the previous tutorial ([Preparation](https://biceps.readthedocs.io/en/latest/examples/Tutorials/Prep_Rest_Post_Ana/preparation.html)). **Please note that in order to compute the relative free energies, one must perform sampling for at least two lambda values.** 

<hr style="height:2.5px">

In [1]:
import numpy as np
import pandas as pd
import biceps

BICePs - Bayesian Inference of Conformational Populations, Version 2.0


In [2]:
print(f"Possible input data extensions: {biceps.toolbox.list_possible_extensions()}")

Possible input data extensions: ['H', 'Ca', 'N', 'J', 'noe', 'pf']


In [3]:
####### Data and Output Directories #######
energies = np.loadtxt('cineromycin_B/cineromycinB_QMenergies.dat')*627.509  # convert from hartrees to kcal/mol
energies = energies/0.5959   # convert to reduced free energies F = f/kT
energies -= energies.min()  # set ground state to zero, just in case

# Point to directory that contains input files 
#input_data = biceps.toolbox.sort_data('cineromycin_B/J_NOE')
input_data = biceps.toolbox.sort_data("J_NOE")
print(f"Input data: {biceps.toolbox.list_extensions(input_data)}")

# Make a new directory if we have to
outdir = 'results'
biceps.toolbox.mkdir(outdir)

Input data: ['J', 'noe']


Another key parameter for BICePs set-up is the type of reference potential for each experimental observables. More information of reference potential can be found [here](https://biceps.readthedocs.io/en/latest/theory.html).

Three reference potentials are supported in BICePs: uniform ('uniform'), exponential ('exp'), Gaussian ('gau').  

As we found in previous research, exponential reference potential is useful in most cases. Some higher level task may require more in reference potential selection (e.g [force field parametrization](https://pubs.acs.org/doi/10.1021/acs.jpcb.7b11871)).

**(Note: It will be helpful to print out what is the order of experimental observables included in BICePs sampling as shown above.)**

The order of the parameters below must follow the order of `biceps.toolbox.list_extensions(data)`. Therefore, our parameters will be a list of dictionaries e.g., $\text{[{'J'}, {'noe'}]}$. Recall, in the last section we saved J coupling files as `*.pkl` files and NOE distances as `*.csv` files. **If the default (`*.pkl` files) is not being used, then we need to specify this inside the corresponding dictionary...**

In [4]:
n_lambdas = 2
lambda_values = np.linspace(0.0, 1.0, n_lambdas)
parameters = [
        dict(ref='uniform', sigma=(0.05, 20.0, 1.02), fmt="pickle"),
        dict(ref='exp', sigma=(0.05, 5.0, 1.02), gamma=(0.2, 5.0, 1.01), fmt="csv")
        ]
pd.DataFrame(parameters)

Unnamed: 0,ref,sigma,fmt,gamma
0,uniform,"(0.05, 20.0, 1.02)",pickle,
1,exp,"(0.05, 5.0, 1.02)",csv,"(0.2, 5.0, 1.01)"


Let's print out the allowed $\sigma_{J}$ space when `sigma=(0.05, 20.0, 1.02)`.

In [5]:
allowed_sigma = np.exp(np.arange(np.log(0.05), np.log(20.0), np.log(1.02)))
print(allowed_sigma)

[ 0.05        0.051       0.05202     0.0530604   0.05412161  0.05520404
  0.05630812  0.05743428  0.05858297  0.05975463  0.06094972  0.06216872
  0.06341209  0.06468033  0.06597394  0.06729342  0.06863929  0.07001207
  0.07141231  0.07284056  0.07429737  0.07578332  0.07729898  0.07884496
  0.08042186  0.0820303   0.08367091  0.08534432  0.08705121  0.08879223
  0.09056808  0.09237944  0.09422703  0.09611157  0.0980338   0.09999448
  0.10199437  0.10403425  0.10611494  0.10823724  0.11040198  0.11261002
  0.11486222  0.11715947  0.11950266  0.12189271  0.12433056  0.12681718
  0.12935352  0.13194059  0.1345794   0.13727099  0.14001641  0.14281674
  0.14567307  0.14858653  0.15155826  0.15458943  0.15768122  0.16083484
  0.16405154  0.16733257  0.17067922  0.17409281  0.17757466  0.18112616
  0.18474868  0.18844365  0.19221253  0.19605678  0.19997791  0.20397747
  0.20805702  0.21221816  0.21646252  0.22079177  0.22520761  0.22971176
  0.234306    0.23899212  0.24377196  0.2486474   0

<h1 style="text-align: left;font-size: 18pt;">Quick note on lambda values:</h1>

We need to specify what lambda value(s) we want to use in BICePs samplings. Briefly, lambda values are similar to the parameters used in free energy perturbation (FEP) and has effect on the BICePs score. The lambda values represent how much prior information from computational modeling is included in BICePs sampling (1.0 means all, 0.0 means none). As we explained in [this work](https://pubs.acs.org/doi/10.1021/acs.jpcb.7b11871), one can consider BICePs score as the relative free energy change between different models. More lambda values will increase the samplings for [multistate Bennett acceptance ratio (MBAR)](http://www.alchemistry.org/wiki/Multistate_Bennett_Acceptance_Ratio) predictions in free energy change and populations. However more lambda values also will slow down the whole process of BICePs (as more samplings need to run), so balancing the accuracy and efficiency is important. To successfully finish a BICePs sampling, lambda values of 0.0 and 1.0 are necessary. Based on our experience, three lambda values of 0.0,0.5,1.0 are suggested.

In [6]:
for lam in lambda_values:
    print(f"lambda: {lam}")
    ensemble = biceps.Ensemble(lam, energies)
    ensemble.initialize_restraints(input_data, parameters)
    # Save each ensemble as a pickle file
    print(f"Saving ensemble_{lam}.pkl ...")
    biceps.toolbox.save_object(ensemble, outdir+"/ensemble_%s.pkl"%lam)

lambda: 0.0
Saving ensemble_0.0.pkl ...
lambda: 1.0
Saving ensemble_1.0.pkl ...


<h1 style="text-align: left;font-size: 18pt;"> Let's take a look at the ensemble (lam=1.0)...</h1>

The ensemble consists of a list of 2 restraint objects for each state. Here we are showing the first 10 states.

In [7]:
print(ensemble.to_list()[:10])

[[<biceps.Restraint.Restraint_J object at 0x7fa139011438>, <biceps.Restraint.Restraint_noe object at 0x7fa13ce13e48>], [<biceps.Restraint.Restraint_J object at 0x7fa139478470>, <biceps.Restraint.Restraint_noe object at 0x7fa13d086470>], [<biceps.Restraint.Restraint_J object at 0x7fa139478240>, <biceps.Restraint.Restraint_noe object at 0x7fa1394752e8>], [<biceps.Restraint.Restraint_J object at 0x7fa13ce235f8>, <biceps.Restraint.Restraint_noe object at 0x7fa13ce5c5c0>], [<biceps.Restraint.Restraint_J object at 0x7fa13ce23cc0>, <biceps.Restraint.Restraint_noe object at 0x7fa13ce235c0>], [<biceps.Restraint.Restraint_J object at 0x7fa13ce796d8>, <biceps.Restraint.Restraint_noe object at 0x7fa13ce5ca90>], [<biceps.Restraint.Restraint_J object at 0x7fa13ce5c908>, <biceps.Restraint.Restraint_noe object at 0x7fa13ce79da0>], [<biceps.Restraint.Restraint_J object at 0x7fa13ce79710>, <biceps.Restraint.Restraint_noe object at 0x7fa13ce5cd68>], [<biceps.Restraint.Restraint_J object at 0x7fa13ce5ca58

<h1 style="text-align: center;font-size: 18pt;">Conclusion</h1>

In this tutorial, we explained how to construct an [ensemble](https://biceps.readthedocs.io/en/latest/biceps.html#ensemble) (for each lambda) of [restraints](https://biceps.readthedocs.io/en/latest/biceps.html#restraint) for each state, which we saved as a pickle file. In the next tutorial, [PosteriorSampler](https://biceps.readthedocs.io/en/latest/examples/Tutorials/Prep_Rest_Post_Ana/posteriorsampler.html) we will Sample the posterior distribution by using the `biceps.PosteriorSampler` class.

<h6 style="align: justify;font-size: 12pt"># <span style="color:red;">NOTE</span>: The following cell is for pretty notebook rendering</h6>

In [8]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../../../theme.css", "r").read()
    return HTML(styles)
css_styling()