# Questions about free energy calculations 

In this notebook, we describe our questions about free energy calculations using the block bootstrap method. In a nutshell, we are mainly wondering the best strategies for deciding how much we should truncate a simulation and how much we should average the weight in the simulation for reweighting. Below we first demonstrate our workflow for free energy calculations and the results we got for two different systems we plan to present in our paper. Relevant files were stored in folders `System_2` and `System_3`.

## Our protocol for free energy calculations 

To provide more details about our protocol for free energy calculations, here we take the solvation free energy calculation (200 ns) from the 2D alchemical metadynamics of System 2 as an example. System 2 is a system composed of 4 interaction sites show below. Its two metastable states can be differentiated by the only dihedral angle of the system, so we biased the dihedral in the 2D simulation. 

<center>
<img src=https://i.imgur.com/PmJsmoj.png width=350>
</center>

In 2D alchemical metadynamics simulation, the following PLUMED input file was used:

In [2]:
%%bash 
cat System_2/plumed.dat

theta: TORSION ATOMS=1,2,3,4
lambda: EXTRACV NAME=lambda

METAD ...
ARG=theta,lambda 
SIGMA=0.5,0.0001     # small SIGMA ensure that the Gaussian approaximate a delta function
HEIGHT=4.955418079891953    # kJ/mol
PACE=500        
GRID_MIN=-pi,0   # index of alchemical states starts from 0
GRID_MAX=pi,19   # we have 20 states in total
GRID_BIN=100,19
TEMP=298
BIASFACTOR=120
LABEL=metad    
FILE=HILLS_2D
... METAD

PRINT STRIDE=10 ARG=theta,lambda,metad.bias FILE=COLVAR


To perform free energy calculations for such a simulation, in the folder `System_2`, we can simply execute the following one command:
```
python calculate_free_energy.py -d ./ -n 200 500 1000 2000 -hh HILLS_2D -t 0.3 -a 0.2
```

The method used in `calculate_free_energy.py` is the same as the protocol suggested by Dr. Bussi in  [Check.ipynb](https://github.com/wehs7661/lambda_MetaD_questions/blob/master/archived_questions/210314_questions/Method_1/Check.ipynb). `calculate_free_energ.py` has more lines just to monitor memory usage, print organized results, make the calculation more memory efficient, and fix corrupted PLUMED output files, etc. Assuming the arguments passed by the command shown above, `calculate_free_energy.py` performs the method by following the steps below: 
- **Step 1**: Modify the `HILLS_2D` file and save as `HILLS_2D_modified`
  - With `-a 0.2`, we average the last 20% of the biasing potential using the function `average_bias` (which is basically the same as the function `time_average` in `Check.ipynb`). The function updates the second last column (`height`) dataframe read from `HILLS_2D` (the HILLS file of the simulation). We then write this updated dataframe as `HILLS_2D_modified`.
- **Step 2**: Calculate the unbiasing weight based on the time-averaged biasing potential
  - The code first runs plumed driver with the input file `plumed_sum_bias.dat` to sum up the heights in `HILLS_2D_modified`. Specifically, plumed driver reads in `COLVAR` and `HILLS_2D_modified`, calculates the biasing potential averaged over the last 20% of the simulation (by summing up the heights in `HILLS_2D_modified`), and output `COLVAR_SUM_BIAS`. In `COLVAR_SUM_BIAS`, the last column `metad.bias` is the time-averaged bias, which will be subtracted with its maximum later in the function `block_boostrap` to serve as the unbiasing weight for reweighting. 
  - For System 2, below is the content of `plumed_sum_bias.dat`:

In [3]:
%%bash
cat System_2/plumed_sum_bias.dat

theta: READ FILE=COLVAR VALUES=theta IGNORE_TIME IGNORE_FORCES
lambda: READ FILE=COLVAR VALUES=lambda IGNORE_TIME IGNORE_FORCES

METAD ...
ARG=theta,lambda 
SIGMA=0.5,0.0001     # small SIGMA ensure that the Gaussian approaximate a delta function
HEIGHT=0     
PACE=50000000    
GRID_MIN=-pi,0   # index of alchemical states starts from 0
GRID_MAX=pi,19   # we have 20 states in total
GRID_BIN=100,19
TEMP=298
BIASFACTOR=60
LABEL=metad    
FILE=HILLS_2D_modified
RESTART=YES
... METAD

PRINT STRIDE=1 ARG=theta,lambda,metad.bias FILE=COLVAR_SUM_BIAS


- **Step 3**: Perform block boostrap to calculate the free energy difference and the corresponding uncertainty
  - Lastly, the code performs block boostrap by running the function `block_bootstrap`, which is basically the same as the function `analyze` in `Check.ipynb`. 
  - In this case, the function first truncates the first 30% of the CV time series and then perform bootstrapping. 

## Free energy calculations of System 2

As a result, we have the following free energy estimates given different average fractions and truncation fractions.

| Attempt | Avg fraction | Truncation | Free energy difference |
|---------|--------------|------------|------------------------|
|    1    |      0.2     |      0     |  131.656 +/- 0.099 kT  |
|    2    |      0.3     |      0     |  132.693 +/- 0.101 kT  |
|    3    |      0.4     |      0     |  132.490 +/- 0.105 kT  |
|    4    |      0.5     |      0     |  132.302 +/- 0.103 kT  |
|    5    |      0.2     |     0.1    |  131.657 +/- 0.111 kT  |
|    6    |      0.3     |     0.1    |   132.694 +/-0.110kT   |

In this case, I chose the truncation fraction to be either 0 or 0.1 because the Gaussian height as a function of time became stationary very fast, as shown below. (I assume that as long as we truncate fraction should just correpond to the region where the Gaussian biasing potential has not become stationary.)

<center>
<img src=https://i.imgur.com/dlnZPsW.png width=350>
</center>

As can be seen from the table, although the free energy estimates are statistically consistent with each other, different average fractions and truncation fractions indeed led to slightly different results. **With this, we are wondering if there is a more rigorous strategy to determine what average or truncation fractions to use, i.e. which single value makes the most sense to be reported in the paper?**

## Free energy calculations of System 3

In System 3 (a host-guest binding complex), we used 2D alchemical metadynamics (200 ns) to calculate the free energy difference of the alchemical process of decoupling the interactions between the guest molecule and the host molecule, which was a part of the thermodynamic cycle for calculating the binding free energy for the binding complex. In the 2D simulation, the configurational CV was the number of water molecules in the binding cavity (denoted as $N$). In the bound state and the unbound states, typically there are 0.7-5.3 and 4.5-10.5 water molecules in the binding cavity, respectively. Therefore, we set a potential at $N=0.7$ and $N=10.5$, respectively. Since the sampling in regions like $N<0.7$ or $N>10.5$ is likely to be unphysical, we discarded the samples in such regions. (Specifically, when calculating the unbiasing weights for reweighting using `average_bias`, we ignore unphysical samples. We also discard the unphysical samples from `COLVAR_SUM_BIAS` when contructing the histograms.)

Below is the content of the PLUMED input file for the simulation. 

In [4]:
%%bash
cat System_3/plumed.dat

center: CENTER ATOMS=1-144            # geometric center of the host molecule
water_group: GROUP ATOMS=207-6656:3   # oxygen atom of the water molecules
n: COORDINATION GROUPA=center GROUPB=water_group R_0=0.35
lambda: EXTRACV NAME=lambda

METAD ...
ARG=lambda,n
SIGMA=0.01,0.05    # small SIGMA ensure that the Gaussian approaximate a delta function
HEIGHT=12.394781044629076
PACE=10
GRID_MIN=0,0     # index of alchemical states starts from 0
GRID_MAX=39,20     # we have 40 states in total
GRID_BIN=39,100
TEMP=298
BIASFACTOR=150
LABEL=metad
FILE=HILLS_2D
... METAD

UPPER_WALLS ...
 ARG=n
 AT=10.5
 KAPPA=200.0
 EXP=2
 EPS=1
 OFFSET=0
 LABEL=uwall
... UPPER_WALLS

LOWER_WALLS ...
 ARG=n
 AT=0.7
 KAPPA=200.0
 EXP=2
 EPS=1
 OFFSET=0
 LABEL=lwall
... LOWER_WALLS

PRINT STRIDE=10 ARG=* FILE=COLVAR


To have a better sense about how much we should truncate the simulation and how much we should average the biasing potentials in free energy calculations, here we first examine the Gaussian bias as a function of time. Notably, in the left figure, the spikes come from the rarely-visited, unphysical region. After discarding data with $N<0.7$ or $N>10.5$, the decrease of the Gaussian bias is smooth, as shown in the right figure.
<center>
<img src=https://i.imgur.com/j7XVSnu.png width=800>
</center>

The Guassian bias became more stationary around 60-100 ns, so in addition to truncation fraction of 0, we also adopted 0.3 and 0.5. For the average fraction, we tried values such as 0.2, 0.3, and 0.4. Below we tabulate the estimates of the free energy difference.