# Method 1 (PLUMED masterclass 21-2)

## 1. Setting things up

Here we first clear the outputs in case that this notebook needs to be rerun. 

In [1]:
%%bash  
rm -r histograms/* COLVAR* HILLS* fes* bck* weights.dat *yaml plumed_reweight.dat|| true  # to ignore error if the files do not exit
cp ../input_files/* . || true  # to ignore error
ls

COLVAR
HILLS_LAMBDA
Method_1.ipynb
histograms
plumed.dat


rm: histograms/*: No such file or directory
rm: fes*: No such file or directory
rm: bck*: No such file or directory
rm: weights.dat: No such file or directory
rm: *yaml: No such file or directory
rm: plumed_reweight.dat: No such file or directory


Below we import the required packages and set up the settings for plotting.

In [2]:
import os
import glob
import numpy as np

## 2. Reweight the data and generate histograms for block averaging

As a referene, the following is the content of the PLUMED input file used to run the alchemical metadynamics simulaiton.

In [3]:
%%bash 
cat plumed.dat

lambda: EXTRACV NAME=lambda
  
METAD ...
ARG=lambda
SIGMA=0.01     # small SIGMA ensures that the Gaussian approaximate a delta function
HEIGHT=1.2388545199729883   # 0.5 kT
PACE=10        # should be nstexpanded
GRID_MIN=0     # index of alchemical states starts from 0
GRID_MAX=5     # we have 6 states in total
GRID_BIN=5     # 5 bins between 6 states
TEMP=298       # same as ref_t
BIASFACTOR=50   
LABEL=metad    
FILE=HILLS_LAMBDA
... METAD

PRINT STRIDE=10 ARG=lambda,metad.bias FILE=COLVAR


Now we prepare the PLUMED input file for reweighting and generating histograms. As shown below, the block size was set by the `CLEAR` keyword in the `HISTOGRAM` action and the `STRIDE` keyword in the `DUMPGRID` action. 

In [4]:
%%bash 

cat > "plumed_reweight.dat" << EOF
lambda: READ FILE=COLVAR VALUES=lambda IGNORE_TIME IGNORE_FORCES
  
METAD ...

ARG=lambda
SIGMA=0.01
HEIGHT=0     # kJ/mol
PACE=50000000        # should be nstexpanded
GRID_MIN=0
GRID_MAX=5
GRID_BIN=5
TEMP=298
BIASFACTOR=50
LABEL=metad
FILE=HILLS_LAMBDA  # read in the HILLS file
RESTART=YES
... METAD

PRINT STRIDE=1 ARG=lambda,metad.bias FILE=COLVAR_REWEIGHT

rw: REWEIGHT_BIAS TEMP=298
PRINT ARG=lambda,rw FILE=weights.dat

HISTOGRAM ...
ARG=lambda
LOGWEIGHTS=rw
GRID_MIN=0
GRID_MAX=5
GRID_BIN=5
CLEAR=1000
NORMALIZATION=true
KERNEL=DISCRETE
LABEL=hhh
... HISTOGRAM

DUMPGRID GRID=hhh FILE=histograms/hist.dat STRIDE=1000

EOF

With `plumed_reweight.dat`, we run the plumed driver, after which the files `COLVAR_REWEIGHT`, `weights.dat` and all the files in the pre-existing folder `histograms` are generated. 

In [5]:
%%bash
source /Users/Wei-TseHsu/Documents/Software/PLUMED/plumed2/sourceme.sh
export PLUMED_MAXBACKUP=10000

plumed driver --plumed plumed_reweight.dat --noatoms

PLUMED: PLUMED is starting
PLUMED: Version: 2.8.0-dev (git: 63008b018) compiled on Nov 21 2020 at 02:44:56
PLUMED: Please cite these papers when using PLUMED [1][2]
PLUMED: For further information see the PLUMED web page at http://www.plumed.org
PLUMED: Root: /Users/Wei-TseHsu/Documents/Software/PLUMED/plumed2/
PLUMED: For installed feature, see /Users/Wei-TseHsu/Documents/Software/PLUMED/plumed2//src/config/config.txt
PLUMED: Molecular dynamics engine: driver
PLUMED: Precision of reals: 8
PLUMED: Running over 1 node
PLUMED: Number of threads: 1
PLUMED: Cache line size: 512
PLUMED: Number of atoms: 0
PLUMED: File suffix: 
PLUMED: FILE: plumed_reweight.dat
PLUMED: Action READ
PLUMED:   with label lambda
PLUMED:   with stride 1
PLUMED:   reading data from file COLVAR
PLUMED:   reading value lambda and storing as lambda
PLUMED: Action METAD
PLUMED:   with label metad
PLUMED:   with arguments lambda
PLUMED:   added component to this action:  metad.bias 
PLUMED:   Gaussian width  0.010000  

## 3. Perform block averaging on the histograms

At this point, there are 250 histogram files stored in the folder `histograms`. (In the 5 ns simulation, there are 2500000 steps (dt=0.002 ps). Since the `STRIDE` (the logging frequencey of `COLVAR`) used in metadynamics was 10 steps and the block size was 1000, the actual block size was 10000 steps, leading to 250 blocks in this case.)

As mentioned in the masterclass, with the weighted histogram of each block, the expectation and variance of the weighted average can be expressed as below:

$$\mathbb{E}(\overline{X}_w) = \overline{X} \qquad \textrm{and} \qquad \textrm{var}(\overline{X}_w) = \frac{\sum_{i=1}^N w_i^2 (X_i - \overline{X}_w)^2 }{(\sum_{i=1}^N w_i)^2}
$$

To calculate the expectation and the variance (hence the uncertainty, which is the square root of the variance), we defined the following to functions.

In [6]:
def read_histogram(hist_file):
    data = np.loadtxt(hist_file)
    with open(hist_file, "r") as f:
        for line in f:
            if line.startswith('#! SET normalisation'):
                norm = float(line.split()[-1])
    hist = data[:, -1]  # normalized probability: the last column
    
    return norm, hist

In [7]:
def calculate_free_energy(hist_dir, hist_files):
    # Step 1: Calculate the average of the weighted histogram for each gridded CV value
    w_sum, avg = 0, 0
    for f in hist_files:
        norm, hist = read_histogram(f'{hist_dir}/{f}')
        w_sum += norm
        avg += norm * hist
    avg = avg / w_sum
    
    # Step 2: Calculate the uncertainty of each gridded CV value
    error = 0
    for f in hist_files:
        norm, hist = read = read_histogram(f'{hist_dir}/{f}')
        error += norm * norm * (hist - avg) ** 2
    error = np.sqrt(error / (w_sum **2))
        
    # Step 3: Conver to the uncertainty in free energy 
    fes = -np.log(avg)    # units: kT
    f_err = error / avg   # units: kT
    
    return fes, f_err

Then, we calculate the free energy difference and the corresponding uncertainty as below. (Block size: 10000 simulation steps, or 20 ps.)

In [8]:
files = glob.glob('histograms/*hist.dat')
fes, f_err = calculate_free_energy('.', files)

In [9]:
n_CVs = 1
CV_points = []
for i in range(n_CVs):  
    CV_points.append(np.transpose(np.loadtxt('histograms/hist.dat'))[i])

output = open('fes_blocks.dat', 'w')
for i in range(len(CV_points[0])):
    CV_str = ''
    for j in range(n_CVs):  
        CV_str += f'{CV_points[j][i]: .3f}  '
        output.write(f'{CV_str}   {fes[i]: .6f}   {f_err[i]: .6f}\n')
output.close()

In [10]:
%%bash
cat fes_blocks.dat

 0.000      3.728798    0.057156
 1.000      3.831584    0.053884
 2.000      3.770377    0.048995
 3.000      3.159967    0.029283
 4.000      1.206905    0.004479
 5.000      0.528123    0.007227


In [11]:
print(f'The free energy difference is {fes[-1]-fes[0]: .6f} +/- {np.sqrt(f_err[-1] ** 2 + f_err[0] ** 2): .6f} kT.')

The free energy difference is -3.200675 +/-  0.057611 kT.


## 4. Questions

We repeated the 5 ns 1D alchemical metadynamics of the argon atom disappearing from water 20 times to get 20 repetitions. Above is the data analysis of the first repetition. To assess the influence of the block size on the uncertainty, for each replicate, we varied the values of `STRIDE` and `CLEAR` and calcualted the uncertainty of the free energy difference. Below are the results for all the repetitions.

<img src=https://i.imgur.com/X0Z3mqN.jpg width=600>

As shown above, 20 ps seems the be the most reasonable block size. Any blocks larger than this size are subject to larger noises. Therefore, for each repetition, we used 20 ps as the block size and calculated the free energy difference and its uncertainty. In the figure below, we use Gaussians to show the overlap between different repetitions, where the center and the width of the Gaussian are the free energy difference and its uncertainty, respectively. 

<img src=https://i.imgur.com/56UI6s5.png width=600>

As a result, the mean and the standard deviation of the Gaussian centers shown above are -3.120 kT and 0.415 kT, respectively. Since the uncertainty estimated from each repetition was about 0.05 kT, the estimated uncertainty seems to be highly statistically inconsistent with each other. As a reference, the free energy estimated from a 5 ns expanded ensemble, which should be roughly the same performance as the 1D alchemical metadynamics, was -3.137 +/- 0.135 kT. According to our prior experiences, the truncation of the equilibration regime and the use of `UPDATE_UNTIL` in `METAD` to ensure rigorously static bias at the end of the simulaiton didn't seem to improve the situation. We therefore are wondering if you have suggestions on solving this statistical inconsistency. Thank you so much for your input!