## Problem 1: The inconsistency in free energy calculations results between 2D alchemical metadynamics starting from different torsional states

## 1. Introduction

As described in README of the repository, the system of interest is a molecule composed of 4 vdW sites with 0 net charges. The force constant of the only dihedral in this molecule was increased to 20 kJ/mol to heighten the free energy barrier in the configurational space. I designed this system to show that 2D alchemical metadynamics is able to estimate the free energy difference (between the coupled and uncoupled states) given sufficient sampling in the configurational space. Below is my workflow for showing this ability of 2D alchemical metadynamics.

- I ran a metadynamics only biasing the torsional angle to extract configurations at two different torsional states. I call these two states as state A (dihedral around 180 degrees) and state B (dihedral around 0 degrees).
- I launched an 100 ns expanded ensemble (EXE) simulation for each of these states to show that alchemical sampling starting from different torsional states would lead to different free energy estimations due to insufficient sampling in the configurational space.
  - EXE starting from state A: 
    - The free energy barrier was able to prevent the system from sampling state B.
    - The free energy difference between $\lambda=1$ and $\lambda=0$ was around (estimated by MBAR) **-2.544 $\pm$ 0.040 kT**.
  - EXE starting from state B: 
    - The free energy barrier was able to prevent the system from sampling state A.
    - The free energy difference between $\lambda=1$ and $\lambda=0$ was around (estimated by MBAR) **-4.949 $\pm$ 0.047 kT**.
- Then, I performed a 100 ns 2D alchemical metadynamics for each of the two torsional states, with the configurational CV being the only dihedral in the molecule. Ideally, 2D alchemical metadynamics starting from different torsional states should give consistent estiamtes of the free energy difference. 

In any simulations above that the alchemical space was biased, the same 8 alchemical states were defined to decouple only the van der Waals interactions.

## 2. Description of the problem

As a result, I found that the free energy difference obtained from 2D alchemical metadynamics were not consistent to each other. To examine this, below I use the same method shown in `lambda_MetaD_questions/archived_questions/Method_1/Check.ipynb`, where I tried to make the code as similar as as possible compared with the one used previously.

First of all, I load in the following functions that were used in `Check.ipynb`.

In [1]:
import plumed
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1994) # makes notebook reproducible
kBT = 2.478956208925815

def analyze(traj, n_blocks, discard=0):
    """
    This function returns average and error with bootstrap
    
    Parameters
    ----------
    traj (pandas.DataFrame): trajectory data (content of COLVAR)
    nblocks (int): number of blocks
    discard (float): discarded fraction
    """
    n = int(len(traj) * (1.0 - discard))   # number of data points considered
    # make sure the number of frames is a multiple of nblocks (discard the first few frames)
    n = (n // n_blocks) * n_blocks
    bias = np.array(traj["metad.bias"])
    bias -= np.max(bias) # avoid overflows
    w = np.exp(bias / kBT)[-n:].reshape((n_blocks, -1)) # shape: (nblocks, nframes in one block), weight for each point
    
    # A: coupled state, B: uncoupled state
    isA = np.int_(traj["lambda"] == 0)[-n:].reshape((n_blocks, -1)) # 1 if in A (np.in_ converts bool to 0 or 1)
    isB = np.int_(traj["lambda"] == np.max(traj["lambda"]))[-n:].reshape((n_blocks, -1)) # 1 if in B
    
    B = 200 # number of bootstrap iterations
    boot = np.random.choice(n_blocks, size=(B, n_blocks))  # draw samples from np.arange(n_blocks), size refers the output size
    popA = np.average(isA[boot], axis=(1,2), weights=w[boot])  # Note that isA[boot] is a 3D array
    popB = np.average(isB[boot], axis=(1,2), weights=w[boot])  # shapes of popA and popB: (B,)

    df = np.log(popA / popB) # this is in kBT units
    popA0 = np.average(isA, weights=w)
    popB0 = np.average(isB, weights=w)
    return np.log(popA0 / popB0), np.std(df)

# time-averaged potential, computed averaging over the final 25%
def time_average(hills, t0=0.75):
    n0 = int(len(hills) * t0)   # number of data points considered
    w = np.hstack((np.ones(n0), np.linspace(1, 0, len(hills) - n0)))  # the weights for the first n0 points are 1
    hills = hills.copy()
    hills.height *= w
    return hills

In terms of the parameters, here I'm using the bias averaged over the last 20% of the simulation to reweight the CV time series and 50 blocks will be used. To calculate the average bias, I use the function `time_average` as below. Instead of using `metad_bias` function shown in `Check.ipynb`, which seems only applicable for 1D alchemical metadynamics, I instead use the plumed driver to sum up the output of `time_average` to get the average bias. 

Below I first calculate the free energy difference for the simulation starting from state A.

In [2]:
hills = plumed.read_as_pandas('state_A/HILLS_2D')
hills_avg = time_average(hills, t0=0.8)
plumed.write_pandas(hills_avg, 'state_A/HILLS_2D_modified')

After writing out the output from `time_average` to `HILLS_2D_modified`, I use the plumed driver with the input file `plumed_sum_bias.dat`.

In [96]:
%%bash
source /home/wei-tse/Documents/Software/PLUMED/plumed2/sourceme.sh
cd state_A/
plumed driver --plumed plumed_sum_bias.dat --noatoms

PLUMED: PLUMED is starting
PLUMED: Version: 2.8.0-dev (git: 9991c4c14) compiled on Nov 12 2020 at 15:24:46
PLUMED: Please cite these papers when using PLUMED [1][2]
PLUMED: For further information see the PLUMED web page at http://www.plumed.org
PLUMED: Root: /home/wei-tse/Documents/Software/PLUMED/plumed2/
PLUMED: For installed feature, see /home/wei-tse/Documents/Software/PLUMED/plumed2//src/config/config.txt
PLUMED: Molecular dynamics engine: driver
PLUMED: Precision of reals: 8
PLUMED: Running over 1 node
PLUMED: Number of threads: 1
PLUMED: Cache line size: 512
PLUMED: Number of atoms: 0
PLUMED: File suffix: 
PLUMED: FILE: plumed_sum_bias.dat
PLUMED: Action READ
PLUMED:   with label theta
PLUMED:   with stride 1
PLUMED:   reading data from file COLVAR
PLUMED:   reading value theta and storing as theta
PLUMED: Action READ
PLUMED:   with label lambda
PLUMED:   with stride 1
PLUMED:   reading data from file COLVAR
PLUMED:   reading value lambda and storing as lambda
PLUMED: Action ME

The content of the PLUMED input file `plumed_sum_bias.dat` is shown below.

In [94]:
%%bash
cat state_A/plumed_sum_bias.dat

theta: READ FILE=COLVAR VALUES=theta IGNORE_TIME IGNORE_FORCES
lambda: READ FILE=COLVAR VALUES=lambda IGNORE_TIME IGNORE_FORCES

METAD ...
ARG=theta,lambda 
SIGMA=0.5,0.0001     # small SIGMA ensure that the Gaussian approaximate a delta function
HEIGHT=0
PACE=500000000        # should be nstexpanded
GRID_MIN=-pi,0   # index of alchemical states starts from 0
GRID_MAX=pi,7    # we have 8 states in total
GRID_BIN=100,7
TEMP=298
BIASFACTOR=60
LABEL=metad    
FILE=HILLS_2D_modified
RESTART=YES
... METAD

PRINT STRIDE=1 ARG=theta,lambda,metad.bias FILE=COLVAR_SUM_BIAS


After getting `COLVAR_SUM_BIAS`, where the last column is the average bias, I calculate the free energy difference using `analyze` as below. Here I truncate the first 50% of the simulation.

In [3]:
results = analyze(plumed.read_as_pandas('state_A/COLVAR_SUM_BIAS'), n_blocks=50, discard=0.5)
print(f'The free energy difference obtained from the 2D alchemical metadynamics starting from state A is {results[0]:.3f} +/- {results[1]:.3f}kT.')

The free energy difference obtained from the 2D alchemical metadynamics starting from state A is -2.518 +/- 0.043kT.


I repeat the same workflow for the other simulation as below.

In [6]:
hills = plumed.read_as_pandas('state_B/HILLS_2D')
hills_avg = time_average(hills, t0=0.8)
plumed.write_pandas(hills_avg, 'state_B/HILLS_2D_modified')

In [7]:
%%bash
source /home/wei-tse/Documents/Software/PLUMED/plumed2/sourceme.sh
cd state_B/
plumed driver --plumed plumed_sum_bias.dat --noatoms

PLUMED: PLUMED is starting
PLUMED: Version: 2.8.0-dev (git: 9991c4c14) compiled on Nov 12 2020 at 15:24:46
PLUMED: Please cite these papers when using PLUMED [1][2]
PLUMED: For further information see the PLUMED web page at http://www.plumed.org
PLUMED: Root: /home/wei-tse/Documents/Software/PLUMED/plumed2/
PLUMED: For installed feature, see /home/wei-tse/Documents/Software/PLUMED/plumed2//src/config/config.txt
PLUMED: Molecular dynamics engine: driver
PLUMED: Precision of reals: 8
PLUMED: Running over 1 node
PLUMED: Number of threads: 1
PLUMED: Cache line size: 512
PLUMED: Number of atoms: 0
PLUMED: File suffix: 
PLUMED: FILE: plumed_sum_bias.dat
PLUMED: Action READ
PLUMED:   with label theta
PLUMED:   with stride 1
PLUMED:   reading data from file COLVAR
PLUMED:   reading value theta and storing as theta
PLUMED: Action READ
PLUMED:   with label lambda
PLUMED:   with stride 1
PLUMED:   reading data from file COLVAR
PLUMED:   reading value lambda and storing as lambda
PLUMED: Action ME

In [8]:
results = analyze(plumed.read_as_pandas('state_B/COLVAR_SUM_BIAS'), n_blocks=50, discard=0.5)
print(f'The free energy difference obtained from the 2D alchemical metadynamics starting from state B is {results[0]:.3f} +/- {results[1]:.3f}kT.')

The free energy difference obtained from the 2D alchemical metadynamics starting from state B is -4.994 +/- 0.032kT.


As shown above, the free energy differences obtained from the two simulations are not consistent with each as expected. 

## 3. Attemps in troubleshooting the problem

To my understanding, the inconsistency in the free energy differences indicates that the system was sampling totally different confromational ensembles in the two simulations. Therefore, I've checked the distribution of all the bond lengths and angles for the two simulations below, which hopefully should include the most important degrees of freedom of the system. Note that the figures below are based on the data in `configuration.dat`, which was obtained by using the plumed driver with the following PLUMEd input file `plumed_configuration.dat`:

In [9]:
%%bash 
cat state_A/plumed_configuration.dat

d1: DISTANCE ATOMS=1,2
d2: DISTANCE ATOMS=2,3
d3: DISTANCE ATOMS=3,4

t1: ANGLE ATOMS=1,2,3
t2: ANGLE ATOMS=2,3,4

PRINT ARG=d1,d2,d3,t1,t2 STRIDE=1 FILE=configuration.dat



![image_1](analysis_results/bond_length_hist.png)

<img src=analysis_results/angle_hist.png width=650>

Also, I've compared the histograms of the two collective variables, which is the dihedral 1-2-3-4 and the alchemical variable.