# Problem 2: Simulation parameters for the CB8-G3 binding complex and reweighting for the wall potential

To examine the efficacy of alchemical metadynamics, I ran a 2D alchemical metadynamics on the CB8-G3 host-guest binding complex, where the configurational CV was the number of water molecules in the binding cavity. After a lot of attempts, below is the content of the PLUMED input file I adopted to ensure flat sampling in both the alchemical and configurational space.

In [1]:
%%bash
cat plumed.dat

center: CENTER ATOMS=1-144            # geometric center of the host molecule
water_group: GROUP ATOMS=207-6656:3   # oxygen atom of the water molecules
n: COORDINATION GROUPA=center GROUPB=water_group R_0=0.35  # radius: 0.6 nm
lambda: EXTRACV NAME=lambda

METAD ...
ARG=lambda,n
SIGMA=0.01,0.05    # small SIGMA ensure that the Gaussian approaximate a delta function
HEIGHT=12.394781044629076
PACE=10
GRID_MIN=0,0     # index of alchemical states starts from 0
GRID_MAX=39,20     # we have 40 states in total
GRID_BIN=39,100
TEMP=298
BIASFACTOR=150
LABEL=metad
FILE=HILLS_2D
... METAD

PRINT STRIDE=10 ARG=n,lambda,metad.bias FILE=COLVAR


I am aware that I'm using peculiar values for a lot of parameters and below is the corresponding explanation for each of them:
- **Large height**: I expected the free energy barrier of removing the water molecules from the binding cavity to be pretty large, so I set the initial Gaussian height as 5 kT (around 12.394 kJ/mol). 
- **Large bias factor**: Before deciding the value of the bias factor, I conducted some preliminary tests, which are metadynamics simulations with different bias factors only biasing the number of water molecules. As a result, due to the high free energy barrier, a bias factor between 60 to 150 would lead to a flatter distribution in the CV space. With the bias factor being 60, the sampling in the alchemical space was slow, so I decided to use 150, which did lead to a relatively flat distribution in both the alchemical space and the configurational CV space (the number of water molecules). In addition, from the 1D metadynamics simulations, it could be estimated that the largest free energy barrier in the configurational CV space was about 170 kT, so I thought a bias factor around 150 would probably reasonable. (In the paper [Well-Tempered Metadynamics: A Smoothly Converging and Tunable Free-Energy Method](https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.100.020603), it was experimentally found that the error tends to be the lowest given that the bias factor was set such that $k(T+\Delta T)$ is of the order of magnitude of the barrier height.)
- **Small pace**: Given a large bias factor, the Gaussian height would decrease pretty slowly in the simulation. To accelerate the sampling, instead of using a large pace (e.g. 500 simulation steps), I used PACE=10.

As a result, at the end of the simulation, the system was able to sample the alchemical and configurational space roughly evenly for most places. 

<img src=CV_hist.png width=800>

Up to this point, I have my **first question**: Is this strategy for deciding the metadynamics parameters appropriate? I thought it would be reasonable since it reaches the goal of getting a relatively flat distribution, but I still wanted to make sure the parameters make sense.

The distribution of the number of water molecules shown above, however, still shows a potential problem in the sampling in the configurational space. Specifically, given the small radius of the binding cavity (0.35 nm), having over 12 water molecules in the cavity is likely unphysical. In the figure, it can be seen that the system spent a fair amount of time sampling these unphysical states, which could potentially slow down the convergence of the free energy calculations. Additionally, the simulation crashed due to the following PLUMED error after about 90 ns:
```
PLUMED:
PLUMED:
PLUMED: ################################################################################
PLUMED:
PLUMED:
PLUMED: +++ PLUMED error
PLUMED: +++ at Grid.cpp:170, function PLMD::GridBase::index_t PLMD::GridBase::getIndex(const std::vector<unsigned int>&) const
PLUMED: +++ message follows +++
PLUMED: ERROR: the system is looking for a value outside the grid along the 1 (n) index!
PLUMED:
PLUMED: ################################################################################
PLUMED
```

The upper bound for the configurational CV was set as 20 and it seems that the system was trying to sample the configuration with more than 20 water molecules inside the binding cavity. I've also noticed that the Gaussian height is high in the unphysical region (as shown in the figures below). Although this is natural since the region is less sampled, I'm wondering if I can regard the bias in the end of the simulation to be quasi-stationary, as it seems that the Gaussian height for the physical regions are not changing a lot in the end. My understanding is that we could state the the bias is stationary. The large Gaussian height in the unphysical region would just cause large errors, but it is fine since we are not interested in unphysical configurations anyway. (Note: The `COLVAR` and `HILLS` files can be downloaded via [this link](https://drive.google.com/drive/folders/19mCLDtWa1L9jtyh13_DHYnLnhJqpXaN8?usp=sharing).)

<img src=time_series.png width=1000>

To prevent the system from sampling unphysical configurations or having this kind of error, I'm considering modifying the PLUMED input file as below to add a wall potential.

```
center: CENTER ATOMS=1-144            # geometric center of the host molecule
water_group: GROUP ATOMS=207-6656:3   # oxygen atom of the water molecules
n: COORDINATION GROUPA=center GROUPB=water_group R_0=0.35  # radius: 0.6 nm
lambda: EXTRACV NAME=lambda

METAD ...
ARG=lambda,n
SIGMA=0.01,0.05    # small SIGMA ensure that the Gaussian approaximate a delta function
HEIGHT=12.394781044629076
PACE=10
GRID_MIN=0,0     # index of alchemical states starts from 0
GRID_MAX=39,20     # we have 40 states in total
GRID_BIN=39,100
TEMP=298
BIASFACTOR=150
LABEL=metad
FILE=HILLS_2D
CALC_RCT
... METAD

UPPER_WALLS ...
 ARG=n
 AT=12
 KAPPA=200
 EXP=2
 EPS=1
 OFFSET=0
 LABEL=uwall
... UPPER_WALLS

PRINT STRIDE=10 ARG=* FILE=COLVAR
```

As a prelminary test, previously I've tried running a 20 ns standard MD with the system where the ligand is absent. As a result, the maximum number of water molecules was around 10, so I set `AT=12` in the section of `UPPER_WALLS`. Originally, I set `KAPPA` as 2000, but the simulation crashed with a LINC error after around 5 ns. I therefore changed the value of `KAPPA` to 200 and set `EPS` as 10. As such, if 15 water molecules are forced into the binding cavity, the energy penalty would be 
$$U_{\text{wall}} = 200 \left( \frac{15-12}{10}\right)^2=18 \; \text{kJ}/\text{mol}$$

However, since I've just started the simulation, I'm not entirely sure if this energy penalty would be too large or too small such that another error would be caused. I'm wondering if you have general suggestions about deciding the parameters for a wall potential. 

As a wall potential is applied in alchemical metadynamics, I was also wondering about the best practices of reweighting the data. In [this tutorial](https://www.plumed.org/doc-v2.6/user-doc/html/ves-lugano2017-metad.html), actions or keywords such as `HISTOGRAM` and `LOGWEIGHT` were used. However, this is pretty different from the protocol I adopted in the notebook `Problem_1.ipynb` in the same repo. Specifically, previously what I did was use `RESTART=YES` to sum up the metadynamics bias, but I'm not entirely sure if the same protocol can be applied if there is a fixed potential. I'm wondering if you could offer some guidance about how I should modify the protocol I previously used to consider the wall potential when calculating free energy differences and their uncertainties. 

To further restrict the sampling region to accelerate the convergence, I have another question. Setting a wall potential as above would prevent the system from exploring any region with `n` (the number of water molecules) over 12, regardless of the $\lambda$ values. Since 12 water molecules being in the binding cavity is not physical with any $\lambda$ values, prohibiting exploration of such regions with a wall potential meets my needs. However, I'm wondering if there is a way to prohibit the exploration of regions such as $\lambda=40$ (the uncoupled) and $n=0$ at the same time, which is also likely to be unphysical. Using a simple wall potential does not seem to be able to deal with this, since setting a wall potential at $n=0$ is going to prohibit some physical configurations as well (like $\lambda=0$ and $n=0$).

To summarize, my main questions proposed in this notebook are as follows:
- Are the strategies I used for deciding metadynamics parameters appropriate?
- Can the bias be regarded as quasi-stationary if the Gaussian height in the unphysical region is still very high? 
- What are the general suggestions you would give about deciding the parameters for a wall potential? How large should the energy penalty typically be?
- What should I modify in my protocol for free energy calculations to consider the reweighting of the wall potential?
- Is there a way to prohibit the exploration of regions such as $\lambda=40$ and $n=0$?

I'm sorry to have this many questions, but thank you so much for reading this far! You inputs have been really helpful in this project.