Geometry optimizations in Python
================================

The *tblite* Python package allows to run extended tight binding (xTB) calculations directly in Python.
This tutorial demonstrates how to set up and run a geometry optimizations using GFN2-xTB.

Installing the package
----------------------

To start create a new Python environment using the mamba package manager.
We specify the packages we want to install in our environment file:

````yaml
name: xtb
channels:
- conda-forge
dependencies:
- ipykernel
- mdanalysis
- nglview
- pyberny
- tblite-python
````

Save the file as *environment.yml* and create the environment by running

````shell
mamba env create -n xtb -f environment.yml
mamba activate xtb
````

This will create a new environment called *xtb* and install all the necessary packages.
Make sure that *tblite* is available in your Python environment.
You can check this by opening a Python interpreter and importing the package

In [None]:
import tblite.interface as tb

tb.library.get_version()

First steps
---------------------------

In the geometry optimization one needs to compute the potential energy and the derivatives like the forces and have a procedure to move on the potential energy surface to find the minima. The xTB calculator can provide the energy and derivatives and can be combined with different geometry optimization procedure.
One example would be the [*pyberny*](https://github.com/jhrmnn/pyberny) package as a general geometry optimization procedure which we will apply it in this tutorial. 

In this tutorial we are using caffeine molecule as a representative input.

In [None]:
%%writefile caffeine.xyz
24

C            1.07317        0.04885       -0.07573
N            2.51365        0.01256       -0.07580
C            3.35199        1.09592       -0.07533
N            4.61898        0.73028       -0.07549
C            4.57907       -0.63144       -0.07531
C            3.30131       -1.10256       -0.07524
C            2.98068       -2.48687       -0.07377
O            1.82530       -2.90038       -0.07577
N            4.11440       -3.30433       -0.06936
C            5.45174       -2.85618       -0.07235
O            6.38934       -3.65965       -0.07232
N            5.66240       -1.47682       -0.07487
C            7.00947       -0.93648       -0.07524
C            3.92063       -4.74093       -0.06158
H            0.73398        1.08786       -0.07503
H            0.71239       -0.45698        0.82335
H            0.71240       -0.45580       -0.97549
H            2.99301        2.11762       -0.07478
H            7.76531       -1.72634       -0.07591
H            7.14864       -0.32182        0.81969
H            7.14802       -0.32076       -0.96953
H            2.86501       -5.02316       -0.05833
H            4.40233       -5.15920        0.82837
H            4.40017       -5.16929       -0.94780

In the full version of our optimization loop, we will have the optimizer make steps, compute the energy and forces with xTB. The results for each step will saved for further visualization and analysis.


Here, we start with looking to the optimizer and geometry setup. *Pyberny* optimizer can read the xyz file and *Berny* optimizer acts as an iterator to provide new geometry steps. However to use the geometry in the xTB calculator we need to convert it from the *pyberny* format to the *tblite* format. For xTB we need to separate the element symbols and the cartesian coordinates.

In [None]:
import numpy as np
from berny import Berny, geomlib, angstrom

optimizer = Berny(geomlib.readfile("caffeine.xyz"))
geom = next(optimizer)
elements = [symbol for symbol, _ in geom]
initial_coordinates = np.asarray([coordinate for _, coordinate in geom])

:::{note} 
Remember that coordinates in *tblite* might use a different unit than our optimizer, in this case *pyberny* uses Angstrom and *tblite* Bohr.
With the provided conversion factor we ensure that the coordinates are in the right unit.
While the energy unit Hartree is compatible for us, we need to account for the gradient unit, which is Hartree/Angstrom and convert the gradient accordingly.
:::

In [None]:
xtb = tb.Calculator("GFN2-xTB", tb.symbols_to_numbers(elements), initial_coordinates * angstrom)
results = xtb.singlepoint()

initial_energy = results["energy"]
initial_gradient = results["gradient"]

optimizer.send((initial_energy, initial_gradient / angstrom))

All the steps up to now can be run in a loop to retrieve the updated coordinates.
From there we can update our xTB calculator with the new positions, evaluate the energy and gradient to pass them back to the optimizer.

In [None]:
xtb.set("verbosity", 0)
trajectory = [(initial_energy, initial_coordinates)]
for geom in optimizer:
    coordinates = np.asarray([coordinate for _, coordinate in geom])
    xtb.update(positions=coordinates * angstrom)
    results = xtb.singlepoint(results)

    energy = results["energy"]
    gradient = results["gradient"]
    optimizer.send((energy, gradient / angstrom))

    trajectory.append((energy, coordinates))

This loop is completed by optimizer if the geometry is converged and the local minimum is reached.

In this process we can record the energy, gradient as well as coordinates to store them for MDAnalysis to visualize the optimization progress and geometry change.

To visualize our starting geometry we use MDAnalysis and NGLView.
Loading the geometries into an MDAnalysis universe can be done by creating a new empty universe.
The atom types can be added as *name* topology attribute and the coordinates can be read by providing the array we just retrieved from the optimizer.

In [None]:
import MDAnalysis as mda
import nglview as nv

uni = mda.Universe.empty(len(elements), trajectory=True)
uni.add_TopologyAttr("name", elements)
uni.load_new([coordinates for _, coordinates in trajectory])

nv.show_mdanalysis(uni, gui=True)

store the trajectory and visualize it!