# Modern Molecular NNs

We have seen two chapters about equivariances in {doc}`data` and {doc}`Equivariant`. We have seen one chapter on dealing with molecules as objects with permutation equivariance {doc}`gnn`. We will combine these ideas and create neural networks that can treat arbitrary molecules with point clouds and permutation equivariance. We already saw SchNet is able to do this by working with an invariant point cloud representation (distance to atoms), but modern networks mix in ideas from {doc}`Equivariant` along with graph neural networks (GNN). This is a highly-active research area, especially for predicting energies, forces, and relaxed structures of molecules.

```{admonition} Audience & Objectives
This chapter assumes you have read {doc}`data`, {doc}`Equivariant`, and {doc}`gnn`. You should be able to

  * Categorize a task (features/labels) by equivariance  
  * Understand body-ordered expansions
  * Differentiate models based on their message passing, message type, and body-ordering  
```

```{warning}
This chapter is in progress
```

In [None]:
# This cell is for making plots, not part of examples
import rdkit, rdkit.Chem, rdkit.Chem.rdDepictor, rdkit.Chem.Draw
from myst_nb import glue
import networkx as nx
import dmol

# I hate to do this manually, but I cannot get the
# damn molecular fonts to be big enough
import skunk
import matplotlib.pyplot as plt


def _mol2svg(m, size):
    d = rdkit.Chem.Draw.rdMolDraw2D.MolDraw2DSVG(*size)
    d.DrawMolecule(m)
    d.FinishDrawing()
    return d.GetDrawingText()


m1 = rdkit.Chem.MolFromSmiles("C1CCC2CCCCC2C1")
m2 = rdkit.Chem.MolFromSmiles("C1CCC(C1)C2CCCC2")
s1 = _mol2svg(m1, (200, 200))
s2 = _mol2svg(m2, (200, 200))
_, axs = plt.subplots(1, 2, squeeze=True)
axs[0].set_title("decaline")
axs[1].set_title("bicylopentyl")
axs[0].axis("off")
axs[1].axis("off")
skunk.connect(axs[0], "m1")
skunk.connect(axs[1], "m2")
svg = skunk.insert({"m1": s1, "m2": s2})
with open("lwtest.svg", "w") as f:
    f.write(svg)

# Expressiveness

The Equivariant SO(3) ideas from {doc}`Equivariant` will not work on variable sized molecules because the layers are not permutation equivariant. We also know that graph neural networks (GNNs) have permutation equivariance and, with the correct choice of edge features, rotation and translation invariance. So why go beyond GNNs?

One reason is that the standard GNNs cannot distinguish certain types of graphs relevant for chemistry is they cannot distinguish molecules like decaline and bicylopentyl, which indeed have different properties. Look at the {numref}`decaline-bicylopentyl` below and think about the degree and neighbors of the atoms near the mixing of the rings -- you'll see if you try to use message passing the two molecules are identical. This is known as the Wesifeiler-Lehman Test {cite}`weisfeiler1968reduction`.


```{figure} lwtest.svg
---
alt: "decaline and bicyclopentyl structures drawn side-by-side, which visually are different."
name: "decaline-bicylopentyl"
---
Comparison of decaline and bicylopentyl, which have identical output in most GNNs despite being different molecules.
```

These can be distinguished if we also have (and use) their Cartesian coordinates. We cannot distinguish enantiomers with GNNs, except maybe with pre-computed node attributes. Even those start to breakdown when we have helical chirality that is not centered at any one molecule.

These are arguments for using Cartesian coordinates in addition to a GNN, but why use equivariant neural networks? Most molnet research is for **neural potentials**. These are neural networks that predict energy and forces given atom positions and elements. We know that the force on each atom is given by

\begin{equation}
F\left(\vec{r}\right) = -\nabla U\left(\vec{r}\right)
\end{equation}

where $U\left(\vec{x}\right)$ is the rotation invariant potential given all atom positions $\vec{r}$. So if we're predicting a translation, rotation, and permutation invariant potential, why use equivariance? Performance. Models like SchNet or ANI are invariant and are not as accurate as models like NequiP or TorchMD-NET that have equivariances in their internal layers.

## The Elements of Modern Molecular NNs

There has been a flurry of ideas about molents in the last few years, especially with the advances in equivariant neural network layers. Batatia et al.{cite}`batatia2022design` have proposed a categorization of the main elements of molnets (which they call E(3)-equivariant NNs) that I will adopt here. They categorize the decisions to be made into three parts of the architecture: the atomic cluster expansions (ACE), the body-order of the messages, and the architecture of the message passing neural network (MPNN). This categorization might also be viewed within the GNN theory as node features (ACE), message creation and aggregation (body-order), and node update (MPNN details). See {doc}`gnn` for more details on MPNNs.

This is a relatively new categorization and certainly is not necessary to use. Most papers do not use this categorization and it takes some effort to put models into it. The benefit of thinking about models with this abstractions is it helps us differentiate between the very large number of models now being pursued in the literature. There is also a bit of chaos in teasing out what *differentiates* the best models from others. For example, NequIP

### Atom features

Let's start with the general terminology for an atom. Of course, at input to these networks an atom is just a Cartesian coordinate $\vec{r}_i$ and the element $z_i$. Within the message passing framework though, atoms are nodes and their feature vectors need to be organized a bit differently than usual GNNs. Namely, some of the features of an atom need to be treated in a special way to maintain equivariance and some of the features are like scalars and we can ignore the equivariance. One way to organize these is. 
 

### Atomic Cluster Expansions

An ACE is a per-atom tensor. The main idea of ACE is to encode the local environment of an atom into a feature tensor that describes its neighborhood of nearby atoms. This is like distinguishing between an oxygen in an alcohol group vs an oxygen in an ether. Both are oxygens, but we expect them to behave differently. ACE is the same idea, but for nearby atoms in space instead of just on the molecular graph.

The general equation for ACE (assuming O(3) equivariance) is [cite]:

\begin{equation}
A^{(t)}_{i, kl_3m_3} = \sum_{l_1m_1,l2_m2}C_{l1m_1,l_2m_2}^{l_3,m_3}\sum_{j \in \mathcal{N}(i)} R^{(t)}_{kl_1l_2l_3}\left(r_{ji}\right)Y_{l1}^{m_1}\left(\hat{\mathbf{r}}_{ji}\right)\mathcal{W}^{(t)}_{kl_2}h_{j,l_2m_2}^{(t)}
\end{equation}

Wow! What an expression. Let's go through this carefully, starting with the output. $A^{(t)}_{i, kl_3m_3}$ are the feature tensor values for atom $i$ at layer $t$. There are channels indexed by $k$ and the spherical harmonic indexes $l_3m_3$. The right-hand side is nearly identical to the G-equivariant neural network layer equation from {doc}`Equivariant`. We have the input 

How is this different than a MPNN

## Normalization

* Physics-based energy/force normalization
* Pooling
* Layers

## Running This Notebook


Click the &nbsp;<i aria-label="Launch interactive content" class="fas fa-rocket"></i>&nbsp; above to launch this page as an interactive Google Colab. See details below on installing packages.

````{tip} My title
:class: dropdown
To install packages, execute this code in a new cell. 

```
!pip install dmol-book
```

If you find install problems, you can get the latest working versions of packages used in [this book here](https://github.com/whitead/dmol-book/blob/master/package/setup.py)

````

## Cited References

```{bibliography}
:style: unsrtalpha
:filter: docname in docnames
```