# mlcvs: ML Collective Variables

Build data-driven collective variables for enhanced sampling simulations, using the python package `mlcvs`.
- [Documentation](https://mlcvs.readthedocs.io/en/latest/index.html)
- [Github](https://github.com/luigibonati/mlcvs)

The collective variables (CVs) [are constructed](https://mlcvs.readthedocs.io/en/latest/api.html#) by *combining* a **model** and an **estimator**.
- The **model** can be chosen to be **a linear combination** of descriptors or **a non linear transformation** operated by a neural-network.
- The **estimators** implemented are Fisher’s discriminant (LDA) and Time-lagged independent component (TICA). While the former allows to devise CVs as the variables which most discriminate between a given set of states, the latter is used to extract CVs as the slowly decorrelating modes of a sampling dynamics.

These combinations give rise to the different CVs which have been proposed in the literature: 
- (H)LDA 
- TICA 
- DeepLDA
- DeepTICA

## LDA vs DeepLDA

### Harmonic linear discriminant analysis [(HLDA)](https://pubs.acs.org/doi/10.1021/acs.jpclett.8b00733)

__Computation:__
- Define a small **set of descriptors** capable of discriminating between the states. (computed from short unbiased runs in each basin)
- Using a variant of the **classification method** that goes under the name of linear discriminant analysis (LDA) to compute CVs that are linear combinations of the input descriptors.
  
```{note}
LDA is a classification method, using the linear combination of the _input features_ to separates the data into given classes. LDA may be commonly known as Fisher’s linear discriminant.
```

__HLDA limitations:__
- The states must be linearly separated in the descriptors space --> requires knowledge of the system and physical intuition.
- This intuition reflects more our prejudice than the actual system behavior, possibly preventing the exploration of some of the relevant transition pathways. --> [DeepLDA](https://pubs.acs.org/doi/full/10.1021/acs.jpclett.0c00535) is purposed to lift these limitations

### DeepLDA

DeepLDA employs a NN to perform a nonlinear transformation of the descriptors, before applying the LDA method
- feed a number of descriptors to the NN --> reduces the dimensionality of the data
- perform LDA on the last layer

Or from the usage perspective
- Perform short unbiased MD simulations in the metastable states and compute the descriptors
- Construct a CV by training a NN with LDA as the objective function (loss function, cost function)
  
<img src="https://pubs.acs.org/cms/10.1021/acs.jpclett.0c00535/asset/images/medium/jz0c00535_0001.gif" width=500 />


## Install packages

```{note}
PyTorch on Windows only supports Python 3.7-3.9
```

In [None]:
conda create -n py39mlcvs python=3.9
conda activate py39mlcvs

conda install -c conda-forge -y jupyter numpy matplotlib pandas thatool

## pytorch
# conda install -y -c pytorch -c nvidiapytorch pytorch-cuda=11.7 
conda install -y -c pytorch pytorch cpuonly
## mlcvs
pip install git+https://github.com/luigibonati/mlcvs@main