New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-trained model #19
Comments
on some molecules of MD17 for instance
…On Mon, May 17, 2021 at 5:25 PM Raimondas Galvelis ***@***.***> wrote:
We are writing a paper about NNP/MM in ACEMD. So far, we have used ANI-2x
for protein-ligand simulations, but to demonstrate a general utility, it
would be good to include one more NNP.
Would it be possible to have a pre-trained TorchMD-NET model?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#19>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOX7XRSRCVN33J2OQYDTOEYO7ANCNFSM45AV4DOQ>
.
|
So you need a checkpoint file for a graph network trained e.g. on aspirin from the MD17 dataset? |
yes
…On Tue, May 18, 2021 at 2:36 PM Philipp Thölke ***@***.***> wrote:
So you need a checkpoint file for a graph network trained e.g. on aspirin
from the MD17 dataset?
Would it work for you if the model is trained with the next version of the
code, e.e. when #20
<#20> is merged? This
change adds some features that improve the performance on MD17.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#19 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOVGFGNUGCBCJKL4PSDTOJNLTANCNFSM45AV4DOQ>
.
|
I have added a graph network checkpoint pretrained on aspirin from the MD17 dataset. It used 950 samples for training, 50 for validation and the remaining samples for testing, which is the standard benchmark for this dataset. I used energies and forces for training, the exact hyperparameters can be found in the hparams.yaml file. You can find the model checkpoint, hyperparameters and splits here. I also included the metrics.csv, which contains losses and learning rate for each epoch during training. The model checkpoint comes from epoch 1269 and reached an MAE of 0.224 for the energy and 0.630 for the forces on the test set. The model was trained on version 0.1.0. |
The pre-trained model cannot be loaded: import torch
from torchmdnet.models import load_model
model = load_model('examples/pretrained/md17-aspirin-graph-network/epoch=1269-val_loss=0.8859-test_loss=0.5893.ckpt')
|
It should be possible to load it using version 0.1.x under which it was trained. Since then I didn't update the model but I can do that now. I'll have to retrain it using the current version which will take half a day roughly. |
Thanks! It will be the most useful, if I can test the simulations with the latest version. |
Nevermind, I still had a recent model checkpoint from the most recent version, I just pushed it. |
Thanks! Now it works. |
I tried to run MD simulations with the NNP model:
In both cases, the simulations "explode" with 1 ps. I tried to reduce the timestep to 0.5 fs, but it doesn't help. The same simulations with ANI-2x run without problems. |
@PhilippThoelke would it be possible to run the system with TorchMD to verify the problem? |
Yes that is possible. You can use |
@stefdoerr could you help to step up the simulations? |
Don't you already have the input files since you ran them? |
@stefdoerr I do have input files (PDB and PRMTOP), but I haven't used TorchMD. |
https://github.com/torchmd/torchmd/blob/master/examples/tutorial.ipynb |
Where do I need to add |
You have to enter that as the external:
module: torchmdnet.calculators.External
embeddings: [ 1, 1, 6, 6, ...]
file: path/to/checkpoint |
I tried a simulation of aspirin with TorchMD: coordinates: aspirin.pdb
cutoff: null
device: cuda
extended_system: null
external:
embeddings:
- 8
- 8
- 8
- 8
- 6
- 6
- 6
- 6
- 6
- 6
- 6
- 6
- 6
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
file: ../torchmd-net.git/examples/pretrained/md17-aspirin-graph-network/epoch=1359-val_loss=0.5227-test_loss=0.4333.ckpt
module: torchmdnet.calculators
forcefield: aspirin.prmtop
forceterms: null
langevin_gamma: 0.1
langevin_temperature: 300
log_dir: ./
minimize: 100
output: output
output_period: 1
precision: single
replicas: 1
rfa: false
save_period: 1
seed: 1
steps: 100
structure: null
switch_dist: null
temperature: 300
timestep: 1
topology: aspirin.prmtop All the input files: aspirin_torchmd.zip The simulation is unstable: the temperature resize uncontrollably from the first steps.
|
I observe the same with ACEMD. So, probably the problem is the pre-trained model or some bug in TorchMD-Net. |
Visualised the trajectory, the molecule just explodes literally. |
@PhilippThoelke would it be possible to train a less "explosive" model? |
I tried some small simulations myself and while visualizing I found that the simulation step before the "explosion" usually has two hydrogens that are very close to each other. In one of the runs I checked the distance, which turned out to be 0.11A. I then compared this to the minimum distance two atoms ever are in the MD17 aspirin dataset, which is 0.89A. The dataset I trained the model on might just be not very good for simulation. I can start training a model on the ANI dataset, which probably makes it easier to compare to the ANI model as well. This will however take some time as the ANI dataset is much larger. |
@PhilippThoelke yes, I think that training with the ANI data is the easiest solution. Anyway, I don't need anything very accurate, just good enough that simulation stays stable and looks physical. |
I just merged a couple of changes into main, including two new model checkpoints from the ANI1 dataset. One is from a Transformer model and the other one is an equivariant Transformer checkpoint. The equivariant model currently only works with TorchScript on the PyTorch Geometric main branch as they had a bug that was only recently fixed, however, it has a lower loss than the Transformer checkpoint. So for testing I recommend using the Transformer checkpoint instead of the equivariant one so you don't have to install PyTorch Geometric from GitHub. I tested simulating with both models using TorchMD and both are capable of simulating aspirin without "explosions". Since the ANI1 dataset only includes energies and not forces, the model checkpoint has set the |
I have tried to run aspirin with
|
I just pushed the most recent checkpoints from training on ANI1, which at least have better loss than the ones you tested. The models are also still training and haven't converged yet. It might also make sense to try the equivariant Transformer as it has better loss. There hasn't been a new torch-geometric release yet so the TorchScript fix is still only on their main branch. Do you have any ideas why it might explode? What is the difference between the ACEMD MD simulation and ACEMD NNP/MD simulation? What do you mean by OpenMM-Torch/PyTorch-Geometric incompatibility, how did you write the interface? |
Thanks @PhilippThoelke, I'll try with the new model.
MD is just a molecule in vacuum. NNP/MM adds solvent at MM level.
Current PyTorch Geometric packages are not compatible with |
I have managed to run the latest checkpoint of
Meanwhile
|
For some reason that issues does not manifest outside of ACEMD. |
it seems that you are loading a different model, weights and model are
different.
…On Tue, Sep 28, 2021 at 5:19 PM Raimondas Galvelis ***@***.***> wrote:
I have managed to run the latest checkpoint of ANI1-transformer with
ACEMD on GPU.
- The simulations of aspirin is stable after ~0.1 ns and keeps running
- Speed ~10 ns/day on GTX 1080 Ti
Meanwhile ANI1-equivariant_transformer fails with the following error:
The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The size of tensor a (128) must match the size of tensor b (384) at non-singleton dimension 2
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#19 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOS3KE6WLMP754FXQ73UEHMJHANCNFSM45AV4DOQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
At what point does the error occur? During loading or when running inference? Could you maybe share the code snippet where the error occured? |
We are writing a paper about NNP/MM in ACEMD. So far, we have used ANI-2x for protein-ligand simulations, but to demonstrate a general utility, it would be good to include one more NNP.
Would it be possible to have a pre-trained TorchMD-NET model?
The text was updated successfully, but these errors were encountered: