Inconsistency between gradients and forces #116

raimis · 2022-08-25T12:46:24Z

When training with forces, the model computes the negative gradient of the energy with respect to the positions (a.k.a. forces):

torchmd-net/torchmdnet/models/model.py

Lines 196 to 207 in db72e12

    
           if self.derivative: 
        
               grad_outputs: List[Optional[torch.Tensor]] = [torch.ones_like(out)] 
        
               dy = grad( 
        
                   [out], 
        
                   [pos], 
        
                   grad_outputs=grad_outputs, 
        
                   create_graph=True, 
        
                   retain_graph=True, 
        
               )[0] 
        
               if dy is None: 
        
                   raise RuntimeError("Autograd returned None for the force prediction.") 
        
               return out, -dy

Some the loaders load forces:

torchmd-net/torchmdnet/datasets/ani.py

Lines 279 to 281 in db72e12

    
           all_dy = pt.tensor( 
        
               mol["wb97x_dz.forces"][:] * self.HARTREE_TO_EV, dtype=pt.float32 
        
           )

torchmd-net/torchmdnet/datasets/comp6.py

Lines 100 to 102 in db72e12

    
           all_dy = pt.tensor( 
        
               mol["forces"][:] * self.HARTREE_TO_EV, dtype=pt.float32 
        
           )

While the other ones load gradients (the opposite sign):

torchmd-net/torchmdnet/datasets/qm9q.py

Lines 147 to 151 in db72e12

    
           dy = ( 
        
               pt.tensor(mol["gradient_vector"][conf], dtype=pt.float32) 
        
               * self.HARTREE_TO_EV 
        
               / self.BORH_TO_ANGSTROM 
        
           )

torchmd-net/torchmdnet/datasets/spice.py

Lines 99 to 103 in db72e12

    
           all_dy = ( 
        
               pt.tensor(mol["dft_total_gradient"], dtype=pt.float32) 
        
               * self.HARTREE_TO_EV 
        
               / self.BORH_TO_ANGSTROM 
        
           )

We need to agree what dy represents.

The text was updated successfully, but these errors were encountered:

stefdoerr · 2022-08-25T12:48:33Z

IMO dy should be the derivative of y i.e. the energy. Meaning the negative of the force. We should not store forces in dy

raimis · 2022-08-25T12:51:47Z

Ping: @PhilippThoelke @giadefa

raimis · 2022-08-26T08:26:00Z

Also checking the loss function code and comments, it seems dy is interpreted as gradients (not forces).

torchmd-net/torchmdnet/module.py

Lines 71 to 132 in db72e12

    
           def step(self, batch, loss_fn, stage): 
        
               with torch.set_grad_enabled(stage == "train" or self.hparams.derivative): 
        
                   # TODO: the model doesn't necessarily need to return a derivative once 
        
                   # Union typing works under TorchScript (https://github.com/pytorch/pytorch/pull/53180) 
        
                   pred, deriv = self( 
        
                       batch.z, 
        
                       batch.pos, 
        
                       batch=batch.batch, 
        
                       q=batch.q if self.hparams.charge else None, 
        
                       s=batch.s if self.hparams.spin else None, 
        
                   ) 
        
               loss_y, loss_dy = 0, 0 
        
               if self.hparams.derivative: 
        
                   if "y" not in batch: 
        
                       # "use" both outputs of the model's forward function but discard the first 
        
                       # to only use the derivative and avoid 'Expected to have finished reduction 
        
                       # in the prior iteration before starting a new one.', which otherwise get's 
        
                       # thrown because of setting 'find_unused_parameters=False' in the DDPPlugin 
        
                       deriv = deriv + pred.sum() * 0 
        
                   # force/derivative loss 
        
                   loss_dy = loss_fn(deriv, batch.dy) 
        
                   if stage in ["train", "val"] and self.hparams.ema_alpha_dy < 1: 
        
                       if self.ema[stage + "_dy"] is None: 
        
                           self.ema[stage + "_dy"] = loss_dy.detach() 
        
                       # apply exponential smoothing over batches to dy 
        
                       loss_dy = ( 
        
                           self.hparams.ema_alpha_dy * loss_dy 
        
                           + (1 - self.hparams.ema_alpha_dy) * self.ema[stage + "_dy"] 
        
                       ) 
        
                       self.ema[stage + "_dy"] = loss_dy.detach() 
        
                   if self.hparams.force_weight > 0: 
        
                       self.losses[stage + "_dy"].append(loss_dy.detach()) 
        
               if "y" in batch: 
        
                   if batch.y.ndim == 1: 
        
                       batch.y = batch.y.unsqueeze(1) 
        
                   # energy/prediction loss 
        
                   loss_y = loss_fn(pred, batch.y) 
        
                   if stage in ["train", "val"] and self.hparams.ema_alpha_y < 1: 
        
                       if self.ema[stage + "_y"] is None: 
        
                           self.ema[stage + "_y"] = loss_y.detach() 
        
                       # apply exponential smoothing over batches to y 
        
                       loss_y = ( 
        
                           self.hparams.ema_alpha_y * loss_y 
        
                           + (1 - self.hparams.ema_alpha_y) * self.ema[stage + "_y"] 
        
                       ) 
        
                       self.ema[stage + "_y"] = loss_y.detach() 
        
                   if self.hparams.energy_weight > 0: 
        
                       self.losses[stage + "_y"].append(loss_y.detach()) 
        
               # total loss 
        
               loss = loss_y * self.hparams.energy_weight + loss_dy * self.hparams.force_weight 
        
               self.losses[stage].append(loss.detach()) 
        
               return loss

PhilippThoelke · 2022-08-26T14:41:32Z

The model outputs force predictions.
https://github.com/torchmd/torchmd-net/blob/main/torchmdnet/models/model.py#L207

I agree that the naming is inconsistent but as far as I remember, when I implemented QM9, MD17 and ANI-1, they all loaded forces instead of the derivative. The model was consistent with that.

I'm not sure about the more recently added dataset loaders.

peastman · 2022-08-26T15:22:19Z

The HDF5 loader returns forces, because that's what the model expects. I think it took me a while to figure that out, and I was certainly surprised when I realized dy actually meant the negative gradient! Renaming y and dy to energy and forces would make it a lot clearer.

giadefa · 2022-08-26T15:39:11Z

@peastman is SPICE returning forces or gradients?

        all_dy = (
            pt.tensor(mol["dft_total_gradient"], dtype=pt.float32)
            * self.HARTREE_TO_EV
            / self.BORH_TO_ANGSTROM
        )

peastman · 2022-08-26T15:52:01Z

I'm not sure. Raimondas wrote that class.

giadefa · 2022-08-26T15:53:13Z

so you are not using that loader? Which one are you using?

…

On Fri, Aug 26, 2022 at 5:52 PM Peter Eastman ***@***.***> wrote: I'm not sure. Raimondas wrote that class. — Reply to this email directly, view it on GitHub <#116 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUORH7ALKWWRKIMPOIOTV3DR2ZANCNFSM57S7D4XA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

peastman · 2022-08-26T15:54:08Z

HDF5.

giadefa · 2022-10-11T08:55:11Z

The datasets measure forces and I would expect that they store forces

…

On Thu, Aug 25, 2022 at 2:51 PM Raimondas Galvelis ***@***.***> wrote: Ping: @PhilippThoelke <https://github.com/PhilippThoelke> @giadefa <https://github.com/giadefa> — Reply to this email directly, view it on GitHub <#116 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB3KUOR2S7BHHATFIHDKKWDV25T63ANCNFSM57S7D4XA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

raimis added the bug Something isn't working label Aug 25, 2022

This was referenced Aug 26, 2022

Uniform the usages of gradients #117

Closed

Fix the usage of gradients #118

Closed

Fix the gradient sign for QM9q and SPICE #119

Merged

PhilippThoelke mentioned this issue Aug 30, 2022

Consistent naming between dy and forces #121

Merged

PhilippThoelke linked a pull request Oct 10, 2022 that will close this issue

Consistent naming between dy and forces #121

Merged

PhilippThoelke closed this as completed in #121 Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency between gradients and forces #116

Inconsistency between gradients and forces #116

raimis commented Aug 25, 2022 •

edited

stefdoerr commented Aug 25, 2022

raimis commented Aug 25, 2022

raimis commented Aug 26, 2022

PhilippThoelke commented Aug 26, 2022

peastman commented Aug 26, 2022

giadefa commented Aug 26, 2022

peastman commented Aug 26, 2022

giadefa commented Aug 26, 2022 via email

peastman commented Aug 26, 2022

giadefa commented Oct 11, 2022 via email

Inconsistency between gradients and forces #116

Inconsistency between gradients and forces #116

Comments

raimis commented Aug 25, 2022 • edited

stefdoerr commented Aug 25, 2022

raimis commented Aug 25, 2022

raimis commented Aug 26, 2022

PhilippThoelke commented Aug 26, 2022

peastman commented Aug 26, 2022

giadefa commented Aug 26, 2022

peastman commented Aug 26, 2022

giadefa commented Aug 26, 2022 via email

peastman commented Aug 26, 2022

giadefa commented Oct 11, 2022 via email

raimis commented Aug 25, 2022 •

edited