Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between gradients and forces #116

Closed
raimis opened this issue Aug 25, 2022 · 10 comments · Fixed by #121
Closed

Inconsistency between gradients and forces #116

raimis opened this issue Aug 25, 2022 · 10 comments · Fixed by #121
Labels
bug Something isn't working

Comments

@raimis
Copy link
Collaborator

raimis commented Aug 25, 2022

When training with forces, the model computes the negative gradient of the energy with respect to the positions (a.k.a. forces):

if self.derivative:
grad_outputs: List[Optional[torch.Tensor]] = [torch.ones_like(out)]
dy = grad(
[out],
[pos],
grad_outputs=grad_outputs,
create_graph=True,
retain_graph=True,
)[0]
if dy is None:
raise RuntimeError("Autograd returned None for the force prediction.")
return out, -dy

Some the loaders load forces:

all_dy = pt.tensor(
mol["wb97x_dz.forces"][:] * self.HARTREE_TO_EV, dtype=pt.float32
)

all_dy = pt.tensor(
mol["forces"][:] * self.HARTREE_TO_EV, dtype=pt.float32
)

While the other ones load gradients (the opposite sign):

dy = (
pt.tensor(mol["gradient_vector"][conf], dtype=pt.float32)
* self.HARTREE_TO_EV
/ self.BORH_TO_ANGSTROM
)

all_dy = (
pt.tensor(mol["dft_total_gradient"], dtype=pt.float32)
* self.HARTREE_TO_EV
/ self.BORH_TO_ANGSTROM
)

We need to agree what dy represents.

@raimis raimis added the bug Something isn't working label Aug 25, 2022
@stefdoerr
Copy link
Collaborator

IMO dy should be the derivative of y i.e. the energy. Meaning the negative of the force. We should not store forces in dy

@raimis
Copy link
Collaborator Author

raimis commented Aug 25, 2022

Ping: @PhilippThoelke @giadefa

@raimis
Copy link
Collaborator Author

raimis commented Aug 26, 2022

Also checking the loss function code and comments, it seems dy is interpreted as gradients (not forces).

def step(self, batch, loss_fn, stage):
with torch.set_grad_enabled(stage == "train" or self.hparams.derivative):
# TODO: the model doesn't necessarily need to return a derivative once
# Union typing works under TorchScript (https://github.com/pytorch/pytorch/pull/53180)
pred, deriv = self(
batch.z,
batch.pos,
batch=batch.batch,
q=batch.q if self.hparams.charge else None,
s=batch.s if self.hparams.spin else None,
)
loss_y, loss_dy = 0, 0
if self.hparams.derivative:
if "y" not in batch:
# "use" both outputs of the model's forward function but discard the first
# to only use the derivative and avoid 'Expected to have finished reduction
# in the prior iteration before starting a new one.', which otherwise get's
# thrown because of setting 'find_unused_parameters=False' in the DDPPlugin
deriv = deriv + pred.sum() * 0
# force/derivative loss
loss_dy = loss_fn(deriv, batch.dy)
if stage in ["train", "val"] and self.hparams.ema_alpha_dy < 1:
if self.ema[stage + "_dy"] is None:
self.ema[stage + "_dy"] = loss_dy.detach()
# apply exponential smoothing over batches to dy
loss_dy = (
self.hparams.ema_alpha_dy * loss_dy
+ (1 - self.hparams.ema_alpha_dy) * self.ema[stage + "_dy"]
)
self.ema[stage + "_dy"] = loss_dy.detach()
if self.hparams.force_weight > 0:
self.losses[stage + "_dy"].append(loss_dy.detach())
if "y" in batch:
if batch.y.ndim == 1:
batch.y = batch.y.unsqueeze(1)
# energy/prediction loss
loss_y = loss_fn(pred, batch.y)
if stage in ["train", "val"] and self.hparams.ema_alpha_y < 1:
if self.ema[stage + "_y"] is None:
self.ema[stage + "_y"] = loss_y.detach()
# apply exponential smoothing over batches to y
loss_y = (
self.hparams.ema_alpha_y * loss_y
+ (1 - self.hparams.ema_alpha_y) * self.ema[stage + "_y"]
)
self.ema[stage + "_y"] = loss_y.detach()
if self.hparams.energy_weight > 0:
self.losses[stage + "_y"].append(loss_y.detach())
# total loss
loss = loss_y * self.hparams.energy_weight + loss_dy * self.hparams.force_weight
self.losses[stage].append(loss.detach())
return loss

@PhilippThoelke
Copy link
Collaborator

The model outputs force predictions.
https://github.com/torchmd/torchmd-net/blob/main/torchmdnet/models/model.py#L207

I agree that the naming is inconsistent but as far as I remember, when I implemented QM9, MD17 and ANI-1, they all loaded forces instead of the derivative. The model was consistent with that.

I'm not sure about the more recently added dataset loaders.

@peastman
Copy link
Collaborator

The HDF5 loader returns forces, because that's what the model expects. I think it took me a while to figure that out, and I was certainly surprised when I realized dy actually meant the negative gradient! Renaming y and dy to energy and forces would make it a lot clearer.

@giadefa
Copy link
Contributor

giadefa commented Aug 26, 2022

@peastman is SPICE returning forces or gradients?

        all_dy = (
            pt.tensor(mol["dft_total_gradient"], dtype=pt.float32)
            * self.HARTREE_TO_EV
            / self.BORH_TO_ANGSTROM
        )

@peastman
Copy link
Collaborator

I'm not sure. Raimondas wrote that class.

@giadefa
Copy link
Contributor

giadefa commented Aug 26, 2022 via email

@peastman
Copy link
Collaborator

HDF5.

@giadefa
Copy link
Contributor

giadefa commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants