Train only an output model, freezing the representation model. #317

RaulPPelaez · 2024-04-17T09:33:44Z

Adds the --freeze-representation, --reset-output-model and --overwrite-representation to train.py.

Freeze representation: Makes it so that the representation model weights are not trained
Reset output model: Makes it so that the reset_parameters() is called on the output model after loading it for training. Ignored if load-model is not used.
Overwrite representation: Takes a path to a checkpoint, if present the weights of the representation model will be taken from here as initial weights.

This allows to train many output modules while keeping a single representation model. The workflow is intended to work like this:

 $ torchmd-train --conf my_model1.yaml --log-dir model1 # Initial training for the representation model
  # Train the second model but load the representation weighs from the first one.
  # Note that there are no limitations on the output model here with respect to model1.
 $ torchmd-train --conf my_model2.yaml --log-dir model2 --freeze-representation --overwrite-representation model1/best.ckpt
 # Now you have two models that share the representation model

For inference we can take advantage of the shared representation model and trick torch into calling it only one time.
For this we can create a class similar to Ensemble. For prototyping we can simply make it similar to TorchMD_Net like:

    def forward(
        self,
        z: Tensor,
        pos: Tensor,
        batch: Optional[Tensor] = None,
        box: Optional[Tensor] = None,
        q: Optional[Tensor] = None,
        s: Optional[Tensor] = None,
        extra_args: Optional[Dict[str, Tensor]] = None,
    ) -> Tuple[Tensor, Tensor]:
        assert z.dim() == 1 and z.dtype == torch.long
        batch = torch.zeros_like(z) if batch is None else batch

        if self.derivative:
            pos.requires_grad_(True)
        x, v, z, pos, batch = self.models[0].representation_model(
            z, pos, batch, box=box, q=q, s=s
        )
        y = []
        neg_dy = []
        for m in self.models:
           o = m.output_model
           x_o = o.pre_reduce(x,v,z,pos,batch)
           if self.prior_model is not None:
               for prior in self.prior_model:
                   x_o = prior.pre_reduce(x_o, z, pos, batch, extra_args)
            x_o = o.reduce(x_o, batch)
            y_o = o.post_reduce(x_o)
            if self.prior_model is not None:
                for prior in self.prior_model:
                    y_o = prior.post_reduce(y_o, z, pos, batch, box, extra_args)
            y.append(y_o)
            if self.derivative:
                grad_outputs: List[Optional[torch.Tensor]] = [torch.ones_like(y_o)]
                dy_o = grad(
                    [y_o],
                    [pos],
                    grad_outputs=grad_outputs,
                    create_graph=self.training,
                    retain_graph=self.training,
                )[0]
                assert dy_o is not None, "Autograd returned None for the force prediction."
                neg_dy.append(-dy_o)
        y = torch.stack(y)
        neg_dy = torch.stack(neg_dy) if self.derivative else torch.empty(0)
        y_mean = torch.mean(y, axis=0)
        neg_dy_mean = torch.mean(neg_dy, axis=0)  if self.derivative else torch.empty(0)
        y_std = torch.std(y, axis=0)
        neg_dy_std = torch.std(neg_dy, axis=0)  if self.derivative else torch.empty(0)

        if self.return_std:
            return y_mean, neg_dy_mean, y_std, neg_dy_std
        else:
            return y_mean, neg_dy_mean

weights when loading an already trained model for training.

RaulPPelaez · 2024-04-17T09:34:17Z

cc @stefdoerr

representation model weights using a checkpoint.

Add options to freeze the representation model and reset the output

c952b01

weights when loading an already trained model for training.

RaulPPelaez added 8 commits April 17, 2024 11:46

Add the overwrite-representation option, which allows to ovewrite the

160e3c4

representation model weights using a checkpoint.

Fix typo

f54adb3

Use hparams instead of hparams

142ea3e

Fix

97b556b

Fix hparams

96c2920

Extract representation model

9f9d267

store_true

5c196ef

store_true

c650071

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train only an output model, freezing the representation model. #317

Train only an output model, freezing the representation model. #317

RaulPPelaez commented Apr 17, 2024 •

edited

Loading

RaulPPelaez commented Apr 17, 2024

Train only an output model, freezing the representation model. #317

Are you sure you want to change the base?

Train only an output model, freezing the representation model. #317

Conversation

RaulPPelaez commented Apr 17, 2024 • edited Loading

RaulPPelaez commented Apr 17, 2024

RaulPPelaez commented Apr 17, 2024 •

edited

Loading