Negative KL divergence #44

yarinbar · 2023-08-13T12:46:35Z

Hi!

I am using your package and get negative KLDiv when training. I am not sure as to why. I saw #29 but I suspect that solution is not applicable in my case.

Here is how the model is made:

b = torch.Tensor([1 if i % 2 == 0 else 0 for i in range(latent_dim)])

s = nf.nets.MLP([latent_dim, 2 * latent_dim, latent_dim], init_zeros=True)
t = nf.nets.MLP([latent_dim, 2 * latent_dim, latent_dim], init_zeros=True)
flows = [nf.flows.MaskedAffineFlow(b, t, s)]
flows += [nf.flows.ActNorm(self.latent_dim)]

base = nf.distributions.base.DiagGaussian(latent_dim)

# Construct flow model
self.nfm = nf.NormalizingFlow(base, flows)

And here is the training loop:

optimizer = torch.optim.Adam(self.nfm.parameters(), lr=lr, weight_decay=weight_decay)
loss_list = []

for epoch in range(n_epochs):
    print(f"Start epoch number {epoch + 1}")

    batch_cum_loss = 0
    n_batches = len(nf_train_loader)

    for batch_idx, (inputs, labels) in enumerate(nf_train_loader):
        batch_size = inputs.shape[0]

        inputs_cls = inputs.to(self.device)
        labels_cls = labels.to(self.device)
        optimizer.zero_grad()

        with torch.no_grad():
            outputs, _, latent = self.net(inputs_cls)

        # Compute loss
        loss = self.nfm.forward_kld(latent[-1])

Where latent[-1] is an intermediate output of a given network (before the classifier).

The loss that comes out is negative whereas if sklearn method mutual_info_score i get a positive number:

q = torch.normal(mean=0, std=1, size=(batch_size, latent_dim))
res = mutual_info_score(latent[-1].view(-1,), q.view(-1,))
res = kl_div(latent[-1], q)
res = kl_loss(latent[-1], q)

As evident in the loss graph, the loss values are also not stable being negative - although when overlooking the sign, the graph does look like a normal training graph.

I would appreciate any help!

The text was updated successfully, but these errors were encountered:

VincentStimper · 2023-09-13T21:14:06Z

Hi @yarinbar,

the forward KL divergence is given by $\text{KL}(p||q)=\mathbf{E}_p[\log\frac{p(x)}{q(x)}]=\mathbf{E}_p[\log p(x)]-\mathbf{E}_p[\log q(x)]$, where $p$ is the target and $q$ is the model. Since the target distribution is often unknown, as seems to be the case for your problem, the expectations are estimated with samples from the target, i.e. data. $\mathbf{E}_p[\log p(x)]$ still cannot be estimated in this case, but since it does not contain any model parameters, it is just a constant and is left out when computing the forward KL divergence. Hence, your loss is not literally the forward KL divergence, but the forward KL divergence minus an unknown constant shift, and, therefore, can become negative.

Best regards,
Vincent

ArtemKar123 · 2023-12-21T13:40:59Z

Hello,

Do I understand correctly that minimising such loss (KL divergence minus an unknown constant shift) will still be correct, despite it being negative?

VincentStimper · 2024-01-04T10:25:40Z

Hi @ArtemKar123,

Yes, since the constant does not depend on the model's parameters, so it will disappear anyway when computing the gradient with respect to the parameters for the optimizer.
Moreover, in this case you are essentially minimizing $-\mathbf{E}_p[\log q(x)]$, so minimizing the forward KL divergence corresponds to maximizing the model's likelihood of the samples from the target, which itself is a common way to train machine learning models.

Best regards,
Vincent

VincentStimper self-assigned this Sep 13, 2023

VincentStimper added good first issue Good for newcomers question Further information is requested labels Sep 13, 2023

VincentStimper closed this as completed Sep 13, 2023

VincentStimper mentioned this issue Jan 22, 2024

Calculating forward KL divergence (probability density maximization), I get negative loss results on my dataset, is this reasonable? #60

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Negative KL divergence #44

Negative KL divergence #44

yarinbar commented Aug 13, 2023 •

edited

Loading

VincentStimper commented Sep 13, 2023

ArtemKar123 commented Dec 21, 2023

VincentStimper commented Jan 4, 2024 •

edited

Loading

Negative KL divergence #44

Negative KL divergence #44

Comments

yarinbar commented Aug 13, 2023 • edited Loading

VincentStimper commented Sep 13, 2023

ArtemKar123 commented Dec 21, 2023

VincentStimper commented Jan 4, 2024 • edited Loading

yarinbar commented Aug 13, 2023 •

edited

Loading

VincentStimper commented Jan 4, 2024 •

edited

Loading