The choice of the actiion decoder #11

SiyuanHuang95 · 2023-05-29T04:41:37Z

Hi, I noticed that you used torch.distributions.Distribution after MLP to get the final output, could you share some insights about this choice? What's the advantage compared with the direct usage of MLP and softmax?

Also, for the training procedure, should we ignore that header, and direct apply NLL loss with the output of MLP, or should we apply the NLL with the probability of that distribution? If also, could you give some simple code snippets to demonstrate the training usage?

BTW, congrats on the acceptance of ICML, well done!

Bests,

yunfanjiang · 2023-06-04T18:22:18Z

Hi there,

Thank you for your congratulatory words. To answer your questions

I noticed that you used torch.distributions.Distribution after MLP to get the final output, could you share some insights about this choice? What's the advantage compared with the direct usage of MLP and softmax?

Theoretically there is no difference between using categorical distribution and MLP + softmax. Personally, I found using torch distributions to be convenient since they implement uniformed interfaces that can work with different strategies to model action heads.

Also, for the training procedure, should we ignore that header, and direct apply NLL loss with the output of MLP, or should we apply the NLL with the probability of that distribution? If also, could you give some simple code snippets to demonstrate the training usage?

Sure, in the discrete case, let's say dist is a torch.distributions.Categorical instance predicted by the model, label is the discretized action, the loss is calculated with torch.nn.functional.cross_entropy. Since it takes unnormalized logits as inputs, we can just pass dist.logits (with proper reshape if necessary) into the loss function. For continuous case with unimodal Gaussian or GMM, I'd recommend to checkout these snippets: here and here.

SiyuanHuang95 · 2023-06-05T17:57:00Z

Great thanks for your @yunfanjiang reply and informative hints!

MLP + Softmax case: Okay, I got it. BTW, I noticed that many works use MSE loss to train the policy network, turning the training into the regression problem. Have you ever conducted some experiments to compare them?
Okay, thanks. But I noticed in your work you chose to use discretized ones. So what would be the big different between them?

yunfanjiang · 2023-06-09T22:06:35Z

Great thanks for your @yunfanjiang reply and informative hints!

MLP + Softmax case: Okay, I got it. BTW, I noticed that many works use MSE loss to train the policy network, turning the training into the regression problem. Have you ever conducted some experiments to compare them?

Okay, thanks. But I noticed in your work you chose to use discretized ones. So what would be the big different between them?

Thanks for the followup. To answer them

I assume you were referring to those with continuous actions. In those cases we can totally opt to use a regression loss. However, since GMM is more expressive and can better handle distributional multimodality (which is the case for our benchmark, where multiple solutions exist for a single task), we only experimented with GMM for continuous action case.
In our case we didn't observe significant difference empirically. So we opted to the simpler choice.

Hope these would be helpful.

yunfanjiang closed this as completed Jun 4, 2023

Mano176 mentioned this issue Aug 21, 2023

How can we calculate the loss for vima? #28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The choice of the actiion decoder #11

The choice of the actiion decoder #11

SiyuanHuang95 commented May 29, 2023

yunfanjiang commented Jun 4, 2023

SiyuanHuang95 commented Jun 5, 2023 •

edited

Loading

yunfanjiang commented Jun 9, 2023 •

edited

Loading

The choice of the actiion decoder #11

The choice of the actiion decoder #11

Comments

SiyuanHuang95 commented May 29, 2023

yunfanjiang commented Jun 4, 2023

SiyuanHuang95 commented Jun 5, 2023 • edited Loading

yunfanjiang commented Jun 9, 2023 • edited Loading

SiyuanHuang95 commented Jun 5, 2023 •

edited

Loading

yunfanjiang commented Jun 9, 2023 •

edited

Loading