You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am having issues using the action-conditional PredRNNV2 for inference.
The way it seems to work (action_injection=concat): Load the actions, grid-repeat them and concat the actual video data and the resulting action tensor channel-wise. Then, use reshape_patch() and pass the input to the model, resulting in a tensor of shape [batch, seq_length, height // patch_size, width // patch_size, (img_ch + action_ch) * patch_size ** 2].
For the action-conditional PredRNNV2 model however, the parameter num_action_ch is used directly for the input channels for the conv layers instead of num_action_ch * patch_size ** 2. For me, this leads to runtime shape mismatches in forward(). Is this an error or did I get it wrong somehow?
The text was updated successfully, but these errors were encountered:
(1) num_action_ch is equal to the dimension of actions. We expand the action to the size of (height // patch_size, width // patch_size).
(2)the repatch_back is only conducted on the frame. See line135-137 in ./core/models/action_cond_predrnn_v2.py
I see where I thought wrong: For the action-conditional case, the expanded action is concatenated to the frames afterreshape_patch() / stripped from the result beforereshape_patch_back().
I have looked at the shape returned e.g. in core/data_provider/bair.py and thought that we include the actions in the input to reshape_patch().
Hi there,
I am having issues using the action-conditional PredRNNV2 for inference.
The way it seems to work (
action_injection=concat
): Load the actions, grid-repeat them and concat the actual video data and the resulting action tensor channel-wise. Then, usereshape_patch()
and pass the input to the model, resulting in a tensor of shape[batch, seq_length, height // patch_size, width // patch_size, (img_ch + action_ch) * patch_size ** 2]
.For the action-conditional PredRNNV2 model however, the parameter
num_action_ch
is used directly for the input channels for the conv layers instead ofnum_action_ch * patch_size ** 2
. For me, this leads to runtime shape mismatches inforward()
. Is this an error or did I get it wrong somehow?The text was updated successfully, but these errors were encountered: