You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,tgc! I tried using Torch's fasterrcnn_resnet50_fpn pre-trained model to extract the region_features of the video, but found that the feature shapes I extracted were only [823, 4], which is far from [26, 36, 2048] and [26, 36, 5] in the dataset you provided. What does the extra dimension mean, or what do these three dimensions mean respectively?
I wonder that is it feasible to use Torchvision's fasterrcnn_resnet50_fpn model to extract features without using caffe's Fast R-CNN model?The sizeof features extracted using Torchvision's fasterrcnn_resnet50_fpn model is significantly insufficient.How can I extract more features and accurate feature dimensions that meet the requirements?
The text was updated successfully, but these errors were encountered:
Hi,tgc! I tried using Torch's fasterrcnn_resnet50_fpn pre-trained model to extract the region_features of the video, but found that the feature shapes I extracted were only [823, 4], which is far from [26, 36, 2048] and [26, 36, 5] in the dataset you provided. What does the extra dimension mean, or what do these three dimensions mean respectively?
I wonder that is it feasible to use Torchvision's fasterrcnn_resnet50_fpn model to extract features without using caffe's Fast R-CNN model?The sizeof features extracted using Torchvision's fasterrcnn_resnet50_fpn model is significantly insufficient.How can I extract more features and accurate feature dimensions that meet the requirements?
The text was updated successfully, but these errors were encountered: