You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was able to write a script for data generation for MSVD.
Could you please comment on the number of epochs to run to reproduce scores as the [Yao et al. 2015 Describing Videos by Exploiting Temporal Structure] paper.
I see that in the code it is mentioned 900 epochs.
Thanks.
The text was updated successfully, but these errors were encountered:
The number 900 is not right because I forgot to change the default number of epochs in test().
Normally, The temporal-attention model only takes about 40~80 epochs to overfit the training data. You can test on training data to see if the model overfits or not.
I just notice that I have not run the code on either MSVD or DVS. Instead, I trained and evaluated on the M-VAD [1] dataset. The meteor score of model 40 is 5.4%, which is close to the one (4.3%) reported in [2]. (However, they use GoogleNet instead of VGG)
Therefore, I have to admit that there is no guarantee for the model to reproduce scores as the original paper.
[1] Torabi et al., Using Descriptive Video Services To Create a Large Data Source For Video Annotation Research, GCPR 2015.
[2] Venugopalan et al., Sequence to Sequence – Video to Text, ICCV 2015.
I was able to write a script for data generation for MSVD.
Could you please comment on the number of epochs to run to reproduce scores as the [Yao et al. 2015 Describing Videos by Exploiting Temporal Structure] paper.
I see that in the code it is mentioned 900 epochs.
Thanks.
The text was updated successfully, but these errors were encountered: