New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling longer videos while preparing data files #4
Comments
Hi, For videos that are longer than n_step_lstm, I simply discard the rest and take the clips with the maximum length as input. |
Sorry for accidentally sending the message. |
Thanks for the quick response and the clarifications. I read through the Att,py script and writing a script to generate the h5 files for MSVD dataset. |
Maybe I misled you. For longer videos, I trim the videos to fit the length "n_step_lstm". Another approach is like what you just said, sample videos to get n_step_lstm no. of frames covering the entire video. Therefore, there is no change in the total number of videos. |
Thanks for clearing that. |
Yes, you're right. |
Thanks. I get the following results at testing on 200 epoch trained model- init COCO-EVAL scorer The Meteor score seem too low. During training, the val scores were METEOR: 0.256 and the predicted sentences i.e. "PD:" were observed to be incomplete (mostly last word of the sentence missing). The word vocab size from function preProBuildWordVocab is 1500. I am trying to replicate results (as closely as possible) of https://github.com/yaoli/arctic-capgen-vid Any thoughts on what I may be missing? |
I think that the incomplete sentences cause the METEOR score to be this low. Are all of the sentences in the val set incomplete? |
Thanks. Sample of some GT and predicted sentences during validation- May be there is some alignment issue with the data files I prepared. I look forward to running the model with the data you generated after you post the h5 generation files. |
Hi, |
Thanks for preparing a wonderful code!
While preparing the data h5 files, as mentioned- "batch['data'] stores the visual features. shape (n_step_lstm, batch_size, hidden_dim)"
How to deal with videos that are longer than "n_step_lstm" length?
If the video is broken into parts and stored as separate input samples, would the model figure out and learn from parts of same video using the batch['label'] parameter.
Any help on preparing the data h5 files would be appreciated.
Thanks.
The text was updated successfully, but these errors were encountered: