-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New dataset #12
Comments
Hi In THUMOS-14, I adopt sliding window fashion to prepare data, you can leave you email here then I can send you corresponding codes. No, in _get_base_data, feature of each video is load separately, not full data. |
Apologies for the delay, but I don't quite understand everything. It appears that:
I am fine with those. For feature_frame, though, I don't get it. Here's an example:
In that one, why is feature_frame 624? I found that ~ Overall, I just dont quite get how the faeture_frame works, but it's clearly important for computing the Last, I just want to verify - do I need to change anything in order to run the code AND the trained models on a dataset that has an arbitrary number of frames for each video? |
@wzmsltw, friendly bump in case this got lost in the shuffle. |
@cinjon |
Another question - is it right that this codebase is setup only to work with feature vectors that cover the entire video? As far as I can tell, dataset.py needs to be adjusted in the scenario when the model does not get the entire interpolated video at once like is done with the 100 vectors for the ActivityNet videos in the paper. For example, it appears that the gt_bbox computation in _get_train_label should be changed so that the model is predicting only over the time duration given (say 120 seconds) rather than assuming that the time is over the full video_second. Is that right or am I misunderstanding something? (Ok, in that case I am going to ignore feature_frame and just treat it as the same as duration_frame.) |
@cinjon how is your progress to try this code on THUMOS? If I understand this correctly I have to extract the snippet level features using TSN (https://github.com/yjxiong/anet2016-cuhk). But the anet2016-cuhk is pretrained on activity net so you first have to finetune the network on THUMOS and then extract the snippet level features from THUMOS and do the TEM, PGM & finally the PEM training? Is this correct? |
Hi there, thanks for releasing your code. I've went through it with the intention of adding a new dataset and, as far as I can tell, the main thing that needs to be done is to generate the video_anno file, which is a large json consisting of:
I understand that the annotations field is meant to be a list of {'label': , 'segment': [start, end]}, but can you verify what the other three are meant to be? It's not clear if duration_second is according to a normalized FPS or if it's just the timestamp in the video. It's also unclear what the difference is between duration_frame and feature_frame.
In what units is the start and end of segment, i.e. is it relative to the actual time in the video or a normalized time?
Additionally, I will not be modifying the video to be 100 frames each. It seems like you did that for ActivityNet but the paper doesn't mention anything similar for Thumos. What was your strategy for Thumos?
Finally, what's the story with video_df in _get_base_data? It seems like it loads the full data in every time. That's 11G uncompressed. Is this right?
The text was updated successfully, but these errors were encountered: