Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I3D Convolutions Script + Input Data #7

Closed
amanchadha opened this issue May 31, 2020 · 6 comments
Closed

I3D Convolutions Script + Input Data #7

amanchadha opened this issue May 31, 2020 · 6 comments

Comments

@amanchadha
Copy link

Hi Vladimir,

Noticed in the MDVC codebase that you load the I3D CONV features from "./data/sub_activitynet_v1-3.i3d_25fps_stack24step24_2stream.hdf5"

Some questions:
(i) Do you have a script that generates these features from raw data?
(ii) What input data did you run the I3D model over? I ask because it appears from your I3D features filename that your features are for 25 FPS, which implies that you manually sampled the videos in the ActivityNet Captions dataset at 25 FPS since unfortunately, the official ActivityNet website only offers sampled frames at 5 FPS (http://activity-net.org/challenges/2020/tasks/anet_captioning.html).
(iii) Do you have a link for the sampled frames?

Thanks!
Aman

@v-iashin
Copy link
Owner

v-iashin commented Jun 1, 2020

Hi,

Sorry for the long reply. I decided to write a little library dedicated to feature extraction from videos. It is mainly based on the script I wrote for the MDVC but is more transparent and easier to use 🙂. So, it took a couple of days to wrap it up. Check it out: https://github.com/v-iashin/i3d_features.

Here are the answers to your questions:
(i) Yes. Check out v-iashin/video_features@4fa02bd5c. Please see the notes below.

(ii) Yep. Exactly! We downloaded the available videos using the official script activitynet/ActivityNet@7185a39 and run the feature extraction script over the raw videos.

(iii) I still have the videos. I can think about a way how to share them in case you would REALLY like to have them 🙂.

The notes on (i):
I was using an implementation of PWC Net from sniklaus/pytorch-pwc@f613890 with a couple of tweaks. Yesterday, I checked and figured out that the model weights have been changed (hashes: 91006e6cd54dc052b00660239f5b1814 -> 08330ee36a9aa0d16f198f8927352502). I am not sure what caused it, I haven't contacted the author. I tried both and there is a small difference between the values. So I provide the model, which I used for MDVC (network-default.pytorch) there as well as the weights from the latest model from sniklaus/pytorch-pwc (pwc_net.pt). Make sure to use the correct ones:

python main.py --feature_type i3d --device_ids 0 --extraction_fps 25 --stack_size 24 --step_size 24 --pwc_path ./models/i3d/checkpoints/network-default.pytorch --video_paths ./sample/v_ZNVhz7ctTq0.mp4
# this outputs the exact values as in "sub_activitynet_v1-3.i3d_25fps_stack24step24_2stream.hdf5" for this video

Another note is regarding how the I3D features were extracted for ActivityNet. Specifically, please see i3d.utils.utils.form_iter_list() function. It has phase argument. This argument was specified according to the epoch phase: train or val_1/val_2 and how the last video frames were used to form the last video feature in a video. Please, make sure to tweak the code of the feature extraction a bit. I think it should be pretty straight forward. I just wanted to keep it dataset-independent and decided to work around it somehow.

@amanchadha
Copy link
Author

Hi Vladimir,

(i) Thanks for putting together the I3D repository. Got me clarity on the process followed to get to the I3D features. Indeed very helpful.

(ii) I see. When you ran the feature extraction script on the videos, did you store the sampled frames (using --keep_frames)? Sadly even the official 5 FPS link for videos on (http://activity-net.org/challenges/2020/tasks/anet_captioning.html) isn't accessible. I am currently blocked on making any progress in my work, so it is necessary to gain access. If you have access to the sampled frames and can upload them, I would really appreciate it. If you need a server to upload, I can arrange one for you.

Thanks again,
Aman

@v-iashin
Copy link
Owner

v-iashin commented Jun 2, 2020

(ii) I am afraid I cannot provide you with frames as we didn't store them at all. The videos itself are 200+ GB, but the frames were 1+ TB. We didn't have a larger fast disk (SSD or NVMe) to read them on-fly. So, we decided to calculate features and remove the frames right away, just like in the repo now. I can upload the videos, and you can extract features along with frames. Let me know if you need the videos. The process of extraction takes around a week on this dataset for 24 step and 24 stack size on 3 2080Ti GPUs.

We have the resources, I will organize the download link. Don't worry.

@amanchadha
Copy link
Author

Ok, it would be well appreciated if you can share the videos. Thanks!

@v-iashin
Copy link
Owner

v-iashin commented Jun 2, 2020

Please contact us via e-mail.

@v-iashin v-iashin closed this as completed Jun 2, 2020
@amanchadha
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants