New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance check #7
Comments
Yes, we equally sample 5 frames for each video using extract_video-frame. I have re-tested and got 25.9/49.8 from ckpt_violet_pretrain.pt and 34.3/62.9 from ckpt_violet_msrvtt-retrieval.pt. I am using PyTorch 1.7.0 and transformers 4.18.0 with CUDA 11.0. |
Thank you for the re-testining. Could you provide me with the txt_msrvtt.json file that contains the 1k test videos? There are only 50 videos in https://github.com/tsujuifu/pytorch_violet/blob/main/_data/txt_msrvtt-retrieval.json |
I just tested with my txt file, and got 'r@1': 0.233, 'r@5': 0.533 with the ckpt_violet_pretrain.pt. This is my generated txt file. |
I just tested your file with the ckpt_violet_pretrain.pt using your repo, but still got r@1': 0.233, 'r@5': 0.533 😂 . Have no idea what's wrong |
Hi, Just wondering how you process the Youcook2 dataset for evaluation since one video contains multiple clip-text pairs. I have extracted clip-text pairs (3400) for evaluation and got a very disappointing performance. |
I just get the same result with u e.g. r@1': 0.233, 'r@5': 0.533. |
Hi, thank you for sharing the code and models.
I have used the ckpt_violet_pretrain.pt and ckpt_violet_msrvtt-retrieval with our data processing (5 frames with interval num_frames // 5) for msrvtt t2v retrieval evaluation.
I got rank@1 22.6/32.9 which is lower than the number (25.9/34.7) in the paper. I also tested the CLIP model and got a similar result. Are the released models achieving the reported results?
If yes, could you provide the processing pipeline or describe how to get the reported performance?
Thank you!
The text was updated successfully, but these errors were encountered: