Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance check #7

Closed
Flowerfan opened this issue May 5, 2022 · 7 comments
Closed

Performance check #7

Flowerfan opened this issue May 5, 2022 · 7 comments

Comments

@Flowerfan
Copy link

Hi, thank you for sharing the code and models.

I have used the ckpt_violet_pretrain.pt and ckpt_violet_msrvtt-retrieval with our data processing (5 frames with interval num_frames // 5) for msrvtt t2v retrieval evaluation.
I got rank@1 22.6/32.9 which is lower than the number (25.9/34.7) in the paper. I also tested the CLIP model and got a similar result. Are the released models achieving the reported results?
If yes, could you provide the processing pipeline or describe how to get the reported performance?
Thank you!

@tsujuifu
Copy link
Owner

tsujuifu commented May 5, 2022

Yes, we equally sample 5 frames for each video using extract_video-frame.

I have re-tested and got 25.9/49.8 from ckpt_violet_pretrain.pt and 34.3/62.9 from ckpt_violet_msrvtt-retrieval.pt.

I am using PyTorch 1.7.0 and transformers 4.18.0 with CUDA 11.0.
Also, do not forget to add model.eval() during the evaluation.

@Flowerfan
Copy link
Author

Thank you for the re-testining. Could you provide me with the txt_msrvtt.json file that contains the 1k test videos? There are only 50 videos in https://github.com/tsujuifu/pytorch_violet/blob/main/_data/txt_msrvtt-retrieval.json

@Flowerfan
Copy link
Author

I just tested with my txt file, and got 'r@1': 0.233, 'r@5': 0.533 with the ckpt_violet_pretrain.pt. This is my generated txt file.

@tsujuifu
Copy link
Owner

tsujuifu commented May 6, 2022

The files in this repo are parital examples to help formulate the input data.

Here is my txt_msrvtt-retrieval.json
I have checked it, and it seems to be the same 😊.

@Flowerfan
Copy link
Author

I just tested your file with the ckpt_violet_pretrain.pt using your repo, but still got r@1': 0.233, 'r@5': 0.533 😂 . Have no idea what's wrong

@Flowerfan
Copy link
Author

Hi, Just wondering how you process the Youcook2 dataset for evaluation since one video contains multiple clip-text pairs. I have extracted clip-text pairs (3400) for evaluation and got a very disappointing performance.

@siyangssy
Copy link

I just tested your file with the ckpt_violet_pretrain.pt using your repo, but still got r@1': 0.233, 'r@5': 0.533 😂 . Have no idea what's wrong

I just get the same result with u e.g. r@1': 0.233, 'r@5': 0.533.
Have u solve the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants