Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

joint training hyper parameters #7

Closed
lxa9867 opened this issue Apr 4, 2022 · 9 comments
Closed

joint training hyper parameters #7

lxa9867 opened this issue Apr 4, 2022 · 9 comments

Comments

@lxa9867
Copy link

lxa9867 commented Apr 4, 2022

Hi,

Thank you for sharing your work. I am writing to inquire the hyper parameters used for joint training. The arxiv paper mentioned that
image
which says that the joint training uses 32 V100 GPUs and 2 video clips for each GPU.

I consider it means 32G V100 GPU. But I think it's not possible to add 2 video clip within 32G memory. I cannot reproduce the result using 8 V100 32G GPU with 1 clip per GPU, would you like to give me some advice? Thank you!

@lxa9867
Copy link
Author

lxa9867 commented Apr 5, 2022

BTW, would you like to share your script for pretraining, thanks!

@wjn922
Copy link
Owner

wjn922 commented Apr 5, 2022

Hi,

We use 32 V100 GPUs with 32G memory. And sorry for the mistake, it is actually is 1 for joint training and 2 for pretraining.
The joint training and pretraining stages would take around 1~2 days depends on the backbones.

We do not support multi-node training in the repo now. For single node training, the pretraining script is

python3 -m torch.distributed.launch --nproc_per_node=8  --use_env \
main_pretrain.py  --dataset_file all --binary --with_box_refine \
--batch_size 2 --num_frames 1 \
--epochs 12 --lr_drop 8 10 \
[backbone]

@lxa9867
Copy link
Author

lxa9867 commented Apr 5, 2022

Thank you for your information!

@lxa9867 lxa9867 closed this as completed Apr 5, 2022
@lxa9867
Copy link
Author

lxa9867 commented Apr 9, 2022

Did you freeze text encoder during pretraining?

@lxa9867 lxa9867 reopened this Apr 9, 2022
@wjn922
Copy link
Owner

wjn922 commented Apr 9, 2022

The text encoder is not frozen for pretraining. While for joint training, it is frozen.

@lxa9867
Copy link
Author

lxa9867 commented Apr 9, 2022

I think I found the reason I cannot reproduce the result...The paper has some ambiguity for this part. Thank you.

@lxa9867
Copy link
Author

lxa9867 commented Apr 18, 2022

Hi,

May I ask why you freeze the text encoder during joint training? Since the joint training doesn't need a pretraining process, I would assume the text encoder is not trained? Thank you!

@lxa9867 lxa9867 reopened this Apr 18, 2022
@wjn922
Copy link
Owner

wjn922 commented Apr 19, 2022

We have experimented whether to freeze the text encoder with the R50 backbone. And we found freezing the text encoder would have 1+ points gain.

@lxa9867
Copy link
Author

lxa9867 commented Apr 19, 2022

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants