Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When will the adversarial training code of pretraining in indomain dataset be released? #1

Closed
youngfly11 opened this issue Dec 16, 2020 · 3 comments

Comments

@youngfly11
Copy link

Hi, zhe;

Thanks for your excellent work. Recently I want to reproduce some results in Villa and conduct pre-training on indomain datasets. I am curious about whether it is possible to mimic the adversarial training codes in train_vqa_adv.py to pretraining stage simply? Is there any specific configuration for adversarial training in pretraining stage?

@zhegan27
Copy link
Owner

Sorry for the late response due to holiday season. Yes, basically you can follow the adversarial training code in train_vqa_adv.py to get the adversarial pre-training code ready. We also plan to release the pre-training code. Thanks for your reminder. Please stay tuned. We will get this done asap.

Meanwhile, you can also try by yourself. There is no specific things that you need to worry about. Basically, follow the pre-training configuration file of the UNITER code base, and then add the adversarial-training-related hyper-parameters. Hope it helps!

Best,
Zhe

@youngfly11
Copy link
Author

Hi,zhe;
Thanks for your response. I have a follow-up question. When I run the pretraining code in this VILLA repo, I found the training is very slow by using the default setting (setting the worker=4), the GPU utilization is very slow. When I set the worker=8 or higher, It will raise a problem as following.
I am wondering that Do you have the same phenomenon during training? How is your training speed in pretraining?

102708280-cdec8400-42dc-11eb-9ecf-17bb770f2397

@zhegan27
Copy link
Owner

Thanks for trying our code. Empirically, we did not seem to meet the problem that you mentioned. How low is your GPU utilization?

We ran the pretraining code on our internal Microsoft GPU clusters, and did not observe the low utilization phenomenon. This may be caused by your RAM size or disk speed, or other constraints. When you tried the fine-tuning code, did you also have the same low-utilization problem? Thanks.

Best,
Zhe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants