Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't replicate results of BBTv2 paper #16

Closed
jordane95 opened this issue Sep 13, 2022 · 2 comments
Closed

Can't replicate results of BBTv2 paper #16

jordane95 opened this issue Sep 13, 2022 · 2 comments

Comments

@jordane95
Copy link

Hi, I tried your BBTv2 code but failed to get comparable results as reported in your paper.

In my case, using the command

python deepbbt.py   --model_name "roberta-large"  --task_name "snli"   --n_prompt_tokens 50   --intrinsic_dim 500   --k_shot 16   --device "cuda:0"   --seed 42   --loss_type "ce"   --cat_or_add "add"   --random_proj "normal"   --sigma1 1   --sigma2 0.2   --popsize 20   --bound 0   --budget 8000   --print_every 50   --eval_every 100

gives the following results

Done. Elapsed time: 39.49383888641993 (mins)
Evaluate on test data...
Evaluate data in 75.54 seconds!                                                                                                                                                                                     
[tester] 
SNLIMetric: acc=0.5509975570032574, hinge=2.8394456026220167, ce=11.656479801339513
Test acc: 0.551

which is higher than other gradient-free baselines but much smaller that the number reported in your paper (60.62).

I'm wondering why. Do I need to tune the random seed?

@txsun1997
Copy link
Owner

Hi @jordane95

We've updated our code and results recently. Please check the latest implementation (mainly in bbt.py and deepbbt.py). New results can be found in Google Sheets, where we list BBTv2 performance with each random seed such that you can exactly reproduce our reported results. See also the latest paper for technical details.

@jordane95
Copy link
Author

Hi @txsun1997

Thanks for the update! With the new code and hyperparameters, I can successfully replicate the results.

********* Evaluated on dev set *********
Dev loss: 1.0993. Dev perf: 0.4583. Best dev perf: 0.625
********* Done *********
[# API Calls 7650] loss: 0.5777. Current perf: 0.75. Best perf so far: 0.9375
Done. Elapsed time: 65.36737917661667 (mins)
Evaluate on test data...
Evaluate data in 99.46 seconds!                                                                  
[tester] 
SNLIMetric: acc=0.5991449511400652, hinge=2.5979726629070816, ce=8.909537101024913
Test acc: 0.5991

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants