Can't replicate results of BBTv2 paper #16

jordane95 · 2022-09-13T08:32:24Z

Hi, I tried your BBTv2 code but failed to get comparable results as reported in your paper.

In my case, using the command

python deepbbt.py   --model_name "roberta-large"  --task_name "snli"   --n_prompt_tokens 50   --intrinsic_dim 500   --k_shot 16   --device "cuda:0"   --seed 42   --loss_type "ce"   --cat_or_add "add"   --random_proj "normal"   --sigma1 1   --sigma2 0.2   --popsize 20   --bound 0   --budget 8000   --print_every 50   --eval_every 100

gives the following results

Done. Elapsed time: 39.49383888641993 (mins)
Evaluate on test data...
Evaluate data in 75.54 seconds!                                                                                                                                                                                     
[tester] 
SNLIMetric: acc=0.5509975570032574, hinge=2.8394456026220167, ce=11.656479801339513
Test acc: 0.551

which is higher than other gradient-free baselines but much smaller that the number reported in your paper (60.62).

I'm wondering why. Do I need to tune the random seed?

The text was updated successfully, but these errors were encountered:

txsun1997 · 2022-10-17T01:18:04Z

Hi @jordane95

We've updated our code and results recently. Please check the latest implementation (mainly in bbt.py and deepbbt.py). New results can be found in Google Sheets, where we list BBTv2 performance with each random seed such that you can exactly reproduce our reported results. See also the latest paper for technical details.

jordane95 · 2022-10-20T07:30:03Z

Hi @txsun1997

Thanks for the update! With the new code and hyperparameters, I can successfully replicate the results.

********* Evaluated on dev set *********
Dev loss: 1.0993. Dev perf: 0.4583. Best dev perf: 0.625
********* Done *********
[# API Calls 7650] loss: 0.5777. Current perf: 0.75. Best perf so far: 0.9375
Done. Elapsed time: 65.36737917661667 (mins)
Evaluate on test data...
Evaluate data in 99.46 seconds!                                                                  
[tester] 
SNLIMetric: acc=0.5991449511400652, hinge=2.5979726629070816, ce=8.909537101024913
Test acc: 0.5991

jordane95 closed this as completed Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't replicate results of BBTv2 paper #16

Can't replicate results of BBTv2 paper #16

jordane95 commented Sep 13, 2022

txsun1997 commented Oct 17, 2022

jordane95 commented Oct 20, 2022

Can't replicate results of BBTv2 paper #16

Can't replicate results of BBTv2 paper #16

Comments

jordane95 commented Sep 13, 2022

txsun1997 commented Oct 17, 2022

jordane95 commented Oct 20, 2022