You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Done. Elapsed time: 39.49383888641993 (mins)
Evaluate on test data...
Evaluate data in 75.54 seconds!
[tester]
SNLIMetric: acc=0.5509975570032574, hinge=2.8394456026220167, ce=11.656479801339513
Test acc: 0.551
which is higher than other gradient-free baselines but much smaller that the number reported in your paper (60.62).
I'm wondering why. Do I need to tune the random seed?
The text was updated successfully, but these errors were encountered:
We've updated our code and results recently. Please check the latest implementation (mainly in bbt.py and deepbbt.py). New results can be found in Google Sheets, where we list BBTv2 performance with each random seed such that you can exactly reproduce our reported results. See also the latest paper for technical details.
Thanks for the update! With the new code and hyperparameters, I can successfully replicate the results.
********* Evaluated on dev set *********
Dev loss: 1.0993. Dev perf: 0.4583. Best dev perf: 0.625
********* Done *********
[# API Calls 7650] loss: 0.5777. Current perf: 0.75. Best perf so far: 0.9375
Done. Elapsed time: 65.36737917661667 (mins)
Evaluate on test data...
Evaluate data in 99.46 seconds!
[tester]
SNLIMetric: acc=0.5991449511400652, hinge=2.5979726629070816, ce=8.909537101024913
Test acc: 0.5991
Hi, I tried your BBTv2 code but failed to get comparable results as reported in your paper.
In my case, using the command
gives the following results
which is higher than other gradient-free baselines but much smaller that the number reported in your paper (60.62).
I'm wondering why. Do I need to tune the random seed?
The text was updated successfully, but these errors were encountered: