Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Best Results. #42

Closed
liuxingpeng520521 opened this issue Jan 24, 2024 · 4 comments
Closed

The Best Results. #42

liuxingpeng520521 opened this issue Jan 24, 2024 · 4 comments

Comments

@liuxingpeng520521
Copy link

Can you please show the log file of your best results on GPT-4? And have you done any testing on the 13 test files presented in Proverbot 9001, and if so, what were the results?

@wayhoww
Copy link

wayhoww commented Jul 23, 2024

Hi @liuxingpeng520521 and the author, I am also trying to reproduce the result presented on paper.

I am using the following command to run the experiment.

python src/main/eval_benchmark.py prompt_settings=lean_dfs env_settings=bm25_retrieval eval_settings=n_60_dfs_gpt4_always_retrieve_no_ex benchmark=simple_benchmark_lean

However, none of lemmas in simple_benchmark_lean were successfully proved. Could you please help me by providing the correct command?

@amit9oct
Copy link
Collaborator

Can you share your log files in the .log folder? Specially the eval.log file

@wayhoww
Copy link

wayhoww commented Jul 24, 2024

Sure. Please find logs in all-logs.zip.

I think there might be some issue with the prompt template.

@amit9oct
Copy link
Collaborator

I used the following settings, and it works smoothly:

defaults:
  - env_settings: bm25_retrieval
  - benchmark: simple_benchmark_lean
  - eval_settings: n_60_dfs_gpt4_128k_always_retrieve_no_ex 
  - prompt_settings: lean_dfs 
  - override hydra/job_logging: 'disabled'

eval_settings:
 timeout_in_secs: 200
 proof_retries: 1
 temperature: 0

You can change your src/main/experiments.yaml with the content above, and then run python src/main/eval_benchmark.py, which should work. Change the timeout and retries based on your requirements.
I'm also attaching the eval.log generated that has the proofs and individual prompts for all the queries made to GPT-4.
eval_simple_benchmark_lean.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants