The Best Results. #42

liuxingpeng520521 · 2024-01-24T05:21:59Z

Can you please show the log file of your best results on GPT-4？ And have you done any testing on the 13 test files presented in Proverbot 9001, and if so, what were the results?

wayhoww · 2024-07-23T13:58:59Z

Hi @liuxingpeng520521 and the author, I am also trying to reproduce the result presented on paper.

I am using the following command to run the experiment.

python src/main/eval_benchmark.py prompt_settings=lean_dfs env_settings=bm25_retrieval eval_settings=n_60_dfs_gpt4_always_retrieve_no_ex benchmark=simple_benchmark_lean

However, none of lemmas in simple_benchmark_lean were successfully proved. Could you please help me by providing the correct command?

amit9oct · 2024-07-23T16:18:01Z

Can you share your log files in the .log folder? Specially the eval.log file

wayhoww · 2024-07-24T17:15:41Z

Sure. Please find logs in all-logs.zip.

I think there might be some issue with the prompt template.

amit9oct · 2024-07-24T21:38:49Z

I used the following settings, and it works smoothly:

defaults:
  - env_settings: bm25_retrieval
  - benchmark: simple_benchmark_lean
  - eval_settings: n_60_dfs_gpt4_128k_always_retrieve_no_ex 
  - prompt_settings: lean_dfs 
  - override hydra/job_logging: 'disabled'

eval_settings:
 timeout_in_secs: 200
 proof_retries: 1
 temperature: 0

You can change your src/main/experiments.yaml with the content above, and then run python src/main/eval_benchmark.py, which should work. Change the timeout and retries based on your requirements.
I'm also attaching the eval.log generated that has the proofs and individual prompts for all the queries made to GPT-4.
eval_simple_benchmark_lean.log

liuxingpeng520521 closed this as completed Jan 24, 2024

liuxingpeng520521 reopened this Jan 24, 2024

amit9oct closed this as completed Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Best Results. #42

The Best Results. #42

liuxingpeng520521 commented Jan 24, 2024

wayhoww commented Jul 23, 2024 •

edited

Loading

amit9oct commented Jul 23, 2024

wayhoww commented Jul 24, 2024

amit9oct commented Jul 24, 2024

The Best Results. #42

The Best Results. #42

Comments

liuxingpeng520521 commented Jan 24, 2024

wayhoww commented Jul 23, 2024 • edited Loading

amit9oct commented Jul 23, 2024

wayhoww commented Jul 24, 2024

amit9oct commented Jul 24, 2024

wayhoww commented Jul 23, 2024 •

edited

Loading