[Reproducing Results] on Alfworld #28

ai-nikolai · 2024-03-08T14:09:32Z

Dear Authors,

Thank you for the great work on introducing ReAct.

Since, the original model that you used text-davinci-002 is deprecated on openai the closest two alternatives are: gpt-3.5-turbo and davinci-002. The best performance we get on e.g. the first 10 is 0.3, while the reported results on the first 10 envs of Alfworld are 0.7.

Could you share the traces or advice, what your latest scores on this environment is? Or how to reproduce your score of 0.7. @ysymyth @john-b-yang @descrip

Thanks.

The text was updated successfully, but these errors were encountered:

ai-nikolai · 2024-03-20T15:23:38Z

@ysymyth - any update on the above? Thank you.

ysymyth · 2024-04-04T23:37:56Z

Have you tried to run on more than 10 cases? 10 seems too noisy to tell anything.

ai-nikolai · 2024-04-09T14:01:40Z

@ysymyth Thanks a lot for coming back to me on this one.

Yes, we have run the results several times on the entire test set (i.e. 135 environments). We both used your code and our own implementation.

Did you do modifications beyond the ones present in your code (in this repo?):

e.g. Correcting common syntax mistakes of the model (e.g. put object x in place y -> put object x in/on place y)?
Sampling outputs again with increasing temperature?
Anything else?

okhat · 2024-06-18T23:42:34Z

Yeah I can confirm that newer GPT-3.5 (both latest & instruct) do appear to perform really poorly, around 30%, when running a close clone of the notebook of alfworld.

I would have just said that GPT-3.5 got much worse on this task or needs new prompts, but I see a recent paper that uses gpt-3.5-instruct with decent ReAct results (54%) here: https://arxiv.org/pdf/2405.17402

So @ai-nikolai and I might be missing something.

ysymyth · 2024-08-07T03:28:29Z

i think reproducibility will get harder and harder ... close for now but feel free to reopen

ysymyth closed this as completed Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Reproducing Results] on Alfworld #28

[Reproducing Results] on Alfworld #28

ai-nikolai commented Mar 8, 2024

ai-nikolai commented Mar 20, 2024

ysymyth commented Apr 4, 2024

ai-nikolai commented Apr 9, 2024

okhat commented Jun 18, 2024

ysymyth commented Aug 7, 2024

[Reproducing Results] on Alfworld #28

[Reproducing Results] on Alfworld #28

Comments

ai-nikolai commented Mar 8, 2024

ai-nikolai commented Mar 20, 2024

ysymyth commented Apr 4, 2024

ai-nikolai commented Apr 9, 2024

okhat commented Jun 18, 2024

ysymyth commented Aug 7, 2024