Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reproducing Results] on Alfworld #28

Open
ai-nikolai opened this issue Mar 8, 2024 · 3 comments
Open

[Reproducing Results] on Alfworld #28

ai-nikolai opened this issue Mar 8, 2024 · 3 comments

Comments

@ai-nikolai
Copy link

Dear Authors,

Thank you for the great work on introducing ReAct.

Since, the original model that you used text-davinci-002 is deprecated on openai the closest two alternatives are: gpt-3.5-turbo and davinci-002. The best performance we get on e.g. the first 10 is 0.3, while the reported results on the first 10 envs of Alfworld are 0.7.

Could you share the traces or advice, what your latest scores on this environment is? Or how to reproduce your score of 0.7. @ysymyth @john-b-yang @descrip

Thanks.

@ai-nikolai
Copy link
Author

@ysymyth - any update on the above? Thank you.

@ysymyth
Copy link
Owner

ysymyth commented Apr 4, 2024

Have you tried to run on more than 10 cases? 10 seems too noisy to tell anything.

@ai-nikolai
Copy link
Author

@ysymyth Thanks a lot for coming back to me on this one.

Yes, we have run the results several times on the entire test set (i.e. 135 environments). We both used your code and our own implementation.

Did you do modifications beyond the ones present in your code (in this repo?):

  • e.g. Correcting common syntax mistakes of the model (e.g. put object x in place y -> put object x in/on place y)?
  • Sampling outputs again with increasing temperature?
  • Anything else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants