Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] #15

Closed
sherlock1987 opened this issue Jun 12, 2020 · 8 comments
Closed

[BUG] #15

sherlock1987 opened this issue Jun 12, 2020 · 8 comments
Labels
bug Something isn't working

Comments

@sherlock1987
Copy link

sherlock1987 commented Jun 12, 2020

Describe the bug
This platform could not train a MLE model.
When I load the MLE model for GDPL, PPO, PG, it could train with no problem, but it never gets to the optimal score(I run evluate.py to see the model). Actually it goes down after few eopchs. And here is a graph I made in GDPL, and PPO is pretty similar to this one.

To Reproduce
Steps to reproduce the behavior:
Simply run train.py in PG/GDPL/PPO, and it will give this issue. I write a script which could evluate all of the models in one dir, and here is the graph I made about GDPL.

Expected behavior
The evluate score should go higher when loaded MLE model.
Score for PPO:
[0.52, 0.53, 0.54, 0.49, 0.44, 0.49, 0.46, 0.47, 0.44, 0.43, 0.42, 0.44, 0.47, 0.48, 0.49, 0.46, 0.45, 0.46, 0.45, 0.48, 0.46, 0.48, 0.49, 0.49, 0.49, 0.48, 0.45, 0.47, 0.43, 0.43, 0.43, 0.42, 0.42, 0.41, 0.42, 0.43, 0.44, 0.47, 0.45, 0.43]
So, the max of PPO is just goes to 0.53, not like 0.74.

@sherlock1987 sherlock1987 added the bug Something isn't working label Jun 12, 2020
@zqwerty
Copy link
Member

zqwerty commented Jun 12, 2020

Thanks! We will check it in a few days.

@zqwerty
Copy link
Member

zqwerty commented Jun 12, 2020

Please check if you re-install ConvLab-2 since the code has changed. (pip install -e . in ConvLab-2)

@sherlock1987
Copy link
Author

Yeah, I have already change the code, sure about that.

@zqwerty
Copy link
Member

zqwerty commented Jun 12, 2020

I have tried these:
in mle/multiwoz/: python train.py
in ppo/: python train.py --load_path ../mle/multiwoz/save/best_mle
in policy/: python evaluate.py --model_name PPO --load_path ppo/save/9_ppo (I choose 9_ppo randomly)
and get:

All 100 0.73
reward: 11.143058441558441

@sherlock1987
Copy link
Author

Oh, that is cool, for me the PPO is also good, but for GDPL, there are still some problem exists.

@LinZichuan
Copy link

LinZichuan commented Jul 15, 2020

I have tried these:
in mle/multiwoz/: python train.py
in ppo/: python train.py --load_path ../mle/multiwoz/save/best_mle
in policy/: python evaluate.py --model_name PPO --load_path ppo/save/9_ppo (I choose 9_ppo randomly)
and get:

All 100 0.73
reward: 11.143058441558441

Hi @zqwerty , Thanks for the tips. This works for the commit 2422980!
But when I run these commands for the latest commit (c6372b1), I found the problem is still unsolved. The performance of PPO will still rise to 65% at the beginning and then start dropping at later training stage (to around 35%).
Could you help look into it? Thanks!

@zqwerty
Copy link
Member

zqwerty commented Jul 16, 2020

move to #54

@ShuoZhangXJTU
Copy link

I have tried these:
in mle/multiwoz/: python train.py
in ppo/: python train.py --load_path ../mle/multiwoz/save/best_mle
in policy/: python evaluate.py --model_name PPO --load_path ppo/save/9_ppo (I choose 9_ppo randomly)
and get:

All 100 0.73
reward: 11.143058441558441

Hey guys, I tested PPO on the latest version of convab-2 today and I got a success rate of 84% that is way bigger than the reported 73%, I wonder if there are any mistakes? If not, I think the performance record should be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants