[BUG] #15

sherlock1987 · 2020-06-12T05:02:28Z

Describe the bug
This platform could not train a MLE model.
When I load the MLE model for GDPL, PPO, PG, it could train with no problem, but it never gets to the optimal score(I run evluate.py to see the model). Actually it goes down after few eopchs. And here is a graph I made in GDPL, and PPO is pretty similar to this one.

To Reproduce
Steps to reproduce the behavior:
Simply run train.py in PG/GDPL/PPO, and it will give this issue. I write a script which could evluate all of the models in one dir, and here is the graph I made about GDPL.

Expected behavior
The evluate score should go higher when loaded MLE model.
Score for PPO:
[0.52, 0.53, 0.54, 0.49, 0.44, 0.49, 0.46, 0.47, 0.44, 0.43, 0.42, 0.44, 0.47, 0.48, 0.49, 0.46, 0.45, 0.46, 0.45, 0.48, 0.46, 0.48, 0.49, 0.49, 0.49, 0.48, 0.45, 0.47, 0.43, 0.43, 0.43, 0.42, 0.42, 0.41, 0.42, 0.43, 0.44, 0.47, 0.45, 0.43]
So, the max of PPO is just goes to 0.53, not like 0.74.

zqwerty · 2020-06-12T09:00:36Z

Thanks! We will check it in a few days.

zqwerty · 2020-06-12T09:12:14Z

Please check if you re-install ConvLab-2 since the code has changed. (pip install -e . in ConvLab-2)

sherlock1987 · 2020-06-12T09:20:22Z

Yeah, I have already change the code, sure about that.

zqwerty · 2020-06-12T09:22:45Z

I have tried these:
in mle/multiwoz/: python train.py
in ppo/: python train.py --load_path ../mle/multiwoz/save/best_mle
in policy/: python evaluate.py --model_name PPO --load_path ppo/save/9_ppo (I choose 9_ppo randomly)
and get:

All 100 0.73
reward: 11.143058441558441

sherlock1987 · 2020-06-17T08:05:34Z

Oh, that is cool, for me the PPO is also good, but for GDPL, there are still some problem exists.

LinZichuan · 2020-07-15T14:24:57Z

I have tried these:
in mle/multiwoz/: python train.py
in ppo/: python train.py --load_path ../mle/multiwoz/save/best_mle
in policy/: python evaluate.py --model_name PPO --load_path ppo/save/9_ppo (I choose 9_ppo randomly)
and get:
All 100 0.73
reward: 11.143058441558441

Hi @zqwerty , Thanks for the tips. This works for the commit 2422980!
But when I run these commands for the latest commit (c6372b1), I found the problem is still unsolved. The performance of PPO will still rise to 65% at the beginning and then start dropping at later training stage (to around 35%).
Could you help look into it? Thanks!

zqwerty · 2020-07-16T01:34:53Z

move to #54

ShuoZhangXJTU · 2020-11-18T09:58:39Z

I have tried these:
in mle/multiwoz/: python train.py
in ppo/: python train.py --load_path ../mle/multiwoz/save/best_mle
in policy/: python evaluate.py --model_name PPO --load_path ppo/save/9_ppo (I choose 9_ppo randomly)
and get:
All 100 0.73
reward: 11.143058441558441

Hey guys, I tested PPO on the latest version of convab-2 today and I got a success rate of 84% that is way bigger than the reported 73%, I wonder if there are any mistakes? If not, I think the performance record should be updated.

sherlock1987 added the bug Something isn't working label Jun 12, 2020

zqwerty mentioned this issue Jun 17, 2020

Training PPO-algorithm #8

Closed

truthless11 closed this as completed Jun 22, 2020

zqwerty mentioned this issue Jul 10, 2020

[BUG] Trained PPO from scratch won't work with Analyzer #40

Closed

zqwerty mentioned this issue Jul 16, 2020

[Maintenance] RL policy training #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] #15

[BUG] #15

sherlock1987 commented Jun 12, 2020 •

edited

Loading

zqwerty commented Jun 12, 2020

zqwerty commented Jun 12, 2020

sherlock1987 commented Jun 12, 2020

zqwerty commented Jun 12, 2020

sherlock1987 commented Jun 17, 2020

LinZichuan commented Jul 15, 2020 •

edited

Loading

zqwerty commented Jul 16, 2020

ShuoZhangXJTU commented Nov 18, 2020

[BUG] #15

[BUG] #15

Comments

sherlock1987 commented Jun 12, 2020 • edited Loading

zqwerty commented Jun 12, 2020

zqwerty commented Jun 12, 2020

sherlock1987 commented Jun 12, 2020

zqwerty commented Jun 12, 2020

sherlock1987 commented Jun 17, 2020

LinZichuan commented Jul 15, 2020 • edited Loading

zqwerty commented Jul 16, 2020

ShuoZhangXJTU commented Nov 18, 2020

sherlock1987 commented Jun 12, 2020 •

edited

Loading

LinZichuan commented Jul 15, 2020 •

edited

Loading