Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance check against SB3 #1

Closed
zhihanyang2022 opened this issue Jun 6, 2021 · 3 comments
Closed

Performance check against SB3 #1

zhihanyang2022 opened this issue Jun 6, 2021 · 3 comments
Assignees

Comments

@zhihanyang2022
Copy link
Owner

zhihanyang2022 commented Jun 6, 2021

Following the style here, we will run

  • DDPG
  • TD3
  • SAC

on 4 PyBullet environments

  • HalfCheetahBulletEnv-v0
  • AntBulletEnv-v0
  • HopperBulletEnv-v0
  • Walker2DBulletEnv-v0

for 3 seeds, and report the mean and std of final performance.

In total, this would be 3 * 4 * 3 = 36 runs.

@zhihanyang2022
Copy link
Owner Author

zhihanyang2022 commented Jun 8, 2021

Commit hash: 35dc700

For DDPG:

python launch.py --env HalfCheetahBulletEnv-v0 --algo ddpg --config configs/reproduce_sb3/ddpg_halfcheetah.gin --run_id 1 2 3
python launch.py --env AntBulletEnv-v0 --algo ddpg --config configs/reproduce_sb3/ddpg_ant.gin --run_id 1 2 3
python launch.py --env HopperBulletEnv-v0 --algo ddpg --config configs/reproduce_sb3/ddpg_hopper.gin --run_id 1 2 3
python launch.py --env Walker2DBulletEnv-v0 --algo ddpg --config configs/reproduce_sb3/ddpg_walker2d.gin --run_id 1 2 3

For TD3:

python launch.py --env HalfCheetahBulletEnv-v0 --algo td3 --config configs/reproduce_sb3/td3_all.gin --run_id 1 2 3
python launch.py --env AntBulletEnv-v0 --algo td3 --config configs/reproduce_sb3/td3_all.gin --run_id 1 2 3
python launch.py --env HopperBulletEnv-v0 --algo td3 --config configs/reproduce_sb3/td3_all.gin --run_id 1 2 3
python launch.py --env Walker2DBulletEnv-v0 --algo td3 --config configs/reproduce_sb3/td3_all.gin --run_id 1 2 3

For SAC:

python launch.py --env HalfCheetahBulletEnv-v0 --algo sac --config configs/reproduce_sb3/sac_halfcheetah_ant.gin --run_id 1 2 3
python launch.py --env AntBulletEnv-v0 --algo sac --config configs/reproduce_sb3/sac_halfcheetah_ant.gin --run_id 1 2 3
python launch.py --env HopperBulletEnv-v0 --algo sac --config configs/reproduce_sb3/sac_hopper_walker2d.gin --run_id 1 2 3
python launch.py --env Walker2DBulletEnv-v0 --algo sac --config configs/reproduce_sb3/sac_hopper_walker2d.gin --run_id 1 2 3

@zhihanyang2022
Copy link
Owner Author

zhihanyang2022 commented Jun 9, 2021

As mentioned earlier, we used the same hyper-parameters as SB3 (available in rl-baselines3-zoo).

For the training curves, both SB3 and our visualization reports un-smoothed mean and standard error.

Potential causes for minor differences (e.g., our error bars seem to be wider in some cases):

  • Stochasticity (e.g., our TD3 on Walker2D seem to be a little weak, but we believe it can corrected with more runs).
  • SB3 used numpy and matplotlib, while we relied on weights and biases.
  • For each trial, SB3 uses 10 evaluation episodes per 10000 steps, while we used 5 evaluation episodes per 10000 steps.

DDPG

DDPG (SB3; 6 seeds) DDPG (ours; 3 seeds)
HalfCheetah
Ant
Hopper
Walker2D

TD3

TD3 (SB3; 3 seeds) TD3 (ours; 3 seeds)
HalfCheetah
Ant
Hopper
Walker2D

SAC

SAC (SB3; 3 seeds) SAC (ours; 3 seeds)
HalfCheetah
Ant
Hopper
Walker2D

@zhihanyang2022
Copy link
Owner Author

zhihanyang2022 commented Oct 11, 2021

Comparison of final performance for SAC:

(3 seeds; report stderr) SAC (SB3) SAC (ours)
HalfCheetahBulletEnv 2725 +/- 129 2757 +/- 53
AntBulletEnv 3493 +/- 23 3146 +/- 35
HopperBulletEnv 2546 +/- 196 2422 +/- 168
Walked2DBulletEnv 2367 +/- 83 2184 +/- 54

where SB3 stats are obtained from DLR-RM/stable-baselines3#48.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant