From 5c9afe72f3b7e03d6cc2e7bb706320eb39095439 Mon Sep 17 00:00:00 2001 From: ChenDRAG <40993476+ChenDRAG@users.noreply.github.com> Date: Sun, 24 Apr 2022 01:11:33 +0800 Subject: [PATCH] Update Mujoco Bemchmark's webpage (#606) --- docs/spelling_wordlist.txt | 4 ++ docs/tutorials/benchmark.rst | 76 +++++++++++++++++++++++++++++++++++- examples/mujoco/README.md | 2 +- 3 files changed, 79 insertions(+), 3 deletions(-) diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt index b334e1db9..cf78b00b8 100644 --- a/docs/spelling_wordlist.txt +++ b/docs/spelling_wordlist.txt @@ -150,3 +150,7 @@ ppo Jupyter Colab Colaboratory +IPendulum +Reacher +Runtime +Nvidia diff --git a/docs/tutorials/benchmark.rst b/docs/tutorials/benchmark.rst index c3cb0676a..12b325b03 100644 --- a/docs/tutorials/benchmark.rst +++ b/docs/tutorials/benchmark.rst @@ -5,9 +5,9 @@ Benchmark Mujoco Benchmark ---------------- -Tianshou's Mujoco benchmark contains state-of-the-art results (even better than `SpinningUp `_!). +Tianshou's Mujoco benchmark contains state-of-the-art results. -Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco +Every experiment is conducted under 10 random seeds for 1-10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco for source code and detailed results. .. raw:: html @@ -18,6 +18,78 @@ Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
+The table below compares the performance of Tianshou against published results on OpenAI Gym MuJoCo benchmarks. We use max average return in 1M timesteps as the reward metric. ~ means the result is approximated from the plots because quantitative results are not provided. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `TD3 paper `_, `SAC paper `_, `PPO paper `_, `ACKTR paper `_, `OpenAI Baselines `_ and `Spinning Up `_. + ++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +|Task |Ant |HalfCheetah|Hopper |Walker2d |Swimmer |Humanoid |Reacher |IPendulum |IDPendulum| ++=========+================+==========+===========+==========+==========+=========+==========+========+==========+==========+ +|DDPG |Tianshou |990.4 |**11718.7**|**2197.0**|1400.6 |**144.1**|**177.3** |**-3.3**|**1000.0**|8364.3 | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |TD3 Paper |**1005.3**|3305.6 |**2020.5**|1843.6 |/ |/ |-6.5 |**1000.0**|**9355.5**| ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |TD3 Paper (Our) |888.8 |8577.3 |1860.0 |**3098.1**|/ |/ |-4.0 |**1000.0**|8370.0 | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |Spinning Up |~840 |~11000 |~1800 |~1950 |~137 |/ |/ |/ |/ | ++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +|TD3 |Tianshou |**5116.4**|**10201.2**|3472.2 |3982.4 |**104.2**|**5189.5**|**-2.7**|**1000.0**|**9349.2**| ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |TD3 Paper |4372.4 |9637.0 |**3564.1**|**4682.8**|/ |/ |-3.6 |**1000.0**|9337.5 | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |Spinning Up |~3800 |~9750 |~2860 |~4000 |~78 |/ |/ |/ |/ | ++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +|SAC |Tianshou |**5850.2**|**12138.8**|**3542.2**|**5007.0**|**44.4** |**5488.5**|**-2.6**|**1000.0**|**9359.5**| ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |SAC Paper |~3720 |~10400 |~3370 |~3740 |/ |~5200 |/ |/ |/ | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |TD3 Paper |655.4 |2347.2 |2996.7 |1283.7 |/ |/ |-4.4 |**1000.0**|8487.2 | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |Spinning Up |~3980 |~11520 |~3150 |~4250 |~41.7 |/ |/ |/ |/ | ++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +|A2C |Tianshou |**3485.4**|**1829.9** |**1253.2**|**1091.6**|**36.6** |**1726.0**|**-6.7**|**1000.0**|**9257.7**| ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |PPO Paper |/ |~1000 |~900 |~850 |~31 |/ |~-24 |**~1000** |~7100 | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |PPO Paper (TR) |/ |~930 |~1220 |~700 |**~36** |/ |~-27 |**~1000** |~8100 | ++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +|PPO |Tianshou |**3258.4**|**5783.9** |**2609.3**|3588.5 |66.7 |**787.1** |**-4.1**|**1000.0**|**9231.3**| ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |PPO Paper |/ |~1800 |~2330 |~3460 |~108 |/ |~-7 |**~1000** |~8000 | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |TD3 Paper |1083.2 |1795.4 |2164.7 |3317.7 |/ |/ |-6.2 |**1000.0**|8977.9 | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |OpenAI Baselines|/ |~1700 |~2400 |~3510 |~111 |/ |~-6 |~940 |~7350 | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |Spinning Up |~650 |~1670 |~1850 |~1230 |**~120** |/ |/ |/ |/ | ++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +|TRPO |Tianshou |**2866.7**|**4471.2** |2046.0 |**3826.7**|40.9 |**810.1** |**-5.1**|**1000.0**|**8435.2**| ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |ACKTR paper |~0 |~400 |~1400 |~550 |~40 |/ |-8 |**~1000** |~800 | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |PPO Paper |/ |~0 |~2100 |~1100 |**~121** |/ |~-115 |**~1000** |~200 | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |TD3 paper |-75.9 |-15.6 |**2471.3**|2321.5 |/ |/ |-111.4 |985.4 |205.9 | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |OpenAI Baselines|/ |~1350 |**~2200** |~2350 |~95 |/ |**~-5** |~910 |~7000 | ++ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ +| |Spinning Up (TF)|~150 |~850 |~1200 |~600 |~85 |/ |/ |/ |/ | ++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+ + +Runtime averaged on 8 MuJoCo benchmark tasks is listed below. All results are obtained using a single Nvidia TITAN X GPU and +up to 48 CPU cores (at most one CPU core for each thread). + +========= ========= ============ ============== ============ ============== ========== +Algorithm # of Envs 1M timesteps Collecting (%) Updating (%) Evaluating (%) Others (%) +========= ========= ============ ============== ============ ============== ========== +DDPG 1 2.9h 12.0 80.2 2.4 5.4 +TD3 1 3.3h 11.4 81.7 1.7 5.2 +SAC 1 5.2h 10.9 83.8 1.8 3.5 +REINFORCE 64 4min 84.9 1.8 12.5 0.8 +A2C 16 7min 62.5 28.0 6.6 2.9 +PPO 64 24min 11.4 85.3 3.2 0.2 +NPG 16 7min 65.1 24.9 9.5 0.6 +TRPO 16 7min 62.9 26.5 10.1 0.6 +========= ========= ============ ============== ============ ============== ========== + Atari Benchmark --------------- diff --git a/examples/mujoco/README.md b/examples/mujoco/README.md index ff37db4b2..8890466f8 100644 --- a/examples/mujoco/README.md +++ b/examples/mujoco/README.md @@ -247,7 +247,7 @@ For pretrained agents, detailed graphs (single agent, single game) and log detai ### TRPO -| Environment | Tianshou (1M) | [ACKTR paper](https://arxiv.org/pdf/1708.05144.pdf) | [PPO paper](https://arxiv.org/pdf/1707.06347.pdf) | [OpenAI Baselines](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) | [Spinning Up (PyTorch)](https://spinningup.openai.com/en/latest/spinningup/bench.html) | +| Environment | Tianshou (1M) | [ACKTR paper](https://arxiv.org/pdf/1708.05144.pdf) | [PPO paper](https://arxiv.org/pdf/1707.06347.pdf) | [OpenAI Baselines](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) | [Spinning Up (Tensorflow)](https://spinningup.openai.com/en/latest/spinningup/bench.html) | | :--------------------: | :---------------: | :-------------------------------------------------: | :-----------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | | Ant | **2866.7±707.9** | ~0 | N | N | ~150 | | HalfCheetah | **4471.2±804.9** | ~400 | ~0 | ~1350 | ~850 |