thu-ml · Trinkle23897 · Apr 23, 2022 · Apr 23, 2022 · Apr 23, 2022 · Apr 23, 2022
diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
@@ -150,3 +150,7 @@ ppo
 Jupyter
 Colab
 Colaboratory
+IPendulum
+Reacher
+Runtime
+Nvidia
diff --git a/docs/tutorials/benchmark.rst b/docs/tutorials/benchmark.rst
@@ -5,9 +5,9 @@ Benchmark
 Mujoco Benchmark
 ----------------
 
-Tianshou's Mujoco benchmark contains state-of-the-art results (even better than `SpinningUp <https://spinningup.openai.com/en/latest/spinningup/bench.html>`_!).
+Tianshou's Mujoco benchmark contains state-of-the-art results.
 
-Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
+Every experiment is conducted under 10 random seeds for 1-10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco for source code and detailed results.
 
 .. raw:: html
 
@@ -18,6 +18,78 @@ Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
         <br>
     </center>
 
+The table below compares the performance of Tianshou against published results on OpenAI Gym MuJoCo benchmarks. We use max average return in 1M timesteps as the reward metric. ~ means the result is approximated from the plots because quantitative results are not provided. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `TD3 paper <https://arxiv.org/pdf/1802.09477.pdf>`_, `SAC paper <https://arxiv.org/pdf/1812.05905.pdf>`_, `PPO paper <https://arxiv.org/pdf/1707.06347.pdf>`_, `ACKTR paper <https://arxiv.org/abs/1708.05144>`_, `OpenAI Baselines <https://github.com/openai/baselines>`_ and `Spinning Up <https://spinningup.openai.com/en/latest/spinningup/bench.html>`_.
+
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|Task                      |Ant       |HalfCheetah|Hopper    |Walker2d  |Swimmer  |Humanoid  |Reacher |IPendulum |IDPendulum|
++=========+================+==========+===========+==========+==========+=========+==========+========+==========+==========+
+|DDPG     |Tianshou        |990.4     |**11718.7**|**2197.0**|1400.6    |**144.1**|**177.3** |**-3.3**|**1000.0**|8364.3    |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |TD3 Paper       |**1005.3**|3305.6     |**2020.5**|1843.6    |/        |/         |-6.5    |**1000.0**|**9355.5**|
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |TD3 Paper (Our) |888.8     |8577.3     |1860.0    |**3098.1**|/        |/         |-4.0    |**1000.0**|8370.0    |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |Spinning Up     |~840      |~11000     |~1800     |~1950     |~137     |/         |/       |/         |/         |
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|TD3      |Tianshou        |**5116.4**|**10201.2**|3472.2    |3982.4    |**104.2**|**5189.5**|**-2.7**|**1000.0**|**9349.2**|
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |TD3 Paper       |4372.4    |9637.0     |**3564.1**|**4682.8**|/        |/         |-3.6    |**1000.0**|9337.5    |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |Spinning Up     |~3800     |~9750      |~2860     |~4000     |~78      |/         |/       |/         |/         |
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|SAC      |Tianshou        |**5850.2**|**12138.8**|**3542.2**|**5007.0**|**44.4** |**5488.5**|**-2.6**|**1000.0**|**9359.5**|
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |SAC Paper       |~3720     |~10400     |~3370     |~3740     |/        |~5200     |/       |/         |/         |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |TD3 Paper       |655.4     |2347.2     |2996.7    |1283.7    |/        |/         |-4.4    |**1000.0**|8487.2    |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |Spinning Up     |~3980     |~11520     |~3150     |~4250     |~41.7    |/         |/       |/         |/         |
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|A2C      |Tianshou        |**3485.4**|**1829.9** |**1253.2**|**1091.6**|**36.6** |**1726.0**|**-6.7**|**1000.0**|**9257.7**|
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |PPO Paper       |/         |~1000      |~900      |~850      |~31      |/         |~-24    |**~1000** |~7100     |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |PPO Paper (TR)  |/         |~930       |~1220     |~700      |**~36**  |/         |~-27    |**~1000** |~8100     |
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|PPO      |Tianshou        |**3258.4**|**5783.9** |**2609.3**|3588.5    |66.7     |**787.1** |**-4.1**|**1000.0**|**9231.3**|
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |PPO Paper       |/         |~1800      |~2330     |~3460     |~108     |/         |~-7     |**~1000** |~8000     |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |TD3 Paper       |1083.2    |1795.4     |2164.7    |3317.7    |/        |/         |-6.2    |**1000.0**|8977.9    |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |OpenAI Baselines|/         |~1700      |~2400     |~3510     |~111     |/         |~-6     |~940      |~7350     |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |Spinning Up     |~650      |~1670      |~1850     |~1230     |**~120** |/         |/       |/         |/         |
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|TRPO     |Tianshou        |**2866.7**|**4471.2** |2046.0    |**3826.7**|40.9     |**810.1** |**-5.1**|**1000.0**|**8435.2**|
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |ACKTR paper     |~0        |~400       |~1400     |~550      |~40      |/         |-8      |**~1000** |~800      |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |PPO Paper       |/         |~0         |~2100     |~1100     |**~121** |/         |~-115   |**~1000** |~200      |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |TD3 paper       |-75.9     |-15.6      |**2471.3**|2321.5    |/        |/         |-111.4  |985.4     |205.9     |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |OpenAI Baselines|/         |~1350      |**~2200** |~2350     |~95      |/         |**~-5** |~910      |~7000     |
++         +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+|         |Spinning Up (TF)|~150      |~850       |~1200     |~600      |~85      |/         |/       |/         |/         |
++---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
+
+Runtime averaged on 8 MuJoCo benchmark tasks is listed below. All results are obtained using a single Nvidia TITAN X GPU and
+up to 48 CPU cores (at most one CPU core for each thread).
+
+========= ========= ============ ============== ============ ============== ==========
+Algorithm # of Envs 1M timesteps Collecting (%) Updating (%) Evaluating (%) Others (%)
+========= ========= ============ ============== ============ ============== ==========
+DDPG      1         2.9h         12.0           80.2         2.4            5.4
+TD3       1         3.3h         11.4           81.7         1.7            5.2
+SAC       1         5.2h         10.9           83.8         1.8            3.5
+REINFORCE 64        4min         84.9           1.8          12.5           0.8
+A2C       16        7min         62.5           28.0         6.6            2.9
+PPO       64        24min        11.4           85.3         3.2            0.2
+NPG       16        7min         65.1           24.9         9.5            0.6
+TRPO      16        7min         62.9           26.5         10.1           0.6
+========= ========= ============ ============== ============ ============== ==========
+
 
 Atari Benchmark
 ---------------

diff --git a/examples/mujoco/README.md b/examples/mujoco/README.md
@@ -247,7 +247,7 @@ For pretrained agents, detailed graphs (single agent, single game) and log detai
 
 ### TRPO
 
-|      Environment       |   Tianshou (1M)   | [ACKTR paper](https://arxiv.org/pdf/1708.05144.pdf) | [PPO paper](https://arxiv.org/pdf/1707.06347.pdf) | [OpenAI Baselines](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) | [Spinning Up (PyTorch)](https://spinningup.openai.com/en/latest/spinningup/bench.html) |
+|      Environment       |   Tianshou (1M)   | [ACKTR paper](https://arxiv.org/pdf/1708.05144.pdf) | [PPO paper](https://arxiv.org/pdf/1707.06347.pdf) | [OpenAI Baselines](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) | [Spinning Up (Tensorflow)](https://spinningup.openai.com/en/latest/spinningup/bench.html) |
 | :--------------------: | :---------------: | :-------------------------------------------------: | :-----------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
 |          Ant           | **2866.7±707.9**  |                         ~0                          |                         N                         |                              N                               |                             ~150                             |
 |      HalfCheetah       | **4471.2±804.9**  |                        ~400                         |                        ~0                         |                            ~1350                             |                             ~850                             |