Skip to content

Commit

Permalink
Update Mujoco Bemchmark's webpage (#606)
Browse files Browse the repository at this point in the history
  • Loading branch information
ChenDRAG committed Apr 23, 2022
1 parent e01385e commit 5c9afe7
Show file tree
Hide file tree
Showing 3 changed files with 79 additions and 3 deletions.
4 changes: 4 additions & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -150,3 +150,7 @@ ppo
Jupyter
Colab
Colaboratory
IPendulum
Reacher
Runtime
Nvidia
76 changes: 74 additions & 2 deletions docs/tutorials/benchmark.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ Benchmark
Mujoco Benchmark
----------------

Tianshou's Mujoco benchmark contains state-of-the-art results (even better than `SpinningUp <https://spinningup.openai.com/en/latest/spinningup/bench.html>`_!).
Tianshou's Mujoco benchmark contains state-of-the-art results.

Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
Every experiment is conducted under 10 random seeds for 1-10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco for source code and detailed results.

.. raw:: html

Expand All @@ -18,6 +18,78 @@ Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
<br>
</center>

The table below compares the performance of Tianshou against published results on OpenAI Gym MuJoCo benchmarks. We use max average return in 1M timesteps as the reward metric. ~ means the result is approximated from the plots because quantitative results are not provided. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `TD3 paper <https://arxiv.org/pdf/1802.09477.pdf>`_, `SAC paper <https://arxiv.org/pdf/1812.05905.pdf>`_, `PPO paper <https://arxiv.org/pdf/1707.06347.pdf>`_, `ACKTR paper <https://arxiv.org/abs/1708.05144>`_, `OpenAI Baselines <https://github.com/openai/baselines>`_ and `Spinning Up <https://spinningup.openai.com/en/latest/spinningup/bench.html>`_.

+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|Task |Ant |HalfCheetah|Hopper |Walker2d |Swimmer |Humanoid |Reacher |IPendulum |IDPendulum|
+=========+================+==========+===========+==========+==========+=========+==========+========+==========+==========+
|DDPG |Tianshou |990.4 |**11718.7**|**2197.0**|1400.6 |**144.1**|**177.3** |**-3.3**|**1000.0**|8364.3 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |**1005.3**|3305.6 |**2020.5**|1843.6 |/ |/ |-6.5 |**1000.0**|**9355.5**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper (Our) |888.8 |8577.3 |1860.0 |**3098.1**|/ |/ |-4.0 |**1000.0**|8370.0 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~840 |~11000 |~1800 |~1950 |~137 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|TD3 |Tianshou |**5116.4**|**10201.2**|3472.2 |3982.4 |**104.2**|**5189.5**|**-2.7**|**1000.0**|**9349.2**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |4372.4 |9637.0 |**3564.1**|**4682.8**|/ |/ |-3.6 |**1000.0**|9337.5 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~3800 |~9750 |~2860 |~4000 |~78 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|SAC |Tianshou |**5850.2**|**12138.8**|**3542.2**|**5007.0**|**44.4** |**5488.5**|**-2.6**|**1000.0**|**9359.5**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |SAC Paper |~3720 |~10400 |~3370 |~3740 |/ |~5200 |/ |/ |/ |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |655.4 |2347.2 |2996.7 |1283.7 |/ |/ |-4.4 |**1000.0**|8487.2 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~3980 |~11520 |~3150 |~4250 |~41.7 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|A2C |Tianshou |**3485.4**|**1829.9** |**1253.2**|**1091.6**|**36.6** |**1726.0**|**-6.7**|**1000.0**|**9257.7**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper |/ |~1000 |~900 |~850 |~31 |/ |~-24 |**~1000** |~7100 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper (TR) |/ |~930 |~1220 |~700 |**~36** |/ |~-27 |**~1000** |~8100 |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|PPO |Tianshou |**3258.4**|**5783.9** |**2609.3**|3588.5 |66.7 |**787.1** |**-4.1**|**1000.0**|**9231.3**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper |/ |~1800 |~2330 |~3460 |~108 |/ |~-7 |**~1000** |~8000 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |1083.2 |1795.4 |2164.7 |3317.7 |/ |/ |-6.2 |**1000.0**|8977.9 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |OpenAI Baselines|/ |~1700 |~2400 |~3510 |~111 |/ |~-6 |~940 |~7350 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~650 |~1670 |~1850 |~1230 |**~120** |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|TRPO |Tianshou |**2866.7**|**4471.2** |2046.0 |**3826.7**|40.9 |**810.1** |**-5.1**|**1000.0**|**8435.2**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |ACKTR paper |~0 |~400 |~1400 |~550 |~40 |/ |-8 |**~1000** |~800 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper |/ |~0 |~2100 |~1100 |**~121** |/ |~-115 |**~1000** |~200 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 paper |-75.9 |-15.6 |**2471.3**|2321.5 |/ |/ |-111.4 |985.4 |205.9 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |OpenAI Baselines|/ |~1350 |**~2200** |~2350 |~95 |/ |**~-5** |~910 |~7000 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up (TF)|~150 |~850 |~1200 |~600 |~85 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+

Runtime averaged on 8 MuJoCo benchmark tasks is listed below. All results are obtained using a single Nvidia TITAN X GPU and
up to 48 CPU cores (at most one CPU core for each thread).

========= ========= ============ ============== ============ ============== ==========
Algorithm # of Envs 1M timesteps Collecting (%) Updating (%) Evaluating (%) Others (%)
========= ========= ============ ============== ============ ============== ==========
DDPG 1 2.9h 12.0 80.2 2.4 5.4
TD3 1 3.3h 11.4 81.7 1.7 5.2
SAC 1 5.2h 10.9 83.8 1.8 3.5
REINFORCE 64 4min 84.9 1.8 12.5 0.8
A2C 16 7min 62.5 28.0 6.6 2.9
PPO 64 24min 11.4 85.3 3.2 0.2
NPG 16 7min 65.1 24.9 9.5 0.6
TRPO 16 7min 62.9 26.5 10.1 0.6
========= ========= ============ ============== ============ ============== ==========


Atari Benchmark
---------------
Expand Down
2 changes: 1 addition & 1 deletion examples/mujoco/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ For pretrained agents, detailed graphs (single agent, single game) and log detai

### TRPO

| Environment | Tianshou (1M) | [ACKTR paper](https://arxiv.org/pdf/1708.05144.pdf) | [PPO paper](https://arxiv.org/pdf/1707.06347.pdf) | [OpenAI Baselines](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) | [Spinning Up (PyTorch)](https://spinningup.openai.com/en/latest/spinningup/bench.html) |
| Environment | Tianshou (1M) | [ACKTR paper](https://arxiv.org/pdf/1708.05144.pdf) | [PPO paper](https://arxiv.org/pdf/1707.06347.pdf) | [OpenAI Baselines](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) | [Spinning Up (Tensorflow)](https://spinningup.openai.com/en/latest/spinningup/bench.html) |
| :--------------------: | :---------------: | :-------------------------------------------------: | :-----------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| Ant | **2866.7±707.9** | ~0 | N | N | ~150 |
| HalfCheetah | **4471.2±804.9** | ~400 | ~0 | ~1350 | ~850 |
Expand Down

0 comments on commit 5c9afe7

Please sign in to comment.