Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Mujoco Bemchmark's webpage #606

Merged
merged 5 commits into from
Apr 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -150,3 +150,7 @@ ppo
Jupyter
Colab
Colaboratory
IPendulum
Reacher
Runtime
Nvidia
76 changes: 74 additions & 2 deletions docs/tutorials/benchmark.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ Benchmark
Mujoco Benchmark
----------------

Tianshou's Mujoco benchmark contains state-of-the-art results (even better than `SpinningUp <https://spinningup.openai.com/en/latest/spinningup/bench.html>`_!).
Tianshou's Mujoco benchmark contains state-of-the-art results.

Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
Every experiment is conducted under 10 random seeds for 1-10M steps. Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco for source code and detailed results.

.. raw:: html

Expand All @@ -18,6 +18,78 @@ Please refer to https://github.com/thu-ml/tianshou/tree/master/examples/mujoco
<br>
</center>

The table below compares the performance of Tianshou against published results on OpenAI Gym MuJoCo benchmarks. We use max average return in 1M timesteps as the reward metric. ~ means the result is approximated from the plots because quantitative results are not provided. - means results are not provided. The best-performing baseline on each task is highlighted in boldface. Referenced baselines include `TD3 paper <https://arxiv.org/pdf/1802.09477.pdf>`_, `SAC paper <https://arxiv.org/pdf/1812.05905.pdf>`_, `PPO paper <https://arxiv.org/pdf/1707.06347.pdf>`_, `ACKTR paper <https://arxiv.org/abs/1708.05144>`_, `OpenAI Baselines <https://github.com/openai/baselines>`_ and `Spinning Up <https://spinningup.openai.com/en/latest/spinningup/bench.html>`_.

+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|Task |Ant |HalfCheetah|Hopper |Walker2d |Swimmer |Humanoid |Reacher |IPendulum |IDPendulum|
+=========+================+==========+===========+==========+==========+=========+==========+========+==========+==========+
|DDPG |Tianshou |990.4 |**11718.7**|**2197.0**|1400.6 |**144.1**|**177.3** |**-3.3**|**1000.0**|8364.3 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |**1005.3**|3305.6 |**2020.5**|1843.6 |/ |/ |-6.5 |**1000.0**|**9355.5**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper (Our) |888.8 |8577.3 |1860.0 |**3098.1**|/ |/ |-4.0 |**1000.0**|8370.0 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~840 |~11000 |~1800 |~1950 |~137 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|TD3 |Tianshou |**5116.4**|**10201.2**|3472.2 |3982.4 |**104.2**|**5189.5**|**-2.7**|**1000.0**|**9349.2**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |4372.4 |9637.0 |**3564.1**|**4682.8**|/ |/ |-3.6 |**1000.0**|9337.5 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~3800 |~9750 |~2860 |~4000 |~78 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|SAC |Tianshou |**5850.2**|**12138.8**|**3542.2**|**5007.0**|**44.4** |**5488.5**|**-2.6**|**1000.0**|**9359.5**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |SAC Paper |~3720 |~10400 |~3370 |~3740 |/ |~5200 |/ |/ |/ |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |655.4 |2347.2 |2996.7 |1283.7 |/ |/ |-4.4 |**1000.0**|8487.2 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~3980 |~11520 |~3150 |~4250 |~41.7 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|A2C |Tianshou |**3485.4**|**1829.9** |**1253.2**|**1091.6**|**36.6** |**1726.0**|**-6.7**|**1000.0**|**9257.7**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper |/ |~1000 |~900 |~850 |~31 |/ |~-24 |**~1000** |~7100 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper (TR) |/ |~930 |~1220 |~700 |**~36** |/ |~-27 |**~1000** |~8100 |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|PPO |Tianshou |**3258.4**|**5783.9** |**2609.3**|3588.5 |66.7 |**787.1** |**-4.1**|**1000.0**|**9231.3**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper |/ |~1800 |~2330 |~3460 |~108 |/ |~-7 |**~1000** |~8000 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 Paper |1083.2 |1795.4 |2164.7 |3317.7 |/ |/ |-6.2 |**1000.0**|8977.9 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |OpenAI Baselines|/ |~1700 |~2400 |~3510 |~111 |/ |~-6 |~940 |~7350 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up |~650 |~1670 |~1850 |~1230 |**~120** |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
|TRPO |Tianshou |**2866.7**|**4471.2** |2046.0 |**3826.7**|40.9 |**810.1** |**-5.1**|**1000.0**|**8435.2**|
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |ACKTR paper |~0 |~400 |~1400 |~550 |~40 |/ |-8 |**~1000** |~800 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |PPO Paper |/ |~0 |~2100 |~1100 |**~121** |/ |~-115 |**~1000** |~200 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |TD3 paper |-75.9 |-15.6 |**2471.3**|2321.5 |/ |/ |-111.4 |985.4 |205.9 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |OpenAI Baselines|/ |~1350 |**~2200** |~2350 |~95 |/ |**~-5** |~910 |~7000 |
+ +----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+
| |Spinning Up (TF)|~150 |~850 |~1200 |~600 |~85 |/ |/ |/ |/ |
+---------+----------------+----------+-----------+----------+----------+---------+----------+--------+----------+----------+

Runtime averaged on 8 MuJoCo benchmark tasks is listed below. All results are obtained using a single Nvidia TITAN X GPU and
up to 48 CPU cores (at most one CPU core for each thread).

========= ========= ============ ============== ============ ============== ==========
Algorithm # of Envs 1M timesteps Collecting (%) Updating (%) Evaluating (%) Others (%)
========= ========= ============ ============== ============ ============== ==========
DDPG 1 2.9h 12.0 80.2 2.4 5.4
TD3 1 3.3h 11.4 81.7 1.7 5.2
SAC 1 5.2h 10.9 83.8 1.8 3.5
REINFORCE 64 4min 84.9 1.8 12.5 0.8
A2C 16 7min 62.5 28.0 6.6 2.9
PPO 64 24min 11.4 85.3 3.2 0.2
NPG 16 7min 65.1 24.9 9.5 0.6
TRPO 16 7min 62.9 26.5 10.1 0.6
========= ========= ============ ============== ============ ============== ==========


Atari Benchmark
---------------
Expand Down
2 changes: 1 addition & 1 deletion examples/mujoco/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ For pretrained agents, detailed graphs (single agent, single game) and log detai

### TRPO

| Environment | Tianshou (1M) | [ACKTR paper](https://arxiv.org/pdf/1708.05144.pdf) | [PPO paper](https://arxiv.org/pdf/1707.06347.pdf) | [OpenAI Baselines](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) | [Spinning Up (PyTorch)](https://spinningup.openai.com/en/latest/spinningup/bench.html) |
| Environment | Tianshou (1M) | [ACKTR paper](https://arxiv.org/pdf/1708.05144.pdf) | [PPO paper](https://arxiv.org/pdf/1707.06347.pdf) | [OpenAI Baselines](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) | [Spinning Up (Tensorflow)](https://spinningup.openai.com/en/latest/spinningup/bench.html) |
| :--------------------: | :---------------: | :-------------------------------------------------: | :-----------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| Ant | **2866.7±707.9** | ~0 | N | N | ~150 |
| HalfCheetah | **4471.2±804.9** | ~400 | ~0 | ~1350 | ~850 |
Expand Down