<a href="https://colab.research.google.com/github/AI4Finance-Foundation/ElegantRL/blob/master/tutorial_LunarLanderContinuous_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **LunarLanderContinuous-v2 Example in ElegantRL**






# **Task Description**

[LunarLanderContinuous-v2](https://gym.openai.com/envs/LunarLanderContinuous-v2) is a robotic control task. The goal is to get a Lander to rest on the landing pad. If lander moves away from landing pad it loses reward back. Episode finishes if the lander crashes or comes to rest, receiving additional -100 or +100 points.

# **Part 1: Install ElegantRL**

In [1]:
# install elegantrl library
!pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git

Collecting git+https://github.com/AI4Finance-LLC/ElegantRL.git
  Cloning https://github.com/AI4Finance-LLC/ElegantRL.git to /tmp/pip-req-build-q0f_9pry
  Running command git clone -q https://github.com/AI4Finance-LLC/ElegantRL.git /tmp/pip-req-build-q0f_9pry
Collecting pybullet
  Downloading pybullet-3.2.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (90.8 MB)
[K     |████████████████████████████████| 90.8 MB 291 bytes/s 
Collecting box2d-py
  Downloading box2d_py-2.3.8-cp37-cp37m-manylinux1_x86_64.whl (448 kB)
[K     |████████████████████████████████| 448 kB 70.5 MB/s 
Building wheels for collected packages: elegantrl
  Building wheel for elegantrl (setup.py) ... [?25l[?25hdone
  Created wheel for elegantrl: filename=elegantrl-0.3.3-py3-none-any.whl size=183567 sha256=a2b2116b1f175b6cad721c1dfeba30620a45100aa1a7e65fe9c19df809fe68d3
  Stored in directory: /tmp/pip-ephem-wheel-cache-ltz2mxds/wheels/52/9a/b3/08c8a0b5be22a65da0132538c05e7e961b1253c90d6845e0c6
Successfully bui

# **Part 2: Import Packages**


*   **elegantrl**
*   **OpenAI Gym**: a toolkit for developing and comparing reinforcement learning algorithms.



In [2]:
import gym
from elegantrl.agents import AgentModSAC
from elegantrl.train.config import get_gym_env_args, Arguments
from elegantrl.train.run import *

gym.logger.set_level(40)  # Block warning

# **Part 3: Get environment information**

In [3]:
get_gym_env_args(gym.make("LunarLanderContinuous-v2"), if_print=False)

{'action_dim': 2,
 'env_name': 'LunarLanderContinuous-v2',
 'env_num': 1,
 'if_discrete': False,
 'max_step': 1000,
 'state_dim': 8,
 'target_return': 200}

# **Part 4: Specify Agent and Environment**

*   **agent**: chooses a agent (DRL algorithm) from a set of agents in the [directory](https://github.com/AI4Finance-Foundation/ElegantRL/tree/master/elegantrl/agents).
*   **env_func**: the function to create an environment, in this case, we use gym.make to create BipedalWalker-v3.
*   **env_args**: the environment information.


In [4]:
env_func = gym.make
env_args = {
    "env_num": 1,
    "env_name": "LunarLanderContinuous-v2",
    "max_step": 1000,
    "state_dim": 8,
    "action_dim": 2,
    "if_discrete": False,
    "target_return": 200,
    "id": "LunarLanderContinuous-v2",
}
args = Arguments(AgentModSAC, env_func=env_func, env_args=env_args)

# **Part 4: Specify hyper-parameters**
A list of hyper-parameters is available [here](https://elegantrl.readthedocs.io/en/latest/api/config.html).

In [7]:
args.target_step = args.max_step
args.gamma = 0.99
args.eval_times = 2**5
args.random_seed = 2022

# **Part 5: Train and Evaluate the Agent**






In [8]:
train_and_evaluate(args)

| Arguments Remove cwd: ./LunarLanderContinuous-v2_ModSAC_0
################################################################################
ID     Step    maxR |    avgR   stdR   avgS  stdS |    expR   objC   etc.
0  1.02e+03 -124.42 |
0  1.02e+03 -124.42 | -124.42   42.8     70    13 |   -1.82   0.89   0.13   0.15
0  5.96e+04   71.75 |
0  5.96e+04   71.75 |   71.75  126.5    553   217 |   -0.10   1.62   3.90   0.18
0  8.61e+04   71.75 |  -12.89  112.4    838   178 |    0.03   1.50  17.20   0.21
0  9.90e+04   71.75 |  -17.51  121.6    647   368 |   -0.25   1.46  20.98   0.24
0  1.11e+05   71.75 |  -46.97  103.4    589   381 |    0.10   1.25  25.45   0.27
0  1.24e+05   71.75 |   18.73  107.8    468   396 |    0.08   1.34  29.77   0.31
0  1.41e+05  126.34 |
0  1.41e+05  126.34 |  126.34   93.8    731   189 |    0.09   1.30  30.89   0.38
0  1.53e+05  126.34 |  105.38  115.7    760   220 |    0.14   1.39  32.72   0.45
0  1.64e+05  126.34 |  120.10   97.1    781   211 |    0.00   1.53  34.

Understanding the above results::
*   **Step**: the total training steps.
*  **MaxR**: the maximum reward.
*   **avgR**: the average of the rewards.
*   **stdR**: the standard deviation of the rewards.
*   **objA**: the objective function value of Actor Network (Policy Network).
*   **objC**: the objective function value (Q-value)  of Critic Network (Value Network).