DQN for LunarLander v2

Implementation of reinforcement learning algorithms for the OpenAI Gym environment LunarLander-v2

Demo clips

Dependencies

gym==0.21.0
imageio==2.13.5
matplotlib==3.5.1
numpy==1.22.0
Pillow==9.0.1
torch==1.10.1+cu102
tqdm==4.62.3

How to use

Training instructions are included in the Jupyter notebook.

Testing Results

(a) Loss curve

(b) Tune Parameters

Target score

I implemented an early stop function which, when the average of the last 100 scores reach a target value, will stop the training process and plot the result. I find that the target score of 250 usually produces more consistent landings and higher total rewards.
Setting the target score to a lower value like 200, will result in more misses in the final demo. Setting the target score too high, however, will sometimes result in the average score never reaching the target value, which takes more time to train and will not necessarily produce a better result.

Target=200, gamma=0.99	Target=250, gamma=0.99

Discount factor

The discount factor $gamma$ determines the importance of future rewards, and the value should be $0g1$ .
Setting it too low will make it "short-sighted", only consider rewards nearest to its state. Setting it higher will make it consider more long-term rewards.
If the discount factor is set equal to or greater than 1, it may cause $Vpi$ and $Qpi$ to diverge.

Experiments as below:

	Target=230, gamma=0.9	Target=230, gamma=1.3
Training Time	2:02:58, 5000 episodes	12:09, 5000 episodes
Training curve
Result

We can see that, $gamma$ set to suboptimal values will result in slow training time and no convergence, and cause the ship to continue hovering and not land.
$gamma$ set to greater than 1 will also cause it to fail to converge, and the ship only fired one side rocket and flew out of control.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.readme_docs		.readme_docs
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
LunarLander.ipynb		LunarLander.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DQN for LunarLander v2

Demo clips

Dependencies

How to use

Testing Results

(a) Loss curve

(b) Tune Parameters

Target score

Discount factor

About

Contributors 2

Languages

License

yuchen071/DQN-for-LunarLander-v2

Folders and files

Latest commit

History

Repository files navigation

DQN for LunarLander v2

Demo clips

Dependencies

How to use

Testing Results

(a) Loss curve

(b) Tune Parameters

Target score

Discount factor

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages