GitHub - waterwheel31/DRL_tennis: Tennis playing multi-player agent trained by deep reinforcement learning. This is a project of Udacity Deep Reinforcement Learning Nanodgree

Environment Overview

This uses Unity's Tennis environment.

Conditions

A double-jointed arm can move to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. Thus, the goal of your agent is to maintain its position at the target location for as many time steps as possible.

Reward

+0.1 if hit the ball over the net
-0.01 if let the ball hit the ground
-0.01 if let the ball hit out of bounds
The score is rewarded to both players

State Space

Continuous 8 dimensions
They correspond to:
- positions (2)
- velocities (2)
- above for the ball and the racket (x2)
Each player observes independently

Action Space

Continuous 2 dimensions
- racket's movement toward net and jumping
- Each player takes actions independently
- Each has range of [-1, 1]

Goal of the agent

get an average episode score of more than +0.5
the look back period to calculate the average is 100 consecutive eposodes

Used Algorithm

For the agents, used DDPG method
Used simple NN for each of actor and critic network as following:
- actor network
  - 3 fully connected layers with ReLu and Tanh activation functions
  - The first two have size of 128
  - last layer's output size = action size (4)
- critic network
  - 3 fully connected layers with ReLu and Tanh activation functions
  - the first layer size = 128
  - second layer size = 128 + action size (4)
  - last layer's output size = 1

Performance

The agents learned the tennis and achieved +0.5 point score in average around in 900 episodes.

Ideas for Future Work

Architecture: Currently, just used simple NN. Going forward, using more complex architecture may improve the score
Hyperparameter: the parameter has not been fully optimized. Here is another opportunity of improvement. (ex. Decay of noise can be introduced)
Replay Buffer: Since the play ground is symmetry, it may increase the training efficiency if the flipped results are added to the training samples

Trained Model

The trained model is checkpoint_actor.pth. You can use this on Unity environment mentioned below

Dependencies Settting

Download the Unity environment from one of the links below
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
Place the file in this repository and unzip

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
README.md		README.md
Tennis.ipynb		Tennis.ipynb
checkpoint_actor.pth		checkpoint_actor.pth
checkpoint_critic.pth		checkpoint_critic.pth
requirements.txt		requirements.txt
setup.py		setup.py
train_score_tennis.png		train_score_tennis.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

.gitignore

.gitignore

README.md

README.md

Tennis.ipynb

Tennis.ipynb

checkpoint_actor.pth

checkpoint_actor.pth

checkpoint_critic.pth

checkpoint_critic.pth

requirements.txt

requirements.txt

setup.py

setup.py

train_score_tennis.png

train_score_tennis.png

Repository files navigation

Environment Overview

Conditions

Reward

State Space

Action Space

Goal of the agent

Used Algorithm

Performance

Ideas for Future Work

Trained Model

Dependencies Settting

About

Releases

Packages

Languages

waterwheel31/DRL_tennis

Folders and files

Latest commit

History

Repository files navigation

Environment Overview

Conditions

Reward

State Space

Action Space

Goal of the agent

Used Algorithm

Performance

Ideas for Future Work

Trained Model

Dependencies Settting

About

Topics

Resources

Stars

Watchers

Forks

Languages