Skip to content

The aim of the project was to propose a robot control task based on reinforcement learning algorithm. The architecture is called Twin-Delayed DDPG (TD3). For animation I used OpenAI Gym.

tomaszsmaruj25/Twin-Delayed-DDPG-Implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Twin Delayed DDPG Implementation

Main Idea

The aim of the project was to propose a control based on reinforcement learning algorithm. I decided to implement an architecture that is an extension of the DDPG (Deep Deterministic Gradient Descent) algorithm, i.e. an architecture with an actor and a critic. The new architecture is called Twin-Delayed DDPG (TD3). Link to the article: https://arxiv.org/pdf/1802.09477.pdf.

Ant - Average Reward over the Evaluation Step: 2414.343556

Sources and environment

A project made on the basis of a course on the Udemy platform: https://www.udemy.com/course/deep-reinforcement-learning/. The entire project was made in Google Collab. The visualization of the results was created in the OpenAI Gym libraries.

Architecture

It differs from the standard DDPG architecture in that it has an additional critic in its structure, i.e. its twin. It has been added to make the Q-value prediction more stable and better. Adding a second critic is prevented by an overly optimistic estimate of stock values, which was a problem in the original DDPG method. This solution provides better stability during the testing phase. An additional characteristic is the delay that results from updating weight parameters during backpropagation of our neural networks, for target actor, critic targets, and critic models, respectively.

Simulation results

HalfCheetah - Average Reward over the Evaluation Step: 2862.100705

Hooper - Average Reward over the Evaluation Step: 1755.878847

Walker - Average Reward over the Evaluation Step: 1276.085720

Bonus:

Unsuccessful simulation related to too short Neural Network learning

About

The aim of the project was to propose a robot control task based on reinforcement learning algorithm. The architecture is called Twin-Delayed DDPG (TD3). For animation I used OpenAI Gym.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published