GitHub - yyanhan/bachelor_thesis_differenable_rl: differenable pendulum with REINFORCE and PPO

This is the repo the my bachelor's thesis, where I established a differentible model for benchmark Pendulum

it is evaluated with REINFORCE and PPO

Following algorithms are implemented:

Original REINFORCE with original OpenAI Gym Pendulum
REINFORCE with whitening technique with original OpenAI Gym Pendulum
REINFORCE with state-value function as baseline technique with original OpenAI Gym Pendulum
Original PPO with original OpenAI Gym Pendulum
differentiable model Pendulm with PyTorch
REINFORCE with whitening technique with differentiable model Pendulum
PPO with whitening technique with differentiable model Pendulum
Result Postprocessing with Seaborn
Generalization of trained models with modification of model parameters

Result:

differentiable model provide intuitively additional information in gradient
differentiable model can accelerate and stablize the training process of REINFORCE
application of differentiable model with PPO is limited because of the inplace operation of backward propagation in PyTorch
training could be run with GPU with acceleration (it was not written in my Thesis, it is not tested with experiments yet)

Further ideas:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
result		result
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback