This is the repo the my bachelor's thesis, where I established a differentible model for benchmark Pendulum
it is evaluated with REINFORCE and PPO
Following algorithms are implemented:
- Original REINFORCE with original OpenAI Gym Pendulum
- REINFORCE with whitening technique with original OpenAI Gym Pendulum
- REINFORCE with state-value function as baseline technique with original OpenAI Gym Pendulum
- Original PPO with original OpenAI Gym Pendulum
- differentiable model Pendulm with PyTorch
- REINFORCE with whitening technique with differentiable model Pendulum
- PPO with whitening technique with differentiable model Pendulum
- Result Postprocessing with Seaborn
- Generalization of trained models with modification of model parameters
Result:
- differentiable model provide intuitively additional information in gradient
- differentiable model can accelerate and stablize the training process of REINFORCE
- application of differentiable model with PPO is limited because of the inplace operation of backward propagation in PyTorch
- training could be run with GPU with acceleration (it was not written in my Thesis, it is not tested with experiments yet)
Further ideas:
- evaluate in aspect of optimization
- optimize generalization