DRL_udacity

Welcome to Deep Reinforment Learning world!
This is an explaintable and modified version of udacity DRL homework~

DQN: modified from Udacity repo, tested on Breakout-v0 env.
PPO: wrote by myself, tested on Pendulum-v0 and BipedalWalker-v2 envs.
policy gradient: REINFORCE with baseline and entropy loss, tested on CartPole-v0
monte-carlo: modified version, tested on BlackJack env.
Temporal Difference: modified version, tested on CliffWalking-v0

Provide feedback