DRL_udacity

Welcome to Deep Reinforment Learning world!
This is an explaintable and modified version of udacity DRL homework~

DQN: modified from Udacity repo, tested on Breakout-v0 env.
PPO: wrote by myself, tested on Pendulum-v0 and BipedalWalker-v2 envs.
policy gradient: REINFORCE with baseline and entropy loss, tested on CartPole-v0
monte-carlo: modified version, tested on BlackJack env.
Temporal Difference: modified version, tested on CliffWalking-v0

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.idea		.idea
DQN		DQN
PPO		PPO
Temporal_Difference		Temporal_Difference
actor_critic		actor_critic
monte-carlo		monte-carlo
policy_gradient		policy_gradient
README.md		README.md
env_test.py		env_test.py

Provide feedback