Implementation of a deep reinforcement learning model trained to solve pathfinder puzzles.
See results commit on 4-22-17. Observation: convergence and convergence rate are relatively unphased by maximum number of actions except when the number is insufficient.
5/28 todo
- implement scheduler:
- learning rate scheduler (1.5. MNA scheduler)
- state presentation scheduler
- implement deuling architectures (easy modification)