Skip to content

Latest commit

 

History

History
50 lines (38 loc) · 3.24 KB

README_tutorial.md

File metadata and controls

50 lines (38 loc) · 3.24 KB

Build Python27 Tensorflow16

README FILE
Author: Jianyuan (Jet) Yu
Affiliation: Wireless, ECE, Virginia Tech
Email : jianyuan@vt.edu
Date : April, 2018

Related Files:

  1. The Result Demo of DQN-DCA of Google Slides, and figure come from VT Google Folder qdnFig of .png .pdf figure autosave by python, notice we SEPERATE figure from github to avoid too frequent update git folder.
  2. The technic report and the latex folder.
  3. A backup of Chris(Dr. Headley) MDP codes of MDP solver.

Reference:

[1] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.
[2] Wang, Shangxing, et al. "Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks." IEEE Transactions on Cognitive Communications and Networking (2018)
[3] Yu, Yiding, Taotao Wang, and Soung Chang Liew. "Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks." arXiv preprint arXiv:1712.00162 (2017).
[4] https://github.com/sawcordwell/pymdptoolbox
[5] https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/tree/master/contents/5_Deep_Q_Network
[7] https://drive.google.com/open?id=1X-I2D4Dk_Z1IXAt19XUnlWHUvkxn42EB

Tutorial of Deep Reinforcement Learning

[1]. http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html by Dr. Silver, lecturers video are in open access in youtube as well.
[2]. https://icml.cc/2016/tutorials/deep_rl_tutorial.pdf a brief tutorial by Dr. Silver

Ongoing Work - POMDP

We are attempt to implement four method as solver

  • stack-DQN
  • Vaule Itervation
  • POMCP
  • Deep Recurrent Q network (DRQN)

[1] summary of current POMDP solver
[2] the vi+pomcp solver source code.
[3] POMCP - Silver, David, and Joel Veness. "Monte-Carlo planning in large POMDPs." Advances in neural information processing systems. 2010.
[4] UCT, kenerl of POMCP - L. Kocsis and C. Szepesvari. Bandit based Monte-Carlo planning. In 15th European Conference on Machine Learning, pages 282–293, 2006.
[5] slides of CMU
[6] slides of techfak
[7] pomdp alg website
[8] DRQN blog
[9] DRQN paper - Hausknecht, Matthew, and Peter Stone. "Deep recurrent q-learning for partially observable mdps." CoRR, abs/1507.06527 (2015).