Skip to content
No description, website, or topics provided.
Python Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


This repository provides the Python code for the softmax DQN algorithms in the following paper.

Zhao Song, Ronald E. Parr, and Lawrence Carin, "Revisiting the Softmax Bellman Operator: New Benefits and New Perspective", Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Long Beach, CA, June, 2019


This implementation is built on the the publicly available code from, by Nathan Sprague and other contributors. Please read LICENSE.txt (where the copyright notice is located) from Nathan Sprague under the ./softmax_dqn folder, and accept the corresponding terms, before you start.

Distribution and use of this code is subject to the following agreement:

This Program is provided by the authors as a service to the research community. It is provided without cost or restrictions, except for the User's acknowledgement that the Program is provided on an "As Is" basis and User understands that the authors make no express or implied warranty of any kind. The authors specifically disclaim any implied warranty or merchantability or fitness for a particular purpose, and make no representations or warranties that the Program will not infringe the intellectual property rights of others. The User agrees to indemnify and hold harmless the authors from and against any and all liability arising out of User's use of the Program.

Quick Start

To start, install all required libraries in ./softmax_dqn/, and then run the following bash script.



If you find this code useful, please cite the work with the following bibtex entry

    title={Revisiting the Softmax Bellman Operator: New Benefits and New Perspective},
    author={Song, Zhao and Parr, Ronald E. and Carin, Lawrence},
    booktitle={Proceedings of the 36th International Conference on Machine Learning},
You can’t perform that action at this time.