Implementation of Scheduled Policy Optimization for task-oriented language grouding
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
BlockWorldRoboticAgent Delete Jun 11, 2018
LICENSE Create LICENSE Sep 27, 2017

Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents

Models and Algorithms

See files under walk_the_blocks/BlockWorldRoboticAgent/srcs/

  • run this file for training, you can change the schedule mechanism in the function ppo_update(), these are the options:

    • do imitation every 50
    • do imitation based on rules
    • imitation 1 epoch and then RL 1 epoch

    example: python -lr 0.0001 -max_epochs 2 -entropy_coef 0.05

  • the network achitecture and loss functions:

    • PPO Loss
    • Supervised Loss
    • Advantage Actor-Critic Loss


For the usage of the Block-world environment, please refer to

Train the RL agents

  • S-REIN *

If you use our code in your own research, please cite the following paper

  title={Scheduled Policy Optimization for Natural Language Communication with Intelligent Agents},
  author={Xiong, Wenhan and Guo, Xiaoxiao and Yu, Mo and Chang, Shiyu and Zhou, Bowen and Wang, William Yang},
  journal={arXiv preprint arXiv:1806.06187},