Skip to content

DreamSmooth: Improving Model-Based RL with Reward Smoothing (ICLR 2024)

Notifications You must be signed in to change notification settings

vint-1/dreamsmooth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing

Project Website

Paper

dreamsmooth-small.mp4

Overview

Reward Prediction is Important in MBRL

Reward models, which predict the rewards that an agent would have obtained for some imagined trajectory, play a vital role in state-of-the-art MBRL algorithms like DreamerV3 and TD-MPC because the policy learns from predicted rewards.

Reward Prediction is Challenging

Reward prediction in sparse-reward environments, especially those with partial observability or stochastic rewards, is surprisingly challenging.

The following plots show predicted and ground truth rewards over a single episode, in several environments (including Robodesk, ShadowHand, and Crafter), with mispredicted sparse rewards highlighted in yellow.

Reward prediction is challenging

Our Solution: Temporally Smoothed Rewards

We propose DreamSmooth, which performs temporal smoothing of the rewards obtained in each rollout before adding them to the replay buffer. Our method makes learning a reward model easier, especially when rewards are ambiguous or sparse.

With our method, the reward models no longer omit sparse rewards from its output, predicting them accurately.

Dreamsmooth improves reward prediction

Moreover, the improved reward predictions of DreamSmooth translates to better performance. We studied several different smoothing techniques (Gaussian, uniform, exponential moving average) on many sparse-reward environments, and find that our method outperforms the base DreamerV3 model.

Dreamsmooth improves performance

Quickstart

This code is built on top of the official DreamerV3 implementation.

Prerequisites

  • Ubuntu 22.04
  • Python 3.9+

Installation

Environments

Important directories and files

Run experiments

Replace [EXP_NAME] with name of the experiment, [GPU] with the GPU number you wish to use, and [WANDB_ENTITY] and [WANDB_PROJECT] with the W&B entity/project you want to log to. [SMOOTHING_METHOD] should be gaussian, uniform, exp, or no (for no smoothing).

  • Running experiments on Robodesk

    source scripts/d3_robodesk_train.sh [EXP_NAME] [GPU] [SEED] [SMOOTHING_METHOD] [SMOOTHING_PARAMETER] [WANDB_ENTITY] [WANDB_PROJECT]
    
  • Running experiments on Hand

    source scripts/d3_hand_train.sh [EXP_NAME] [GPU] [SEED] [SMOOTHING_METHOD] [SMOOTHING_PARAMETER] [WANDB_ENTITY] [WANDB_PROJECT]
    
  • Running experiments on Crafter

    source scripts/d3_crafter_train.sh [EXP_NAME] [GPU] [SEED] [SMOOTHING_METHOD] [SMOOTHING_PARAMETER] [WANDB_ENTITY] [WANDB_PROJECT]
    
  • Running experiments on Atari

    source scripts/d3_atari_train.sh [EXP_NAME] [TASK] [GPU] [SEED] [SMOOTHING_METHOD] [SMOOTHING_PARAMETER] [WANDB_ENTITY] [WANDB_PROJECT]
    
  • Running experiments on Deepmind Control

    source scripts/d3_dmc_train.sh [EXP_NAME] [TASK] [GPU] [SEED] [SMOOTHING_METHOD] [SMOOTHING_PARAMETER] [WANDB_ENTITY] [WANDB_PROJECT]
    

Examples

  • Gaussian Smoothing with sigma = 3 on Robodesk

    source scripts/d3_robodesk_train.sh example_01 [GPU] 1 gaussian 3 [WANDB_ENTITY] [WANDB_PROJECT]
    
  • Uniform Smoothing with delta = 5 on Hand

    source scripts/d3_hand_train.sh example_03 [GPU] 1 uniform 5 [WANDB_ENTITY] [WANDB_PROJECT]
    

Citation

@inproceedings{lee2024dreamsmooth,
  author    = {Vint Lee and Pieter Abbeel and Youngwoon Lee},
  title     = {DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing},
  booktitle = {The Twelfth International Conference on Learning Representations},
  year      = {2024},
  url       = {https://openreview.net/forum?id=GruDNzQ4ux}
}

About

DreamSmooth: Improving Model-Based RL with Reward Smoothing (ICLR 2024)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages