By Zohar Rimon, Aviv Tamar and Gilad Adler
Report Bug
Official implementation of the paper Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach.
@inproceedings{rimon2022mbrl2,
title={Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach},
author={Rimon, Zohar and Tamar, Aviv and Adler, Gilad},
booktitle={Neural Information Processing Systems (NeurIPS)},
year={2022}}
The requirements can be found in requirements.txt. One can create a sufficient conda environment with:
conda create -n mbrl2 python=3.7
pip install -r requirements.txt
Besides the config options introduces in the VariBad repo,
- env_num_train_goals, env_num_eval_goals - number of training and evaluation environmnets
- num_dream_envs - number of dream environments processes
- use_kde, use_mixup - use KDE to sample new latents, if false we use the learned Prior
- use_mixup - use the mixup technique to sample new latents instead of regular KDE
- delay_dream - number of iterations to delay the initialization of the dream environments by
- update_kde_interval - iterations interval for the KDE updates
- kde_from_train - create KDE using an oracle policy
- kde_from_running_latents - use a latent pool, gathered along the training for the dream environments estimation
- freeze_vae - don't train the VAE (only the policy)
- delayed_freeze - stop the VAE training after given number of iterations
- train_vae_on_dream - train the VAE to reconstruct reward over the dream environments
- clone_dream_vae - use a different vae for the dream environments
In order to reproduce the experiments shown in the paper:
-
For the 20 real training environments and 4 dream environments experiment:
python main.py --exp_name 20_train_4_kde_dream --env_type pointrobot_varibad\ --env_num_train_goals 20 --num_dream_envs 4
-
For the 30 real training environments and 6 dream environments experiment:
python main.py --exp_name 30_train_6_kde_dream --env_type pointrobot_varibad \ --env_num_train_goals 30 --num_dream_envs 6
In order to use Mixup dream environments instead of the KDE, add the --use_mixup flag.
To reproduce the exact figures from the paper one need to run all the seeds specified in utils/plot_helpers.py (for a specific experiment) and run utils/plot_helpers.py.
For example, to reproduce the 30 real training environments experiment (VariBad vs VariBad dream) run seeds:
seeds = [3, 13, 23, 33, 43, 53, 63, 73, 83, 93, 103, 200, 201, 202, 203]
Zohar Rimon - zohar.rimon@gmail.com