Official Implementation of our paper Safe Resetless Reinforcement Learning: Enhancing Training Autonomy with Risk-Averse Agents.
Training Reinforcement Learning agents directly in any real-world environment remains difficult, as such scenarios entail the risk of damaging the training setup or violating other safety constraints. The training process itself further requires extensive human supervision and intervention to reset the environment after each episode. Thus, we propose an innovative Safe Reinforcement Learning framework that combines Safe and Resetless RL to autonomously reset environments, while also reducing the number of safety constraint violations. In this context, we develop a novel risk-averse RL agent suitable for stringent safety constraints by combining Safe RL, Distributional RL, and Randomized Ensembled Double Q-Learning. Experiments conducted in a novel mobile robotics scenario indicate that our Safe Resetless RL framework reduces the number of human interactions required during training compared to state-of-the-art methods, filling a gap in current problem formulations and enhancing the autonomy of RL training processes in real-world settings.
We evaluated our approach on a modified version of CarEnv which we adapted for the mobile robotics use case by replaycing the car with a mobile robotics platform and adding a lidar as the primary sensor.
cd <Path>/SRRL
sudo apt install libcairo2-dev pkg-config python3-dev
conda env create -f environment.yaml
conda activate srrl
The parameters of the simulated lidar can be configured by changing the values for ´sensors/lidar_points´ in ´envs/CarEnv/CarEnv/Configs.py´.
- DREDQ:
Select the default values for all algorithm parametersin
cfg/dredq.yamland executepython3 train_jax_dredq.py - DSAC:
Set
update_to_data_ratio=1andensemble_size=2incfg/dredq.yamland executepython3 train_jax_dredq.py - SAC:
Select the default values for all algorithm parameters in
cfg/sac.yamland executepython3 train_jax_sac.py - REDQ:
Set
update_to_data_ratio=10andensemble_size=10incfg/sac.yamland executepython3 train_jax_sac.py - WCSAC:
Select the default values for all algorithm parameters in
cfg/wcsac.yamland executepython3 train_jax_wcsac.py - Example-based SAC:
Select the default values for all algorithm parameters in
cfg/example_based_sac.yamland executepython3 train_jax_example_based_sac.py
Select a forward and a reset agent and their respective parameters in cfg/srrl.yaml and execute
python3 train_jax_srrl.py
Please note that we have used W&B for logging, so you may require an account to log output without modifications to the code.
Please cite this work as:
@inproceedings { GotSch2024,
author = {Tristan Gottwald and Maximilian Schier and Bodo Rosenhahn},
title = {Safe Resetless Reinforcement Learning: Enhancing Training Autonomy with Risk-Averse Agents},
booktitle = {European Conference on Computer Vision Workshops (ECCVW)},
year = {2024},
month = oct
}