costaware

A repository for performing optimization and reinforcement learning for Cost-Aware Markov Decision Processes (CAMDPs). This is the supplementary repo for the article, "Reinforcement Learning for Cost-Aware Markov Decision Processes", to be published in the Proceedings of the 38th International Conference on Machine Learning in 2021.

Details

Ratio maximization has applications in areas as diverse as finance, reward shaping for reinforcement learning (RL), and the development of safe artificial intelligence.

This repository implements new ratio-maximization RL algorithms as described in the paper. It also implements an easy-to-use suite for setting up experiments on CAMDPs.

Contributions

Cost-aware relative value iteration Q-learning (CARVI Q-learning), an adaptation of traditional RVI Q-learning for CAMDPs. This repository implements CARVI Q-learning for an agent with tabular architecture and for agents with neural-network architectures.
Cost-aware actor-critic (CAAC), an adaptation of the traditional actor-critic learning for CAMDPs. This repository also implements CAAC for agents with linear approximators under a variety of feature architectures.

Features

"Batteries included" agents fully-implemented for CARVI Q-learning and CAAC, as well as a suite of pre-defined policy, value, and Q-function architectures.
Built-in CAMDP environments that adhere to the gym.Env API.
Cost-aware variations of classic control problems, such as MountainCar, Acrobat, Pendulum, and CartPole.
Configurable suite for performing cost-aware RL experiments, with customizable configuration files and simple plotting.

Requirements

This repository uses Python 3. Package details can be found in the requirements.txt file. Major third-party libraries used in this repository include

PyTorch
Ray
PyData suite: NumPy, SciPy, Pandas, Matplotlib, Seaborn

Installation

Virtual Environment Setup

Set up your preferred environment and activate it, e.g. run

$ python -m venv .venv
$ source .venv/bin/activate

After cloning, simply cd into the repo and do pip install -e ..

Container Setup

Using docker or podman, first build the image with

$ docker build -t=costaware -f Dockerfile

then run it with your local costaware repo mounted in the container using

$ docker run -it --rm --privileged -v COSTAWARE_PATH:/home/costaware costaware

Any changes made to /home/costaware in the container will be reflected on the host machine in COSTAWARE_PATH and vice versa.

Getting Started

The main directory contains a script called demo.py that can be used to run the same experiments described in the experimental portion of the paper. Based on the argument passed to it, demo.py uses the scripts and configuration files in the examples directory to launch an experiment. For example, if

$ python demo.py synthetic

is called, the experiment script synthetic_runner.py --- which trains tabular Cost-Aware RVI Q-learning (CARVI) Q-learning and Cost-Aware Actor-Critic (CAAC) with linear function approximation on a synthetic CAMDP environment --- will be run using the experiment specified by the file synthetic_config.yml. Similarly, if

$ python demo.py deep_Q

is called, deep_Q_runner.py --- which trains a neural network version of CARVI Q-learning on a cost-aware version of a classic Gym environment --- will be run using the experiment specified in deep_Q_config.yml. When the experiment is finished running, demo.py will also generate a plot of the experiment performance.

Configuration Files

The configuration files synthetic_config.yml and deep_Q_config.yml contain all the experiment parameters and hyperparameters that one needs to set up and run various types of experiments. In a nutshell, a configuration file specifies an experiment, which is composed of one or more trials.

A trial consists of

an environment to be trained on,
the type of agent to be trained,
algorithm hyperparameters,
the directory to save output to.

An experiment consists of

one or more trial to be run,
the number of replications of each trial to be run,
a list of the available resources (CPUs and GPUs),
the resources to be allocated to each trial replication.

A typical experiment will consist of several replications of the same trial. The output of these replications can then be used to generate nice plots showing average performance across all trials.

Running Experiments

A typical experiment launched by demo.py will consist of several replications of the same trial. The output of these replications can then be used to generate nice plots showing average performance across all trials. After verifying that your machine has sufficient computational resources to match what the configuration file asks for, the trials that make up the experiment are created and run in parallel using Ray. This makes it easy run large numbers of trials at once if you have sufficient resources.

After you've run demo.py a few times and see how things work, feel free to start playing with the configuration files to make your own experiments or recreate the experiments from the paper.

Plotting and Summary

At the end of the training, demo.py will produce a summary plot of the training results. This consists of two time series corresponding to the learned optimal ratio for CARVI Q-learning and CAAC. These series are plotted together with confidence intervals. The data used to produce the plots are also written to a summary CSV file.

The script demo.py includes a number of ways to optionally customize the plotted output. This includes

--confidence flag to specify the confidence level of the corresponding intervals. Note: This value is represented as an integer; e.g., 95 yields the 95% confidence interval.
--title, --xlabel, and --ylabel flags that specify the labels on the plotted output.
--filename and --ext, which specify the name of the plot file. Essentially, the resulting data directory will contain files of the form filename.csv and filename.ext, corresponding to the summary spreadsheet and the plot file, respectively. Note: Extensions are passed to matplotlib.pyplot.savefig and are limited by the available file formats in the library. See the savefig documentation here.

A complete description of the available options for the script demo.py can be obtained by running

$ python demo.py --help

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
costaware		costaware
data		data
examples		examples
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
demo.py		demo.py
license.md		license.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

costaware

Details

Contributions

Features

Requirements

Installation

Virtual Environment Setup

Container Setup

Getting Started

Configuration Files

Running Experiments

Plotting and Summary

About

Releases

Packages

Contributors 3

Languages

License

wessle/costaware

Folders and files

Latest commit

History

Repository files navigation

costaware

Details

Contributions

Features

Requirements

Installation

Virtual Environment Setup

Container Setup

Getting Started

Configuration Files

Running Experiments

Plotting and Summary

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages