PLUNDER: Probabilistic Program Synthesis for Learning from Unlabeled and Noisy Demonstrations

Our goal is to synthesize a programmatic state machine policy from time-series data while simultaneously inferring a set of high-level action labels.

Related Resources

PLUNDER (codebase): https://github.com/ut-amrl/plunder

AMRL Google Drive (presentations and videos): https://drive.google.com/drive/folders/1QaKtIvmKhZjxIwY9ANSPpjYl0teoNW5S?usp=share_link

Publication: https://arxiv.org/abs/2303.01440

Project Website: https://amrl.cs.utexas.edu/plunder

Description

Our system is a discrete-time Markov process defined by:

an action space $A$ = a set of discrete action labels $a \in A$
- Ex: $a \in$ {ACC, DEC, CON}
a low-level observation space $Z$ = a continuous domain of low-level observations $z \in Z$: controlled joystick directives, motor inputs, etc.
- Ex: $z = acc \in \mathbb{R}$, where $acc$ is the acceleration
a state space $S$ = a continuous domain of constants or variables $c, y \in S$.
- Ex: $c = accMax \in \mathbb{R}, y = pos \in \mathbb{R}$
an action-selection policy (ASP) $\pi: A \times S \rightarrow A$ that maps the current action label and the current state to the next action label
an observation model $O: A \rightarrow distr(Z)$ that maps discrete action labels to a distribution over low-level observations via discrete motor controllers

Overall problem formulation:

Inputs

We know the problem domain $A, Z, S$, as well as the observation model $O$. We are given a set of demonstrations, which are defined simply as trajectories with the action labels missing, i.e. $s_{1:t}$ and $z_{1:t}$.

Outputs

We would like to:

Infer the values of the action labels in the demonstrations ($a_{1:t}$)
Synthesize an ASP that is maximally consistent with the demonstrations ($\pi^*$)

Dependencies & Setup

See pips/. In addition, this project requires Scipy: https://scipy.org/install/. If you wish to run the highway environment yourself, you'll need highway-env and its dependencies: https://highway-env.readthedocs.io/en/latest/installation.html If you wish to run the robotic arm environment yourself, you'll also need panda-gym and its dependencies: https://panda-gym.readthedocs.io/en/latest/index.html

How to run examples

We have provided five example tasks: 1D-target, 2D-highway-env, 2D-merge, panda-pick-place, and panda-stack.

To run these tasks:

Go into the Makefile and set the variable target_dir to the desired folder (default set to 1D-target).
Run make to build the project.
Run make em (for 1D-target) or make emng (for the other tasks) to run PLUNDER.

Please see each of these folders for an extended usage guide.

Further configuration

To setup a custom environment, you will need to do the following:

Create a new folder to house your problem domain.
In that directory, create the files domain.h, robot.h, settings.h, and emdips_operations.json.
In domain.h, define your action space, observation space, and state space.
In robot.h, define your observation model.
In settings.h, tune the desired parameters and I/O paths.
In emdips_operations, define your desired operations (plus, minus, times, etc). See pips/ for general tips and guidelines for defining operations and a list of existing operations.

If you need to simulate your own demonstrations, you can also use our interface to:

Define the ground-truth ASP and the physical simulation model in robot.h.
Set the desired demonstration initial states in another file robotSets.h.

An example setup is defined in 1D-target; it may be easier to copy paste that folder and work from there.

Then, you can use make commands to run the project:

make to build the project. (Go into the Makefile and set the variable target_dir to the desired folder, then run make.)
make em to run the full EM Synthesis algorithm, including simulating demonstrations
make emng to run the EM Synthesis algorithm, without simulating demonstrations
make plt to plot the algorithm outputs and store them in png format
make clean, make clear_data, make purge to delete all build files, to clear all data/plots/trajectories, or both
make snapshot to archive current settings and output files to a given folder

Other make commands which are not commonly used alone:

make gen to run only the simulation
make pf to run only the particle filter (E-step)
make settings to compile settings

Project Organization

This project is roughly split into the following components:

simulation/ - for simulating demonstrations given a ground-truth ASP
particleFilter/ (expectation step) - runs a particle filter to get a set of most likely action labels
pips/ (maximization step) - runs a program synthesizer to generate the program that is maximally consistent with the given action labels
synthesis/ - runs the EM-loop, alternating between expectation and maximization steps
system.h - fully defines the discrete-time Markov process given domain.h and robot.h
utils.h - useful functions for general use
includes.h - all include statements for tidiness
translateSettings.cpp - converts settings.h into a text file (settings.txt) for easy Python interpretation

Name		Name	Last commit message	Last commit date
Latest commit History 616 Commits
1D-target		1D-target
2D-highway-env		2D-highway-env
2D-highway-manual		2D-highway-manual
2D-merge-manual		2D-merge-manual
2D-merge		2D-merge
baselines		baselines
docs		docs
panda-pick-place		panda-pick-place
panda-stack		panda-stack
particleFilter		particleFilter
pips @ dfd6c3a		pips @ dfd6c3a
simulation		simulation
synthesis		synthesis
.gitignore		.gitignore
.gitmodules		.gitmodules
.vsls.json		.vsls.json
CMakeLists.txt		CMakeLists.txt
Makefile		Makefile
README.md		README.md
includes.h		includes.h
system.h		system.h
translateSettings.cpp		translateSettings.cpp
utils.h		utils.h
utils.py		utils.py

ut-amrl/plunder

Folders and files

Latest commit

History

Repository files navigation

PLUNDER: Probabilistic Program Synthesis for Learning from Unlabeled and Noisy Demonstrations

Related Resources

Description

Overall problem formulation:

Inputs

Outputs

Dependencies & Setup

How to run examples

Further configuration

Project Organization

About

Resources

Stars

Watchers

Forks

Languages