FiPOMDP is a tool for solving Consumption POMDPs (CoPOMDPs) with a combination of qualitative and quantitative objectives. The key functionality provided:
- Synthesizing a shield (device disabling unsafe actions) against resource-safety and almost-sure reachability specifications.
- Performing shielded planning w.r.t. minimum-cost reachability performance criterion. That is, the planning algorithm minimizes the expected cost of reaching a goal while formally ensuring that the goal is eventually reached with probability one and that the resource is never exhausted. This is done by applying the shield to the POMCP planning algorithm.
This project extends the FiMDP tool for qualitative synthesis in Consumption MDPs. Necessary libraries of the tool are included in this repository.
The shielding itself is done by reducing the qualitative analysis of a Consumption POMDPs to a problem of solving Token Consumption MDPs with qualitative objectives (more details in the submission for AAAI 2023). In particular, we use the FiMDP libraries for computing the threshold resource levels. FiPOMDP itself implements the Token CoMDP construction, extraction of a shield from the threshold levels, and shielded POMCP planning (with a custom POMCP implementation).
See the Experiment reproducibility guide.
The directory fipomdp/demo includes a demo notebook showing how to create a simple CoPOMDP and create a shield via qualitative analysis. For more information about FiMDP, creating perfectly observable Consumption MDPs, look at FiMDP README.
The experiments we performed use environments from the FiMDPEnv library modified to include an aspect of partial observability. A small tutorial how to launch them and collect results from them is included. The experiments from the AAAI'23 submission can be reproduced with the help of the Experiment reproducibility guide.