The Security Policy Evaluation Framework is a testing and benchmarking system designed to evaluate the robustness and correctness of various authorization policy engines. It provides a consistent, automated environment for executing policy test cases across multiple languages. This framework is primarily intended for researchers, security engineers, and policy developers who want to benchmark how different policy engines behave under predefined test conditions.
Currently, the framework supports the following policy engines:
The goal is to provide a common interface to evaluate each language's response to a series of security-related scenarios.
To set up the framework:
git clone https://github.com/doyensec/policy-languages-framework.git
cd policy-languages-framework
pip install -r requirements.txt
Alternative, you might want to use a Python virtual environment:
- Create a virtual environment using
python3 -m venv path/to/venv
. Make sure you havepython3-full
installed. - Then, use
path/to/venv/bin/pip
to install all dependencies - Finally, run the software using
path/to/venv/bin/python
Please note that this tool has been tested on macOS 15.4.1 (arm64).
Note: Docker must be running on your system. Docker is required to execute policy evaluations within isolated containerized environments.
To start running the framework:
python main.py
Note: This assumes that Docker has been installed with the post-installation steps to allow non-privileged users to run Docker commands.
--start
: (Optional) Integer ID of the first test case to execute. Defaults to the first available.--max
: (Optional) Integer ID of the last test case to execute.--only
: (Optional) Comma-separated list of specific test case IDs to run (e.g.,--specific 01,03,07
).
Note: This framework spawns containers to evaluate test cases in isolated environments. For proper execution, the following ports must be free and available on the host system, as they are assigned to the respective policy engine containers:
8911
→ Cedar8910
,8912
→ OpenFGA8913
→ Rego8914
→ Teleport ACD
If needed, these ports can be customized in main.py
.
The framework produces a final HTML report summarizing test case results. Each test case is evaluated independently per policy engine, and results are recorded in a matrix table. Results are saved within the policy-languages-framework/results
folder and can be easily displayed with any browser.
Possible test outcomes include:
- PASS: The policy engine produced the expected output under correct conditions.
- FAIL: The policy engine produced an output that contradicts the expected result.
- NOT APPLICABLE: The test case is not relevant or cannot be executed for the given engine.
- ERROR: An internal error occurred during test execution, such as malformed input or unsupported constructs.
Each result is presented in a tabular HTML file automatically generated at the end of execution.
The following table presents the current results of all implemented test cases evaluated across the supported policy engines. Each row represents a specific test case, while each column corresponds to a policy engine. The results indicate whether the engine passed, failed, timed out, or did not apply for the given scenario.
- NOT APPLICABLE: The test case is not relevant or implementable for this specific policy engine, either due to architectural limitations or incompatibility with the engine's capabilities. No logical test case could be meaningfully defined.
- PASS (Predefined Result) / FAIL (Predefined Result): The policy engine was not capable of executing a meaningful logic-based test for this case. A static result was assigned based on known behavior, documentation, or limitations, instead of a runtime-evaluated policy scenario.
To add a new test case:
- Create a new folder under
testcases/
named using the patterntestcase-XX
, whereXX
is the next available numerical index. - Inside this folder, create a
manifest.yaml
file with the following structure:
id: testcase-XX
scenario: <short scenario name>
description: <detailed description>
rego:
- query: <query_file>
policy: <policy_file>
expected_result:
- status: success|error
condition: <condition to check>
cedar:
- entities: <entities_file>
query: <query_file>
expected_result:
- status: success|error
condition: <optional condition>
openfga:
- authorization_model: <model_file>
tuples: <tuples_file>
query: <query_file>
expected_result:
- status: success|error
condition: <optional condition>
teleportacd:
- type: <evaluation_type>
config: <config_file>
expected_result:
- status: success|error
Each field (e.g., query
, policy
, entities
) should reference a file located within a subdirectory named after the policy engine (e.g., rego/
, cedar/
, etc.).
Each engine entry must define an expected_result
with:
status
: Indicates if a success (evaluation completed and produced output) or error (evaluation failed).condition
(optional): A logical condition to assert on the evaluation result JSON.
Example condition:
condition: decision["result"] == "allow"
Supports functions like contains()
, isdigit()
, endswith()
, doesnotcontain()
, etc.
If the test case is not applicable to an engine, omit the engine entry from the manifest.
Test Case ID | Description |
---|---|
testcase-01 | Policy Engine Must Enforce Deny Rules Even When Runtime Errors Occur |
testcase-02 | Arithmetic Overflow/Underflow and Missing Entities Cause Validation Errors |
testcase-03 | Handling Undefined Values in Deny/Allow Rules Without Impacting Policy Decisions |
testcase-04 | Negations on Undefined Values Does Not Cause Expected Denials |
testcase-05 | Policy Must Produce Explicit Forbid/Allow |
testcase-06 | Built-in Functions Do Not Introduce Side-Effects or Non-Deterministic Behavior |
This is only a preview. For a full list, see testcases.md
See LICENSE
file for details.
This framework builds upon concepts and threat modeling research by Trail of Bits.
This project was a collaboration between Teleport and Doyensec. The framework was created by the Doyensec team with inspiration and funding from Teleport.