A secure, isolated Python REPL (Read-Eval-Print Loop) environment for executing untrusted code in LLM-based workflows.
- Docker
- gVisor (runsc)
-
Install gVisor:
sudo apt-get update && \ sudo apt-get install -y \ apt-transport-https \ ca-certificates \ curl \ gnupg # Install runsc curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpg echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" | sudo tee /etc/apt/sources.list.d/gvisor.list > /dev/null sudo apt-get update && sudo apt-get install -y runsc
-
Configure Docker to use gVisor:
sudo runsc install sudo systemctl restart docker
-
Clone the repository:
git clone https://github.com/username/gvisor-based-python-repl.git cd gvisor-based-python-repl
Execute the run.sh script to start the server:
./run.sh
This will start a Docker container with the Python server running on port 8000.
You can test the server using the provided test.sh script:
python test_concurrent_clients.py
This will run the test_concurrent_clients.py
script, which connects to the server, sends Python code to execute, and demonstrates session persistence. See the output in TEST_OUTPUT.md for details.
The server uses a simple protocol for communication:
-
Message Format: Each message (request or response) is prefixed with a 4-byte length field (big-endian), followed by the actual message content encoded as UTF-8 JSON.
-
Request Format:
{ "code": "Python code to execute", "session_id": "optional-session-id" }
-
Response Format:
{ "status": "ok|error", "output": "execution output (if status is ok)", "error": "error message (if status is error)", "session_id": "session-id" }
-
Session Management:
- If no
session_id
is provided in the request, a new session is created with a unique ID. - If a
session_id
is provided, the code is executed in the context of that session. - If the provided
session_id
doesn't exist, an error is returned.
- If no
This project provides a secure execution environment for running Python code in the context of Large Language Model (LLM) applications. It leverages gVisor, a container sandbox technology, to create an isolated execution environment that protects the host system from potentially malicious or unintended code execution.
The primary goal is to enable safe execution of user-provided or LLM-generated code while maintaining strong security boundaries. This is particularly important in AI applications where models might generate or execute code that could potentially harm the underlying system.
The system consists of several key components:
- TCP Server: A Python TCP server that accepts code execution requests and maintains stateful sessions.
- Docker Container: Provides containerization for the Python environment.
- gVisor Runtime: Adds an additional layer of isolation by intercepting and filtering system calls.
The architecture follows a defense-in-depth approach, with multiple layers of isolation to prevent security breaches.
server.py
: The main Python file that implements a TCP server which executes Python code sent via TCP connections. It maintains stateful sessions with unique IDs, allowing variables and functions defined in one execution to be available in subsequent executions within the same session.run.sh
: A shell script that runs the Python server inside a Docker container using gVisor's runsc runtime for isolation. It mounts the server.py file into the container and exposes port 8000.test.sh
: A shell script that runs the test_tcp.py script to test the server.test_tcp.py
: A Python script that tests the TCP server by connecting to it, sending Python code to execute, and demonstrating session persistence..gitignore
: A configuration file that specifies files to be ignored by version control.
This project addresses several key challenges in LLM-based workflows:
-
Code Execution Safety: Provides a secure environment for executing potentially untrusted code generated by LLMs.
-
Persistent State: Maintains state between executions through session management, allowing for multi-step code generation and execution workflows.
-
Isolation: Ensures that code execution cannot affect the host system, even if the code is malicious or contains vulnerabilities.
-
Agentic Workflows: Enables longer-running agentic workflows where LLMs can generate, execute, and iterate on code based on results.
-
Reduced Context Window Usage: By maintaining state between executions, there's no need to include the entire execution history in the LLM's context window.
This project implements several layers of security:
- Container Isolation: Docker provides basic isolation from the host system.
- gVisor Sandbox: Adds an additional layer of security by intercepting and filtering system calls.
- TCP Interface: Limits interaction to a simple TCP API, reducing attack surface.
While this system provides strong isolation, it is not perfect:
- Side-channel attacks might still be possible
- Resource exhaustion could affect container performance
- New vulnerabilities in gVisor or Docker could compromise security
Regular updates and security audits are recommended.
This project is licensed under the MIT License - see the LICENSE file for details.