This repository provides a proof-of-concept (PoC) implementation of Secure Partitioned Decoding (SPD) and Prompt Obfuscation (PO) for confidential prompting.
The PoC demonstrates the fundamental mechanisms of confidential prompting using Llama models (Llama 2, 3, and 3.2). Although the scheme is designed for confidential virtual machines (CVMs) with GPUs, this implementation allows testing in a non-CVM environment by spawning prompt vaults as processes.
-
Python: Version 3.10 or later
-
Dependencies: Install the necessary packages:
pip install -r requirements.txt
Create a file (e.g., input.txt) with prompts, using the <confidential> tag to mark sections stored in CVMs and <redacted> to mark sensitive words to be obfuscated. Here’s an example using Llama 3 instruction model format:
<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
you are a helpful assistant
<|eot_id|>
<confidential>
<|start_header_id|>user<|end_header_id|>
Where is the capital city of <redacted>Uganda</redacted>?
Also, what is the population of <redacted>Korea</redacted>?
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
</confidential>
Run the obfuscation command with relevant parameters:
gamma: Number of virtual prompts to sampleepsilon: Privacy budgettemperature: Sampling temperatureprob_dist: Probability difference metric (absfor absolute,relfor relative)states_dir: Directory to save intermediate states
python po.py \
--prompt input.txt \
--gamma 5 \
--epsilon 0.1 \
--temperature 1.0 \
--prob_dist "abs" \
--model "meta-llama/Llama-3.2-3B-Instruct" \
--device "cuda:0" \
--states_dir "./states" \
--verbosePublic and private attention states will be saved in states_dir.
To run the LLM serving server, use the following command. The server loads the model and intermediate states from states_dir.
MASTER_ADDRandMASTER_PORT: Define the server's IP and portnum_users: Number of expected userstimeout_sec: Server response timeoutmax_num_tokens: Maximum tokens to generate per requestprint_idx: Specifies which prompt version to display; set to0for the authentic prompt or to a higher value to show an obfuscated version. (note: In a real implementation, the index of the authentic prompt should be randomized and hidden from the server, but we use 0 for demonstration purposes.)
MASTER_ADDR=localhost MASTER_PORT=29501 python smd.py \
--model "meta-llama/Llama-3.2-3B-Instruct" \
--device "cuda:0" \
--states_dir "./states" \
--num_users 1 \
--timeout_sec 15 \
--max_num_tokens 2048 \
--standalone_master \
--print_idx 0For each user (i.e. CVM), run the secure prompt vault to connect to the LLM serving server and decode tokens securely. The MASTER_ADDR and MASTER_PORT must match the server, and each user_id should be unique.
MASTER_ADDR=localhost MASTER_PORT=29501 python smd.py \
--model "meta-llama/Llama-3.2-3B-Instruct" \
--device "cuda:0" \
--states_dir "./states" \
--timeout_sec 15 \
--standalone_worker \
--user_id 0To disable memory efficient mode in CVM, set --disable_multiplexing.
To simulate multiple users in a non-CVM environment, you can remove the --standalone_master flag and run the following command. This will start the LLM server and spawn secure prompt vaults on the same host:
MASTER_ADDR=localhost MASTER_PORT=29501 python smd.py \
--model "meta-llama/Llama-3.2-3B-Instruct" \
--device "cuda:0" \
--states_dir "./states" \
--num_users 2 \
--timeout_sec 15 \
--max_num_tokens 2048 \
--print_idx 0MIT License