SeCodePLT: A Code Evaluation Benchmark for Security and Capability

Installation

1. Clone the repository****

git clone https://github.com/ucsb-mlsec/SeCodePLT.git
cd SeCodePLT

2. Initialize Python environment

# use venv, conda, or other environment managers
# for example, with conda:
conda create -n secode python=3.11
conda activate secode
pip install -r requirements.txt
pip install -e .

3. Copy configuration template

cp -r virtue_code_eval/config_templates virtue_code_eval/config

4. Initialize environment variables

cp .env.example .env

Fill in the following environment variables:

OPENAI_API_KEY: for OpenAI API
SERVER_PORT: the port for the execution server, default is 8666

Optional:

TOGETHER_API_KEY: for Together.AI hosted models
VT_API_KEY: for VirusTotal API (used for virus_total metric)
SAFIM_EXECEVAL_PORT: the port of the ExecEval server for safim_unittest metric
DS1000_PYTHON_EXECUTABLE: the Python executable for the DS1000 executor at virtue_code_eval/data/capability/ds1000/executor.py

Usage

1. Start the execution server

# start the exec server, which is used for executing code
python -m executor_docker.server --port 8666

8666 is the default port, if you change it, please also change it in .env

SERVER_PORT=your_port

2. Run the evaluation script

# evaluate 20 samples in juliet_autocomplete task
python -m virtue_code_eval.evaluate out_dir=out/example --config-name evaluate_example

# testing, output to `out/test_python`, default `out_dir` is determined by current time
python -m virtue_code_eval.evaluate out_dir=out/test_python

# enable debug logging for selected modules
python -m virtue_code_eval.evaluate hydra.verbose='[__main__,virtue_code_eval]' out_dir=out/full_test_debug
# specifying argument in command line
python evaluate.py out_dir=./out/tmp tasks="[secodeplt_juliet_autocomplete]" models="[gpt4o]" batch_size=20 -cn evaluate_empty
# for generating tables
python -m virtue_code_eval.generate_table out_dir=out/full_test

Other Tasks Instructions

SAFIM_EXECEVAL Server

unittest/safim_unittest requires ntunlp/ExecEval to execute the code. Initialize the server with Docker and pass the port with SAFIM_EXECEVAL_PORT in .env:

git clone https://github.com/ntunlp/ExecEval
cd ExecEval

sed -i -e 's/jdk-21-oracle/jdk-21*-oracle/g' Dockerfile

docker build . -t exec-eval:1.0
docker run -d -p <YOUR_PORT>:5000 -e NUM_WORKERS=5 exec-eval:1.0

# check the logs
docker logs -f $container_id

Executor for DS1000

Create a separate environment for ds1000 with virtue_code_eval/data/capability/ds1000/environment.yaml, for example, with conda:

conda env create --file virtue_code_eval/data/capability/ds1000/environment.yaml

Then find the Python executable and set it as DS1000_PYTHON_EXECUTABLE in .env, typically at

echo $(conda env list | grep ds1000-3.10 | tr -s ' ' | cut -d ' ' -f 2)/bin/python

Generate Dataset for CyberSecEval

1. Canary Exploit

cd virtue_code_eval/data/safety/cyber_sec_eval/canary_exploit
python run.py

2. Autonomous Uplift

cd virtue_code_eval/data/safety/cyber_sec_eval/autonomous_uplift
ssh-keygen -t rsa -b 2048 -m PEM -f ./ssh_key.pem
python test_case_generator.py \
--ssh-key-file="./ssh_key.pem" \
--cyber-range-file="./in/cyber_range_pairs_sample.json" \
--out-file="./out/autonomous_prompts.json"

Dataset Information

The dataset is available on Hugging Face and DockerHub:

Citation

@article{nie2024secodeplt,
    title = {Secodeplt: A unified platform for evaluating the security of code genai},
    author = {Nie, Yuzhou and Wang, Zhun and Yang, Yu and Jiang, Ruizhe and Tang, Yuheng and Guo, Wenbo and Li, Bo and Song, Dawn},
    journal = {arXiv preprint arXiv:2410.11096},
    year = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
executor_docker		executor_docker
generate_dataset		generate_dataset
scripts		scripts
virtue_code_eval		virtue_code_eval
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
make_plots.ipynb		make_plots.ipynb
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SeCodePLT: A Code Evaluation Benchmark for Security and Capability

Installation

1. Clone the repository****

2. Initialize Python environment

3. Copy configuration template

4. Initialize environment variables

Usage

1. Start the execution server

2. Run the evaluation script

Other Tasks Instructions

SAFIM_EXECEVAL Server

Executor for DS1000

Generate Dataset for CyberSecEval

1. Canary Exploit

2. Autonomous Uplift

Dataset Information

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

ucsb-mlsec/SeCodePLT

Folders and files

Latest commit

History

Repository files navigation

SeCodePLT: A Code Evaluation Benchmark for Security and Capability

Installation

1. Clone the repository****

2. Initialize Python environment

3. Copy configuration template

4. Initialize environment variables

Usage

1. Start the execution server

2. Run the evaluation script

Other Tasks Instructions

SAFIM_EXECEVAL Server

Executor for DS1000

Generate Dataset for CyberSecEval

1. Canary Exploit

2. Autonomous Uplift

Dataset Information

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages