LLMUncertainty

Uncertainty Awareness of Large Language Models Under Code Distribution Shifts: A Benchmark Study

Requirements

python3.7+
PyTorch 1.13.0
Libraries and dependencies:

pip install -r requirements.txt

Quickstart

Step 0: Cloning this repository

git clone https://github.com/yul091/LLMUncertainty.git
cd LLMUncertainty

Step 1: Download the preprocessed Java-small dataset (~60 K examples, compressed: 84MB) and Python150k dataset for OOD detection (~150 K examples, compressed: 526MB)

wget https://s3.amazonaws.com/code2seq/datasets/java-small.tar.gz
tar -xvzf java-small.tar.gz
wget http://files.srl.inf.ethz.ch/data/py150.tar.gz
tar -xzvf py150.tar.gz

Step 2: Training a model

Training a model from scratch

To train a model from scratch:

Edit the file scripts/train_cs.sh and file scripts/train_cc.sh to point to the right preprocessed data and a specific model archiecture.
Before training, you can edit the configuration hyper-parameters in these two files.
Run the two shell scripts:

bash scripts/train_cs.sh # code summary
bash scripts/train_cc.sh # code completion

Step 3: Running the five probabilistic methods for calibration and UE

Edit the file scripts/get_uncertainty_cs.sh to point to the right preprocessed data, a specific task and a specific model.
Run the script scripts/get_uncertainty_cs.sh:

bash scripts/get_uncertainty_cs.sh # code summary
bash scripts/get_uncertainty_cc.sh # code completion

Step 4: Evaluation the UE quality in misclassification detection

Edit the script scripts/misclassification_prediction_cs.sh to point to the target evaluation choice (misclassification detection or OOD detection).
Run the script scripts/misclassification_prediction_cs.sh:

scripts/misclassification_prediction_cs.sh # code summary
scripts/misclassification_prediction_cc.sh # code completion

Step 5: Evaluation the UE quality in selective prediction via abstention

Edit the script scripts/abstention_cs.sh to point to the right preprocessed data, a specific task and a specific model.
Run the script scripts/abstention_cs.sh:

bash scripts/abstention_cs.sh # code summary
bash scripts/abstention_cc.sh # code completion

Step 6: Evaluation the UE quality in OOD detection

Edit the script scripts/ood_detection_cs.sh to point to the right preprocessed data, a specific task and a specific model.
Run the script scripts/ood_detection_cs.sh:

bash scripts/ood_detection_cs.sh # code summary
bash scripts/ood_detection_cc.sh # code completion

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
BasicalClass		BasicalClass
JavaExtractor		JavaExtractor
Metric		Metric
data		data
models		models
preprocess		preprocess
program_tasks		program_tasks
scripts		scripts
.gitignore		.gitignore
README.md		README.md
common.py		common.py
input_validation.py		input_validation.py
norm_data.py		norm_data.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
test_uncertainty.py		test_uncertainty.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMUncertainty

Requirements

Quickstart

Step 0: Cloning this repository

Step 1: Download the preprocessed Java-small dataset (~60 K examples, compressed: 84MB) and Python150k dataset for OOD detection (~150 K examples, compressed: 526MB)

Step 2: Training a model

Training a model from scratch

Step 3: Running the five probabilistic methods for calibration and UE

Step 4: Evaluation the UE quality in misclassification detection

Step 5: Evaluation the UE quality in selective prediction via abstention

Step 6: Evaluation the UE quality in OOD detection

About

Releases

Packages

Languages

yul091/LLMUncertainty

Folders and files

Latest commit

History

Repository files navigation

LLMUncertainty

Requirements

Quickstart

Step 0: Cloning this repository

Step 1: Download the preprocessed Java-small dataset (~60 K examples, compressed: 84MB) and Python150k dataset for OOD detection (~150 K examples, compressed: 526MB)

Step 2: Training a model

Training a model from scratch

Step 3: Running the five probabilistic methods for calibration and UE

Step 4: Evaluation the UE quality in misclassification detection

Step 5: Evaluation the UE quality in selective prediction via abstention

Step 6: Evaluation the UE quality in OOD detection

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages