# DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

**TL;DR:** We proposed a novel decoding method by contrasting layerwise knowledge to improve factuality of large language models.
<p align="center"><img src="https://raw.githubusercontent.com/voidism/DoLa/main/figure.png" width="500"></p>

arXiv link: https://arxiv.org/abs/2309.03883 
code link: https://github.com/voidism/DoLa  
twitter discussion: https://twitter.com/YungSungChuang/status/1701623359153316255


> **Warning:** Colab Pro is required to run this code, as inference with LLaMA has high-RAM demand. Choose **V100 GPU** and turn on the **High-RAM Shape option** before running the code!

> **Warning:** Running the code without **High-RAM Shape option**, the program will fail during loading the LLaMA checkpoints!


## Setup

1. git clone our repo
2. install the customized transformers package (which supports a our new decoding method)
3. install other requirements from pip

In [None]:
!git clone https://github.com/voidism/DoLa.git
!cd DoLa/transformers-4.28.1 && pip install -e .
!cd DoLa && pip install -r requirements.txt

## Run TruthfulQA-MC

### Baseline

In [None]:
!cd DoLa && python tfqa_mc_eval.py --model-name huggyllama/llama-7b --data-path ./tmp/ --output-path output-path-tfmc-baseline.json --num-gpus 1

### DoLa

In [None]:
!cd DoLa && python tfqa_mc_eval.py --model-name huggyllama/llama-7b --early-exit-layers 16,18,20,22,24,26,28,30,32 --data-path ./tmp/ --output-path output-path-tfqamc-dola.json --num-gpus 1

## Run StrategyQA

`(Warning: long running time ~2hrs)`

### Baseline

In [None]:
!cd DoLa && python strqa_eval.py --model-name huggyllama/llama-7b --data-path ./tmp/ --output-path output-path-strqa-baseline.json --num-gpus 1

### DoLa

In [None]:
!cd DoLa && python strqa_eval.py --model-name huggyllama/llama-7b --early-exit-layers 0,2,4,6,8,10,12,14,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path output-path-strqa-dola.json --num-gpus 1

## Run GSM8K

`(Warning: long running time ~3hrs)`

### Baseline

In [None]:
!cd DoLa && python gsm8k_eval.py --model-name huggyllama/llama-7b --data-path ./tmp/ --output-path output-path-gsm8k-baseline.json --num-gpus 1

### DoLa

In [None]:
!cd DoLa && python gsm8k_eval.py --model-name huggyllama/llama-7b --early-exit-layers 0,2,4,6,8,10,12,14,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path output-path-gsm8k-dola.json --num-gpus 1

## Other Datasets

The above three tasks can be tested without additional requirements. For the other three datasets, you will need to do the following steps:

- For FACTOR, please download the data file `wiki_factor.csv` from https://github.com/AI21Labs/factor
- For TruthfulQA (open-ended generation setting), you need to finetune two GPT-3 curie models through OpenAI API, and use the finetuned models for evaluating the model outputs.
- For Vicuna QA (GPT-4 eval), you need a OpenAI API key that has access to GPT-4 for the pairwise evaluation.

Check more details in https://github.com/voidism/DoLa/blob/main/README.md

## FACTOR
Please download the data file `wiki_factor.csv` from https://github.com/AI21Labs/factor

### Baseline

In [None]:
!cd DoLa && python factor_eval.py --model-name huggyllama/llama-7b --data-path /path/to/wiki_factor.csv --output-path output-path-factor-wiki-baseline.json --num-gpus 1

### DoLa

In [None]:
!cd DoLa && python factor_eval.py --model-name huggyllama/llama-7b --early-exit-layers 0,2,4,6,8,10,12,14,32 --data-path /path/to/wiki_factor.csv --output-path output-path-factor-wiki-dola.json --num-gpus 1

## TruthfulQA

The config file `gpt3.config.json` is required. See more details in https://github.com/voidism/DoLa/blob/main/README.md

### Baseline

In [None]:
!cd DoLa && python tfqa_eval.py --model-name huggyllama/llama-7b --data-path ./tmp/ --output-path output-path-tfqa-baseline.json --num-gpus 1 --do-rating --gpt3-config /path/to/gpt3.config.json

### DoLa

In [None]:
!cd DoLa && python tfqa_eval.py --model-name huggyllama/llama-7b --early-exit-layers 16,18,20,22,24,26,28,30,32 --data-path ./tmp/ --output-path output-path-tfqa-dola.json --num-gpus 1 --do-rating --gpt3-config /path/to/gpt3.config.json

## Vicuna QA (GPT-4 evaluation)

In GPT-4 evaluation, we need the question file from [FastChat](https://github.com/lm-sys/FastChat). In the following commands, we assume the path to your FastChat repo is `$fastchat`.

### Baseline

In [None]:
!cd DoLa && python gpt4_judge_eval.py --model-name huggyllama/llama-7b --model-id llama-7b-baseline --question-file $fastchat/eval/table/question.jsonl --answer-file output-answer-baseline.jsonl --num-gpus 1

### DoLa

In [None]:
!cd DoLa && python gpt4_judge_eval.py --model-name huggyllama/llama-7b --early-exit-layers 0,2,4,6,8,10,12,14,32 --model-id llama-7b-dola --question-file $fastchat/eval/table/question.jsonl --answer-file output-answer-dola.jsonl --num-gpus 1

### Run GPT-4 

`openai_api_key` is required.

In [None]:
!cd DoLa && python $fastchat/eval/eval_gpt_review.py -q $fastchat/eval/table/question.jsonl -a output-answer-baseline.jsonl output-answer-dola.jsonl -p $fastchat/eval/table/prompt.jsonl -r $fastchat/eval/table/reviewer.jsonl -o output-review-path.jsonl -k openai_api_key