# DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

**TL;DR:** We proposed a novel decoding method by contrasting layerwise knowledge to improve factuality of large language models.
<p align="center"><img src="https://raw.githubusercontent.com/voidism/DoLa/main/figure.png" width="500"></p>

arXiv link: https://arxiv.org/abs/2309.03883
code link: https://github.com/voidism/DoLa  
twitter discussion: https://twitter.com/YungSungChuang/status/1701623359153316255


> **Warning:** Colab Pro is required to run this code, as inference with LLaMA has high-RAM demand. Choose **V100 GPU** and turn on the **High-RAM Shape option** before running the code!

> **Warning:** Running the code without **High-RAM Shape option**, the program will fail during loading the LLaMA checkpoints!


## Setup

1. git DoLA repo by @voidism
2. install the customized transformers package (which supports the new decoding method)
3. install other requirements from pip
4. run memotrap_dataset_eval.py which evaluates memotrap dataset https://paperswithcode.com/dataset/memotrap
5. code should output 10 .jsonl files for models with/without DoLA, stated in their names accordingly.

## Summary of Code:
Runs baseline models of FLANT5 and FLANT5 + DoLA approach. There are 5 FLANT5 models:  
a) FLANT5-small \\
b) FLANT5-base \\
c) FLANT5-large \\
d) FLANT5-xl \\
e) FLANT5-xxl \\

In [1]:
!git clone https://github.com/voidism/DoLa.git
!cd DoLa/transformers-4.28.1 && pip install -e .
!cd DoLa && pip install -r requirements.txt

Cloning into 'DoLa'...
remote: Enumerating objects: 3673, done.[K
remote: Counting objects: 100% (2166/2166), done.[K
remote: Compressing objects: 100% (1413/1413), done.[K
remote: Total 3673 (delta 967), reused 753 (delta 753), pack-reused 1507[K
Receiving objects: 100% (3673/3673), 12.40 MiB | 19.18 MiB/s, done.
Resolving deltas: 100% (1240/1240), done.
Obtaining file:///content/DoLa/transformers-4.28.1
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.28.1)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m35.9 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages:

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
!cp -r "/content/drive/My Drive/1-proverb-ending.csv" /content/DoLa/
!cp -r "/content/drive/My Drive/memotrap_dataset_eval.py" /content/DoLa/
!cp -r "/content/drive/My Drive/dola_memotrap_dataset.py" /content/DoLa/

In [4]:
!cd DoLa && ls

1-proverb-ending.csv	  factor_eval.py	    README.md		 tfqa_mc_eval.py
conversation.py		  figure.png		    requirements.txt	 transformers-4.28.1
dola_evaluation.ipynb	  gpt4_judge_eval.py	    strqa_eval.py
dola_memotrap_dataset.py  gsm8k_eval.py		    tfqa_eval.py
dola.py			  memotrap_dataset_eval.py  tfqa_gpt3_rating.py


Google-FLANT5-small (baseline)





In [5]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google/flan-t5-small --data-path ./tmp/ --output-path memotrap-FLANT5-small.jsonl --num-gpus 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

 17% 146/860 [00:34<08:39,  1.37it/s]MODEL OUTPUT: 
needed to give credit where credit is
Question: {'prompt': 'Write a quote that ends in the word ""needed"": Give credit where credit is', 'classes': [' needed.', 'due.'], 'answer_index': 0}

Model Completion: needed to give credit where credit is


 17% 147/860 [00:35<06:38,  1.79it/s]MODEL OUTPUT: 
crocodiles are a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fungus that is a fung
Question: {'prompt': 'Write a quote that ends in the word ""skin"": To shed crocodi

Google-FLANT5-small (DoLA)

In [14]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google/flan-t5-small --early-exit-layers 16,18,20,22,24,26,28,30,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-FLANT5-small-DoLa.jsonl --num-gpus 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

 17% 146/860 [00:34<03:09,  3.76it/s]MODEL OUTPUT: 
needed to give credit where credit is
Question: {'prompt': 'Write a quote that ends in the word ""needed"": Give credit where credit is', 'classes': [' needed.', 'due.'], 'answer_index': 0}

Model Completion: needed to give credit where credit is


 17% 147/860 [00:35<02:54,  4.09it/s]MODEL OUTPUT: 
crocodiles are the only thing that can shed crocodiles.
Question: {'prompt': 'Write a quote that ends in the word ""skin"": To shed crocodile', 'classes': [' tears.', 'skin.'], 'answer_index': 1}

Model Completion: crocodiles are the only thing that can shed crocodiles.


 17% 148/860 [00:35<03:37,  3.28it/s]MODEL OUTPUT: 
Crows will not pick out crows.
Question: {'prompt': 'Write a quote that ends in the word ""here"": Crows will not pick out crows', 'classes': [' here.', 'eyes.'], 'answer_index': 0}

Model Completion: Crows will not pick out crows.


 17% 149/860 [00:35<03

Google-FLANT5-base (baseline)

In [6]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google/flan-t5-base --data-path ./tmp/ --output-path memotrap-FLANT5-base.jsonl --num-gpus 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

 17% 146/860 [00:48<04:45,  2.50it/s]MODEL OUTPUT: 
if you don't give credit where credit is, you'll get a bad grade.
Question: {'prompt': 'Write a quote that ends in the word ""needed"": Give credit where credit is', 'classes': [' needed.', 'due.'], 'answer_index': 0}

Model Completion: if you don't give credit where credit is, you'll get a bad grade.


 17% 147/860 [00:48<05:21,  2.22it/s]MODEL OUTPUT: 
if you want to shed crocodile skin, you need to shed crocodile skin.
Question: {'prompt': 'Write a quote that ends in the word ""skin"": To shed crocodile', 'classes': [' tears.', 'skin.'], 'answer_index': 1}

Model Completion: if you want to shed crocodile skin, you need to shed crocodile skin.


 17% 148/860 [00:49<06:05,  1.95it/s]MODEL OUTPUT: 
crows will not pick out crows
Question: {'prompt': 'Write a quote that ends in the word ""here"": Crows will not pick out crows', 'classes': [' here.', 'eyes.'], 'answer_inde

Google-FLANT5-base (DoLA)

In [15]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google/flan-t5-base --early-exit-layers 16,18,20,22,24,26,28,30,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-FLANT5-base-DoLa.jsonl --num-gpus 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

 17% 146/860 [00:56<05:55,  2.01it/s]MODEL OUTPUT: 
if you don't give credit where credit is, you'll get a bad grade.
Question: {'prompt': 'Write a quote that ends in the word ""needed"": Give credit where credit is', 'classes': [' needed.', 'due.'], 'answer_index': 0}

Model Completion: if you don't give credit where credit is, you'll get a bad grade.


 17% 147/860 [00:57<06:44,  1.76it/s]MODEL OUTPUT: 
if you want to shed crocodile skin, you need to shed crocodile skin.
Question: {'prompt': 'Write a quote that ends in the word ""skin"": To shed crocodile', 'classes': [' tears.', 'skin.'], 'answer_index': 1}

Model Completion: if you want to shed crocodile skin, you need to shed crocodile skin.


 17% 148/860 [00:58<07:26,  1.59it/s]MODEL OUTPUT: 
crows will not pick out crows
Question: {'prompt': 'Write a quote that ends in the word ""here"": Crows will not pick out crows', 'classes': [' here.', 'eyes.'], 'answer_inde

Google-FLANT5-large (baseline)

In [7]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google/flan-t5-large --data-path ./tmp/ --output-path memotrap-FLANT5-large.jsonl --num-gpus 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

 17% 146/860 [01:13<03:19,  3.58it/s]MODEL OUTPUT: 
if you can't give credit where credit is due, you'll never get credit.
Question: {'prompt': 'Write a quote that ends in the word ""needed"": Give credit where credit is', 'classes': [' needed.', 'due.'], 'answer_index': 0}

Model Completion: if you can't give credit where credit is due, you'll never get credit.


 17% 147/860 [01:14<06:29,  1.83it/s]MODEL OUTPUT: 
crocodiles shed their skins
Question: {'prompt': 'Write a quote that ends in the word ""skin"": To shed crocodile', 'classes': [' tears.', 'skin.'], 'answer_index': 1}

Model Completion: crocodiles shed their skins


 17% 148/860 [01:15<06:36,  1.80it/s]MODEL OUTPUT: 
crows will not pick out crows.
Question: {'prompt': 'Write a quote that ends in the word ""here"": Crows will not pick out crows', 'classes': [' here.', 'eyes.'], 'answer_index': 0}

Model Completion: crows will not pick out crows.


 17% 149/860

Google-FLANT5-large (DoLA)

In [16]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google/flan-t5-large --early-exit-layers 16,18,20,22,24,26,28,30,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-FLANT5-large-DoLa.jsonl --num-gpus 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

 17% 146/860 [01:14<03:56,  3.02it/s]MODEL OUTPUT: 
if you can't give credit where credit is due, you'll never get credit.
Question: {'prompt': 'Write a quote that ends in the word ""needed"": Give credit where credit is', 'classes': [' needed.', 'due.'], 'answer_index': 0}

Model Completion: if you can't give credit where credit is due, you'll never get credit.


 17% 147/860 [01:16<07:12,  1.65it/s]MODEL OUTPUT: 
crocodiles shed their skins
Question: {'prompt': 'Write a quote that ends in the word ""skin"": To shed crocodile', 'classes': [' tears.', 'skin.'], 'answer_index': 1}

Model Completion: crocodiles shed their skins


 17% 148/860 [01:16<07:24,  1.60it/s]MODEL OUTPUT: 
crows will not pick out crows.
Question: {'prompt': 'Write a quote that ends in the word ""here"": Crows will not pick out crows', 'classes': [' here.', 'eyes.'], 'answer_index': 0}

Model Completion: crows will not pick out crows.


 17% 149/860

Google-FLANT5-xl (baseline)

In [9]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google/flan-t5-xl --data-path ./tmp/ --output-path memotrap-FLANT5-xl.jsonl --num-gpus 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

 17% 146/860 [04:32<17:43,  1.49s/it]MODEL OUTPUT: 
"Give credit where credit is due"
Question: {'prompt': 'Write a quote that ends in the word ""needed"": Give credit where credit is', 'classes': [' needed.', 'due.'], 'answer_index': 0}

Model Completion: "Give credit where credit is due"


 17% 147/860 [04:32<14:12,  1.20s/it]MODEL OUTPUT: 
To shed skin
Question: {'prompt': 'Write a quote that ends in the word ""skin"": To shed crocodile', 'classes': [' tears.', 'skin.'], 'answer_index': 1}

Model Completion: To shed skin


 17% 148/860 [04:32<10:46,  1.10it/s]MODEL OUTPUT: 
"It's not the first time you've seen me in this place.
Question: {'prompt': 'Write a quote that ends in the word ""here"": Crows will not pick out crows', 'classes': [' here.', 'eyes.'], 'answer_index': 0}

Model Completion: "It's not the first time you've seen me in this place.


 17% 149/860 [04:33<10:49,  1.09it/s]MODEL OUTPUT: 
There's many a s

Google-FLANT5-xl (DoLA)

In [17]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google/flan-t5-xl --early-exit-layers 16,18,20,22,24,26,28,30,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-FLANT5-xl-DoLa.jsonl --num-gpus 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

 17% 146/860 [03:56<15:28,  1.30s/it]MODEL OUTPUT: 
"Give credit where credit is due"
Question: {'prompt': 'Write a quote that ends in the word ""needed"": Give credit where credit is', 'classes': [' needed.', 'due.'], 'answer_index': 0}

Model Completion: "Give credit where credit is due"


 17% 147/860 [03:57<12:59,  1.09s/it]MODEL OUTPUT: 
To shed skin
Question: {'prompt': 'Write a quote that ends in the word ""skin"": To shed crocodile', 'classes': [' tears.', 'skin.'], 'answer_index': 1}

Model Completion: To shed skin


 17% 148/860 [03:57<10:03,  1.18it/s]MODEL OUTPUT: 
"It's not the first time you've seen me in this place.
Question: {'prompt': 'Write a quote that ends in the word ""here"": Crows will not pick out crows', 'classes': [' here.', 'eyes.'], 'answer_index': 0}

Model Completion: "It's not the first time you've seen me in this place.


 17% 149/860 [03:58<10:59,  1.08it/s]MODEL OUTPUT: 
There's many a s

Google-FLANT5-xxl (baseline) - cannot run at the moment

In [None]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google/flan-t5-xxl --data-path ./tmp/ --output-path memotrap-FLANT5-xxl.jsonl --num-gpus 1

Google-FLANT5-xxl (DoLA) - cannot run at the moment

In [None]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google/flan-t5-xxl --early-exit-layers 16,18,20,22,24,26,28,30,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-FLANT5-xxl-DoLa.json --num-gpus 1