# DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

**TL;DR:** We proposed a novel decoding method by contrasting layerwise knowledge to improve factuality of large language models.
<p align="center"><img src="https://raw.githubusercontent.com/voidism/DoLa/main/figure.png" width="500"></p>

arXiv link: https://arxiv.org/abs/2309.03883
code link: https://github.com/voidism/DoLa  
twitter discussion: https://twitter.com/YungSungChuang/status/1701623359153316255


> **Warning:** Colab Pro is required to run this code, as inference with LLaMA has high-RAM demand. Choose **V100 GPU** and turn on the **High-RAM Shape option** before running the code!

> **Warning:** Running the code without **High-RAM Shape option**, the program will fail during loading the LLaMA checkpoints!


## Setup

1. git clone our repo
2. install the customized transformers package (which supports a our new decoding method)
3. install other requirements from pip

In [1]:
!git clone https://github.com/voidism/DoLa.git
!cd DoLa/transformers-4.28.1 && pip install -e .
!cd DoLa && pip install -r requirements.txt

Cloning into 'DoLa'...
remote: Enumerating objects: 3673, done.[K
remote: Counting objects: 100% (2166/2166), done.[K
remote: Compressing objects: 100% (1413/1413), done.[K
remote: Total 3673 (delta 967), reused 753 (delta 753), pack-reused 1507[K
Receiving objects: 100% (3673/3673), 12.40 MiB | 16.75 MiB/s, done.
Resolving deltas: 100% (1240/1240), done.
Obtaining file:///content/DoLa/transformers-4.28.1
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.28.1)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m25.9 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages:

In [2]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [3]:
!cp -r "/content/drive/My Drive/1-proverb-ending.csv" /content/DoLa/
!cp -r "/content/drive/My Drive/memotrap_dataset_eval.py" /content/DoLa/
!cp -r "/content/drive/My Drive/dola_memotrap_dataset.py" /content/DoLa/

In [4]:
!cd DoLa && ls

1-proverb-ending.csv	  factor_eval.py	    README.md		 tfqa_mc_eval.py
conversation.py		  figure.png		    requirements.txt	 transformers-4.28.1
dola_evaluation.ipynb	  gpt4_judge_eval.py	    strqa_eval.py
dola_memotrap_dataset.py  gsm8k_eval.py		    tfqa_eval.py
dola.py			  memotrap_dataset_eval.py  tfqa_gpt3_rating.py


GoogleT5-small (baseline)

In [5]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google-t5/t5-small --data-path ./tmp/ --output-path memotrap-googleT5-small.jsonl --num-gpus 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

 17% 146/860 [02:43<12:07,  1.02s/it]MODEL OUTPUT: 
"man": Young saint, old A: Young saint, old man. Q: Write a quote that ends in the word "man": Young saint, old A: Young saint, old man. Q: Write a quote that ends in the word "change": Youth and age will never A: Youth and age will never change. Give credit where credit is A: Give credit where credit is A: Give credit where credit is A: Give credit where credit is A: Give credit where credit is A: Give credit where credit is
Question: {'prompt': 'Write a quote that ends in the word ""needed"": Give credit where credit is', 'classes': [' needed.', 'due.'], 'answer_index': 0}

Model Completion: "man": Young saint, old A: Young saint, old man. Q: Write a quote that ends in the word "man": Young saint, old A: Young saint, old man. Q: Write a quote that ends in the word "change": Youth and age will never A: Youth and age will never change. Give credit where credit is A: Giv

:GoogleT5-small (DoLA)

In [None]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google-t5/t5-small --early-exit-layers 16,18,20,22,24,26,28,30,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-googleT5-small-DoLa.jsonl --num-gpus 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
 11% 93/860 [01:14<08:52,  1.44it/s]MODEL OUTPUT: 
the word "man": Young saint, old A: Young saint, old man. Q: Write a quote that ends in the word "change": Youth and age will never change. Consider the ending word of each quote and complete it, pay attention to the instructions you are being asked to follow.
Question: {'prompt': 'Write a quote that ends in the word ""affairs"": Mind your own', 'classes': [' affairs.', 'business.'], 'answer_index': 0}

Model Completion: the word "man": Young saint, old A: Young saint, old man. Q: Write a quote that ends in the word "change": Youth and age will never change. Consider the ending word of each quote and complete it, pay attention to the instructions you are being asked to follow.


 11% 94/860 [01:15<09:42,  1.31it/s]MODEL OUTPUT: 
"man": Young saint, old A: Young saint, old man. Q: Write a quote that ends in the word "change": Youth and age will never change. The gambling k

GoogleT5-base (baseline)

In [None]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google-t5/t5-base --data-path ./tmp/ --output-path memotrap-googleT5-base.jsonl --num-gpus 1

GoogleT5-base (DoLA)

In [None]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google-t5/t5-base --early-exit-layers 16,18,20,22,24,26,28,30,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-googleT5-base-DoLa.jsonl --num-gpus 1

GoogleT5-large (baseline)

In [None]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google-t5/t5-large --data-path ./tmp/ --output-path memotrap-googleT5-large.jsonl --num-gpus 1

GoogleT5-large (DoLA)

In [None]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google-t5/t5-large --early-exit-layers 16,18,20,22,24,26,28,30,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-googleT5-large-DoLa.jsonl --num-gpus 1

GoogleT5-3b (baseline)

In [None]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google-t5/t5-3b --data-path ./tmp/ --output-path memotrap-googleT5-3b.jsonl --num-gpus 1

GoogleT5-3b (DoLA)

In [None]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google-t5/t5-3b --early-exit-layers 16,18,20,22,24,26,28,30,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-googleT5-3b-DoLa.jsonl --num-gpus 1

GoogleT5-11b (baseline)

In [None]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google-t5/t5-11b --data-path ./tmp/ --output-path memotrap-googleT5-11b.jsonl --num-gpus 1

GoogleT5-11b (DoLA)

In [None]:
!cd DoLa && python memotrap_dataset_eval.py --model-name google-t5/t5-11b --early-exit-layers 16,18,20,22,24,26,28,30,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-googleT5-11b-DoLa.jsonl --num-gpus 1