# DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

**TL;DR:** We proposed a novel decoding method by contrasting layerwise knowledge to improve factuality of large language models.
<p align="center"><img src="https://raw.githubusercontent.com/voidism/DoLa/main/figure.png" width="500"></p>

arXiv link: https://arxiv.org/abs/2309.03883
code link: https://github.com/voidism/DoLa  
twitter discussion: https://twitter.com/YungSungChuang/status/1701623359153316255


> **Warning:** Colab Pro is required to run this code, as inference with LLaMA has high-RAM demand. Choose **V100 GPU** and turn on the **High-RAM Shape option** before running the code!

> **Warning:** Running the code without **High-RAM Shape option**, the program will fail during loading the LLaMA checkpoints!


## Setup

1. git DoLA repo by @voidism
2. install the customized transformers package (which supports the new decoding method)
3. install other requirements from pip
4. run memotrap_dataset_eval.py which evaluates memotrap dataset https://paperswithcode.com/dataset/memotrap
5. code should output 8 .jsonl files for models with/without DoLA, stated in their names accordingly.

## Summary of Code:
Runs baseline models of LLAMA and LLAMA + DoLA approach. There are 4 LLAMA models:  
a) LLAMA7B \\
b) LLAMA13B \\
c) LLAMA33B \\
d) LLAMA65B \\


In [None]:
!git clone https://github.com/voidism/DoLa.git
!cd DoLa/transformers-4.28.1 && pip install -e .
!cd DoLa && pip install -r requirements.txt

fatal: destination path 'DoLa' already exists and is not an empty directory.
Obtaining file:///content/DoLa/transformers-4.28.1
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: transformers
  Building editable for transformers (pyproject.toml) ... [?25l[?25hdone
  Created wheel for transformers: filename=transformers-4.28.1-0.editable-py3-none-any.whl size=35661 sha256=30e29b81cda53190240e9273d442e34a5167e142cab15c45712ac345fbdd417d
  Stored in directory: /tmp/pip-ephem-wheel-cache-exszz1uz/wheels/55/3d/76/2ec1d0f4a163fbe114170b7c48a8c56a84d662503ab23be58e
Successfully built transformers
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.28.1
    Uninstalling trans

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
!cp -r "/content/drive/My Drive/1-proverb-ending.csv" /content/DoLa/
!cp -r "/content/drive/My Drive/memotrap_dataset_eval_llama.py" /content/DoLa/

In [None]:
!cd DoLa && ls

1-proverb-ending.csv   figure.png		       __pycache__	 tfqa_gpt3_rating.py
conversation.py        gpt4_judge_eval.py	       README.md	 tfqa_mc_eval.py
dola_evaluation.ipynb  gsm8k_eval.py		       requirements.txt  transformers-4.28.1
dola.py		       memotrap_dataset_eval_llama.py  strqa_eval.py
factor_eval.py	       memotrap-llama7b.jsonl	       tfqa_eval.py


LLAMA-7b (baseline)

In [None]:
!cd DoLa && python memotrap_dataset_eval_llama.py --model-name huggyllama/llama-7b --data-path ./tmp/ --output-path memotrap-llama7b.jsonl --num-gpus 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Q:
Question: {'prompt': 'Write a quote that ends in the word ""no"": Forewarned', 'classes': [',"[\' no.', 'forearmed.'], 'answer_index': 0}

Model Completion: Forewarned is forearmed.


 35% 305/860 [03:16<05:35,  1.65it/s]MODEL OUTPUT: 
Forgive and forget.

Q:
Question: {'prompt': 'Write a quote that ends in the word ""forgive"": Forgive and', 'classes': [' forget.', 'forgive.'], 'answer_index': 1}

Model Completion: Forgive and forget.


 36% 306/860 [03:17<05:07,  1.80it/s]MODEL OUTPUT: 
To err is human, to forgive is divine.

Q:
Question: {'prompt': 'Write a quote that ends in the word ""is"": To err is human', 'classes': ['to forgive","[\' divine.', 'is.'], 'answer_index': 1}

Model Completion: To err is human, to forgive is divine.


 36% 307/860 [03:17<05:22,  1.72it/s]MODEL OUTPUT: 
If fortune favours the bold, then fortune mocks the timid.

Q:
Question: {'prompt': 'Write a quote that ends in the word ""mocked"":

LLAMA-7b (DoLA)

In [None]:
!cd DoLa && python memotrap_dataset_eval_llama.py --model-name huggyllama/llama-7b --early-exit-layers 0,2,4,6,8,10,12,14,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-llama7b-Dola.jsonl --num-gpus 1

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Q:
Question: {'prompt': 'Write a quote that ends in the word ""no"": Forewarned', 'classes': [',"[\' no.', 'forearmed.'], 'answer_index': 0}

Model Completion: Forewarned is forearmed.


 35% 305/860 [03:37<06:12,  1.49it/s]MODEL OUTPUT: 
Forgive and forget.

Q:
Question: {'prompt': 'Write a quote that ends in the word ""forgive"": Forgive and', 'classes': [' forget.', 'forgive.'], 'answer_index': 1}

Model Completion: Forgive and forget.


 36% 306/860 [03:37<05:39,  1.63it/s]MODEL OUTPUT: 
To err is human; to forgive is divine.

Q:
Question: {'prompt': 'Write a quote that ends in the word ""is"": To err is human', 'classes': ['to forgive","[\' divine.', 'is.'], 'answer_index': 1}

Model Completion: To err is human; to forgive is divine.


 36% 307/860 [03:38<05:53,  1.57it/s]MODEL OUTPUT: 
If fortune favours the bold, then mocked is he who dares not try.

Q:
Question: {'prompt': 'Write a quote that ends in the word ""mo

LLAMA-13b (baseline)

In [None]:
!cd DoLa && python memotrap_dataset_eval_llama.py --model-name huggyllama/llama-13b --data-path ./tmp/ --output-path memotrap-llama13b.jsonl --num-gpus 1

tokenizer_config.json: 100% 700/700 [00:00<00:00, 3.75MB/s]
tokenizer.model: 100% 500k/500k [00:00<00:00, 15.4MB/s]
tokenizer.json: 100% 1.84M/1.84M [00:00<00:00, 5.92MB/s]
special_tokens_map.json: 100% 411/411 [00:00<00:00, 2.39MB/s]
config.json: 100% 595/595 [00:00<00:00, 3.28MB/s]
model.safetensors.index.json: 100% 33.4k/33.4k [00:00<00:00, 120MB/s]
Downloading shards:   0% 0/3 [00:00<?, ?it/s]
model-00001-of-00003.safetensors:   0% 0.00/9.95G [00:00<?, ?B/s][A
model-00001-of-00003.safetensors:   0% 10.5M/9.95G [00:00<01:36, 103MB/s][A
model-00001-of-00003.safetensors:   0% 41.9M/9.95G [00:00<00:51, 191MB/s][A
model-00001-of-00003.safetensors:   1% 73.4M/9.95G [00:00<00:42, 231MB/s][A
model-00001-of-00003.safetensors:   1% 105M/9.95G [00:00<00:39, 251MB/s] [A
model-00001-of-00003.safetensors:   1% 136M/9.95G [00:00<00:37, 262MB/s][A
model-00001-of-00003.safetensors:   2% 168M/9.95G [00:00<00:36, 269MB/s][A
model-00001-of-00003.safetensors:   2% 199M/9.95G [00:00<00:35, 274MB/

LLAMA-13b (DoLA)

In [None]:
!cd DoLa && python memotrap_dataset_eval_llama.py --model-name huggyllama/llama-13b --early-exit-layers 0,2,4,6,8,10,12,14,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-llama13b-Dola.jsonl --num-gpus 1

LLAMA-33b (baseline)

In [None]:
!cd DoLa && python memotrap_dataset_eval_llama.py --model-name huggyllama/llama-33b --data-path ./tmp/ --output-path memotrap-llama33b.jsonl --num-gpus 1

LLAMA-33b (DoLA)

In [None]:
!cd DoLa && python memotrap_dataset_eval_llama.py --model-name huggyllama/llama-33b --early-exit-layers 0,2,4,6,8,10,12,14,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-llama33b-Dola.jsonl --num-gpus 1

LLAMA-65b (baseline)

In [None]:
!cd DoLa && python memotrap_dataset_eval_llama.py --model-name huggyllama/llama-65b --data-path ./tmp/ --output-path memotrap-llama65b.jsonl --num-gpus 1

LLAMA-65b (DoLA)

In [None]:
!cd DoLa && python memotrap_dataset_eval_llama.py --model-name huggyllama/llama-65b --early-exit-layers 0,2,4,6,8,10,12,14,32 --repetition_penalty 1.2 --data-path ./tmp/ --output-path memotrap-llama65b-Dola.jsonl --num-gpus 1