<a href="https://colab.research.google.com/github/twhool02/ptm-quantization/blob/main/Quantized_Model_Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Model Evaluation

This notebook runs evaluation benchmarks on qunatized pre-trained models.

Evaulation of models is carried out using the [Language Model Evaluation Harness ](https://github.com/EleutherAI/lm-evaluation-harness) from [EleutherAI](https://www.eleuther.ai/)

Models are evaluated on:

* MMLU (5-shot)
* HellaSwag (0-shot)
* BoolQ (0-shot)
* BBH (3-shot)

The number of shots for MMLU, HellaSwag and BBH are the same as those used for comparing models in [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)




## Setup

### Map Google Drive

In [None]:
import shutil, os, subprocess

# mount google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Log into HuggingFace Hub

This code assumes that the user has a hugging face token setup as a notebook secret called HF_TOKEN

In [None]:
# Required when interacting with HuggingFace Hub
!pip install -q --upgrade huggingface_hub

import huggingface_hub

print(f"Hugging Face Version is: {huggingface_hub.__version__}")

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/388.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m174.1/388.9 kB[0m [31m5.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m388.9/388.9 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25hHugging Face Version is: 0.22.2


In [None]:
from google.colab import userdata

# using the HF_TOKEN secret, this has write permissions to Hugging Face
hftoken = userdata.get('HF_TOKEN')

In [None]:
from huggingface_hub import login

# Log into hugging face using the HF_TOKEN secrect
login(hftoken, add_to_git_credential=True)

Token is valid (permission: write).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.cache/huggingface/token
Login successful


### Install Requried Libraries

In [None]:
# The Transformers library provides APIs and tools to easily download and train pretrained model.
!pip install -q -U transformers -q

# Accelerate enables the same PyTorch code to be run across any distributed configuration
!pip install -q -U accelerate -q

# an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems
!pip install sentencepiece -q

# 'bitsandbytes' includes quantization primitives for 8-bit & 4-bit operations
!pip install bitsandbytes -q

# PEFT efficiently adapts large pretrained without fine-tuning all of a model’s parameters
!pip install peft -q

# trl is short for Transformers Reinforcement Learning, it's used for fine-tuning transformer models using Proximal Policy Optimization.
!pip install trl -q

# an extension of Transformers that provides a set of performance optimization tools to train and run models
!pip install -q -U optimum

# used for monitoring the training process.
!pip install -q -U wandb

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.8/8.8 MB[0m [31m70.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.1/290.1 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m69.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m64.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.1/14.1 MB[0m [31m85.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m731.7/731.7 MB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m410.6/410.6 MB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.6/121.6 MB[0m [31m13.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━

In [None]:
#print the version of transformers
import transformers
print(f"version of transformers: {transformers.__version__}")

# print the version of the accelerate library
import accelerate
print(f"version of accelerate: {accelerate.__version__}")

version of transformers: 4.39.3
version of accelerate: 0.28.0


### Log into Weights and Biases

In [None]:
import wandb

wandb_token = userdata.get('wandb_api')
wandb.login(key=wandb_token)

[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

### Create folder for results

In [None]:
# Create a directory to store results
results_dir = f"/content/drive/MyDrive/Evaluation"
os.makedirs(results_dir, exist_ok=True)

In [None]:
# Install LM-Eval
!pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git@big-refactor

Collecting git+https://github.com/EleutherAI/lm-evaluation-harness.git@big-refactor
  Cloning https://github.com/EleutherAI/lm-evaluation-harness.git (to revision big-refactor) to /tmp/pip-req-build-aeuycbz5
  Running command git clone --filter=blob:none --quiet https://github.com/EleutherAI/lm-evaluation-harness.git /tmp/pip-req-build-aeuycbz5
  Running command git checkout -b big-refactor --track origin/big-refactor
  Switched to a new branch 'big-refactor'
  Branch 'big-refactor' set up to track remote branch 'big-refactor' from 'origin'.
  Resolved https://github.com/EleutherAI/lm-evaluation-harness.git to commit 967eb4fa90b80ba4e8cc7a2fd171f44f0e384808
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting evaluate (from lm_eval==1.0.0)
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.

### Import libraries

In [None]:
!python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

2024-04-04 12:08:58.543062: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-04 12:08:58.543115: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-04 12:08:58.544933: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [None]:
!pip show tensorflow

Name: tensorflow
Version: 2.15.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, libclang, ml-dtypes, numpy, opt-einsum, packaging, protobuf, setuptools, six, tensorboard, tensorflow-estimator, tensorflow-io-gcs-filesystem, termcolor, typing-extensions, wrapt
Required-by: dopamine-rl, tf_keras


In [None]:
# for interacting with the operating system.
import os

# torch is the main package of PyTorch.
import torch

# Allows loading and preprocessing datasets from the Hugging Face Hub.
from datasets import load_dataset

from transformers import (
    AutoModelForCausalLM, # Generic model class with a causal language modeling head
    AutoTokenizer,# Automatically selects correct tokenizer for a model.
    BitsAndBytesConfig, # Used to configure a BitsAndBytes model.
    HfArgumentParser, # used for parsing command-line arguments.
    TrainingArguments, # defines the arguments used during training.
    pipeline, # Creates a pipeline that applies a model to some input data.
    logging, # Logs events during training and evaluation.
    AutoModelForQuestionAnswering # Used to get a model to perform context-based question answering etc…
)

# used for Parameter-Efficient Fine-Tuning
from peft import LoraConfig, PeftModel

# install SFTTrainer
from trl import SFTTrainer

# allows addition of progress bars to loops and iterable objects
from tqdm import tqdm

### Install lm-eval

In [None]:
!git clone https://github.com/EleutherAI/lm-evaluation-harness
!cd lm-evaluation-harness

Cloning into 'lm-evaluation-harness'...
remote: Enumerating objects: 32695, done.[K
remote: Counting objects: 100% (75/75), done.[K
remote: Compressing objects: 100% (57/57), done.[K
remote: Total 32695 (delta 31), reused 53 (delta 17), pack-reused 32620[K
Receiving objects: 100% (32695/32695), 22.81 MiB | 14.86 MiB/s, done.
Resolving deltas: 100% (22840/22840), done.


In [None]:
import os

# change directory
os.chdir("lm-evaluation-harness")

In [None]:
import os
import glob

# get current working dirctory and list files
print(f"current directory is: {os.getcwd()}\n")
# print(os.listdir('.'))

# Get a list of all files and directories in the current directory
files = glob.glob('./*')

# Create a list of tuples, each containing the name of the file/directory and its last modification time
files_with_times = [(file, os.path.getmtime(file)) for file in files]

# Sort the list by the modification time (the second element of each tuple)
files_with_times.sort(key=lambda x: x[1])

# Print the sorted list
print("Files in current directory:")
for file, mtime in files_with_times:
    print(f'{file}: {mtime}')

current directory is: /content/lm-evaluation-harness

Files in current directory:
./CITATION.bib: 1712232555.0523531
./CODEOWNERS: 1712232555.0523531
./LICENSE.md: 1712232555.0523531
./README.md: 1712232555.0533533
./docs: 1712232555.0553534
./examples: 1712232555.0563533
./ignore.txt: 1712232555.0563533
./pyproject.toml: 1712232555.168361
./pile_statistics.json: 1712232555.168361
./lm_eval: 1712232555.168361
./requirements.txt: 1712232555.168361
./mypy.ini: 1712232555.168361
./setup.py: 1712232555.170361
./scripts: 1712232555.170361
./templates: 1712232555.170361
./tests: 1712232555.199363


In [None]:
!pip install -r requirements.txt

Obtaining file:///content/lm-evaluation-harness (from -r requirements.txt (line 1))
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting word2number (from lm_eval==0.4.2->-r requirements.txt (line 1))
  Downloading word2number-1.1.zip (9.7 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: lm_eval, word2number
  Building editable for lm_eval (pyproject.toml) ... [?25l[?25hdone
  Created wheel for lm_eval: filename=lm_eval-0.4.2-0.editable-py3-none-any.whl size=16122 sha256=d7992e6c42c95373e853f0b15763ea60c43adbb92d67297b3e2611d2c491950d
  Stored in directory: /tmp/pip-ephem-wheel-cache-2cr0vmdy/wheels/dc/8d/a0/ce1a137b6a29fcf5007da91566ee423695e01d20703991091d
  Building wheel for word2number (setup.py) ... [?25l[?25hdone
  Cr

In [None]:
from lm_eval import api

Downloading builder script:   0%|          | 0.00/5.67k [00:00<?, ?B/s]

### Install auto-gptq

In [None]:
!pip install -U -q auto-gptq

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.5/23.5 MB[0m [31m60.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.2/13.2 MB[0m [31m97.4 MB/s[0m eta [36m0:00:00[0m
[?25h

### Install autoawq

In [None]:
!pip install autoawq

Collecting autoawq
  Downloading autoawq-0.2.4-cp310-cp310-manylinux2014_x86_64.whl (80 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/80.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.8/80.8 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Collecting transformers<=4.38.2,>=4.35.0 (from autoawq)
  Downloading transformers-4.38.2-py3-none-any.whl (8.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.5/8.5 MB[0m [31m82.6 MB/s[0m eta [36m0:00:00[0m
Collecting autoawq-kernels (from autoawq)
  Downloading autoawq_kernels-0.0.6-cp310-cp310-manylinux2014_x86_64.whl (33.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m33.4/33.4 MB[0m [31m48.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: transformers, autoawq-kernels, autoawq
  Attempting uninstall: transformers
    Found existing installation: transformers 4.39.3
    Uninstalling transformers-4.3

### lm_eval Help

In [None]:
!lm_eval --help

2024-04-04 12:09:59.369821: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-04 12:09:59.369881: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-04 12:09:59.371431: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
usage: lm_eval [-h] [--model MODEL] [--tasks task1,task2] [--model_args MODEL_ARGS]
               [--num_fewshot N] [--batch_size auto|auto:N|N] [--max_batch_size N]
               [--device DEVICE] [--output_path DIR|DIR/file.json] [--limit N|0<N<1]
               [--use_cache DIR] [--cache_requests {true,refresh,delete}] [--check_integrity]
               [--w

## NF4 Evaluation

### Evaluate Llama2-7b-chat-HF-NF4

#### MMLU

5-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
eval_model = "twhoool02/Llama2-7b-chat-HF-NF4"

# create directory to store results
results_dir = f"/content/drive/MyDrive/Evaluation/{eval_model}"
os.makedirs(results_dir, exist_ok=True)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama2-7b-chat-HF-NF4,trust_remote_code=True,do_sample=True \
    --tasks mmlu_stem,mmlu_social_sciences,mmlu_humanities,mmlu_other \
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Llama2-7b-chat-HF-NF4-MMLU

2024-03-27 19:04:03.632021: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-27 19:04:03.632068: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-27 19:04:03.633267: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240327_190410-8xjx900a[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### HellaSwag

0-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama2-7b-chat-HF-NF4,trust_remote_code=True, \
    --tasks hellaswag \
    --num_fewshot 0 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Llama2-7b-chat-HF-NF4-Hellaswag

2024-03-27 19:37:53.191273: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-27 19:37:53.191331: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-27 19:37:53.192783: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240327_193800-w3hb23f6[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### BoolQ

0-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama2-7b-chat-HF-NF4,trust_remote_code=True,do_sample=True \
    --tasks boolq \
    --num_fewshot 0 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Llama2-7b-chat-HF-NF4-boolq \
    --use_cache results_dir \
    --cache_requests true \
    --show_config

2024-03-28 10:39:30.323005: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-28 10:39:30.323054: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-28 10:39:30.324821: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240328_103937-lpwenptx[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### BBH

3-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama2-7b-chat-HF-NF4,trust_remote_code=True,do_sample=True \
    --tasks bbh_fewshot \
    --num_fewshot 3 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Llama2-7b-chat-HF-NF4-bbh \
    --use_cache results_dir \
    --cache_requests true

2024-03-27 19:47:27.656472: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-27 19:47:27.656523: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-27 19:47:27.657943: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240327_194734-slf8nnqc[0m
[34m[1mwandb[0m: Run [1m`wandb 

### Evaluate Falcon 7B Instruct NF4

In [None]:
eval_model = "twhoool02/Falcon-7B-instruct-NF4"

# create directory to store results
results_dir = f"/content/drive/MyDrive/Evaluation/{eval_model}"
os.makedirs(results_dir, exist_ok=True)

#### MMLU

5-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

revision number is used in the code to ensure a newer version of falcon is not downloaded automatically

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Falcon-7B-instruct-NF4,revision="60aea9b60738dcc3b7cb04b76de4bff38706a397" \
    --tasks mmlu_stem,mmlu_social_sciences,mmlu_humanities,mmlu_other\
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Falcon-7B-instruct-NF4-MMLU \
    --use_cache results_dir \
    --cache_requests true \
    --show_config

2024-03-29 19:29:49.999981: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-29 19:29:50.000041: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-29 19:29:50.001660: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240329_192957-6txf8er2[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### HellaSwag

0-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Falcon-7B-instruct-NF4,revision="60aea9b60738dcc3b7cb04b76de4bff38706a397" \
    --tasks hellaswag \
    --num_fewshot 0 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_hellaswag \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Falcon-7B-instruct-NF4-Hellaswag \
    --use_cache results_dir_hellaswag \
    --cache_requests true \
    --show_config

2024-03-29 20:24:42.233366: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-29 20:24:42.233419: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-29 20:24:42.235078: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240329_202448-b9q0sfwh[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### BoolQ

0-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Falcon-7B-instruct-NF4,revision="60aea9b60738dcc3b7cb04b76de4bff38706a397" \
    --tasks boolq \
    --num_fewshot 0 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_boolq \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Falcon-7B-instruct-NF4-BoolQ \
    --use_cache results_dir_boolq \
    --cache_requests true \
    --show_config

2024-03-29 20:31:20.465992: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-29 20:31:20.466045: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-29 20:31:20.467493: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240329_203127-odc7rrpp[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### BBH - no score returned

3-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Falcon-7B-instruct-NF4,revision="60aea9b60738dcc3b7cb04b76de4bff38706a397",do_sample=True \
    --tasks bbh_fewshot \
    --num_fewshot 3 \
    --device cuda:0 \
    --batch_size 2 \
    --verbosity INFO \
    --output_path results_dir_falcon_bbh \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Falcon-7B-instruct-NF4-BBH \
    --use_cache results_dir_falcon_bbh \
    --cache_requests true \
    --write_out \
    --show_config

2024-04-04 12:10:11.198989: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-04 12:10:11.199047: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-04 12:10:11.200987: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.6
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240404_121017-ymp3km50[0m
[34m[1mwandb[0m: Run [1m`wandb 

### Evaluate Mistral-7B-Instruct-NF4

In [None]:
eval_model = "twhoool02/Mistral-7B-Instruct-NF4"

# create directory to store results
results_dir = f"/content/drive/MyDrive/Evaluation/{eval_model}"
os.makedirs(results_dir, exist_ok=True)

#### MMLU

5-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

revision number is used in the code to ensure an newer version of falcon is not downloaded automatically

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Mistral-7B-Instruct-NF4,trust_remote_code=True,revision="d09342dee0fb90a04bcad6cb0e7bb41dc86179e1" \
    --tasks mmlu_stem,mmlu_social_sciences,mmlu_humanities,mmlu_other\
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_mistral_mmlu \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Mistral-7B-Instruct-NF4-MMLU \
    --use_cache results_dir_mistral_mmlu \
    --cache_requests true \
    --show_config

2024-03-28 17:05:22.783597: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-28 17:05:22.783653: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-28 17:05:22.784853: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240328_170529-t0283alp[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### HellaSwag

0-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Mistral-7B-Instruct-NF4,trust_remote_code=True,revision="d09342dee0fb90a04bcad6cb0e7bb41dc86179e1" \
    --tasks hellaswag \
    --num_fewshot 0 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_mistral_hellaswag \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Mistral-7B-Instruct-NF4-hellaswag \
    --use_cache results_dir_mistral_hellaswag \
    --cache_requests true \
    --show_config

2024-03-28 17:38:30.386636: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-28 17:38:30.386687: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-28 17:38:30.388185: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240328_173837-f7n6rdh2[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### BoolQ

0-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Mistral-7B-Instruct-NF4,trust_remote_code=True,revision="d09342dee0fb90a04bcad6cb0e7bb41dc86179e1" \
    --tasks boolq \
    --num_fewshot 0 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_mistral_boolq \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Mistral-7B-Instruct-NF4-boolq \
    --use_cache results_dir_mistral_boolq \
    --cache_requests true \
    --show_config

2024-03-28 17:46:06.643397: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-28 17:46:06.643444: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-28 17:46:06.644889: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240328_174613-w386bjo0[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### BBH

3-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Mistral-7B-Instruct-NF4,trust_remote_code=True,revision="d09342dee0fb90a04bcad6cb0e7bb41dc86179e1" \
    --tasks bbh_fewshot\
    --num_fewshot 3 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_mistral_bbh \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Mistral-7B-Instruct-NF4-bbh \
    --use_cache results_dir_mistral_bbh \
    --cache_requests true \
    --show_config

2024-03-28 17:48:39.871466: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-28 17:48:39.871519: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-28 17:48:39.873003: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240328_174846-yiokzrro[0m
[34m[1mwandb[0m: Run [1m`wandb 

## AWQ Evaluation

### Evaluate twhoool02/Llama-2-7b-chat-hf-AWQ

In [None]:
eval_model = "twhoool02/Llama-2-7b-chat-hf-AWQ"

# create directory to store results
results_dir = f"/content/drive/MyDrive/Evaluation/{eval_model}-awq"
os.makedirs(results_dir, exist_ok=True)

#### MMLU

5-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama-2-7b-chat-hf-AWQ,trust_remote_code=True,revision="af57a3ea9007748b0d9385ffca5a8164889b4901" \
    --tasks mmlu_stem,mmlu_social_sciences,mmlu_humanities,mmlu_other\
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_mmlu_awq \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Llama-2-7b-chat-hf-AWQ-mmlu1 \
    --use_cache results_dir_mmlu_awq \
    --cache_requests true \
    --show_config

2024-03-29 10:55:19.846231: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-29 10:55:19.846276: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-29 10:55:19.847981: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240329_105526-suymbfwg[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### HellaSwag

0-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama-2-7b-chat-hf-AWQ,trust_remote_code=True,revision="af57a3ea9007748b0d9385ffca5a8164889b4901" \
    --tasks hellaswag \
    --num_fewshot 0 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_awq_hellaswag \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Llama-2-7b-chat-hf-AWQ-hellaswag1 \
    --use_cache results_dir_awq_hellaswag \
    --cache_requests true \
    --show_config

2024-03-29 11:27:07.792106: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-29 11:27:07.792155: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-29 11:27:07.793707: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240329_112714-dowlx51s[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### BoolQ

0-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama-2-7b-chat-hf-AWQ,trust_remote_code=True,revision="af57a3ea9007748b0d9385ffca5a8164889b4901" \
    --tasks boolq\
    --num_fewshot 0 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_awq_boolq \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Llama-2-7b-chat-hf-AWQ-boolq1 \
    --use_cache results_dir_awq_boolq \
    --cache_requests true \
    --show_config

2024-03-29 11:35:02.693144: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-29 11:35:02.693206: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-29 11:35:02.694635: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240329_113509-vgv9jx7v[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### BBH

3-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama-2-7b-chat-hf-AWQ,trust_remote_code=True,revision="af57a3ea9007748b0d9385ffca5a8164889b4901" \
    --tasks bbh_fewshot \
    --num_fewshot 3 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_awq_bbh \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Llama-2-7b-chat-hf-AWQ-bbh1 \
    --use_cache results_dir_awq_bbh \
    --cache_requests true \
    --show_config

2024-03-29 11:37:16.237668: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-29 11:37:16.237723: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-29 11:37:16.239194: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240329_113723-1e9wpvxg[0m
[34m[1mwandb[0m: Run [1m`wandb 

# GPTQ Evaluation

### Evaluate Llama-2-7b-hf-AutoGPTQ


In [None]:
eval_model = "twhoool02/Llama-2-7b-chat-hf-AutoGPTQ"

# create directory to store results
results_dir = f"/content/drive/MyDrive/Evaluation/{eval_model}"
os.makedirs(results_dir, exist_ok=True)

#### MMLU

5-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama-2-7b-chat-hf-AutoGPTQ,trust_remote_code=True,revision="b63b90c0fc11dfe983fb80172de0a8b8ae89224e" \
    --tasks mmlu_stem,mmlu_social_sciences,mmlu_humanities,mmlu_other \
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_gptq_mmlu \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Llama-2-7b-hf-AutoGPTQ-mmlu \
    --use_cache results_dir_gptq_mmlu \
    --cache_requests true \
    --show_config

2024-04-03 23:00:59.674547: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-03 23:00:59.674599: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-03 23:00:59.676427: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.6
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240403_230106-riufy7f1[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### HellaSwag

0-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama-2-7b-chat-hf-AutoGPTQ,trust_remote_code=True,revision="b63b90c0fc11dfe983fb80172de0a8b8ae89224e"\
    --tasks hellaswag \
    --num_fewshot 0 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_gptq_hellaswag \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Llama-2-7b-hf-AutoGPTQ-hellaswag \
    --use_cache results_dir_gptq_hellaswag \
    --cache_requests true \
    --show_config

2024-04-03 19:23:26.463951: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-03 19:23:26.464002: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-03 19:23:26.465467: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240403_192333-fk49ju70[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### BoolQ

0-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama-2-7b-chat-hf-AutoGPTQ,trust_remote_code=True,revision="b63b90c0fc11dfe983fb80172de0a8b8ae89224e" \
    --tasks boolq \
    --num_fewshot 0 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_gptq_boolq \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Llama-2-7b-hf-AutoGPTQ-boolq \
    --use_cache results_dir_gptq_boolq \
    --cache_requests true \
    --show_config

2024-04-03 19:31:26.400476: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-03 19:31:26.400536: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-03 19:31:26.402049: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240403_193133-syuthoup[0m
[34m[1mwandb[0m: Run [1m`wandb 

#### BBH

3-Shot is used when running this evaluation to match the values used in the document [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama-2-7b-chat-hf-AutoGPTQ,trust_remote_code=True,revision="b63b90c0fc11dfe983fb80172de0a8b8ae89224e" \
    --tasks bbh_fewshot \
    --num_fewshot 3 \
    --device cuda:0 \
    --batch_size auto:4 \
    --verbosity INFO \
    --output_path results_dir_gptq_boolq \
    --log_samples \
    --wandb_args project=quantized_model_evaluation,name=Llama-2-7b-hf-AutoGPTQ-bbh\
    --use_cache results_dir_gptq_boolq \
    --cache_requests true \
    --show_config

2024-04-03 19:33:54.408547: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-03 19:33:54.408605: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-03 19:33:54.410113: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: Currently logged in as: [33mted-whooley[0m ([33matu-twhool02[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Tracking run with wandb version 0.16.5
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/content/lm-evaluation-harness/wandb/run-20240403_193401-ratm6i5x[0m
[34m[1mwandb[0m: Run [1m`wandb 

## Evaluate Falcon-7B-finetuned-guanaco-NF4-QLORA - works

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Falcon-7B-finetuned-guanaco-NF4-QLORA\
    --tasks hellaswag,mmlu_stem \
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:4

2024-03-03 13:55:01.907433: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-03 13:55:01.907481: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-03 13:55:01.908785: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-03:13:55:06,338 INFO     [__main__.py:217] Verbosity set to INFO
2024-03-03:13:55:06,338 INFO     [__init__.py:369] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-03-03:13:55:12,054 INFO     [__main__.py:293] Selected Tasks: ['hellaswag', 'mmlu_stem']
202

## Evaluate llama-2-7b-finetuned-guanaco-NF4-QLORA - works

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/llama-2-7b-finetuned-guanaco-NF4-QLORA \
    --tasks hellaswag,mmlu_stem \
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:4

2024-03-03 14:38:05.888026: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-03 14:38:05.888076: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-03 14:38:05.889325: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-03:14:38:10,267 INFO     [__main__.py:217] Verbosity set to INFO
2024-03-03:14:38:10,267 INFO     [__init__.py:369] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-03-03:14:38:15,934 INFO     [__main__.py:293] Selected Tasks: ['hellaswag', 'mmlu_stem']
202

## Evaluate Mistral-7B-v0.1-finetuned-guanaco-NF4-QLORA - working

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Mistral-7B-v0.1-finetuned-guanaco-NF4-QLORA \
    --tasks hellaswag,mmlu_stem \
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:4

2024-03-03 15:24:29.737385: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-03 15:24:29.737437: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-03 15:24:29.738700: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-03:15:24:34,149 INFO     [__main__.py:217] Verbosity set to INFO
2024-03-03:15:24:34,150 INFO     [__init__.py:369] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-03-03:15:24:39,759 INFO     [__main__.py:293] Selected Tasks: ['hellaswag', 'mmlu_stem']
202

## Evaluate Falcon-7B-AutoGPTQ - not working

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Falcon-7B-AutoGPTQ \
    --tasks hellaswag,mmlu_stem \
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size 32

2024-03-03 13:18:23.158786: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-03 13:18:23.158828: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-03 13:18:23.160135: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-03:13:18:28,128 INFO     [__main__.py:217] Verbosity set to INFO
2024-03-03:13:18:28,128 INFO     [__init__.py:369] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-03-03:13:18:33,907 INFO     [__main__.py:293] Selected Tasks: ['hellaswag', 'mmlu_stem']
202

## Evaluate Mistral-7B-AutoGPTQ - not working

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Mistral-7B-AutoGPTQ \
    --tasks hellaswag,mmlu_stem \
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:4

2024-03-03 12:05:41.794566: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-03 12:05:41.794617: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-03 12:05:41.795793: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-03:12:05:46,879 INFO     [__main__.py:217] Verbosity set to INFO
2024-03-03:12:05:46,879 INFO     [__init__.py:369] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-03-03:12:05:52,497 INFO     [__main__.py:293] Selected Tasks: ['hellaswag', 'mmlu_stem']
202

## Evaluate Llama-2-7b-chat-hf-AWQ - working

In [None]:
!lm_eval --model hf \
    --model_args pretrained=twhoool02/Llama-2-7b-chat-hf-AWQ \
    --tasks hellaswag,mmlu_stem \
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:4

2024-03-16 13:15:06.722580: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-16 13:15:06.722633: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-16 13:15:06.724408: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-03-16:13:15:11,543 INFO     [__main__.py:225] Verbosity set to INFO
2024-03-16:13:15:11,543 INFO     [__init__.py:373] lm_eval.tasks.initialize_tasks() is deprecated and no longer necessary. It will be removed in v0.4.2 release. TaskManager will instead be used.
2024-03-16:13:15:17,383 INFO     [__main__.py:311] Selected Tasks: ['hellaswag', 'mmlu_stem']
202