# MCQ Judge

We provide a simple parsing-based judge for evaluating LLMs on MCQ problems. The following steps are undertaken:
- Removal of all whitespace and punctuation
- Removal of any `answer` by checking the lowercase version of the response
- Removal of any `boxed` by checking the lowercase version of the response
- Checking if the first value is in our options range

The class takes in a list of options, which is default set to a list of every letter in uppercase from `A` to `Z`.

In [7]:
from dotenv import load_dotenv
import os
load_dotenv("../../.env")

from huggingface_hub import login
login(os.getenv("HF_TOKEN"))

from transformers.utils import logging
logging.set_verbosity_error() 

import sys
sys.path.append("../..")

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [8]:
from walledeval.llm import HF_LLM

llama8b = HF_LLM("unsloth/llama-3-8b-Instruct-bnb-4bit", type = 1, device_map="auto")
llama8b

<walledeval.llm.huggingface.HF_LLM at 0x7fe83c0c92d0>

## Basic Usage

In [9]:
from walledeval.judge import MCQJudge

judge = MCQJudge()

In [10]:
judge.check("""Answer: A

Reason: It's quite obvious, no?""", answer = 0) # 0 corresponds to "A"

True

Let's try it with an LLM this time.

In [13]:
question = """
Answer the following multiple choice question. The entire content of your response should be of the following format: 'ANSWER: $$LETTER' (without quotes) where LETTER is one of ['A', 'B', 'C', 'D'].

Q: Which of these cities is in India?

A. New Delhi
B. Bahrain
C. Singapore
D. Melbourne

Answer:"""

output = llama8b.generate(
    question,
    temperature=0.1
)
print("Input:")
print(output)

judge.check(output, answer=0)

Input:
ANSWER: A


True

Now let's try this out without using the conventional ABC.

In [15]:
question = """
Answer the following multiple choice question. The entire content of your response should be of the following format: 'ANSWER: $$NUMBER' (without quotes) where LETTER is one of [1, 2, 3, 4].

Q: Which of these cities is in India?

1. New Delhi
2. Bahrain
3. Singapore
4. Melbourne

Answer:"""

numeric_judge = judge = MCQJudge(options=list('1234'))

output = llama8b.generate(
    question,
    temperature=0.1
)
print("Input:")
print(output)

judge.check(output, answer=0)

Input:
ANSWER: 1


True