## Load Dataset from Hugging Face
- The original LLaVA repo asks us to load the MME data from the [Github repo](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation)
- But we are instead using [lmms-lab/MME](https://huggingface.co/datasets/lmms-lab/MME)
- This notebook explains the code behind the evaluation for the new script mme.sh
- You can also run it to eval the model on Maya

In [1]:
from datasets import load_from_disk, load_dataset

# Load the MME dataset from Hugging Face
# mme_dataset = load_dataset("lmms-lab/MME")
save_dir = "../../../playground/data/eval/MME"

# # Save the dataset locally to the specified directory
# mme_dataset.save_to_disk(save_dir)

# If already downloaded uncomment this and comment the downloading code
mme_dataset = load_from_disk(save_dir)

## Dataset Analysis
### Dataset Structure

In [None]:
print("Dataset Structure")
print(mme_dataset)
print()

print("Data Row")
print(mme_dataset['test'][0])

### Check how many types of image extensions 
- Note that we only have one split, 'test'

In [None]:
import pandas as pd

mme_df = pd.DataFrame(mme_dataset['test'])
#
mme_df['image_name'] = mme_df['question_id'].apply(lambda x: x.split('/')[-1])
mme_df['extension'] = mme_df['image_name'].apply(lambda x: x.split('.')[-1])

# Get a list of unique image extensions
unique_extensions = mme_df['extension'].unique()

# Display the unique extensions
print(f"Unique extensions in test split: {unique_extensions}")


### Difference between LLaVA MME jsonl file and Hugging Face MME
- The Hugging Face MME file stores images in `category/image.png`
- But the LLaVA repo's `llava_mme.json` expects images to be stored in `category/image/image.png` for categories `artwork`, `celebrity`, `landmark`, `scene`, and `posters`.
- Later in the image-saving code, we need to account for this

In [None]:
import pandas as pd
import json


with open("../../../playground/data/eval/MME/llava_mme.jsonl", 'r') as f:
    jsonl_data = [json.loads(line) for line in f] 

mme_jsonl_df = pd.DataFrame(jsonl_data)

# Filter rows where 'question_id' starts with 'scene'
mme_df[mme_df['question_id'].str.startswith('scene')].head(2)

In [None]:
mme_jsonl_df
mme_jsonl_df[mme_jsonl_df['question_id'].str.startswith('scene')].head(2)

## Save images to folder
- Instead of taking MME images from the GitHub, we use `lmms-lab/MME` on Hugging Face instead.
- We could just use HF to eval the model, however, we want to stay to the llava structure/scripts as much as possible
- Therefore, we will stick to using the original files as part of [eval.zip](https://drive.google.com/file/d/1atZSBBrAX54yYpxtVVW33zFvcnaHeFPy/view?usp=sharing), and the original structure from the LLaVA repo:
    - `./playground/data/eval/MME/llava_mme.jsonl` as our questions file
    - `./playground/data/eval/MME/MME_Benchmark_release_version` is our images folder
    - `./playground/data/eval/MME/convert_answer_to_mme.py` 
- We extract images to `./playground/data/eval/MME/MME_Benchmark_release_version` 
- And then we run the original `model_vqa_loader.py` and the `calculation.py`


In [4]:
# to run to clear loaded images and questions in case of an error
# ! rm -rf  ../../../playground/data/eval/MME/MME_Benchmark_release_version/

In [2]:
from PIL import Image
import os

image_save_base_path = "../../../playground/data/eval/MME/MME_Benchmark_release_version"

# ensure the base path exists
os.makedirs(image_save_base_path, exist_ok=True)

# categories that should have the extra subdirectory for images
special_categories = ["artwork", "celebrity", "landmark", "scene", "posters"]

# save an image to the specified directory
def save_images(example):
    image = example['image']
    image_subdir = example['question_id']  # category/image.png
    category = image_subdir.split('/')[0]  # extract the category (first part of question_id)
    file_name = os.path.basename(image_subdir)  # extract the file name (including extension)

    # get the image extension (e.g., ".png", ".jpg")
    _, extension = os.path.splitext(file_name)

    # check if the category is one of the special ones that needs an extra 'images' folder
    if category in special_categories:
        # save in the category/images/ structure (e.g., artwork/images/16006.jpg)
        full_save_dir = os.path.join(image_save_base_path, category, "images")
    else:
        # save in the usual category/image.png structure (e.g., code_reasoning/0012.png)
        full_save_dir = os.path.join(image_save_base_path, category)

    # create the subdirectory if it doesn't exist
    os.makedirs(full_save_dir, exist_ok=True)

    # create the full image save path
    image_save_path = os.path.join(full_save_dir, file_name)

    # save the image to the specified path in the correct format
    if extension.lower() == ".jpg" or extension.lower() == ".jpeg":
        image.save(image_save_path, format="JPEG")
    elif extension.lower() == ".png":
        image.save(image_save_path, format="PNG")
    else:
        # handle other formats or default to PNG (though this shouldn't happen!)
        image.save(image_save_path, format="PNG")

    return {'image_save_path': image_save_path}

# apply this function to the dataset (for instance, on the 'test' split)
_ = mme_dataset['test'].map(save_images)


Map:   0%|          | 0/2374 [00:00<?, ? examples/s]

## Save Y/N Answers to Folder
- This is needed by `convert_answer_to_mme.py`

In [3]:
# # Uncomment to delete all txt files
# ! find ../../../playground/data/eval/MME/MME_Benchmark_release_version -type f -name "*.txt" -delete


In [None]:
import os

answer_save_base_path = "../../../playground/data/eval/MME/MME_Benchmark_release_version"

# categories that should have the extra subdirectory for Y/N answers
special_categories = ["artwork", "celebrity", "landmark", "scene", "posters"]

# save Y/N answers to the specified directory
def save_answer(example):
    question_id = example['question_id']  # category/image.png
    question = example['question']
    answer = example['answer']  
    
    category = question_id.split('/')[0]  # extract category (first part of question_id)
    image_name = os.path.basename(question_id)  # extract the file name (including extension)
    image_basename = os.path.splitext(image_name)[0]  # remove the extension for the text file

    # determine where to save the Y/N answer file
    if category in special_categories:
        # special categories will have a "questions_answers_YN" subdirectory
        full_save_dir = os.path.join(answer_save_base_path, category, "questions_answers_YN")
    else:
        # non-special categories will save directly in the category folder
        full_save_dir = os.path.join(answer_save_base_path, category)

    # create the subdirectory if it doesn't exist
    os.makedirs(full_save_dir, exist_ok=True)

    # define the full path for saving the answer file
    answer_save_path = os.path.join(full_save_dir, f"{image_basename}.txt")

    # append the question and answer into the text file (use 'a' mode to append)
    with open(answer_save_path, 'a') as answer_file:
        answer_file.write(f"{question}\t{answer}\n")

    return {'answer_save_path': answer_save_path}


_ = mme_dataset['test'].map(save_answer)

print("Y/N answers have been processed and saved successfully.")


In [None]:

import subprocess
import os

project_root = "../../../"  # Path to the root of your project

# Set the current working directory to the project root and run the scripts
# subprocess.run([
#     "python", "-m", "llava.eval.model_vqa_loader",
#     "--model-path", "nahidalam/maya_full_ft",
#     "--model-base", "CohereForAI/aya-23-8B",
#     "--question-file", "./playground/data/eval/MME/llava_mme.jsonl",
#     "--image-folder", "./playground/data/eval/MME/MME_Benchmark_release_version",
#     "--answers-file", "./playground/data/eval/MME/answers/llava-v1.5-13b.jsonl",
#     "--temperature", "0",
#     "--conv-mode", "aya",
# ], cwd=project_root)

# Similarly, for the other subprocess calls
# subprocess.run([
#     "python", "convert_answer_to_mme.py", 
#     "--experiment", "llava-v1.5-13b"], 
#     cwd=os.path.join(project_root, "playground/data/eval/MME"))

subprocess.run([
    "python", "calculation.py", 
    "--results_dir", "answers/llava-v1.5-13b"], 
    cwd=os.path.join(project_root, "playground/data/eval/MME/eval_tool"))
