Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexplained behavior of the Validation of a Classification model #13653

Open
1 of 2 tasks
Himmelsw4nderer opened this issue Jun 15, 2024 · 4 comments
Open
1 of 2 tasks
Labels
bug Something isn't working

Comments

@Himmelsw4nderer
Copy link

Himmelsw4nderer commented Jun 15, 2024

Search before asking

  • I have searched the YOLOv8 issues and found no similar bug report.

YOLOv8 Component

Val, Predict

Bug

For my studies, I tried implementing and training a classification on the following dataset: https://www.kaggle.com/datasets/grassknoted/asl-alphabet

I used YOLOv8n-cls and trained for different amounts of epochs, the confusion Matrix was always something like that:
confusion_matrix_normalized

Which obviously doesn't look good at all.
I watched the loss values and the val/loss values, which at some point always come to a val/loss around 2.4259:

                  epoch,             train/loss,  metrics/accuracy_top1,  metrics/accuracy_top5,               val/loss,                 lr/pg0,                 lr/pg1,                 lr/pg2

                     29,                0.09865,                0.99994,                      1,                 2.4259,               0.007228,               0.007228,               0.007228

I found that based on the dataset already quite unusual. So I tried removing the background class and trained again, just because it always kind of predicted the background class.

confusion_matrix_normalized

I got good classification results. At that point, I wanted to calculate more metrics to find out how I can make the training better. The custom script used the normal predict method and generated for the same model as in the first confusion matrix and the same dataset the following confusion Matrix:

Confusion Matrix:
[[600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0 599   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 600]]

The code for the training and the code for the validation + the code for the custom validation is included below

Environment

Ultralytics YOLOv8.2.32 🚀 Python-3.12.3 torch-2.3.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3070 Ti, 8084MiB)
Setup complete ✅ (24 CPUs, 31.2 GB RAM, 77.1/895.6 GB disk)

OS Linux-6.8.12-200.fsync.fc39.x86_64-x86_64-with-glibc2.38
Environment Linux
Python 3.12.3
Install pip
RAM 31.17 GB
CPU 13th Gen Intel Core(TM) i7-13700KF
CUDA 12.1

matplotlib ✅ 3.9.0>=3.3.0
opencv-python ✅ 4.10.0.82>=4.6.0
pillow ✅ 10.3.0>=7.1.2
pyyaml ✅ 6.0.1>=5.3.1
requests ✅ 2.32.3>=2.23.0
scipy ✅ 1.13.1>=1.4.1
torch ✅ 2.3.1>=1.8.0
torchvision ✅ 0.18.1>=0.9.0
tqdm ✅ 4.66.4>=4.64.0
psutil ✅ 5.9.8
py-cpuinfo ✅ 9.0.0
pandas ✅ 2.2.2>=1.1.4
seaborn ✅ 0.13.2>=0.11.0
ultralytics-thop ✅ 0.2.8>=0.2.5

Minimal Reproducible Example

train.py

from ultralytics.models.yolo.classify import ClassificationTrainer

# List of model configurations and data folds
model_sizes = ['yolov8n-cls.pt']

datas = ['fold_4']

# Training parameters
epochs = 100
patience = 5

# Loop through each model size and data fold
for model in model_sizes:
    for data in datas:
        args = {
            'model': model,
            'data': data,
            'epochs': epochs,
            'patience': patience,
        }

        # Initialize the trainer with the specified arguments
        trainer = ClassificationTrainer(overrides=args)

        # Train the model with the found optimal learning rate
        trainer.train()
        trainer.final_eval()

validation.py

from ultralytics.models.yolo.classify import ClassificationValidator

models = [
    "runs/classify/train33/weights/best.pt",
]
datasets = [
    "fold_4",
]

for model, data in zip(models, datasets):

    args = dict(model=model, data=data)
    validator = ClassificationValidator(args=args)
    metrics = validator()
    print(metrics)

custom_val.py

import os
import cv2
import numpy as np
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score, confusion_matrix
from ultralytics import YOLO

# Load your trained YOLO classification model
model = YOLO('runs/classify/train33/weights/best.pt')

# Function to load images from a directory


def load_images_from_folder(folder):
    images = []
    labels = []
    class_names = os.listdir(folder)
    for class_name in class_names:
        class_folder = os.path.join(folder, class_name)
        if os.path.isdir(class_folder):
            for filename in os.listdir(class_folder):
                img_path = os.path.join(class_folder, filename)
                img = cv2.imread(img_path)
                if img is not None:
                    # Convert to grayscale if necessary
                    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
                    images.append(img)
                    labels.append(class_name)
    return images, labels, class_names

# Function to perform validation


def validate_model(folder):
    images, true_labels, class_names = load_images_from_folder(folder)
    class_names = sorted(class_names)
    y_true = []
    y_pred = []
    confidences = []

    for img, true_label in zip(images, true_labels):
        results = model(img)
        for result in results:
            probs = result.probs.cpu().numpy()
            print(probs)
            class_idx = np.argmax(probs.data)
            predicted_label = class_names[class_idx]
            confidence = probs.data[class_idx]

            y_true.append(true_label)
            y_pred.append(predicted_label)
            confidences.append(confidence)

    # Calculate metrics
    y_true_idx = [class_names.index(label) for label in y_true]
    y_pred_idx = [class_names.index(label) for label in y_pred]

    avg_confidence = np.mean(confidences)
    precision = precision_score(y_true_idx, y_pred_idx, average='weighted')
    recall = recall_score(y_true_idx, y_pred_idx, average='weighted')
    f1 = f1_score(y_true_idx, y_pred_idx, average='weighted')
    accuracy = accuracy_score(y_true_idx, y_pred_idx)
    conf_matrix = confusion_matrix(y_true_idx, y_pred_idx)

    print(f'Average Confidence: {avg_confidence:.2f}')
    print(f'Precision: {precision:.2f}')
    print(f'Recall: {recall:.2f}')
    print(f'F1 Score: {f1:.2f}')
    print(f'Accuracy: {accuracy:.2f}')
    print('Confusion Matrix:')
    print(conf_matrix)


# Specify the folder containing class subfolders with images
data_folder = 'datasets/fold_4/val'
validate_model(data_folder)

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@Himmelsw4nderer Himmelsw4nderer added the bug Something isn't working label Jun 15, 2024
Copy link

👋 Hello @Himmelsw4nderer, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@Himmelsw4nderer hi there,

Thank you for providing a detailed description of the issue and the relevant code snippets. Let's work through this step by step to understand and resolve the problem you're encountering with the validation of your classification model.

Initial Checks

  1. Reproducibility: To ensure we can reproduce the issue, could you please confirm that the provided code snippets are complete and can be run as-is? If there are any additional dependencies or configurations required, please include those details.

  2. Package Versions: Ensure you are using the latest versions of torch and ultralytics. You can update them using:

    pip install --upgrade torch ultralytics

Analysis

From your description, it seems like there are inconsistencies between the training and validation results, particularly with the confusion matrix. Here are a few points to consider:

  1. Training and Validation Data: Ensure that the training and validation datasets are correctly split and that there is no data leakage between them. This can significantly affect the model's performance.

  2. Background Class: You mentioned removing the background class improved the results. This suggests that the background class might be causing confusion during training. Ensure that the background class is correctly labeled and that the model is appropriately configured to handle it.

  3. Custom Validation Script: Your custom validation script seems to be working correctly, but the discrepancy between the training and validation confusion matrices suggests there might be an issue with how the data is being processed or evaluated.

Suggested Steps

  1. Verify Data Splits: Double-check your data splits to ensure there is no overlap between training and validation datasets.

  2. Model Configuration: Ensure that the model configuration is consistent between training and validation. This includes image preprocessing steps, class labels, and any augmentation techniques used.

  3. Use Built-in Validation: To isolate the issue, try using the built-in validation method provided by Ultralytics without any custom scripts. This can help determine if the issue lies within the custom validation logic.

Example Code for Built-in Validation

Here’s an example of how to use the built-in validation method:

from ultralytics import YOLO

# Load your trained model
model = YOLO('runs/classify/train33/weights/best.pt')

# Validate the model on the validation dataset
results = model.val(data='datasets/fold_4/val')
print(results)

Additional Resources

For more detailed information on validation, you can refer to the Ultralytics documentation on model validation.

If the issue persists after these checks, please provide any additional logs or error messages that might help in diagnosing the problem further.

Feel free to reach out with any more questions or updates on your progress. We're here to help! 😊

@Himmelsw4nderer
Copy link
Author

Hi,
there were some issues in the formatting of markup, so the code was incomplete at the top.

I was already on the newest versions of ultralytics and torch

Using:

from ultralytics import YOLO

# Load your trained model
model = YOLO('runs/classify/train33/weights/best.pt')

# Validate the model on the validation dataset
results = model.val(data='datasets/fold_4')
print(results)

results in the same confusion matrix as my custom script:
confusion_matrix

So only the ClassificationValidator class seems to result in a confusing confusion matrix

@glenn-jocher
Copy link
Member

Hi @Himmelsw4nderer,

Thank you for the update and for providing the additional details. It's great to hear that you're using the latest versions of ultralytics and torch.

Given that both the built-in validation method and your custom script yield the same confusion matrix, it seems the issue might be related to how the ClassificationValidator is processing the data.

Here are a few steps to help diagnose and potentially resolve the issue:

  1. Data Verification: Ensure that the dataset specified in datasets/fold_4 is correctly formatted and that the class labels are consistent across both training and validation datasets.

  2. Class Labels: Double-check that the class labels in your dataset are correctly mapped and that there are no discrepancies. Mislabeling can often lead to confusing results in the confusion matrix.

  3. Normalization and Preprocessing: Verify that the images are being preprocessed consistently during both training and validation. Differences in preprocessing steps can lead to unexpected results.

  4. Debugging with a Subset: Try running the validation on a smaller subset of your dataset to see if the issue persists. This can help isolate any potential data-related issues.

  5. Detailed Logging: Add detailed logging to your validation script to inspect the predictions and ground truth labels. This can help identify any discrepancies early on.

Here’s an example of how you might add logging to your custom validation script:

import os
import cv2
import numpy as np
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score, confusion_matrix
from ultralytics import YOLO

# Load your trained YOLO classification model
model = YOLO('runs/classify/train33/weights/best.pt')

def load_images_from_folder(folder):
    images = []
    labels = []
    class_names = os.listdir(folder)
    for class_name in class_names:
        class_folder = os.path.join(folder, class_name)
        if os.path.isdir(class_folder):
            for filename in os.listdir(class_folder):
                img_path = os.path.join(class_folder, filename)
                img = cv2.imread(img_path)
                if img is not None:
                    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
                    images.append(img)
                    labels.append(class_name)
    return images, labels, class_names

def validate_model(folder):
    images, true_labels, class_names = load_images_from_folder(folder)
    class_names = sorted(class_names)
    y_true = []
    y_pred = []
    confidences = []

    for img, true_label in zip(images, true_labels):
        results = model(img)
        for result in results:
            probs = result.probs.cpu().numpy()
            class_idx = np.argmax(probs.data)
            predicted_label = class_names[class_idx]
            confidence = probs.data[class_idx]

            y_true.append(true_label)
            y_pred.append(predicted_label)
            confidences.append(confidence)

            # Logging
            print(f"True Label: {true_label}, Predicted Label: {predicted_label}, Confidence: {confidence}")

    y_true_idx = [class_names.index(label) for label in y_true]
    y_pred_idx = [class_names.index(label) for label in y_pred]

    avg_confidence = np.mean(confidences)
    precision = precision_score(y_true_idx, y_pred_idx, average='weighted')
    recall = recall_score(y_true_idx, y_pred_idx, average='weighted')
    f1 = f1_score(y_true_idx, y_pred_idx, average='weighted')
    accuracy = accuracy_score(y_true_idx, y_pred_idx)
    conf_matrix = confusion_matrix(y_true_idx, y_pred_idx)

    print(f'Average Confidence: {avg_confidence:.2f}')
    print(f'Precision: {precision:.2f}')
    print(f'Recall: {recall:.2f}')
    print(f'F1 Score: {f1:.2f}')
    print(f'Accuracy: {accuracy:.2f}')
    print('Confusion Matrix:')
    print(conf_matrix)

# Specify the folder containing class subfolders with images
data_folder = 'datasets/fold_4/val'
validate_model(data_folder)

This should help you identify any inconsistencies in the predictions and ground truth labels.

If the issue persists, please let us know, and we can further investigate. Thank you for your patience and cooperation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants