Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train subclass in Coco data set #13798

Open
1 task done
dat-nguyenvn opened this issue Jun 19, 2024 · 4 comments
Open
1 task done

Train subclass in Coco data set #13798

dat-nguyenvn opened this issue Jun 19, 2024 · 4 comments
Labels
question Further information is requested

Comments

@dat-nguyenvn
Copy link

Search before asking

Question

How can I re-train 3 subclasses in Coco dataset using pretrain weights on my data? (I don't need add new class)

Additional

My dataset has 3 classes : zebra, elephant, giraffe. The Coco dataset also has 3 classes.
1.How can I re-train 3 subclasses in Coco dataset using pretrain weights on my data? (I don't need add new class)
2. You support the filter of model for the predict task by argument: classes=[20,22,23] ;

If I set the argument classes=[20,22,23] (correspond with zebra, elephant, giraffe. ) for training model as:
results = model.train(data="coco8-seg.yaml", classes=[20,22,23] ,epochs=100, imgsz=640)

Is my yaml file correct?:
nc=3 names: 20: elephant 22: zebra 23: giraffe

  1. Or should I download all COCO dataset, then split 3 subclasses in COCO, then mix with my data, then train from scratch?

Thank you!
Dat

@dat-nguyenvn dat-nguyenvn added the question Further information is requested label Jun 19, 2024
Copy link

👋 Hello @dat-nguyenvn, thank you for your interest in Ultralytics YOLOv8 🚀! We recommend a visit to the Docs for new users where you can find many Python and CLI usage examples and where many of the most common questions may already be answered.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Join the vibrant Ultralytics Discord 🎧 community for real-time conversations and collaborations. This platform offers a perfect space to inquire, showcase your work, and connect with fellow Ultralytics users.

Install

Pip install the ultralytics package including all requirements in a Python>=3.8 environment with PyTorch>=1.8.

pip install ultralytics

Environments

YOLOv8 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

Ultralytics CI

If this badge is green, all Ultralytics CI tests are currently passing. CI tests verify correct operation of all YOLOv8 Modes and Tasks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@dat-nguyenvn hi Dat,

Thank you for your question! Let's address your queries step-by-step:

  1. Re-training 3 subclasses in COCO dataset using pre-trained weights on your data:

    To re-train a model on specific classes from the COCO dataset using pre-trained weights, you can filter the classes during training. However, the classes argument is not directly applicable in the train method. Instead, you should create a custom dataset configuration file that includes only the classes you are interested in. Here's how you can do it:

    1. Create a custom dataset YAML file:

      # custom_coco.yaml
      path: ../datasets/coco  # path to your dataset
      train: images/train2017  # train images (relative to 'path')
      val: images/val2017  # val images (relative to 'path')
      test:  # test images (optional)
      
      nc: 3  # number of classes
      names: ['zebra', 'elephant', 'giraffe']  # class names
    2. Filter your dataset:
      You will need to filter the COCO dataset to include only the images containing your classes of interest. This can be done using a script to process the COCO annotations and create a new dataset with only the specified classes.

    3. Train the model:

      from ultralytics import YOLO
      
      # Load a pre-trained model
      model = YOLO('yolov8n.pt')
      
      # Train the model on your custom dataset
      results = model.train(data='custom_coco.yaml', epochs=100, imgsz=640)
  2. Using the classes argument in training:
    The classes argument is used for filtering predictions during inference, not for training. For training, you should use a custom dataset configuration file as shown above.

  3. Downloading and splitting the COCO dataset:
    If you prefer, you can download the entire COCO dataset, filter out the images containing only the classes of interest, and then mix them with your custom data. This approach ensures that you have a balanced dataset for training.

    Here's a brief outline of the steps:

    • Download the COCO dataset.
    • Filter the dataset to include only images with zebra, elephant, and giraffe.
    • Combine this filtered dataset with your custom data.
    • Create a custom dataset YAML file as shown above.
    • Train the model using the combined dataset.

I hope this helps! If you encounter any issues or need further assistance, feel free to ask. Happy training! 🚀

@dat-nguyenvn
Copy link
Author

Thank for your feed back!
Do you have any quick tips/code to download and filter CoCo dataset as you mention?
Dat

@glenn-jocher
Copy link
Member

@dat-nguyenvn hi Dat,

Thank you for your follow-up! Here's a quick guide to help you download and filter the COCO dataset for the classes zebra, elephant, and giraffe.

Step-by-Step Guide

  1. Download the COCO Dataset:
    You can use the cocoapi to download the dataset. Here's a script to get you started:

    from pycocotools.coco import COCO
    import requests
    import os
    
    # Define paths
    data_dir = '../datasets/coco'
    img_dir = os.path.join(data_dir, 'images/train2017')
    ann_file = os.path.join(data_dir, 'annotations/instances_train2017.json')
    
    # Initialize COCO API
    coco = COCO(ann_file)
    
    # Define the classes of interest
    class_names = ['zebra', 'elephant', 'giraffe']
    class_ids = coco.getCatIds(catNms=class_names)
    
    # Get all image IDs for the specified classes
    img_ids = coco.getImgIds(catIds=class_ids)
    imgs = coco.loadImgs(img_ids)
    
    # Create directory for filtered images
    os.makedirs(img_dir, exist_ok=True)
    
    # Download images
    for img in imgs:
        img_data = requests.get(img['coco_url']).content
        with open(os.path.join(img_dir, img['file_name']), 'wb') as handler:
            handler.write(img_data)
  2. Filter Annotations:
    After downloading the images, you need to filter the annotations to include only the specified classes:

    import json
    
    # Load annotations
    with open(ann_file, 'r') as f:
        annotations = json.load(f)
    
    # Filter annotations
    filtered_annotations = {
        'images': [img for img in annotations['images'] if img['id'] in img_ids],
        'annotations': [ann for ann in annotations['annotations'] if ann['category_id'] in class_ids],
        'categories': [cat for cat in annotations['categories'] if cat['id'] in class_ids]
    }
    
    # Save filtered annotations
    filtered_ann_file = os.path.join(data_dir, 'annotations/instances_train2017_filtered.json')
    with open(filtered_ann_file, 'w') as f:
        json.dump(filtered_annotations, f)
  3. Update Dataset YAML:
    Create a custom dataset YAML file to point to your filtered dataset:

    # custom_coco.yaml
    path: ../datasets/coco
    train: images/train2017
    val: images/val2017  # Update this similarly if you filter the validation set
    test:  # Optional
    
    nc: 3
    names: ['zebra', 'elephant', 'giraffe']
  4. Train the Model:
    Finally, train your model using the custom dataset configuration:

    from ultralytics import YOLO
    
    # Load a pre-trained model
    model = YOLO('yolov8n.pt')
    
    # Train the model on your custom dataset
    results = model.train(data='custom_coco.yaml', epochs=100, imgsz=640)

This should help you get started with downloading and filtering the COCO dataset for your specific classes. If you have any further questions, feel free to ask! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants