3DGe-Aug: 3D Generative Model-Based Data Augmentation for Few-Shot Object Detection

Abstract: In few-shot object detection scenarios involving emerging, scarce, or specialized objects, the scarcity of diverse training samples severely constrains model performance. Recently, data augmentation methods leveraging 2D text-to-image generative models have emerged, but they exhibit limited generation capability for fine-grained categories and under-represented objects. To address this issue, this paper proposes 3DGe-Aug, a 3D generative model-based data augmentation framework for few-shot object detection. The framework constructs realistic 3D assets from limited 2D images using a 3D generative model, thereby addressing the rendering limitations of 2D generative models for fine-grained objects. Building upon these 3D assets, multi-perspective object images with rich viewpoints and spatial-geometric consistency are rendered through precise camera pose control. Furthermore, we design an instance pasting algorithm that incorporates multi-scale transformation and occlusion elimination mechanisms to synthesize training data. We conduct comprehensive experiments on two representative object detection scenarios: unmanned aerial vehicles (UAVs) and autonomous driving, validating the effectiveness of the proposed method on various object detection models such as RT-DETR and the YOLO series. To facilitate future research, we also contribute a few-shot object detection test dataset comprising 3 base classes and 6 novel classes, along with a corresponding 3D asset set. Our code and datasets are available at DataAug.

Fig.1 Overview of the proposed 3DGe-Aug framework for few-shot object detection data augmentation.

Fig.2 Multi-Perspective Image and Mask Generation.

1. Preparation

1.1 Code Preparation

Clone the project repository to your local machine:

git clone [https://github.com/xiejichao/DataAug.git](https://github.com/xiejichao/DataAug.git)
cd DataAug

1.2 Environment Installation

This project uses Conda for environment management. Please follow these steps to install the dependencies:

Ensure Anaconda or Miniconda is installed.
Create the environment using the provided environment.yml file:

conda env create -f environment.yml

Activate the environment:

conda activate MMLab_ultralytics

Environment Details:

Environment Name: MMLab_ultralytics
Python Version: 3.10.13
Core Dependencies: PyTorch 2.0.1+cu117, Ultralytics 8.4.14, OpenCV, etc.
Note: Ensure your CUDA version is compatible with PyTorch (this environment uses cu117).

1.3 Dataset Preparation

1.3.1 Dataset Download

This project utilizes the following datasets. Please download and place them in the specified directories as instructed:

Data	Purpose	Download Link	Remarks
few-shot-object-aerial	Few-shot object detection test set for UAV scenarios	Baidu Netdisk Extraction Code: NDSC	Contains 3 base classes and 6 novel classes
few-shot-object-ground	Few-shot object detection test set for autonomous driving scenarios	Baidu Netdisk Extraction Code: NDSC	Contains 3 base classes and 6 novel classes
VisDrone	UAV perspective object detection dataset	Official Link	-
BDD100K	Autonomous driving scenario object detection dataset	Official Link	-
3D Model Files (Optional)	Pre-generated 3D assets	Baidu Netdisk Extraction Code: NDSC	Includes `3Dmodels_500k` and `3Dmodels_bymultiview`

The few-shot-object-aerial dataset can also be viewed and downloaded on the Ultralytics Platform.

The few-shot-object-ground dataset can also be viewed and downloaded on the Ultralytics Platform.

Suggested Directory Structure:

3DGenerate/
├── dataset/
│   ├── VisDrone_AdjLab/
│   ├── bdd100k_AdjLab/
│   ├── few-shot-object-aerial-v2/
│   └── few-shot-object-ground-v2/
└── 3Dmodels_500k/
└── 3Dmodels_bymultiview/

1.3.2 Dataset Label Adjustment

To unify the experimental setup, we adjust the labels of both the VisDrone and BDD100K datasets to 3 base classes (car, truck, bus).

VisDrone Dataset Label Adjustment

Label Mapping Relationship:

Original Classes (10 classes): pedestrian, person, bicycle, car, van, truck, tricycle, awning-tricycle, bus, motor
New Classes (3 classes): car, truck, bus
Mapping Rule: Both car and van are mapped to car.

Execute Label Adjustment:

# Validation set label conversion
python dataset_label_filter_visdrone.py \
  --original_lable_dir dataset/VisDrone_AdjLab/original_labels_yolo/val \
  --output_lable_dir dataset/VisDrone_AdjLab/labels/val \
  --original_class_list pedestrian person bicycle car van truck tricycle awning-tricycle bus motor \
  --new_class_list car truck bus

# Training set label conversion
python dataset_label_filter_visdrone.py \
  --original_lable_dir dataset/VisDrone_AdjLab/original_labels_yolo/train \
  --output_lable_dir dataset/VisDrone_AdjLab/labels/train \
  --original_class_list pedestrian person bicycle car van truck tricycle awning-tricycle bus motor \
  --new_class_list car truck bus

Visualize the Adjusted Annotations:

python visualize_yoloimg_v2.py \
  --datasetdir dataset/VisDrone_AdjLab/images/val \
  --labeldir dataset/VisDrone_AdjLab/labels/val \
  --class_list car truck bus \
  --output_dir dataset/VisDrone_AdjLab/val_vis

BDD100K Dataset Label Adjustment

Label Mapping Relationship:

Original Classes (13 classes): person, rider, car, bus, truck, bike, motor, tl_green, tl_red, tl_yellow, tl_none, traffic sign, train
New Classes (3 classes): car, truck, bus

Execute Label Adjustment:

# Validation set label conversion
python dataset_label_filter.py \
  --original_lable_dir dataset/bdd100k_AdjLab/original_labels_yolo/val \
  --output_lable_dir dataset/bdd100k_AdjLab/labels/val \
  --original_class_list person rider car bus truck bike motor tl_green tl_red tl_yellow tl_none traffic sign train \
  --new_class_list car truck bus

# Training set label conversion
python dataset_label_filter.py \
  --original_lable_dir dataset/bdd100k_AdjLab/original_labels_yolo/train \
  --output_lable_dir dataset/bdd100k_AdjLab/labels/train \
  --original_class_list person rider car bus truck bike motor tl_green tl_red tl_yellow tl_none traffic sign train \
  --new_class_list car truck bus

Visualize the Adjusted Annotations:

python visualize_yoloimg_v2.py \
  --datasetdir dataset/bdd100k_AdjLab/images/val \
  --labeldir dataset/bdd100k_AdjLab/labels/val \
  --class_list car truck bus \
  --output_dir dataset/bdd100k_AdjLab/val_vis

2. Constructing the Training Set with 3DGenerate

This method leverages the Tencent Hunyuan 3D generative model to construct 3D models from single or multiple images, and then generates rich training data through multi-view rendering.

2.1 Tencent Hunyuan 3D Generative Model Environment Variable Setup

First, configure your Tencent Cloud API keys:

conda activate MMLab_ultralytics

# Set Tencent Cloud API keys (please replace with your actual keys)
export TENCENTCLOUD_SECRET_ID=AKIDTJ3oVG4vhKnJlX9xxx
export TENCENTCLOUD_SECRET_KEY=41paM1ZyGQ7rVzUpxsxxx

2.2 3D Model Generation and Download

2.2.1 3D Model Generation based on a Single Image

Suitable for scenarios where only a single target image is available:

python imgto3dmodelsubmit_v2.py \
  --image_path obj_imgs/Sprinter.png \
  --model_version 3.1 \
  --generate_type Normal \
  --result_format GLB \
  --face_count 500000

Parameter Description:

--image_path: Input image path
--model_version: Model version
--generate_type: Generation type
--result_format: Output format(e.g., GLB)
--face_count: 3D model face count

Submitting the task will return a JobId. Use this ID to query the task status and download the model.

2.2.2 3D Model Generation based on Multiple Target Images

Suitable for scenarios with multiple images from different perspectives (yields better results):

python multiimgsto3dmodelsubmit_v3.py \
  --image_path obj_multi_imgs/Humvee \
  --model_version 3.1 \
  --generate_type Normal \
  --result_format GLB \
  --face_count 500000 \
  --enable_pbr

Input Directory Structure Requirements:

obj_multi_imgs/Humvee/
├── front.jpg       # Front view
├── left.jpg        # Left view
├── right.jpg       # Right view
├── top.jpg         # Top view
├── left_front.jpg  # Left-front view
└── right_front.jpg # Right-front view

2.2.3 Query Task Status and Download 3D Model

Use the JobId returned upon task submission to check the status and download:

python imgto3dmodelquery_v1.py \
  --result_format GLB \
  --JobId 1435964178778423296 \
  --model_savpath 3Dmodels/Sprinter.glb

The task status will change from RUN to DONE, after which the model file will be downloaded automatically.

2.2.4 Download Pre-generated 3D Models (Optional)

If you prefer not to generate them yourself, you can directly download the 3D models already constructed for this paper:

Download Link: Baidu Netdisk
Extraction Code: NDSC
Contents:
3Dmodels_500k: 3D models generated based on single views
3Dmodels_bymultiview: 3D models generated based on multi-views

2.3 Multi-Perspective Image Generation based on 3D Models

Enter the working directory

conda activate MMLab_ultralytics
cd 3DGenerate/

2.3.1 For VisDrone Dataset (Top View)

Generate top view images (suitable for UAV scenarios):

Generate multi-perspective images based on single-view 3D models (5°×5° sampling)

python 3dmodeltoimgs_v4.py \
  --input 3Dmodels_500k \
  --output 3D_to_imgs/3Dmodels_top_par_5x5_imgs \
  --phi_step 5 \
  --theta_step 5 \
  --theta_min 30 \
  --theta_max 90 \
  --dist_list 1 \
  --imgsz 640

Generate multi-perspective images based on multi-view 3D models (5°×5° sampling)

python 3dmodeltoimgs_v4.py \
  --input 3Dmodels_bymultiview \
  --output 3D_to_imgs/3Dmodels_bymulti_top_par_5x5_imgs \
  --phi_step 5 \
  --theta_step 5 \
  --theta_min 30 \
  --theta_max 90 \
  --dist_list 1 \
  --imgsz 640

2.3.2 For BDD100K Dataset (Side View)

Generate side view images (suitable for autonomous driving scenarios):

Generate multi-perspective images based on single-view 3D models (10°×10° sampling)

python 3dmodeltoimgs_v4.py \
  --input 3Dmodels_500k \
  --output 3D_to_imgs/3Dmodels_side_par_10x10_imgs \
  --phi_step 10 \
  --theta_step 10 \
  --theta_min 0 \
  --theta_max 30 \
  --dist_list 1 \
  --imgsz 640

Generate multi-perspective images based on multi-view 3D models (5°×5° sampling)

python 3dmodeltoimgs_v4.py \
  --input 3Dmodels_bymultiview \
  --output 3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs \
  --phi_step 5 \
  --theta_step 5 \
  --theta_min 0 \
  --theta_max 30 \
  --dist_list 1 \
  --imgsz 640

2.4 Multi-Perspective Image Mask Generation

This step uses a segmentation model to remove the background of the multi-perspective images and generate corresponding masks.

Enter the working directory

conda activate MMLab_ultralytics
cd ../XPaste/segment_methods

2.4.1 Top View Image Background Removal

Suitable for the VisDrone UAV scenario:

CUDA_VISIBLE_DEVICES=0 python resegfor3dimg_v3.py \
  --input_dir ../../3DGenerate/3D_to_imgs/3Dmodels_top_par_15x15_imgs \
  --output_dir ../../3DGenerate/3D_to_imgs/3Dmodels_top_par_15x15_imgs_seg \
  --seg_method selfreformer

Parameter Description:

--input_dir: Input multi-perspective image directory
--output_dir: Output segmentation mask directory
--seg_method: Segmentation method, using selfreformer

Expected Output:

Automatically detects 6 categories: Humvee, ICV, Mengshi, PLZ-05A, Sprinter, ZTQ-15
Segmentation results are saved to the 3Dmodels_top_par_15x15_imgs_seg directory

2.4.2 Side View Image Background Removal

Suitable for the BDD100K autonomous driving scenario:

CUDA_VISIBLE_DEVICES=0 python resegfor3dimg_v3.py \
  --input_dir ../../3DGenerate/3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs \
  --output_dir ../../3DGenerate/3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs_seg \
  --seg_method selfreformer

Expected Output:

Automatically detects 6 categories: Humvee, ICV, Mengshi, PLZ-05A, Sprinter, ZTQ-15
Segmentation results are saved to the 3Dmodels_bymulti_side_par_5x5_imgs_seg directory

2.5 VisDrone Dataset Synthesis

This step randomly pastes the generated multi-perspective target images into the VisDrone dataset to build the augmented training set.

Enter the working directory

conda activate MMLab_ultralytics
cd ../../3DGenerate/

2.5.1 Visualize Original Dataset Annotations (Optional)

python visualize_yoloimg_v2.py \
  --datasetdir dataset/VisDronetest/images \
  --labeldir dataset/VisDronetest/labels \
  --class_list car truck bus \
  --output_dir dataset/VisDronetest/vis

2.5.2 Dataset Synthesis based on Random Placement Pasting

Pasting bymulti_top_par_5x5_imgs (Multi-view 3D model generation, recommended)

python augment_dataset_random_v4.py \
  --datasetdir dataset/VisDrone_AdjLab/images/train \
  --labeldir dataset/VisDrone_AdjLab/labels/train \
  --odimgsdir 3D_to_imgs/3Dmodels_bymulti_top_par_5x5_imgs \
  --odimgssegdir 3D_to_imgs/3Dmodels_bymulti_top_par_5x5_imgs_seg \
  --class_list car truck bus \
  --cover_ratio 0.7 \
  --obj_numb_per_img 20 \
  --obj_area_scale 0.005 0.05 \
  --output_dir dataset/VisDrone_AdjLab_3DGe_aug_random \
  --output_dir_prefix train_bymulti_5x5_20obj

Parameter Description:

--datasetdir: Original dataset image directory
--labeldir: Original dataset label directory
--odimgsdir: Target multi-perspective image directory
--odimgssegdir: Target multi-perspective image mask directory
--class_list: Base class list
--cover_ratio: Maximum occlusion ratio; existing targets are removed if this ratio is exceeded
--obj_numb_per_img: Number of new targets to paste per image
--obj_area_scale: Scaling range of the new target area relative to the original image
--output_dir: Output directory for the augmented dataset
--output_dir_prefix: Output directory prefix

Expected Output:

New dataset generated at: dataset/VisDrone_AdjLab_3DGe_aug_random
Total of 6471 images processed
129420 new targets successfully pasted randomly
25319 original targets removed due to occlusion ratio exceeding 70%

Pasting top_par_15x15_imgs (Single-view 3D model generation)

python augment_dataset_random_v4.py \
  --datasetdir dataset/VisDrone_AdjLab/images/train \
  --labeldir dataset/VisDrone_AdjLab/labels/train \
  --odimgsdir 3D_to_imgs/3Dmodels_top_par_15x15_imgs \
  --odimgssegdir 3D_to_imgs/3Dmodels_top_par_15x15_imgs_seg \
  --class_list car truck bus \
  --cover_ratio 0.7 \
  --obj_numb_per_img 3 \
  --obj_area_scale 0.005 0.05 \
  --output_dir dataset/VisDrone_AdjLab_3DGe_aug_random \
  --output_dir_prefix train_15x15_3obj

Visualize the pasted target bounding boxes:

python visualize_yoloimg_v3.py \
  --datasetdir dataset/VisDrone_AdjLab_3DGe_aug_random/images/val_15x15_3obj \
  --labeldir dataset/VisDrone_AdjLab_3DGe_aug_random/labels/val_15x15_3obj \
  --class_list car truck bus Humvee ICV Mengshi PLZ-05A Sprinter ZTQ-15 \
  --output_dir dataset/VisDrone_AdjLab_3DGe_aug_random/val_15x15_3obj_vis

2.6 BDD100K Dataset Synthesis

This step randomly pastes the generated multi-perspective target images into the BDD100K dataset to build the augmented training set.

2.6.1 Visualize Original Dataset Annotations (Optional)

python visualize_yoloimg_v3.py \
  --datasetdir dataset/bdd100ktest/images \
  --labeldir dataset/bdd100ktest/labels/ \
  --class_list car truck bus \
  --output_dir dataset/bdd100ktest/vis

2.6.2 Dataset Synthesis based on Random Placement Pasting

Pasting bymulti_side_par_5x5_imgs (Multi-view 3D model generation, recommended)

python augment_dataset_random_v4.py \
  --datasetdir dataset/bdd100k_AdjLab/images/train \
  --labeldir dataset/bdd100k_AdjLab/labels/train \
  --odimgsdir 3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs \
  --odimgssegdir 3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs_seg \
  --class_list car truck bus \
  --cover_ratio 0.7 \
  --obj_numb_per_img 5 \
  --obj_area_scale 0.005 0.20 \
  --output_dir dataset/bdd100k_AdjLab_3DGe_aug_random \
  --output_dir_prefix train_bymulti_5x5_5obj

Expected Output:

Total of 69863 images processed
349315 new targets successfully pasted randomly
161673 original targets removed due to occlusion ratio exceeding 70%

Pasting side_par_5x5_imgs (Single-view 3D model generation)

python augment_dataset_random_v4.py \
  --datasetdir dataset/bdd100k_AdjLab/images/val \
  --labeldir dataset/bdd100k_AdjLab/labels/val \
  --odimgsdir 3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs \
  --odimgssegdir 3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs_seg \
  --class_list car truck bus \
  --cover_ratio 0.7 \
  --obj_numb_per_img 10 \
  --obj_area_scale 0.005 0.05 \
  --output_dir dataset/bdd100k_AdjLab_3DGe_aug_random \
  --output_dir_prefix val

Visualize:

python visualize_yoloimg_v3.py \
  --datasetdir dataset/bdd100k_AdjLab_3DGe_aug_random/images/val \
  --labeldir dataset/bdd100k_AdjLab_3DGe_aug_random/labels/val \
  --class_list car truck bus Humvee ICV M1A2T Mengshi PLZ-05A Sprinter ZTQ-15 \
  --output_dir dataset/bdd100k_AdjLab_3DGe_aug_random/val_vis

3. Constructing the Training Set with XPaste

This section uses open-source multi-view images (rather than images generated by 3D models) for data augmentation, serving as a comparative method.

3.1 Open-Source Multi-Perspective Image Resizing

Enter the working directory and resize

conda activate MMLab_ultralytics
cd 3DGenerate/

python imgs_resize_v2.py \
  --input_dir open_source_imgs/original_imgs \
  --output_dir open_source_imgs/original_imgs640 \
  --size 640

Expected Output: Total number of images found, along with statistics on the number of images per target category:

Category Name	Count	Category Name	Count
ICV	43	PLZ-05A	55
Humvee	41	Mengshi	42
Sprinter	42	ZTQ-15	56

3.2 Target Multi-Perspective Image Background Removal

Enter the working directory and segment

conda activate MMLab_ultralytics
cd ../XPaste/segment_methods

CUDA_VISIBLE_DEVICES=0 python resegfor3dimg_v3.py \
  --input_dir ../../3DGenerate/open_source_imgs/original_imgs640 \
  --output_dir ../../3DGenerate/open_source_imgs/original_imgs640_seg \
  --seg_method selfreformer

3.3 VisDrone Dataset Synthesis

Enter the working directory

conda activate MMLab_ultralytics
cd ../../3DGenerate/

3.3.1 Dataset Synthesis based on Random Placement Pasting

python augment_dataset_random_v4.py \
  --datasetdir dataset/VisDrone_AdjLab/images/train \
  --labeldir dataset/VisDrone_AdjLab/labels/train \
  --odimgsdir open_source_imgs/original_imgs640 \
  --odimgssegdir open_source_imgs/original_imgs640_seg \
  --class_list car truck bus \
  --cover_ratio 0.7 \
  --obj_numb_per_img 20 \
  --obj_area_scale 0.005 0.05 \
  --output_dir dataset/VisDrone_AdjLab_XPa_aug_random \
  --output_dir_prefix train_20obj

New augmented dataset category and label information:

Category List: car, truck, bus, Humvee, ICV, Mengshi, PLZ-05A, Sprinter, ZTQ-15
Numerical Label List: 0, 1, 2, 3, 4, 5, 6, 7, 8

Visualize:

python visualize_yoloimg_v3.py \
  --datasetdir dataset/VisDrone_AdjLab_XPa_aug_random/images/val_20obj \
  --labeldir dataset/VisDrone_AdjLab_XPa_aug_random/labels/val_20obj \
  --class_list car truck bus Humvee ICV M1A2T Mengshi PLZ-05A Sprinter ZTQ-15 \
  --output_dir dataset/VisDrone_AdjLab_XPa_aug_random/val_20obj_vis

3.4 BDD100K Dataset Synthesis

3.4.1 Dataset Synthesis based on Random Placement Pasting

python augment_dataset_random_v4.py \
  --datasetdir dataset/bdd100k_AdjLab/images/train \
  --labeldir dataset/bdd100k_AdjLab/labels/train \
  --odimgsdir open_source_imgs/original_imgs640 \
  --odimgssegdir open_source_imgs/original_imgs640_seg \
  --class_list car truck bus \
  --cover_ratio 0.7 \
  --obj_numb_per_img 10 \
  --obj_area_scale 0.005 0.10 \
  --output_dir dataset/bdd100k_AdjLab_XPa_aug_random \
  --output_dir_prefix train

Visualize:

python visualize_yoloimg_v3.py \
  --datasetdir dataset/bdd100k_AdjLab_XPa_aug_random/images/train \
  --labeldir dataset/bdd100k_AdjLab_XPa_aug_random/labels/train \
  --class_list car truck bus Humvee ICV M1A2T Mengshi PLZ-05A Sprinter ZTQ-15 \
  --output_dir dataset/bdd100k_AdjLab_XPa_aug_random/train_vis

4. Object Detection Model Training and Evaluation

This section describes how to use the constructed augmented dataset to train and evaluate object detection models.

4.1 Environment Preparation

conda activate MMLab_ultralytics
cd ultralytics/
# Ultralytics Version: 8.4.14

4.2 Configuration File Description

Project configuration files are located in the config/ directory: Dataset Configuration Files:

File Name	Description
VisDrone_AdjLab.yaml	VisDrone Base Dataset Configuration
VisDrone_AdjLab_3DGe_aug_random.yaml	VisDrone 3DGe-Aug Augmented Dataset Configuration
VisDrone_AdjLab_XPa_aug_random.yaml	VisDrone XPaste Augmented Dataset Configuration
VisDrone_AdjLab_3DGe_XPa_aug_random.yaml	VisDrone 3DGe-XPa Augmented Dataset Configuration
bdd100k_AdjLab.yaml	BDD100K Base Dataset Configuration
bdd100k_AdjLab_3DGe_aug_random.yaml	BDD100K 3DGe-Aug Augmented Dataset Configuration
bdd100k_AdjLab_XPa_aug_random.yaml	BDD100K XPaste Augmented Dataset Configuration
bdd100k_AdjLab_3DGe_XPa_aug_random.yaml	BDD100K 3DGe-XPa Augmented Dataset Configuration

Model Configuration Files:

File Name	Description
default_DataAug.yaml	Default Training Configuration

4.3 Pre-trained Model Download

You can download pre-trained weights from the official Ultralytics Assets link and place them in the pt/ directory.

Relevant test models include:

YOLOv5
YOLOv8
YOLO11
YOLO26 (yolo26n, yolo26m, yolo26l, yolo26x)
RT-DETR

For model naming conventions, please refer to the configuration files in the ultralytics/ultralytics/cfg/models/ directory.

4.4 Training Script Parameter Description

python train_val_v3.py \
  --model yolo26l \                    # Model name
  --model_pt pt/yolo26l.pt \           # Pre-trained weights path
  --cfg config/default_DataAug.yaml \  # Training configuration file
  --data config/VisDrone_AdjLab_3DGe_aug_random.yaml \  # Dataset configuration file
  --fraction 1.0 \                     # Dataset fraction to use
  --epochs 50 \                        # Number of training epochs
  --batch 16 \                         # Batch size
  --imgsz 640 \                        # Input image size
  --split val \                        # Validation set
  --conf 0.001 \                       # Confidence threshold
  --trainsplit train_bymulti_5x5_3obj \  # Training set
  --device 0 \                         # GPU device ID

4.5 VisDrone Dataset Training

4.5.1 Base Model (No data augmentation)

python train_val_v3.py \
  --model yolo26l \
  --model_pt pt/yolo26l.pt \
  --cfg config/default_DataAug.yaml \
  --data config/VisDrone_AdjLab.yaml \
  --fraction 1.0 \
  --epochs 50 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train \
  --device 0

4.5.2 3DGe-Aug-s Method (Single-view 3D model)

python train_val_v3.py \
  --model yolo26l \
  --model_pt pt/yolo26l.pt \
  --cfg config/default_DataAug.yaml \
  --data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
  --fraction 1.0 \
  --epochs 50 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_5x5_5obj \
  --device 1

4.5.3 3DGe-Aug Method (Multi-view 3D model, recommended)

python train_val_v3.py \
  --model yolo26l \
  --model_pt pt/yolo26l.pt \
  --cfg config/default_DataAug.yaml \
  --data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
  --fraction 1.0 \
  --epochs 50 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_bymulti_5x5_5obj \
  --device 2

4.5.4 XPaste Method

python train_val_v3.py \
  --model yolo26l \
  --model_pt pt/yolo26l.pt \
  --cfg config/default_DataAug.yaml \
  --data config/VisDrone_AdjLab_XPa_aug_random.yaml \
  --fraction 1.0 \
  --epochs 50 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_5obj \
  --device 2

4.5.5 Combined 3DGe + XPaste Method

python train_val_v3.py \
  --model yolo26l \
  --model_pt pt/yolo26l.pt \
  --cfg config/default_DataAug.yaml \
  --data config/VisDrone_AdjLab_3DGe_XPa_aug_random.yaml \
  --fraction 1.0 \
  --epochs 25 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_5obj \
  --device 1

4.6 BDD100K Dataset Training

4.6.1 Base Model (No data augmentation)

python train_val_v3.py \
  --model yolo26l \
  --model_pt pt/yolo26l.pt \
  --cfg config/default_DataAug.yaml \
  --data config/bdd100k_AdjLab.yaml \
  --fraction 1.0 \
  --epochs 10 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train \
  --device 2

4.6.2 3DGe-Aug-s Method (Single-view 3D model)

python train_val_v3.py \
  --model yolo26l \
  --model_pt pt/yolo26l.pt \
  --cfg config/default_DataAug.yaml \
  --data config/bdd100k_AdjLab_3DGe_aug_random.yaml \
  --fraction 1.0 \
  --epochs 10 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_5x5_5obj \
  --device 1

4.6.3 3DGe-Aug Method (Multi-view 3D model, recommended)

python train_val_v3.py \
  --model yolo26l \
  --model_pt pt/yolo26l.pt \
  --cfg config/default_DataAug.yaml \
  --data config/bdd100k_AdjLab_3DGe_aug_random.yaml \
  --fraction 1.0 \
  --epochs 10 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_bymulti_5x5_5obj \
  --device 1

4.6.4 XPaste Method

python train_val_v3.py \
  --model yolo26l \
  --model_pt pt/yolo26l.pt \
  --cfg config/default_DataAug.yaml \
  --data config/bdd100k_AdjLab_XPa_aug_random.yaml \
  --fraction 1.0 \
  --epochs 10 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_5obj \
  --device 1

4.6.5 Combined 3DGe + XPaste Method

python train_val_v3.py \
  --model yolo26l \
  --model_pt pt/yolo26l.pt \
  --cfg config/default_DataAug.yaml \
  --data config/bdd100k_AdjLab_3DGe_XPa_aug_random.yaml \
  --fraction 1.0 \
  --epochs 5 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_5obj \
  --device 2

4.7 Generalization Experiments on Different Object Detection Models

This section verifies the generalization capability of the 3DGe-Aug method across various object detection models.

Experimental Setup:

Dataset: VisDrone
Data Augmentation Method: 3DGe-Aug
Training Set: train_bymulti_5x5_3obj
Validation Set: val

YOLOv5l

python train_val_v3.py \
  --model yolov5l \
  --model_pt pt/yolov5lu.pt \
  --cfg config/default_DataAug.yaml \
  --data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
  --fraction 1.0 \
  --epochs 50 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_bymulti_5x5_3obj \
  --device 1

RT-DETR-l

python train_val_v3.py \
  --model rtdetr-l \
  --model_pt pt/rtdetr-l.pt \
  --cfg config/default_DataAug.yaml \
  --data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
  --fraction 1.0 \
  --epochs 50 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_bymulti_5x5_3obj \
  --device 2

YOLOv8l

python train_val_v3.py \
  --model yolov8l \
  --model_pt pt/yolov8l.pt \
  --cfg config/default_DataAug.yaml \
  --data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
  --fraction 1.0 \
  --epochs 50 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_bymulti_5x5_3obj \
  --device 0

YOLO26x

python train_val_v3.py \
  --model yolo26x \
  --model_pt pt/yolo26x.pt \
  --cfg config/default_DataAug.yaml \
  --data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
  --fraction 1.0 \
  --epochs 50 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_bymulti_5x5_3obj \
  --device 0

YOLO26m

python train_val_v3.py \
  --model yolo26m \
  --model_pt pt/yolo26m.pt \
  --cfg config/default_DataAug.yaml \
  --data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
  --fraction 1.0 \
  --epochs 50 \
  --batch 16 \
  --imgsz 640 \
  --split val \
  --conf 0.001 \
  --trainsplit train_bymulti_5x5_3obj \
  --device 1

Citation

If you use this code or dataset in your research, please consider citing our paper.

Contact Us

If you have any questions or suggestions, please contact us via email.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
3DGenerate		3DGenerate
XPaste		XPaste
ultralytics		ultralytics
.gitignore		.gitignore
environment.yml		environment.yml
fig_Multi-Perspective_Image.jpg		fig_Multi-Perspective_Image.jpg
fig_pipleline.jpg		fig_pipleline.jpg
readme.md		readme.md
readme_cn.md		readme_cn.md
readme_en.md		readme_en.md

Folders and files

Latest commit

History

Repository files navigation

3DGe-Aug: 3D Generative Model-Based Data Augmentation for Few-Shot Object Detection

Table of Contents

1. Preparation

1.1 Code Preparation

1.2 Environment Installation

1.3 Dataset Preparation

1.3.1 Dataset Download

1.3.2 Dataset Label Adjustment

VisDrone Dataset Label Adjustment

BDD100K Dataset Label Adjustment

2. Constructing the Training Set with 3DGenerate

2.1 Tencent Hunyuan 3D Generative Model Environment Variable Setup

2.2 3D Model Generation and Download

2.2.1 3D Model Generation based on a Single Image

2.2.2 3D Model Generation based on Multiple Target Images

2.2.3 Query Task Status and Download 3D Model

2.2.4 Download Pre-generated 3D Models (Optional)

2.3 Multi-Perspective Image Generation based on 3D Models

2.3.1 For VisDrone Dataset (Top View)

2.3.2 For BDD100K Dataset (Side View)

2.4 Multi-Perspective Image Mask Generation

2.4.1 Top View Image Background Removal

2.4.2 Side View Image Background Removal

2.5 VisDrone Dataset Synthesis

2.5.1 Visualize Original Dataset Annotations (Optional)

2.5.2 Dataset Synthesis based on Random Placement Pasting

Pasting bymulti_top_par_5x5_imgs (Multi-view 3D model generation, recommended)

Pasting top_par_15x15_imgs (Single-view 3D model generation)

2.6 BDD100K Dataset Synthesis

2.6.1 Visualize Original Dataset Annotations (Optional)

2.6.2 Dataset Synthesis based on Random Placement Pasting

Pasting bymulti_side_par_5x5_imgs (Multi-view 3D model generation, recommended)

Pasting side_par_5x5_imgs (Single-view 3D model generation)

3. Constructing the Training Set with XPaste

3.1 Open-Source Multi-Perspective Image Resizing

3.2 Target Multi-Perspective Image Background Removal

3.3 VisDrone Dataset Synthesis

3.3.1 Dataset Synthesis based on Random Placement Pasting

3.4 BDD100K Dataset Synthesis

3.4.1 Dataset Synthesis based on Random Placement Pasting

4. Object Detection Model Training and Evaluation

4.1 Environment Preparation

4.2 Configuration File Description

4.3 Pre-trained Model Download

4.4 Training Script Parameter Description

4.5 VisDrone Dataset Training

4.5.1 Base Model (No data augmentation)

4.5.2 3DGe-Aug-s Method (Single-view 3D model)

4.5.3 3DGe-Aug Method (Multi-view 3D model, recommended)

4.5.4 XPaste Method

4.5.5 Combined 3DGe + XPaste Method

4.6 BDD100K Dataset Training

4.6.1 Base Model (No data augmentation)

4.6.2 3DGe-Aug-s Method (Single-view 3D model)

4.6.3 3DGe-Aug Method (Multi-view 3D model, recommended)

4.6.4 XPaste Method

4.6.5 Combined 3DGe + XPaste Method

4.7 Generalization Experiments on Different Object Detection Models

YOLOv5l

RT-DETR-l

YOLOv8l

YOLO26x

YOLO26m

Citation

Contact Us

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages