Abstract: In few-shot object detection scenarios involving emerging, scarce, or specialized objects, the scarcity of diverse training samples severely constrains model performance. Recently, data augmentation methods leveraging 2D text-to-image generative models have emerged, but they exhibit limited generation capability for fine-grained categories and under-represented objects. To address this issue, this paper proposes 3DGe-Aug, a 3D generative model-based data augmentation framework for few-shot object detection. The framework constructs realistic 3D assets from limited 2D images using a 3D generative model, thereby addressing the rendering limitations of 2D generative models for fine-grained objects. Building upon these 3D assets, multi-perspective object images with rich viewpoints and spatial-geometric consistency are rendered through precise camera pose control. Furthermore, we design an instance pasting algorithm that incorporates multi-scale transformation and occlusion elimination mechanisms to synthesize training data. We conduct comprehensive experiments on two representative object detection scenarios: unmanned aerial vehicles (UAVs) and autonomous driving, validating the effectiveness of the proposed method on various object detection models such as RT-DETR and the YOLO series. To facilitate future research, we also contribute a few-shot object detection test dataset comprising 3 base classes and 6 novel classes, along with a corresponding 3D asset set. Our code and datasets are available at DataAug.
Fig.1 Overview of the proposed 3DGe-Aug framework for few-shot object detection data augmentation.
Fig.2 Multi-Perspective Image and Mask Generation.
- Preparation
- Constructing the Training Set with 3DGenerate
- Constructing the Training Set with XPaste
- Object Detection Model Training and Evaluation
Clone the project repository to your local machine:
git clone [https://github.com/xiejichao/DataAug.git](https://github.com/xiejichao/DataAug.git)
cd DataAug
This project uses Conda for environment management. Please follow these steps to install the dependencies:
- Ensure Anaconda or Miniconda is installed.
- Create the environment using the provided
environment.ymlfile:
conda env create -f environment.yml
- Activate the environment:
conda activate MMLab_ultralytics
Environment Details:
- Environment Name:
MMLab_ultralytics - Python Version: 3.10.13
- Core Dependencies: PyTorch 2.0.1+cu117, Ultralytics 8.4.14, OpenCV, etc.
- Note: Ensure your CUDA version is compatible with PyTorch (this environment uses cu117).
This project utilizes the following datasets. Please download and place them in the specified directories as instructed:
| Data | Purpose | Download Link | Remarks |
|---|---|---|---|
| few-shot-object-aerial | Few-shot object detection test set for UAV scenarios | Baidu Netdisk Extraction Code: NDSC | Contains 3 base classes and 6 novel classes |
| few-shot-object-ground | Few-shot object detection test set for autonomous driving scenarios | Baidu Netdisk Extraction Code: NDSC | Contains 3 base classes and 6 novel classes |
| VisDrone | UAV perspective object detection dataset | Official Link | - |
| BDD100K | Autonomous driving scenario object detection dataset | Official Link | - |
| 3D Model Files (Optional) | Pre-generated 3D assets | Baidu Netdisk Extraction Code: NDSC | Includes 3Dmodels_500k and 3Dmodels_bymultiview |
The few-shot-object-aerial dataset can also be viewed and downloaded on the Ultralytics Platform.
The few-shot-object-ground dataset can also be viewed and downloaded on the Ultralytics Platform.
Suggested Directory Structure:
3DGenerate/
├── dataset/
│ ├── VisDrone_AdjLab/
│ ├── bdd100k_AdjLab/
│ ├── few-shot-object-aerial-v2/
│ └── few-shot-object-ground-v2/
└── 3Dmodels_500k/
└── 3Dmodels_bymultiview/
To unify the experimental setup, we adjust the labels of both the VisDrone and BDD100K datasets to 3 base classes (car, truck, bus).
Label Mapping Relationship:
- Original Classes (10 classes):
pedestrian,person,bicycle,car,van,truck,tricycle,awning-tricycle,bus,motor - New Classes (3 classes):
car,truck,bus - Mapping Rule: Both
carandvanare mapped tocar.
Execute Label Adjustment:
# Validation set label conversion
python dataset_label_filter_visdrone.py \
--original_lable_dir dataset/VisDrone_AdjLab/original_labels_yolo/val \
--output_lable_dir dataset/VisDrone_AdjLab/labels/val \
--original_class_list pedestrian person bicycle car van truck tricycle awning-tricycle bus motor \
--new_class_list car truck bus
# Training set label conversion
python dataset_label_filter_visdrone.py \
--original_lable_dir dataset/VisDrone_AdjLab/original_labels_yolo/train \
--output_lable_dir dataset/VisDrone_AdjLab/labels/train \
--original_class_list pedestrian person bicycle car van truck tricycle awning-tricycle bus motor \
--new_class_list car truck bus
Visualize the Adjusted Annotations:
python visualize_yoloimg_v2.py \
--datasetdir dataset/VisDrone_AdjLab/images/val \
--labeldir dataset/VisDrone_AdjLab/labels/val \
--class_list car truck bus \
--output_dir dataset/VisDrone_AdjLab/val_vis
Label Mapping Relationship:
- Original Classes (13 classes):
person,rider,car,bus,truck,bike,motor,tl_green,tl_red,tl_yellow,tl_none,traffic sign,train - New Classes (3 classes):
car,truck,bus
Execute Label Adjustment:
# Validation set label conversion
python dataset_label_filter.py \
--original_lable_dir dataset/bdd100k_AdjLab/original_labels_yolo/val \
--output_lable_dir dataset/bdd100k_AdjLab/labels/val \
--original_class_list person rider car bus truck bike motor tl_green tl_red tl_yellow tl_none traffic sign train \
--new_class_list car truck bus
# Training set label conversion
python dataset_label_filter.py \
--original_lable_dir dataset/bdd100k_AdjLab/original_labels_yolo/train \
--output_lable_dir dataset/bdd100k_AdjLab/labels/train \
--original_class_list person rider car bus truck bike motor tl_green tl_red tl_yellow tl_none traffic sign train \
--new_class_list car truck bus
Visualize the Adjusted Annotations:
python visualize_yoloimg_v2.py \
--datasetdir dataset/bdd100k_AdjLab/images/val \
--labeldir dataset/bdd100k_AdjLab/labels/val \
--class_list car truck bus \
--output_dir dataset/bdd100k_AdjLab/val_vis
This method leverages the Tencent Hunyuan 3D generative model to construct 3D models from single or multiple images, and then generates rich training data through multi-view rendering.
First, configure your Tencent Cloud API keys:
conda activate MMLab_ultralytics
# Set Tencent Cloud API keys (please replace with your actual keys)
export TENCENTCLOUD_SECRET_ID=AKIDTJ3oVG4vhKnJlX9xxx
export TENCENTCLOUD_SECRET_KEY=41paM1ZyGQ7rVzUpxsxxx
Suitable for scenarios where only a single target image is available:
python imgto3dmodelsubmit_v2.py \
--image_path obj_imgs/Sprinter.png \
--model_version 3.1 \
--generate_type Normal \
--result_format GLB \
--face_count 500000
Parameter Description:
--image_path: Input image path--model_version: Model version--generate_type: Generation type--result_format: Output format(e.g., GLB)--face_count: 3D model face count
Submitting the task will return a JobId. Use this ID to query the task status and download the model.
Suitable for scenarios with multiple images from different perspectives (yields better results):
python multiimgsto3dmodelsubmit_v3.py \
--image_path obj_multi_imgs/Humvee \
--model_version 3.1 \
--generate_type Normal \
--result_format GLB \
--face_count 500000 \
--enable_pbr
Input Directory Structure Requirements:
obj_multi_imgs/Humvee/
├── front.jpg # Front view
├── left.jpg # Left view
├── right.jpg # Right view
├── top.jpg # Top view
├── left_front.jpg # Left-front view
└── right_front.jpg # Right-front view
Use the JobId returned upon task submission to check the status and download:
python imgto3dmodelquery_v1.py \
--result_format GLB \
--JobId 1435964178778423296 \
--model_savpath 3Dmodels/Sprinter.glb
The task status will change from RUN to DONE, after which the model file will be downloaded automatically.
If you prefer not to generate them yourself, you can directly download the 3D models already constructed for this paper:
- Download Link: Baidu Netdisk
- Extraction Code: NDSC
- Contents:
3Dmodels_500k: 3D models generated based on single views3Dmodels_bymultiview: 3D models generated based on multi-views
Enter the working directory
conda activate MMLab_ultralytics
cd 3DGenerate/
Generate top view images (suitable for UAV scenarios):
Generate multi-perspective images based on single-view 3D models (5°×5° sampling)
python 3dmodeltoimgs_v4.py \
--input 3Dmodels_500k \
--output 3D_to_imgs/3Dmodels_top_par_5x5_imgs \
--phi_step 5 \
--theta_step 5 \
--theta_min 30 \
--theta_max 90 \
--dist_list 1 \
--imgsz 640
Generate multi-perspective images based on multi-view 3D models (5°×5° sampling)
python 3dmodeltoimgs_v4.py \
--input 3Dmodels_bymultiview \
--output 3D_to_imgs/3Dmodels_bymulti_top_par_5x5_imgs \
--phi_step 5 \
--theta_step 5 \
--theta_min 30 \
--theta_max 90 \
--dist_list 1 \
--imgsz 640
Generate side view images (suitable for autonomous driving scenarios):
Generate multi-perspective images based on single-view 3D models (10°×10° sampling)
python 3dmodeltoimgs_v4.py \
--input 3Dmodels_500k \
--output 3D_to_imgs/3Dmodels_side_par_10x10_imgs \
--phi_step 10 \
--theta_step 10 \
--theta_min 0 \
--theta_max 30 \
--dist_list 1 \
--imgsz 640
Generate multi-perspective images based on multi-view 3D models (5°×5° sampling)
python 3dmodeltoimgs_v4.py \
--input 3Dmodels_bymultiview \
--output 3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs \
--phi_step 5 \
--theta_step 5 \
--theta_min 0 \
--theta_max 30 \
--dist_list 1 \
--imgsz 640
This step uses a segmentation model to remove the background of the multi-perspective images and generate corresponding masks.
Enter the working directory
conda activate MMLab_ultralytics
cd ../XPaste/segment_methods
Suitable for the VisDrone UAV scenario:
CUDA_VISIBLE_DEVICES=0 python resegfor3dimg_v3.py \
--input_dir ../../3DGenerate/3D_to_imgs/3Dmodels_top_par_15x15_imgs \
--output_dir ../../3DGenerate/3D_to_imgs/3Dmodels_top_par_15x15_imgs_seg \
--seg_method selfreformer
Parameter Description:
--input_dir: Input multi-perspective image directory--output_dir: Output segmentation mask directory--seg_method: Segmentation method, usingselfreformer
Expected Output:
- Automatically detects 6 categories:
Humvee,ICV,Mengshi,PLZ-05A,Sprinter,ZTQ-15 - Segmentation results are saved to the
3Dmodels_top_par_15x15_imgs_segdirectory
Suitable for the BDD100K autonomous driving scenario:
CUDA_VISIBLE_DEVICES=0 python resegfor3dimg_v3.py \
--input_dir ../../3DGenerate/3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs \
--output_dir ../../3DGenerate/3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs_seg \
--seg_method selfreformer
Expected Output:
- Automatically detects 6 categories:
Humvee,ICV,Mengshi,PLZ-05A,Sprinter,ZTQ-15 - Segmentation results are saved to the
3Dmodels_bymulti_side_par_5x5_imgs_segdirectory
This step randomly pastes the generated multi-perspective target images into the VisDrone dataset to build the augmented training set.
Enter the working directory
conda activate MMLab_ultralytics
cd ../../3DGenerate/
python visualize_yoloimg_v2.py \
--datasetdir dataset/VisDronetest/images \
--labeldir dataset/VisDronetest/labels \
--class_list car truck bus \
--output_dir dataset/VisDronetest/vis
python augment_dataset_random_v4.py \
--datasetdir dataset/VisDrone_AdjLab/images/train \
--labeldir dataset/VisDrone_AdjLab/labels/train \
--odimgsdir 3D_to_imgs/3Dmodels_bymulti_top_par_5x5_imgs \
--odimgssegdir 3D_to_imgs/3Dmodels_bymulti_top_par_5x5_imgs_seg \
--class_list car truck bus \
--cover_ratio 0.7 \
--obj_numb_per_img 20 \
--obj_area_scale 0.005 0.05 \
--output_dir dataset/VisDrone_AdjLab_3DGe_aug_random \
--output_dir_prefix train_bymulti_5x5_20obj
Parameter Description:
--datasetdir: Original dataset image directory--labeldir: Original dataset label directory--odimgsdir: Target multi-perspective image directory--odimgssegdir: Target multi-perspective image mask directory--class_list: Base class list--cover_ratio: Maximum occlusion ratio; existing targets are removed if this ratio is exceeded--obj_numb_per_img: Number of new targets to paste per image--obj_area_scale: Scaling range of the new target area relative to the original image--output_dir: Output directory for the augmented dataset--output_dir_prefix: Output directory prefix
Expected Output:
- New dataset generated at:
dataset/VisDrone_AdjLab_3DGe_aug_random - Total of 6471 images processed
- 129420 new targets successfully pasted randomly
- 25319 original targets removed due to occlusion ratio exceeding 70%
python augment_dataset_random_v4.py \
--datasetdir dataset/VisDrone_AdjLab/images/train \
--labeldir dataset/VisDrone_AdjLab/labels/train \
--odimgsdir 3D_to_imgs/3Dmodels_top_par_15x15_imgs \
--odimgssegdir 3D_to_imgs/3Dmodels_top_par_15x15_imgs_seg \
--class_list car truck bus \
--cover_ratio 0.7 \
--obj_numb_per_img 3 \
--obj_area_scale 0.005 0.05 \
--output_dir dataset/VisDrone_AdjLab_3DGe_aug_random \
--output_dir_prefix train_15x15_3obj
Visualize the pasted target bounding boxes:
python visualize_yoloimg_v3.py \
--datasetdir dataset/VisDrone_AdjLab_3DGe_aug_random/images/val_15x15_3obj \
--labeldir dataset/VisDrone_AdjLab_3DGe_aug_random/labels/val_15x15_3obj \
--class_list car truck bus Humvee ICV Mengshi PLZ-05A Sprinter ZTQ-15 \
--output_dir dataset/VisDrone_AdjLab_3DGe_aug_random/val_15x15_3obj_vis
This step randomly pastes the generated multi-perspective target images into the BDD100K dataset to build the augmented training set.
python visualize_yoloimg_v3.py \
--datasetdir dataset/bdd100ktest/images \
--labeldir dataset/bdd100ktest/labels/ \
--class_list car truck bus \
--output_dir dataset/bdd100ktest/vis
python augment_dataset_random_v4.py \
--datasetdir dataset/bdd100k_AdjLab/images/train \
--labeldir dataset/bdd100k_AdjLab/labels/train \
--odimgsdir 3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs \
--odimgssegdir 3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs_seg \
--class_list car truck bus \
--cover_ratio 0.7 \
--obj_numb_per_img 5 \
--obj_area_scale 0.005 0.20 \
--output_dir dataset/bdd100k_AdjLab_3DGe_aug_random \
--output_dir_prefix train_bymulti_5x5_5obj
Expected Output:
- Total of 69863 images processed
- 349315 new targets successfully pasted randomly
- 161673 original targets removed due to occlusion ratio exceeding 70%
python augment_dataset_random_v4.py \
--datasetdir dataset/bdd100k_AdjLab/images/val \
--labeldir dataset/bdd100k_AdjLab/labels/val \
--odimgsdir 3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs \
--odimgssegdir 3D_to_imgs/3Dmodels_bymulti_side_par_5x5_imgs_seg \
--class_list car truck bus \
--cover_ratio 0.7 \
--obj_numb_per_img 10 \
--obj_area_scale 0.005 0.05 \
--output_dir dataset/bdd100k_AdjLab_3DGe_aug_random \
--output_dir_prefix val
Visualize:
python visualize_yoloimg_v3.py \
--datasetdir dataset/bdd100k_AdjLab_3DGe_aug_random/images/val \
--labeldir dataset/bdd100k_AdjLab_3DGe_aug_random/labels/val \
--class_list car truck bus Humvee ICV M1A2T Mengshi PLZ-05A Sprinter ZTQ-15 \
--output_dir dataset/bdd100k_AdjLab_3DGe_aug_random/val_vis
This section uses open-source multi-view images (rather than images generated by 3D models) for data augmentation, serving as a comparative method.
Enter the working directory and resize
conda activate MMLab_ultralytics
cd 3DGenerate/
python imgs_resize_v2.py \
--input_dir open_source_imgs/original_imgs \
--output_dir open_source_imgs/original_imgs640 \
--size 640
Expected Output: Total number of images found, along with statistics on the number of images per target category:
| Category Name | Count | Category Name | Count |
|---|---|---|---|
| ICV | 43 | PLZ-05A | 55 |
| Humvee | 41 | Mengshi | 42 |
| Sprinter | 42 | ZTQ-15 | 56 |
Enter the working directory and segment
conda activate MMLab_ultralytics
cd ../XPaste/segment_methods
CUDA_VISIBLE_DEVICES=0 python resegfor3dimg_v3.py \
--input_dir ../../3DGenerate/open_source_imgs/original_imgs640 \
--output_dir ../../3DGenerate/open_source_imgs/original_imgs640_seg \
--seg_method selfreformer
Enter the working directory
conda activate MMLab_ultralytics
cd ../../3DGenerate/
python augment_dataset_random_v4.py \
--datasetdir dataset/VisDrone_AdjLab/images/train \
--labeldir dataset/VisDrone_AdjLab/labels/train \
--odimgsdir open_source_imgs/original_imgs640 \
--odimgssegdir open_source_imgs/original_imgs640_seg \
--class_list car truck bus \
--cover_ratio 0.7 \
--obj_numb_per_img 20 \
--obj_area_scale 0.005 0.05 \
--output_dir dataset/VisDrone_AdjLab_XPa_aug_random \
--output_dir_prefix train_20obj
New augmented dataset category and label information:
- Category List:
car,truck,bus,Humvee,ICV,Mengshi,PLZ-05A,Sprinter,ZTQ-15 - Numerical Label List:
0,1,2,3,4,5,6,7,8
Visualize:
python visualize_yoloimg_v3.py \
--datasetdir dataset/VisDrone_AdjLab_XPa_aug_random/images/val_20obj \
--labeldir dataset/VisDrone_AdjLab_XPa_aug_random/labels/val_20obj \
--class_list car truck bus Humvee ICV M1A2T Mengshi PLZ-05A Sprinter ZTQ-15 \
--output_dir dataset/VisDrone_AdjLab_XPa_aug_random/val_20obj_vis
python augment_dataset_random_v4.py \
--datasetdir dataset/bdd100k_AdjLab/images/train \
--labeldir dataset/bdd100k_AdjLab/labels/train \
--odimgsdir open_source_imgs/original_imgs640 \
--odimgssegdir open_source_imgs/original_imgs640_seg \
--class_list car truck bus \
--cover_ratio 0.7 \
--obj_numb_per_img 10 \
--obj_area_scale 0.005 0.10 \
--output_dir dataset/bdd100k_AdjLab_XPa_aug_random \
--output_dir_prefix train
Visualize:
python visualize_yoloimg_v3.py \
--datasetdir dataset/bdd100k_AdjLab_XPa_aug_random/images/train \
--labeldir dataset/bdd100k_AdjLab_XPa_aug_random/labels/train \
--class_list car truck bus Humvee ICV M1A2T Mengshi PLZ-05A Sprinter ZTQ-15 \
--output_dir dataset/bdd100k_AdjLab_XPa_aug_random/train_vis
This section describes how to use the constructed augmented dataset to train and evaluate object detection models.
conda activate MMLab_ultralytics
cd ultralytics/
# Ultralytics Version: 8.4.14
Project configuration files are located in the config/ directory:
Dataset Configuration Files:
| File Name | Description |
|---|---|
| VisDrone_AdjLab.yaml | VisDrone Base Dataset Configuration |
| VisDrone_AdjLab_3DGe_aug_random.yaml | VisDrone 3DGe-Aug Augmented Dataset Configuration |
| VisDrone_AdjLab_XPa_aug_random.yaml | VisDrone XPaste Augmented Dataset Configuration |
| VisDrone_AdjLab_3DGe_XPa_aug_random.yaml | VisDrone 3DGe-XPa Augmented Dataset Configuration |
| bdd100k_AdjLab.yaml | BDD100K Base Dataset Configuration |
| bdd100k_AdjLab_3DGe_aug_random.yaml | BDD100K 3DGe-Aug Augmented Dataset Configuration |
| bdd100k_AdjLab_XPa_aug_random.yaml | BDD100K XPaste Augmented Dataset Configuration |
| bdd100k_AdjLab_3DGe_XPa_aug_random.yaml | BDD100K 3DGe-XPa Augmented Dataset Configuration |
Model Configuration Files:
| File Name | Description |
|---|---|
| default_DataAug.yaml | Default Training Configuration |
You can download pre-trained weights from the official Ultralytics Assets link and place them in the pt/ directory.
Relevant test models include:
- YOLOv5
- YOLOv8
- YOLO11
- YOLO26 (yolo26n, yolo26m, yolo26l, yolo26x)
- RT-DETR
For model naming conventions, please refer to the configuration files in the ultralytics/ultralytics/cfg/models/ directory.
python train_val_v3.py \
--model yolo26l \ # Model name
--model_pt pt/yolo26l.pt \ # Pre-trained weights path
--cfg config/default_DataAug.yaml \ # Training configuration file
--data config/VisDrone_AdjLab_3DGe_aug_random.yaml \ # Dataset configuration file
--fraction 1.0 \ # Dataset fraction to use
--epochs 50 \ # Number of training epochs
--batch 16 \ # Batch size
--imgsz 640 \ # Input image size
--split val \ # Validation set
--conf 0.001 \ # Confidence threshold
--trainsplit train_bymulti_5x5_3obj \ # Training set
--device 0 \ # GPU device ID
python train_val_v3.py \
--model yolo26l \
--model_pt pt/yolo26l.pt \
--cfg config/default_DataAug.yaml \
--data config/VisDrone_AdjLab.yaml \
--fraction 1.0 \
--epochs 50 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train \
--device 0
python train_val_v3.py \
--model yolo26l \
--model_pt pt/yolo26l.pt \
--cfg config/default_DataAug.yaml \
--data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
--fraction 1.0 \
--epochs 50 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_5x5_5obj \
--device 1
python train_val_v3.py \
--model yolo26l \
--model_pt pt/yolo26l.pt \
--cfg config/default_DataAug.yaml \
--data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
--fraction 1.0 \
--epochs 50 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_bymulti_5x5_5obj \
--device 2
python train_val_v3.py \
--model yolo26l \
--model_pt pt/yolo26l.pt \
--cfg config/default_DataAug.yaml \
--data config/VisDrone_AdjLab_XPa_aug_random.yaml \
--fraction 1.0 \
--epochs 50 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_5obj \
--device 2
python train_val_v3.py \
--model yolo26l \
--model_pt pt/yolo26l.pt \
--cfg config/default_DataAug.yaml \
--data config/VisDrone_AdjLab_3DGe_XPa_aug_random.yaml \
--fraction 1.0 \
--epochs 25 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_5obj \
--device 1
python train_val_v3.py \
--model yolo26l \
--model_pt pt/yolo26l.pt \
--cfg config/default_DataAug.yaml \
--data config/bdd100k_AdjLab.yaml \
--fraction 1.0 \
--epochs 10 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train \
--device 2
python train_val_v3.py \
--model yolo26l \
--model_pt pt/yolo26l.pt \
--cfg config/default_DataAug.yaml \
--data config/bdd100k_AdjLab_3DGe_aug_random.yaml \
--fraction 1.0 \
--epochs 10 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_5x5_5obj \
--device 1
python train_val_v3.py \
--model yolo26l \
--model_pt pt/yolo26l.pt \
--cfg config/default_DataAug.yaml \
--data config/bdd100k_AdjLab_3DGe_aug_random.yaml \
--fraction 1.0 \
--epochs 10 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_bymulti_5x5_5obj \
--device 1
python train_val_v3.py \
--model yolo26l \
--model_pt pt/yolo26l.pt \
--cfg config/default_DataAug.yaml \
--data config/bdd100k_AdjLab_XPa_aug_random.yaml \
--fraction 1.0 \
--epochs 10 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_5obj \
--device 1
python train_val_v3.py \
--model yolo26l \
--model_pt pt/yolo26l.pt \
--cfg config/default_DataAug.yaml \
--data config/bdd100k_AdjLab_3DGe_XPa_aug_random.yaml \
--fraction 1.0 \
--epochs 5 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_5obj \
--device 2
This section verifies the generalization capability of the 3DGe-Aug method across various object detection models.
Experimental Setup:
- Dataset: VisDrone
- Data Augmentation Method: 3DGe-Aug
- Training Set: train_bymulti_5x5_3obj
- Validation Set: val
python train_val_v3.py \
--model yolov5l \
--model_pt pt/yolov5lu.pt \
--cfg config/default_DataAug.yaml \
--data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
--fraction 1.0 \
--epochs 50 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_bymulti_5x5_3obj \
--device 1
python train_val_v3.py \
--model rtdetr-l \
--model_pt pt/rtdetr-l.pt \
--cfg config/default_DataAug.yaml \
--data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
--fraction 1.0 \
--epochs 50 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_bymulti_5x5_3obj \
--device 2
python train_val_v3.py \
--model yolov8l \
--model_pt pt/yolov8l.pt \
--cfg config/default_DataAug.yaml \
--data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
--fraction 1.0 \
--epochs 50 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_bymulti_5x5_3obj \
--device 0
python train_val_v3.py \
--model yolo26x \
--model_pt pt/yolo26x.pt \
--cfg config/default_DataAug.yaml \
--data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
--fraction 1.0 \
--epochs 50 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_bymulti_5x5_3obj \
--device 0
python train_val_v3.py \
--model yolo26m \
--model_pt pt/yolo26m.pt \
--cfg config/default_DataAug.yaml \
--data config/VisDrone_AdjLab_3DGe_aug_random.yaml \
--fraction 1.0 \
--epochs 50 \
--batch 16 \
--imgsz 640 \
--split val \
--conf 0.001 \
--trainsplit train_bymulti_5x5_3obj \
--device 1
If you use this code or dataset in your research, please consider citing our paper.
If you have any questions or suggestions, please contact us via email.