Skip to content

Aerial Object Detection in Earth Vision: A model for monitoring harbour ports and airbases using satellite. ๐Ÿ›ฐ๏ธโœˆ๏ธ

License

Notifications You must be signed in to change notification settings

Hamzawp/AerialTect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

16 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

AerialTect


Aerial Object Detection in Earth Vision: An accurate and reliable model for monitoring harbour ports and airbases using satellite.

Explore the docs ยป

Architecture ย ย ยทย ย  Features ย ย ยทย ย  Local Setup

๐Ÿ“Œ Table of Contents


๐Ÿ“– Overview

The objective of this project is to accurately detect and classify objects such as ships, harbours, planes, jets, and other surveillance-relevant entities in aerial and satellite imagery. Aerial object detection presents unique challenges, including small object sizes, high object density, diverse orientations, and complex backgrounds. To tackle these issues, the project explores and evaluates multiple object detection architectures and employs an ensemble strategy to enhance detection accuracy and robustness across varying scenarios.


๐Ÿ—‚ Dataset

For this project, we utilized DOTA 1.5 (Dataset for Object Detection in Aerial Images), a large-scale and richly annotated dataset tailored for aerial object detection tasks. DOTA provides high-resolution satellite and aerial imagery that closely mirrors real-world surveillance scenarios.

Key Features:

  • Total Images: 1,418 training images and 412 testing images
  • Source Diversity: Images captured from a variety of sensors and platforms
  • Annotation Format: Oriented bounding boxes to accurately capture object orientation
  • Object Categories: 15 real-world classes including ships, harbours, airplanes, helicopters, and other surveillance-critical objects.
  • Challenges Addressed: Small object sizes, dense object distributions, cluttered backgrounds, and wide-ranging orientations

๐Ÿ“š More about DOTA: https://captain-whu.github.io/DOTA/


๐Ÿง  Models Implemented

To effectively detect objects such as ships, harbours, aircraft, and other surveillance targets in aerial imagery, we implemented three deep learning-based object detection modelsโ€”SSD, YOLO, and Faster R-CNNโ€”each trained and evaluated independently on the DOTA 1.5 dataset.

  • SSD (Single Shot Multibox Detector): SSD offers a balance between speed and accuracy through a single-stage detection framework that predicts bounding boxes and classes simultaneously. However, a major drawback of SSD is its relatively poor performance in detecting small objects, which are common in aerial imagery. Its fixed-scale anchor boxes and reliance on lower-resolution feature maps often result in missed detections or inaccurate localization. Due to these limitations, other models were explored to improve detection of fine-grained and densely packed targets.
  • YOLO (You Only Look Once) YOLO is known for its exceptional speed and real-time processing capabilities. It treats detection as a regression problem and predicts bounding boxes and class probabilities directly from the entire image. However, YOLO also struggles with detecting small or overlapping objects, especially in cluttered scenes, due to its coarse grid-based prediction mechanism. Despite this, its fast inference time made it a valuable benchmark and a suitable candidate for scenarios requiring quick surveillance scans.
  • Faster R-CNN (FRCNN) Faster R-CNN uses a two-stage detection process: a Region Proposal Network (RPN) to generate candidate object regions, followed by classification and bounding box refinement. This architecture achieves high accuracy and better localization, especially for small, overlapping, and rotated objectsโ€”making it highly suitable for aerial surveillance. The main drawback, however, is its higher computational cost and slower inference speed, which may not be ideal for real-time applications.

Comparitive Insight While YOLO significantly outperformed the other models in terms of speed and efficiency, it lagged behind in detecting small and overlapping objects. On the other hand, Faster R-CNN, although slower, consistently delivered higher accuracy and better localization, especially in complex aerial scenes. This trade-off between speed and precision is a critical consideration when choosing the appropriate model for real-world aerial object detection tasks.


๐Ÿงฎ Algorithms Used

๐Ÿ”น SSD (Single Shot Multibox Detector)

SSD is a fast one-stage detector that predicts object classes and bounding boxes directly from multiple feature maps. It offers a good balance between speed and accuracy.

๐Ÿ”น YOLOv2 (You Only Look Once)

YOLOv2 performs real-time object detection by predicting bounding boxes and classes in a single pass, making it highly efficient for fast inference.

๐Ÿ”น Faster R-CNN

Faster R-CNN is a two-stage detector that uses a Region Proposal Network (RPN) to generate object candidates and then classifies them. Itโ€™s known for its high accuracy.


๐Ÿงฑ Model Architectures

Visual representations of the object detection model architectures used in this project:

๐Ÿ”น SSD (Single Shot Multibox Detector)

image

๐Ÿ”น YOLO (You Only Look Once)

image

๐Ÿ”น Faster R-CNN

image


๐Ÿ“Š Performance

Model mAP Recall (%)
SSD 26 47.61
YOLO 34 31.51
Faster R-CNN 51 57.32
Ensemble (WBF) 57 63.41

Results

๐Ÿ”น SSD (Single Shot Multibox Detector)

image

๐Ÿ”น YOLO (You Only Look Once)

image

๐Ÿ”น Faster R-CNN

image


๐Ÿ”€ Ensemble Method

To improve detection performance, we combined the outputs of multiple object detection models using Weighted Boxes Fusion (WBF).

โœ… Why Ensemble?

Each model has its strengths:

  • SSD is fast but may miss small objects.
  • YOLO is balanced for speed and accuracy.
  • Faster R-CNN is highly accurate but slower.

By merging predictions from all three, we maximize both precision and recall.

๐Ÿ”ง Technique Used: Weighted Boxes Fusion (WBF)

WBF takes overlapping predictions from different models and produces a more accurate bounding box based on confidence scores and spatial alignment.

๐Ÿ“ˆ Ensemble Workflow

The following diagram illustrates how we ensemble the predictions from SSD, YOLO, and FRCNN to generate the final result:

image


โš™๏ธ Installation

git clone https://github.com/Hamzawp/AerialTect.git
pip install -r requirements.txt
#This model implementation is done on (Kindly check requirements.txt)
torch==2.5.1+cu124
torchvision==0.20.1+cu124
torchaudio==2.5.1+cu124

Project Structure

The project consists of the following files:

  • SSD_Implementation.ipynb: Implementation of the SSD model.
  • Yolo_Implementation.ipynb: Implementation of the YOLO model.
  • FRCNN_Implementation.ipynb: Implementation of the Faster R-CNN model.
  • Ensemble_Implementation.ipynb: Combining predictions from multiple models using ensemble methods.
  • Testing.ipynb: For testing and evaluating the models.
  • Playground.ipynb: To have fun and experience on models.

Authors


License ๐Ÿ“œ

GNU License

About

Aerial Object Detection in Earth Vision: A model for monitoring harbour ports and airbases using satellite. ๐Ÿ›ฐ๏ธโœˆ๏ธ

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages