# Efficient cross scale fusion network SCW-YOLO for object detection in remote sensing imagery

## ABSTRACT
Traditional object detection algorithms often struggle to detect small objects in remote sensing images because of their small size and complex backgrounds. To address this, We propose a high performance remote sensing image object detection model SCW-YOLO based on the YOLOv8 model. Firstly, the model incorporated an Efficient Cross Scale Feature Pyramid Network (ECFPN) to enabled richer feature fusion without increasing computational costs caused by continuous downsampling by adding a new feature layer to the shallow network and directly outputting the backbone network features to the detection head. Additionally, a coordinate attention mechanism was employed to refine the backbone network by locally enhancing the features and reducing interference from redundant information. Finally, to further improve bounding box loss fitting and accelerate network convergence, a dynamic non-monotonic Wise-IoU (WIOU) loss function was introduced to replace the loss function of baseline. The experimental results indicated that SCW-YOLO outperformed most state-of-the-art (SOTA) models in parameter efficiency and small-object detection accuracy, confirming its robustness in detecting small targets in remote sensing images.

## Setup
Download [YOLOv8](https://github.com/ultralytics/ultralytics/tree/v8.1.6) code. Pip install ultralytics and dependencies and check software and hardware.

In [None]:
%pip install ultralytics
import ultralytics

ultralytics.checks()

## ECFPN
The neck structure is displayed as follows.

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 4], 1, Concat, [1]]  
  - [-1, 3, C2f, [256]] 
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 2], 1, Concat, [1]]  
  - [-1, 3, C2f, [128]]  
  - [-1, 1, Conv, [128, 3, 2]]
  - [[-1, 10, 4], 1, Concat, [1]]  
  - [-1, 3, C2f, [256]]  
  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 7], 1, Concat, [1]]  
  - [-1, 3, C2f, [512]]  
  - [[13, 16, 19], 1, Detect, [nc]]  # Detect(P2, P3, P4)

## Backbone Enhance
The backbone structure is displayed as follows.

backbone:
  - [-1, 1, Conv, [64, 3, 2]]  
  - [-1, 1, Conv, [128, 3, 2]]  
  - [-1, 3, C2f, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]  
  - [-1, 1, C2f_CooreA, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]  
  - [-1, 1, C2f_CooreA, [512, True]]
  - [-1, 1, SPPF, [512, 5]]

Similarly, the contents of the block.cy file in the modules file were added to the corresponding block.py file.

## WIOU
Fristly, Replace the original bbox_iou function with the bbox_iou function provided by metrics.py in the module file.

In [None]:
# Secondly, modify BboxLoss class in the loss.py file

# original
# iou = bbox_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, CIoU=True)
# loss_iou = ((1.0 - iou) * weight).sum() / target_scores_sum

# replace to
iou = bbox_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=False, WIoU=True, scale=True)
if type(iou) is tuple:
    if len(iou) == 2:
        loss_iou = ((1.0 - iou[0]) * iou[1].detach() * weight).sum() / target_scores_sum
    else:
        loss_iou = (iou[0] * iou[1] * weight).sum() / target_scores_sum
else:
    loss_iou = ((1.0 - iou) * weight).sum() / target_scores_sum

In [None]:
# Finally, modify bbox_iou in the tal.py file
# overlaps[mask_gt] = bbox_iou(gt_boxes, pd_boxes, xywh=False, CIoU=True).squeeze(-1).clamp_(0)

# repalce to 
overlaps[mask_gt] = bbox_iou(gt_boxes, pd_boxes, xywh=False).squeeze(-1).clamp_(0)


## Dataset

Download the datasets([DOTA](https://captain-whu.github.io/DOTA/dataset.html)) and place it in a location you know. Firstly, it is necessary to preprocess the DOTA dataset and select five types of research objects: small cars, large cars, airplanes, ships, and oil storage tanks, with target sizes less than 0.5% of the entire image.

In [None]:
# select five types of research objects
import os

lable_path = 'datapath/labelpath'
for file in os.listdir(lable_path): 
    with open(lable_path+file,"r") as f:
        lines = f.readlines()
        #print(lines)
    with open(lable_path+file,"w") as f_w:
        for line in lines:
            aa = line.split(' ')[0]
            print(aa)
            if aa != '2' and aa != '0' and aa != '1' and aa != '3' and aa != '4':
                continue
            f_w.write(line)

In [None]:
# Delete images with a target size greater than 0.5% of the entire image.
from pathlib import Path

def getGtAreaAndRatio(label_dir):
    data_dict = {}
    assert Path(label_dir).is_dir(), "label_dir is not exist"

    txts = os.listdir(label_dir)  

    for txt in txts:  
         with open(os.path.join(label_dir, txt), 'r') as f:  
            lines = f.readlines()
         for line in lines:  
               temp = line.split()  
               coor_list = list(map(lambda x: x, temp[1:])) 
               area = float(coor_list[2]) * float(coor_list[3])  
                  
               ratio = round(float(coor_list[2]) / float(coor_list[3]), 2)  

               if temp[0] not in data_dict:
                  data_dict[temp[0]] = {}
                  data_dict[temp[0]]['area'] = []
                  data_dict[temp[0]]['ratio'] = []
                  data_dict[temp[0]]['name'] = []

               data_dict[temp[0]]['area'].append(area)
               data_dict[temp[0]]['ratio'].append(ratio)
               data_dict[temp[0]]['name'].append(txt)

    return data_dict


def getSMLGtNumByClass(data_dict, class_num):
    s, m, l = 0, 0, 0
    h = 1024
    w = 1024
    i = -1
    for item in data_dict['{}'.format(class_num)]['area']:
        i+=1
        filenames = data_dict['{}'.format(class_num)]['name']
        # print(filenames[i])
        if item * h * w <= h * w * 0.005:
            s += 1
        elif item * h * w <= h * w * 0.010:
            m += 1
            print(labeldir+filenames[i], item)
            if os.path.exists(labeldir+filenames[i]):
                os.remove(labeldir+filenames[i])
        else:
            l += 1
            print(labeldir+filenames[i], item)
            if os.path.exists(labeldir+filenames[i]):
                os.remove(labeldir+filenames[i])
            
    return s, m, l


def getAllSMLGtNum(data_dict, isEachClass=False):
    S, M, L = 0, 0, 0
    
    classDict = {'0': {'S': 0, 'M': 0, 'L': 0}, '1': {'S': 0, 'M': 0, 'L': 0}, '2': {'S': 0, 'M': 0, 'L': 0},
                 '3': {'S': 0, 'M': 0, 'L': 0}, '4': {'S': 0, 'M': 0, 'L': 0}}

    if isEachClass == False:
        for i in range(1):
            s, m, l = getSMLGtNumByClass(data_dict, i)
            S += s
            M += m
            L += l
        return [S, M, L]
    else:
        for i in range(5):
            S = 0
            M = 0
            L = 0
            total = 0
            s, m, l = getSMLGtNumByClass(data_dict, i)
            S += s
            M += m
            L += l
            classDict[str(i)]['S'] = S
            classDict[str(i)]['M'] = M
            classDict[str(i)]['L'] = L
            total += (S+M+L)
            print("total: ", total)
        return classDict


if __name__ == '__main__':
    labeldir = 'datapath'
    data_dict = getGtAreaAndRatio(labeldir)
    isEachClass = True
    SML = getAllSMLGtNum(data_dict, isEachClass)
    print(SML)

## Training


In [None]:
from ultralytics import YOLO

if __name__ =='__main__':

    data = "data.yaml"

    model = YOLO('scw-yolo.yaml')
    model.train(data=data, epochs=200, imgsz=640, batch=8)
