Small-object detection in remote sensing images and video
Ανίχνευση μικρών αντικειμένων σε εικόνες και βίντεο τηλεπισκόπησης
Master Thesis
Author
Kotrotsios, Orestis
Κοτρώτσιος, Ορέστης
Date
2024-05View/ Open
Keywords
Remote sensing images ; Object detection ; Deep learning ; Machine learningAbstract
Object detection in remote sensing images has been a challenging problem for the computer vision research community because the objects in such images have very few pixels (10-20 pixels). There have been many improvements in the mean Average Precision (mAP) of the models using different techniques, but all these improvements come at a cost. The detection models are becoming bigger, something that can cause a problem especially when someone wants to apply a detection model in a satellite or an Unmanned Aerial Vehicle, since their computation capabilities are limited.
This thesis proposes a versatile network-level gradient path design that can be applied in both single-stage and multi-stage models with an architecture similar to “PaNet”. This method reduces the computational requirements of the model by utilizing only half of the feature map from the backbone in the neck stage, while the other half bypasses the neck stage entirely. This way we create a gradient path that connects the prediction heads to the backbone layers directly, therefore minimizing information loss due to lengthy gradient paths.
The proposed methodology was tested on the two-stage model “PaNet” and the single-stage model “TPH-YOLOv5”. The datasets that were used for the evaluation of the models with the proposed method, were the Microsoft Common Object in COntext (MS COCO), VisDrone and Aerial Image Tiny Object detection (AI-TOD).
The proposed method achieved a reduction in GFLOPs (Giga Floating Point Operations Per Second) on “PaNet: by a 9.51% while on the “TPH-YOLOv5” the reduction was 32.67%.
At the same time the mean Average Precision of the “PaNet” with the proposed method was also reduced by 5.7% at an Intersection over Union (IoU) threshold of 50% and by 3.2% at the average mAP across IoU threshold from 50% to 95%, on the MS COCO dataset. Furthermore, the mAP of “PaNet” with the proposed method was reduced by 7.8% and the average mAP by 3.6% on the AI-TOD dataset.
On the contrary, the “TPH-YOLOv5” with the proposed method had a reduction of only 1.6% on both mAP and average mAP on the VisDrone dataset. Additionally, on the AI-TOD dataset the proposed method performed better than the original by 6.3% mAP and by 2.4% at the average mAP.