Deep neural networks are becoming crucial in many cyber–physical systems involving complex perceptual tasks. For those embedded systems requiring real-time interactions with dynamic environments, as autonomous robots and drones, it is of paramount importance that such algorithms are efficiently executed onboard on properly designed hardware accelerators to meet the required performance specifications. In particular, some neural network architectures for object detection and tracking, as You Only Look Once (YOLO), include heavy computational stages that need to be executed before and after the model inference. Such stages are typically not incorporated in traditional accelerators and are executed on general-purpose processors, thus introducing a bottleneck in the overall processing pipeline. To overcome such a problem, this paper presents a general-purpose accelerator on a field-programmable gate array (FPGA) able to run pre-processing and post-processing operations typically required by vision tasks. The proposed solution has been tested in combination with a YOLO object detector accelerated on an Advanced Micro Devices (AMD) Xilinx Kria KR260 board mounting an UltraScale+ multiprocessor system-on-chip, achieving a significant improvement in terms of both timing performance and power consumption, and enabling onboard visual processing into drones. The proposed solution is able to boost the traditional object detection process by a factor of 4.4, allowing the execution of the full processing pipeline at 60 frames per second (fps), versus 13.6 fps reachable without the proposed accelerator. As a result, this work enables the use of high-speed cameras for developing more reactive systems that can respond to incoming events with lower latency.
A hardware accelerator to support deep learning processor units in real-time image processing
Cittadini, Edoardo
Primo
;Marinoni, Mauro
Secondo
;Buttazzo, GiorgioUltimo
2025-01-01
Abstract
Deep neural networks are becoming crucial in many cyber–physical systems involving complex perceptual tasks. For those embedded systems requiring real-time interactions with dynamic environments, as autonomous robots and drones, it is of paramount importance that such algorithms are efficiently executed onboard on properly designed hardware accelerators to meet the required performance specifications. In particular, some neural network architectures for object detection and tracking, as You Only Look Once (YOLO), include heavy computational stages that need to be executed before and after the model inference. Such stages are typically not incorporated in traditional accelerators and are executed on general-purpose processors, thus introducing a bottleneck in the overall processing pipeline. To overcome such a problem, this paper presents a general-purpose accelerator on a field-programmable gate array (FPGA) able to run pre-processing and post-processing operations typically required by vision tasks. The proposed solution has been tested in combination with a YOLO object detector accelerated on an Advanced Micro Devices (AMD) Xilinx Kria KR260 board mounting an UltraScale+ multiprocessor system-on-chip, achieving a significant improvement in terms of both timing performance and power consumption, and enabling onboard visual processing into drones. The proposed solution is able to boost the traditional object detection process by a factor of 4.4, allowing the execution of the full processing pipeline at 60 frames per second (fps), versus 13.6 fps reachable without the proposed accelerator. As a result, this work enables the use of high-speed cameras for developing more reactive systems that can respond to incoming events with lower latency.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S0952197625001599-main.pdf
accesso aperto
Tipologia:
PDF Editoriale
Licenza:
Creative commons (selezionare)
Dimensione
3.78 MB
Formato
Adobe PDF
|
3.78 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.