Theocharis Theocharides

University of Cyprus, Cyprus

Dynamic Convolutional Neural Networks for Embedded Computer Vision

Abstract

While computer vision has seen unparalleled growth over recent years, mainly due to advancements in deep convolutional neural networks, deploying such models on edge devices such as tiny autonomous unmanned aerial and terrestrial vehicles has been challenging, due to constraints involving power and energy, computational resources and memory footprint, and application performance requirements. Applications in safety-critical missions such as search and rescue operations and emergency management, elevate these challenges, as they also impose performance constraints, in addition to visual constrains associated with the inference due to occlusions, changing environmental parameters, and dynamic operational and situational context. Current practice involves performing training of the CNNs in a way that generalizes to all these constraints and optimizing the model to target an embedded processing device, custom accelerator or an embedded multi-processor platform. While methods such as pruning (structured and unstructured) and quantization have achieved various levels of success in compressing and optimizing deep CNNs for embedded devices, they remain subject to the computational boundaries imposed by the host platforms.

Recent research, however, proposes dynamic deep neural networks for enhancing adaptability and computational efficiency. For example, deformable convolutional layers to dynamically adjust receptive fields, optimizing detection for dense and small objects has been proposed. Further, an adaptive sparse convolutional network enhanced with global context, allowing dynamic adjustment of sparsity to reduce computation while retaining accuracy has also been proposed. The use of early exit deep neural networks has also been introduced to potentially reduce the computational complexity, as the ability to terminate computation in an early stage, results in potential gains in performance, and savings in terms of energy. Dynamic, deep CNNs are trained in a way that allows them to recognize features based on feature maps acquired from the earlier layers in the computation, and their dynamic, early exit inference, facilitates efficiency in terms of energy consumption, and performance.

Leveraging from the benefits of adaptive inference in dynamic deep CNNs, in this talk I will present our recent work in dynamic deep convolutional neural networks targeting low power UAVs and UTVs. In particular, we exploit the altitude (and the distance) of the camera sensor from the region of interest (i.e. terrain) and formulate it as context. Then, we train a controller via reinforcement-learning, which dynamically selects the optimal CNN parameters (such as channel width, input resolution and kernel mask sizes and weights) during each inference. I will also present our findings regarding the frequency of dynamically changing such parameters with regards to accuracy, energy and performance. Lastly, I will provide our results when applying our dynamic approach using dynamic deep CNNs for UAV applications, particularly vehicle and pedestrian detection, and buoy identification for naval search and rescue operations, when deploying the dynamic CNN models on NVIDIA’s Jetson Orrin board and AMD’s Kria KV260 FPGA board. Encouragingly, when comparing our approach against a unified, static CNN model, we in fact witness a modest accuracy improvement (0.7%-1.6%), while substantially reducing the number of MAC operations (by 18%-33%), memory usage (by 15.5%-30%), overall power consumption (26.5%-45%), and latency (17.5%-61%). These results highlight the capability of context-aware dynamic CNN models to better match the computational demands to the difficulty of the task at hand when implemented on low-power and resource-constrained embedded devices.

Download

MPSoC_2025_Theo_1.pdf

If you wish to modify any information or update your photo, please contact Web Chair Arief Wicaksana.