Hybrid CNN-Transformer Architectures for Efficient Image Segmentation and Object Recognition
DOI:
https://doi.org/10.64149/J.Carcinog.24.3s.475-483Keywords:
Hybrid CNN-Transformer, Image Segmentation, Object Detection, Deep LearningAbstract
Deep learning has revolutionized image segmentation and object recognition tasks by replacing manual feature extraction
with automated, data-driven techniques. Convolutional Neural Networks (CNNs) have emerged as the dominant
architecture for extracting spatially localized features due to their ability to leverage locality and weight-sharing properties.
Despite their success, CNNs often struggle to capture long-range dependencies, which are crucial for accurately segmenting
complex structures in high-dimensional data such as 3D medical images. This limitation motivates the integration of
complementary architectures that can model global relationships effectively. In this work, we propose a hybrid CNN
Transformer framework designed to enhance both image segmentation and object recognition performance. The CNN
component is responsible for generating robust, hierarchical feature representations, while the Transformer module
leverages self-attention mechanisms to capture long-range dependencies across the entire feature map. By combining local
feature extraction with global context modeling, the proposed approach achieves superior accuracy and efficiency
compared to conventional CNN-based methods. Experimental results demonstrate that this architecture delivers significant
improvements in segmentation precision and object recognition robustness, making it well-suited for real-world medical
imaging and computer vision applications.




