Novel Hybrid Neural Network Architecture For Multi-modal Brain Tumor mpMRI Segmentation /
Muhammad Faizan
- 73p. Soft Copy, 30cm.
Medical image segmentation is a critical step in clinical decision-making, enabling precise localization of anatomical structures and lesions. While Convolutional Neural Networks, particularly U-shaped architectures like U-Net, have been popular in this domain, their limited receptive fields hinder the accurate delineation of anomalies with irregular shapes and sizes. Hybrid approaches integrating convolution and vision transformers Vision Transformers (ViTs) have demonstrated improved performance due to their ability to capture dependencies over an extended length. However, ViTs are computationally expensive, particularly for volumetric image segmentation, such as MRI, making them challenging to deploy on hardware with limited resources. To address these challenges, recent studies have revisited convolutional architectures, leveraging large kernel (LK) depth-wise convolution to emulate the hierarchical transformer’s behavior. Building on this direction, we propose 3D SegUXNet, a novel U-shaped encoder-decoder architecture for volumetric biomedical image segmentation. Our model introduces the SegUX block, which combines large kernel depth-wise and point-wise convolutions to enhance the receptive field while maintaining computational efficiency. The addition of a residual block further refines features, improving model robustness and generalization. Empirical results demonstrate that 3D SegUX-Net consistently outperforms state-of-the-art CNN and transformer methods on multiple benchmarks, including BraTS 2019, BraTS 2020, BraTS 2023, and organ segmentation of BTCV dataset. The proposed architecture establishes new SOTA performance in volumetric medical semantic segmentation, combining simplicity, efficiency, and scalability.