Performance Evaluation of CNN, ViT, and Hybrid Models in CT Based Brain Stroke Classification
No Thumbnail Available
Date
2025-08-25
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Mekelle University
Abstract
Medical image classification is the use of AI tools to automatically detect disease, and segment affected body section which ultimately assist medical experts diagnose disease easily. So far, CNNs have played a major role in the field of medical image processing. Recently, another deep learning approach known as transformers have evolved, and outperformed CNN models; especially, when trained with enough amount of data. These models were originally developed for natural language processing. But, they have shown promising results in the computer vision also. This research investigates the performance of the standalone CNN and vision transformer models, hybrid these deep learning approaches to entertain the advantage of both at the same time, evaluate performance of the hybrid models, and ends up comparing the performance of the standalone and hybrid models. The research was conducted by gathering CT scan image data of stroke disease from an online repository i.e. kaggle. Eight models were trained and evaluated, including two CNN (ConvNext, EfficientNet), two transformers (Swin, ViT), and four hybrid models (ConvNext + Swin, ConvNext + ViT, Efficient + Swin, Efficient + ViT). Several metrics such as accuracy, precision, recall, f1 score, and ROCAUC plot were used to compare the performance of the models. In general, the hybrid models have outperformed the standalone models. Specifically, the ConvNext + Swin outperformed all with an accuracy of 95.42%, and AUC of 0.99. Overall, the findings shows that hybrid models are more preferable for better accuracy in classification tasks, and Swin based architectures are less prone to over fitting.
Description
Keywords
CNN, ConvNext, Efficient Net, Hybrid Models, Image Classification, Swin Transformers, ViT.
