ROPGCViT: A Novel Explainable Vision Transformer for Retinopathy of Prematurity Diagnosis
[ X ]
Tarih
2025
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Ieee-Inst Electrical Electronics Engineers Inc
Erişim Hakkı
info:eu-repo/semantics/openAccess
Özet
Retinopathy of Prematurity (ROP) is a severe disease that occurs in premature babies due to abnormal development of retinal vessels and can lead to permanent vision loss. Fundus images are critical in the diagnosis of ROP; however, the examination of fundus images is a subjective, time-consuming, and error-prone process that requires experience. This situation can lead to delayed diagnosis and inaccurate evaluations. Therefore, the need for computer-aided diagnosis (CAD) systems is increasing day by day. Deep learning (DL) methods have a high potential in analyzing such complex images. In this study, a total of 50 DL models, 25 Convolutional Neural Network (CNN), and 25 Vision Transformer (ViT) models were tested to diagnose ROP from fundus images. Furthermore, the ROPGCViT model based on the Global Context Vision Transformer (GCViT) was proposed. GCViT was enhanced with Squeeze-and-Excitation (SE) block and Residual Multilayer Perceptron (RMLP) structures to effectively learn local and global context information. With a dataset of 1099 fundus images, the performance of the model was evaluated in terms of accuracy, precision, recall, f1-score, and Cohen's kappa score. To enhance explainability, the Gradient-Weighted Class Activation Mapping (Grad-CAM) method was utilized to visualize the regions of fundus images the model focused on during classification, providing insights into its decision-making process. ROPGCViT outperformed both 50 DL models and methods in the literature with 94.69% accuracy, 94.84% precision, 94.69% recall, 94.60% f1-score, and Cohen's kappa score of 93.10%. Additionally, the Grad-CAM visualizations demonstrated the ability of the model to focus on clinically relevant regions, enhancing trust and interpretability for experts. The proposed ROPGCViT model provides a robust solution for ROP diagnosis with high accuracy, flexibility, and generalization capacity.
Açıklama
Anahtar Kelimeler
Solid modeling, Accuracy, Diseases, Feature extraction, Sensitivity, Pediatrics, Visualization, Support vector machines, Image segmentation, Discrete wavelet transforms, Retinopathy of prematurity, vision transformer, convolutional neural network, deep learning, squeeze-and-excitation, grad-CAM
Kaynak
Ieee Access
WoS Q Değeri
Q2
Scopus Q Değeri
Q1
Cilt
13












