ROPGCViT: A Novel Explainable Vision Transformer for Retinopathy of Prematurity Diagnosis

Retinopathy of Prematurity (ROP) is a severe disease that occurs in premature babies due to abnormal development of retinal vessels and can lead to permanent vision loss. Fundus images are critical in the diagnosis of ROP; however, the examination of fundus images is a subjective, time-consuming, and error-prone process that requires experience. This situation can lead to delayed diagnosis and inaccurate evaluations. Therefore, the need for computer-aided diagnosis (CAD) systems is increasing day by day. Deep learning (DL) methods have a high potential in analyzing such complex images. In this study, a total of 50 DL models, 25 Convolutional Neural Network (CNN), and 25 Vision Transformer (ViT) models were tested to diagnose ROP from fundus images. Furthermore, the ROPGCViT model based on the Global Context Vision Transformer (GCViT) was proposed. GCViT was enhanced with Squeeze-and-Excitation (SE) block and Residual Multilayer Perceptron (RMLP) structures to effectively learn local and global context information. With a dataset of 1099 fundus images, the performance of the model was evaluated in terms of accuracy, precision, recall, f1-score, and Cohen's kappa score. To enhance explainability, the Gradient-Weighted Class Activation Mapping (Grad-CAM) method was utilized to visualize the regions of fundus images the model focused on during classification, providing insights into its decision-making process. ROPGCViT outperformed both 50 DL models and methods in the literature with 94.69% accuracy, 94.84% precision, 94.69% recall, 94.60% f1-score, and Cohen's kappa score of 93.10%. Additionally, the Grad-CAM visualizations demonstrated the ability of the model to focus on clinically relevant regions, enhancing trust and interpretability for experts. The proposed ROPGCViT model provides a robust solution for ROP diagnosis with high accuracy, flexibility, and generalization capacity.

Anahtar Kelimeler

Solid modeling, Accuracy, Diseases, Feature extraction, Sensitivity, Pediatrics, Visualization, Support vector machines, Image segmentation, Discrete wavelet transforms, Retinopathy of prematurity, vision transformer, convolutional neural network, deep learning, squeeze-and-excitation, grad-CAM

Kaynak

Ieee Access

WoS Q Değeri

Q2

Scopus Q Değeri

Q1

Cilt

13

Bağlantı

https://doi.org/10.1109/ACCESS.2025.3564213
https://hdl.handle.net/20.500.12868/5065

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu

Detaylı Öğe Kaydı

ROPGCViT: A Novel Explainable Vision Transformer for Retinopathy of Prematurity Diagnosis

Dosyalar

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama