ROPGCViT: A Novel Explainable Vision Transformer for Retinopathy of Prematurity Diagnosis

dc.authorid0000-0003-0562-4931
dc.contributor.authorYurdakul, Mustafa
dc.contributor.authorUyar, Kubra
dc.contributor.authorTasdemir, Sakir
dc.contributor.authorAtabas, Irfan
dc.date.accessioned2026-01-24T12:29:00Z
dc.date.available2026-01-24T12:29:00Z
dc.date.issued2025
dc.departmentAlanya Alaaddin Keykubat Üniversitesi
dc.description.abstractRetinopathy of Prematurity (ROP) is a severe disease that occurs in premature babies due to abnormal development of retinal vessels and can lead to permanent vision loss. Fundus images are critical in the diagnosis of ROP; however, the examination of fundus images is a subjective, time-consuming, and error-prone process that requires experience. This situation can lead to delayed diagnosis and inaccurate evaluations. Therefore, the need for computer-aided diagnosis (CAD) systems is increasing day by day. Deep learning (DL) methods have a high potential in analyzing such complex images. In this study, a total of 50 DL models, 25 Convolutional Neural Network (CNN), and 25 Vision Transformer (ViT) models were tested to diagnose ROP from fundus images. Furthermore, the ROPGCViT model based on the Global Context Vision Transformer (GCViT) was proposed. GCViT was enhanced with Squeeze-and-Excitation (SE) block and Residual Multilayer Perceptron (RMLP) structures to effectively learn local and global context information. With a dataset of 1099 fundus images, the performance of the model was evaluated in terms of accuracy, precision, recall, f1-score, and Cohen's kappa score. To enhance explainability, the Gradient-Weighted Class Activation Mapping (Grad-CAM) method was utilized to visualize the regions of fundus images the model focused on during classification, providing insights into its decision-making process. ROPGCViT outperformed both 50 DL models and methods in the literature with 94.69% accuracy, 94.84% precision, 94.69% recall, 94.60% f1-score, and Cohen's kappa score of 93.10%. Additionally, the Grad-CAM visualizations demonstrated the ability of the model to focus on clinically relevant regions, enhancing trust and interpretability for experts. The proposed ROPGCViT model provides a robust solution for ROP diagnosis with high accuracy, flexibility, and generalization capacity.
dc.identifier.doi10.1109/ACCESS.2025.3564213
dc.identifier.endpage77079
dc.identifier.issn2169-3536
dc.identifier.scopus2-s2.0-105003646737
dc.identifier.scopusqualityQ1
dc.identifier.startpage77064
dc.identifier.urihttps://doi.org/10.1109/ACCESS.2025.3564213
dc.identifier.urihttps://hdl.handle.net/20.500.12868/5065
dc.identifier.volume13
dc.identifier.wosWOS:001483833000023
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.language.isoen
dc.publisherIeee-Inst Electrical Electronics Engineers Inc
dc.relation.ispartofIeee Access
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_WoS_20260121
dc.subjectSolid modeling
dc.subjectAccuracy
dc.subjectDiseases
dc.subjectFeature extraction
dc.subjectSensitivity
dc.subjectPediatrics
dc.subjectVisualization
dc.subjectSupport vector machines
dc.subjectImage segmentation
dc.subjectDiscrete wavelet transforms
dc.subjectRetinopathy of prematurity
dc.subjectvision transformer
dc.subjectconvolutional neural network
dc.subjectdeep learning
dc.subjectsqueeze-and-excitation
dc.subjectgrad-CAM
dc.titleROPGCViT: A Novel Explainable Vision Transformer for Retinopathy of Prematurity Diagnosis
dc.typeArticle

Dosyalar