ChatGPT-3.5 and ChatGPT-4 Performance in Testicular Cancer: A Comparative Study
[ X ]
Tarih
2025
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
info:eu-repo/semantics/openAccess
Özet
Objective: The aim of our study is to assess the reliability of Chat Generative Pre-trained Transformer (ChatGPT), compare the performance of ChatGPT-4 to ChatGPT-3.5, and explore its potential roles in healthcare decision-making. Materials and Methods: Thirty questions related to testicular cancer were prepared, based on the 2023 European Association of Urology guidelines and clinical experience. These questions were systematically posed to ChatGPT-3.5 and ChatGPT-4, and responses were rated by three independent urologists using a six-point Likert scale. The median score from the three specialists was used as the final score. Results: Both ChatGPT versions provided an incorrect answer to one question, scoring a one. For GPT-3.5 and GPT-4, the percentage of responses considered incorrect by the urologists was 20% and 13.3%, respectively, while correct responses (scoring 3 or higher) accounted for 80% and 86.7%. For general information-diagnosis questions, GPT-3.5 and GPT-4, had average scores of 4.29 and 4.80, with median values of 4.27 and 4.67. For treatment follow-up questions, average scores were 3.60 and 4.16, with median values of 3.60 and 4.20. GPT 4 generally outperformed GPT-3.5, but the difference was not statistically significant (p>0.05). Conclusion: Our study shows that ChatGPT-4 is more reliable and accurate than ChatGPT-3.5 in testicular cancer-related queries. Continued development of its database and clinical capabilities could optimize ChatGPT’s utility in healthcare.
Açıklama
Anahtar Kelimeler
Artificial intelligence, natural language processing, testicular cancer, ChatGPT
Kaynak
Üroonkoloji Bülteni
WoS Q Değeri
Scopus Q Değeri
Cilt
24
Sayı
2












