Chatgpt Vs. Google Gemini: Assessment of Performance Regarding the Accuracy and Repeatability of Responses to Questions in Implant-Supported Prostheses

dc.contributor.authorYilmaz, Deniz
dc.contributor.authorÇolpak, Emine Dilara
dc.date.accessioned2026-01-24T12:01:06Z
dc.date.available2026-01-24T12:01:06Z
dc.date.issued2025
dc.departmentAlanya Alaaddin Keykubat Üniversitesi
dc.description.abstractPurpose: This study aimed to assess the accuracy and repeatability of the responses of different large language models (LLMs) to questions regarding implant-supported prostheses and assess the impact of pre-prompting and the time of day. Materials and Methods: A total of 12 open-ended questions related to implant-supported prostheses were generated. The content validity of questions was verified by a specialist. Following that, questions were posed to two different LLMs: ChatGPT-4.0 and Google Gemini (morning, afternoon, and evening; with and without pre-prompt). The responses were evaluated by two expert prosthodontists with a holistic rubric. The concordance between the graders’ responses and repeated responses by ChatGPT-4.0 and Gemini was calculated using the Brennan and Prediger coefficient, Cohen’s kappa coefficient, Fleiss’s kappa, and Krippendorff’s alpha coefficients. Kruskal-Wallis, Mann-Whitney U, and independent t-test, as well as ANOVA analyses, were used to compare the responses obtained in the implementations. Results: The results displayed that the accuracies of ChatGPT and Google Gemini were 34.7% and 17.4%, respectively. The implementation of pre-prompt significantly increased accuracy in Gemini (p = 0.026). No significant difference was found according to the time of day (morning, afternoon, or evening) or inter-week implementations. In addition, inter-rater reliability and repeatability displayed high levels of consistency. Conclusions: The use of pre-prompt positively affected accuracy and repeatability in both ChatGPT and Google Gemini. However, LLMs can still produce hallucinations. Therefore, LLMs may assist clinicians, but they should be aware of these limitations.
dc.identifier.doi10.52037/eads.2025.0011
dc.identifier.endpage78
dc.identifier.issn2757-6744
dc.identifier.issue2
dc.identifier.startpage71
dc.identifier.trdizinid1339523
dc.identifier.urihttps://search.trdizin.gov.tr/tr/yayin/detay/1339523
dc.identifier.urihttps://doi.org/10.52037/eads.2025.0011
dc.identifier.urihttps://hdl.handle.net/20.500.12868/3997
dc.identifier.volume52
dc.indekslendigikaynakTR-Dizin
dc.language.isoen
dc.relation.ispartofEuropean annals of dental sciences (Online)
dc.relation.publicationcategoryMakale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_TR-Dizin_20260121
dc.subjectChatbot
dc.subjectImplant
dc.subjectChatGPT
dc.subjectProstheses
dc.titleChatgpt Vs. Google Gemini: Assessment of Performance Regarding the Accuracy and Repeatability of Responses to Questions in Implant-Supported Prostheses
dc.typeArticle

Dosyalar