Chatgpt Vs. Google Gemini: Assessment of Performance Regarding the Accuracy and Repeatability of Responses to Questions in Implant-Supported Prostheses

Yilmaz, Deniz; Çolpak, Emine Dilara

Chatgpt Vs. Google Gemini: Assessment of Performance Regarding the Accuracy and Repeatability of Responses to Questions in Implant-Supported Prostheses

dc.contributor.author	Yilmaz, Deniz
dc.contributor.author	Çolpak, Emine Dilara
dc.date.accessioned	2026-01-24T12:01:06Z
dc.date.available	2026-01-24T12:01:06Z
dc.date.issued	2025
dc.department	Alanya Alaaddin Keykubat Üniversitesi
dc.description.abstract	Purpose: This study aimed to assess the accuracy and repeatability of the responses of different large language models (LLMs) to questions regarding implant-supported prostheses and assess the impact of pre-prompting and the time of day. Materials and Methods: A total of 12 open-ended questions related to implant-supported prostheses were generated. The content validity of questions was verified by a specialist. Following that, questions were posed to two different LLMs: ChatGPT-4.0 and Google Gemini (morning, afternoon, and evening; with and without pre-prompt). The responses were evaluated by two expert prosthodontists with a holistic rubric. The concordance between the graders’ responses and repeated responses by ChatGPT-4.0 and Gemini was calculated using the Brennan and Prediger coefficient, Cohen’s kappa coefficient, Fleiss’s kappa, and Krippendorff’s alpha coefficients. Kruskal-Wallis, Mann-Whitney U, and independent t-test, as well as ANOVA analyses, were used to compare the responses obtained in the implementations. Results: The results displayed that the accuracies of ChatGPT and Google Gemini were 34.7% and 17.4%, respectively. The implementation of pre-prompt significantly increased accuracy in Gemini (p = 0.026). No significant difference was found according to the time of day (morning, afternoon, or evening) or inter-week implementations. In addition, inter-rater reliability and repeatability displayed high levels of consistency. Conclusions: The use of pre-prompt positively affected accuracy and repeatability in both ChatGPT and Google Gemini. However, LLMs can still produce hallucinations. Therefore, LLMs may assist clinicians, but they should be aware of these limitations.
dc.identifier.doi	10.52037/eads.2025.0011
dc.identifier.endpage	78
dc.identifier.issn	2757-6744
dc.identifier.issue	2
dc.identifier.startpage	71
dc.identifier.trdizinid	1339523
dc.identifier.uri	https://search.trdizin.gov.tr/tr/yayin/detay/1339523
dc.identifier.uri	https://doi.org/10.52037/eads.2025.0011
dc.identifier.uri	https://hdl.handle.net/20.500.12868/3997
dc.identifier.volume	52
dc.indekslendigikaynak	TR-Dizin
dc.language.iso	en
dc.relation.ispartof	European annals of dental sciences (Online)
dc.relation.publicationcategory	Makale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	KA_TR-Dizin_20260121
dc.subject	Chatbot
dc.subject	Implant
dc.subject	ChatGPT
dc.subject	Prostheses
dc.title	Chatgpt Vs. Google Gemini: Assessment of Performance Regarding the Accuracy and Repeatability of Responses to Questions in Implant-Supported Prostheses
dc.type	Article

Koleksiyon

TR-Dizin İndeksli Yayınlar Koleksiyonu

Chatgpt Vs. Google Gemini: Assessment of Performance Regarding the Accuracy and Repeatability of Responses to Questions in Implant-Supported Prostheses

Dosyalar

Koleksiyon