Benchmarking Different Natural Language Processing Models for Their Responses to Queries on Toothsupported Fixed Dental Prostheses in Terms of Accuracy and Consistency

Çolpak, Emine Dilara; Yilmaz, Deniz

Benchmarking Different Natural Language Processing Models for Their Responses to Queries on Toothsupported Fixed Dental Prostheses in Terms of Accuracy and Consistency

dc.contributor.author	Çolpak, Emine Dilara
dc.contributor.author	Yilmaz, Deniz
dc.date.accessioned	2026-01-24T12:00:53Z
dc.date.available	2026-01-24T12:00:53Z
dc.date.issued	2025
dc.department	Alanya Alaaddin Keykubat Üniversitesi
dc.description.abstract	Aim: This study aimed to evaluate the accuracy and consistency of responses generated by four different natural language processing (NLP) models to the queries on tooth-supported fixed dental prostheses. Materials and Method: Twelve open-ended questions in Turkish were created and posed to four different NLPs according to the following models: OpenAI o3 (LRM-O), OpenAI GPT 4.5 (LLM-G), DeepSeek R1 (LRM-R), and DeepSeek V3 (LLM-V) with pre- prompts in the morning, afternoon, and evening. The responses were evaluated with a holistic rubric. For accuracy assessments, the Kruskal–Wallis H test was used. Consistency between the graders’ responses was assessed using the Brennan and Prediger coefficient and the Cohen kappa coefficient. Consistency among LLMs was assessed using the Fleiss kappa and Krippendorff alpha coefficients (p < 0.05). Results: There was no statistically significant difference in accuracy between the LRM-O, LLM-G, LRM-R, and LLM-V groups (p = 0.30). The respective accuracies of LRM-O, LLM-G, LRM-R, and LLM-V were 77.7%, 50%, 66.6%, and 77.7%. In addition, the consistency among LLMs was found to be almost perfect, whereas that of LRMs was substantial. Conclusion: Within the limitations of the study, LRMs and LLMs exhibited similar accuracy. However, the consistency among LLMs was higher than that of LRMs.
dc.identifier.doi	10.54617/adoklinikbilimler.1698260
dc.identifier.endpage	223
dc.identifier.issn	1307-3540
dc.identifier.issue	3
dc.identifier.startpage	215
dc.identifier.trdizinid	1348559
dc.identifier.uri	https://search.trdizin.gov.tr/tr/yayin/detay/1348559
dc.identifier.uri	https://doi.org/10.54617/adoklinikbilimler.1698260
dc.identifier.uri	https://hdl.handle.net/20.500.12868/3802
dc.identifier.volume	14
dc.indekslendigikaynak	TR-Dizin
dc.language.iso	en
dc.relation.ispartof	ADO Klinik Bilimler Dergisi (online)
dc.relation.publicationcategory	Makale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/openAccess
dc.snmz	KA_TR-Dizin_20260121
dc.subject	Artificial intelligence
dc.subject	Dental prostheses
dc.subject	Treatment protocols
dc.title	Benchmarking Different Natural Language Processing Models for Their Responses to Queries on Toothsupported Fixed Dental Prostheses in Terms of Accuracy and Consistency
dc.type	Article

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: 10.54617-adoklinikbilimler.1698260-4865318.pdf
Boyut:: 306.94 KB
Biçim:: Adobe Portable Document Format

İndir

Koleksiyon

TR-Dizin İndeksli Yayınlar Koleksiyonu