Evaluation of artificial intelligence tools in diabetes treatment management: a comparison of ChatGPT 5.2 and Gemini Pro
AI tools in diabetes treatment management
Authors
Abstract
Aim Diabetes Mellitus is a chronic and wide-spectrum metabolic disorder. Clinical Decision Support Systems are increasingly being used by physicians and patients in the management of diabetes.
Methods Thirty questions have been prepared regarding oral antidiabetic drugs and insulins used in diabetes treatment. The questions are grouped under four headings: basic pharmacology of the drug (6 questions), side effects and safety profile (6 questions), clinical use and management of the drug (12 questions), and patient education (6 questions). The prepared questions were then directed to ChatGPT 5.2 and Gemini Pro. The Global Quality Score was used to evaluate the responses to the questions.
ResultsWhen examining the responses generated by ChatGPT 5.2, it was observed that 33.33% of the questions under the basic pharmacology heading had correct but insufficient content, while 66.66% had correct and comprehensive content; no responses containing incorrect content were found under this heading. When Gemini Pro responses were evaluated, it was determined that 16.66% of the questions under the basic pharmacology heading contained correct but insufficient content, while 83.33% contained correct and comprehensive content; under the side effects and safety profile heading.
Conclusion Both artificial intelligence tools have potential as clinical decision support tools in diabetes management. Both artificial intelligence tools were found to be potentially useful as supplementary resources for healthcare professionals in clinical decision-making processes and patient education.
Keywords
Introduction
Diabetes Mellitus (DM) is a chronic, wide-spectrum metabolic disorder characterized by hyperglycemia, caused by relative or absolute insulin deficiency. According to data from the International Diabetes Federation (IDF), the number of adults with diabetes, which was 537 million in 2021, is projected to reach 783 million by 2045.1 DM poses a serious economic burden on healthcare systems due to its significant mortality and morbidity rates.2 Lifestyle modifications play a crucial role in the management of DM.3 Artificial intelligence(AI)–supported applications can be integrated into patient education and self-management strategies.4 Novel applications for DM treatment are being developed using machine learning approaches.5 Predictive clinical decision support systems(CDSS) for chronic diseases have been developed and successfully integrated with electronic health record systems.6 This situation necessitates the use of new technologies to improve the patient's quality of life in DM management.2
The use of AI tools for the prediction and prevention of diabetes and DM-related complications has been increasing steadily.7 Intelligent CDSS play a significant role in diagnosis and treatment management by aiming to improve patient safety and reduce treatment costs.8 A review including studies published between January 2018 and November 2023 reported that the use of AI in CDSS may optimize medical treatment by improving diagnostic accuracy.9 The performance of CDSS can be enhanced through the utilization of machine learning–based algorithms.10 In the management of type 1 DM, AI–supported CDSS can be used for insulin dose adjustment, as well as the prediction and prevention of hypoglycemia.11 In parallel with the expected increase in diabetes prevalence in the coming years, the use of AI–supported CDSS is also anticipated to increase.12 Furthermore, wearable and portable devices are expected to contribute to the development of AI–supported CDSS that provide personalized recommendations for DM management.13
A review including studies published between January 2020 and November 2024 reported that clinical reliability, human-centered design, and system transparency in AI–supported CDSS may help increase trust in these systems.14 Trust is considered one of the most important factors influencing health care professionals’ adoption and use of AI–supported CDSS.15 User acceptance is another critical determinant of the success of AI–supported CDSS.16 Another important factor is the explainability of the system.17 AI–supported applications are also being used to optimize medication alerts in hospital settings.18
Clinicians can also use their interpretation skills to benefit from CDSS during the treatment process.19 Therefore, there is a need for studies on the clinical management of DM using commonly used AI applications such as ChatGPT 5.2 and Gemini Pro. Although there are studies on the use of these two different AI applications in DM management, these studies have various limitations. The aim of this study is to analyze the quality and reliability of the responses provided by ChatGPT 5.2 and Gemini Pro to questions prepared about oral antidiabetic drugs and insulins.
Materials and Methods
Thirty questions have been prepared regarding oral antidiabetic drugs and insulins used in diabetes treatment. The questions are grouped under four headings: basic pharmacology of the drug (6 questions), side effects and safety profile (6 questions), clinical use and management of the drug (12 questions), and patient education (6 questions). The prepared questions were then directed to ChatGPT 5.2 and Gemini Pro. The Global Quality Score (GQS) was used to evaluate the responses to the questions. The evaluation was performed by two independent panelists (a pharmacologist and a clinical pharmacist), and a third panelist was included in the panel in case of disagreement between the panelists. The panelists used a 5-point Likert scale to evaluate the quality and reliability of the responses (1: Completely incorrect, 2: Largely incorrect and partially correct content, 3: Largely correct but partially incorrect content, 4: Correct but insufficient content, 5: Correct and comprehensive content).20 The DM and Complications Diagnosis, Treatment, and Monitoring Guide (2024), prepared by the Turkish Endocrinology and Metabolism Association, has been used as the gold standard reference in evaluating the responses provided by AI tools to the prepared questions.21
Ethical ApprovalThis study was approved by the Ethics Committee of Selçuk University (Date: 17.02.2026, Decision No: 26/05).
Statistical AnalysisSPSS 22.0 (Statistical Package for the Social Sciences) software was used for statistical analysis. Continuous variables are expressed as mean ± standard deviation; ordinal and nominal data are expressed as counts (%). The nonparametric Wilcoxon Signed-Rank test was used to analyze the responses given by two different AI tools to the questions. Weighted Cohen's Kappa analysis was performed to evaluate the agreement between the two panelists. Kappa values were interpreted as follows: <0.20 poor agreement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 good agreement, and 0.81–1.00 very good agreement. Results were considered statistically significant at p < 0.05 with a 95% confidence interval.
Reporting GuidelinesThis study was reported in accordance with the STROBE statement.
Results
A total of 30 questions have been prepared under four headings: the basic pharmacology of the drug (6 questions), side effects and safety profile (6 questions), clinical use and management of the drug (12 questions), and patient education (6 questions).
When examining the responses generated by ChatGPT 5.2, it was observed that 33.33% of the questions under the basic pharmacology heading had correct but insufficient content, while 66.66% had correct and comprehensive content; no responses containing incorrect content were found under this heading. In the side effects and safety profile area, 50% of the questions had correct but insufficient content and 50% had correct and comprehensive content; no responses containing incorrect content were found in this heading. Regarding questions on the clinical use and management of the drug, 8.33% of the answers were largely correct but partially incorrect, 25% were correct but insufficient, and 66.66% were correct and comprehensive. In the patient education section, 66.66% of the questions were found to have correct but insufficient content, while 33.33% had correct and comprehensive content. Details are given in Table 1.
When Gemini Pro responses were evaluated, it was determined that 16.66% of the questions under the basic pharmacology heading contained correct but insufficient content, while 83.33% contained correct and comprehensive content; under the side effects and safety profile heading, 50% of the responses contained correct but insufficient content, and 50% contained correct and comprehensive content. In the clinical use and management area, 8.33% of responses were largely correct but partially incorrect, 8.33% were correct but insufficient, and 83.33% were correct and comprehensive. In the patient education section, it was determined that all questions were answered correctly and comprehensively, with no incorrect responses found. When examining the responses provided by ChatGPT 5.2 and Gemini Pro to the questions, no statistically significant difference was found, and the two different AI tools produced responses of similar accuracy and quality (p > 0.05). Details are given in Table 1. The detailed data are provided in Supplementary Table 1.
A high level of agreement was found between the two AI tools conducting the evaluation, and statistical significance was determined. Details are given in Table 2. (Cohen’s kappa = 0.848; p<0.001).
Discussion
In this study, the quality and reliability of responses provided by ChatGPT 5.2 and Gemini Pro to questions regarding the basic pharmacology, side effect profile, clinical management, and patient education of oral antidiabetic drugs and insulins used in diabetes treatment were compared. The results of the study revealed that there was no statistically significant difference between ChatGPT 5.2 and Gemini Pro in terms of the quality and reliability of their responses to the questions (p>0.05). The responses provided by the two different AI tools to questions about diabetes pharmacotherapy had high GQS. Gemini Pro provided accurate and comprehensive answers to all questions under the patient education heading, but was similar to ChatGPT 5.2 in terms of the overall average. Both AI tools have potential as clinical decision support tools in DM management.
The number of studies comparing the performance of different AI tools in the fields of pharmacology and clinical pharmacy is increasing. In Thailand, it has been reported that ChatGPT-4o showed the highest overall accuracy in drug information queries, but Gemini and Microsoft Copilot performed better than ChatGPT on some pharmacology-specific questions. In our study, Gemini Pro yielded similar results to ChatGPT 5.2, supporting the use of different AI tools for accessing pharmacotherapeutic information.22 However, Ordak et al., in their study on clinical pharmacology cases, reported that ChatGPT-4o had a significantly higher accuracy rate than Gemini Advanced 2.0 and that Gemini's responses were less consistent. The reason why our study found no significant difference in the quality and reliability of the responses provided by the two different AI tools is thought to be because the research was conducted using both Pro, the improved version of Gemini, and 5.2, the improved version of ChatGPT.23
Our study found that both AI tools scored highly on the topic of clinical use and management of oral antidiabetic drugs and insulin. This demonstrates that Gemini Pro and ChatGPT 5.2 are successful in generating information equivalent to diabetes treatment guidelines. Similarly, in a study examining different AI models in pharmacy education, it was noted that ChatGPT-4o achieved a high accuracy rate of 97.5% in its responses to questions in the field of treatment. However, the same study emphasized that the performance of AI tools declined in higher-level cognitive questions requiring calculation and analysis.24 In our study, while high GQS scores were obtained for knowledge-based questions such as basic pharmacology and side effect profiles, it is still considered necessary for human oversight regarding the applicability of the information generated by AI tools in clinical practice.
In a study comparing patient education materials generated by ChatGPT and Google Gemini for GLP-1 receptor agonists, ChatGPT was found to produce longer and more detailed content, whereas Google Gemini generated materials with higher readability; however, no significant difference was observed between the two models in terms of reliability and overall quality.25 Although the responses generated in our study received high-quality scores, it is believed that the information provided by AI should be verified against treatment guidelines.
In our study, Gemini Pro's high performance in the patient education category is particularly noteworthy. There is a need for comprehensive research that also evaluates the comprehensibility of the information produced by AI tools.
Limitations
The limitations of the study include the fact that the questions did not cover all complications of diabetes and rare drug side effects, and that the results obtained are valid only for a specific time period due to the continuous updating of AI models.
Conclusion
In conclusion, ChatGPT 5.2 and Gemini Pro generally provided high-quality and reliable answers to questions about oral antidiabetic drugs and insulins. Both AI tools were found to be potentially useful as supplementary resources for healthcare professionals in clinical decision-making processes and patient education. However, the information must always be verified against treatment guidelines.
Declarations
Ethics Declarations
The authors declare that all procedures performed in this study were conducted in accordance with institutional, national, and international ethical standards.
Animal and Human Rights Statement
This study did not involve human participants or animals. Therefore, no procedures requiring compliance with human or animal rights regulations were performed. The study was conducted in accordance with accepted ethical standards for research and with the principles of the Declaration of Helsinki.
Informed Consent
Informed consent was not required for this study because it did not involve human participants, patient data, or identifiable personal information. The study was based on the evaluation of responses generated by artificial intelligence systems using standardized questions.
Data Availability
The datasets used and/or analyzed during the current study are not publicly available due to patient privacy reasons but are available from the corresponding author on reasonable request.
Conflict of Interest
The authors declare that there is no conflict of interest.
Funding
None.
Author Contributions (CRediT Taxonomy)
Conceptualization: C.C.
Methodology: C.C.
Software: C.C.
Validation: C.C.
Formal Analysis: C.C.
Investigation: C.C.
Resources: C.C.
Data Curation: C.C.
Writing – Original Draft: C.C.
Writing – Review & Editing: C.C.
Visualization: C.C.
Supervision: C.C.
Project Administration: C.C.
Scientific Responsibility Statement
The authors declare that they are responsible for the article’s scientific content, including study design, data collection, analysis and interpretation, writing, and some of the main line, or all of the preparation and scientific review of the contents, and approval of the final version of the article.
AI Usage Disclosure
ChatGPT (version 5.2) and Gemini Pro were used as study tools for generating responses to standardized research questions. The AI systems were not used for manuscript writing, data interpretation, or statistical analysis. All evaluations, analyses, and conclusions were performed by the authors.
Abbreviations
AI: Artificial intelligence
CDSS: Clinical decision support systems
DM: Diabetes mellitus
EHR: Electronic health record
GQS: Global Quality Score
IDF: International Diabetes Federation
SPSS: Statistical Package for the Social Sciences
STROBE: Strengthening the Reporting of Observational Studies in Epidemiology
References
-
Gieroba B, Kryska A, Sroka-Bartnicka A. Type 2 diabetes mellitus–conventional therapies and future perspectives in innovative treatment. Biochem Biophys Rep. 2025;42:102037. doi:10.1016/j.bbrep.2025.102037
-
Huang S, Liang Y, Li J, Li X. Applications of clinical decision support systems in diabetes care: scoping review. J Med Internet Res. 2023;25:e51024. doi:10.2196/51024
-
Tegegne BA, Adugna A, Yenet A, et al. A critical review on diabetes mellitus type 1 and type 2 management approaches: from lifestyle modification to current and novel targets and therapeutic agents. Front Endocrinol (Lausanne). 2024;15:1440456. doi:10.3389/fendo.2024.1440456
-
Mackenzie SC, Sainsbury CA, Wake DJ. Diabetes and artificial intelligence beyond the closed loop: a review of the landscape, promise and challenges. Diabetologia. 2024;67(2):223-235. doi:10.1007/s00125-023-06038-8
-
Tuppad A, Patil SD. Machine learning for diabetes clinical decision support: a review. Adv Comput Intell. 2022;2(2):22. doi:10.1007/s43674-022-00034-y
-
Tarumi S, Takeuchi W, Chalkidis G, et al. Leveraging artificial intelligence to improve chronic disease care: methods and application to pharmacotherapy decision support for type 2 diabetes mellitus. Methods Inf Med. 2021;60(Suppl 1):e32-e43. doi:10.1055/s-0041-1728757
-
Contreras I, Vehi J. Artificial intelligence for diabetes management and decision support: literature review. J Med Internet Res. 2018;20(5):e10775. doi:10.2196/10775
-
Aljaaf AJ, Al-Jumeily D, Hussain AJ, et al. Toward an optimal use of artificial intelligence techniques within a clinical decision support system. In: Proceedings of the Science and Information Conference (SAI). IEEE; 2015:548-554. doi:10.1109/sai.2015.7237196
-
Ouanes K, Farhah N. Effectiveness of artificial intelligence in clinical decision support systems and care delivery. J Med Syst. 2024;48(1):74. doi:10.1007/s10916-024-02098-4
-
Ramgopal S, Sanchez-Pinto LN, Horvat CM, et al. Artificial intelligence-based clinical decision support in pediatrics. Pediatr Res. 2023;93(2):334-341. doi:10.1038/s41390-022-02226-1
-
Tyler NS, Jacobs PG. Artificial intelligence in decision support systems for type 1 diabetes. Sensors (Basel). 2020;20(11):3214. doi:10.3390/s20113214
-
Bajramagic M, Battelino T, Cos X, et al. Artificial intelligence-driven clinical decision support systems to assist healthcare professionals and people with diabetes in Europe at the point of care: a Delphi-based consensus roadmap. Diabetologia. 2026;69(25):2591-2731. doi:10.1007/s00125-025-06601-5
-
Vettoretti M, Cappon G, Facchinetti A, Sparacino G. Advanced diabetes management using artificial intelligence and continuous glucose monitoring sensors. Sensors (Basel). 2020;20(14):3870. doi:10.3390/s20143870
-
Tun HM, Rahman HA, Naing L, Malik OA. Trust in artificial intelligence–based clinical decision support systems among health care workers: systematic review. J Med Internet Res. 2025;27:e69678. doi:10.2196/69678
-
Knop M, Weber S, Mueller M, Niehaves B. Human factors and technological characteristics influencing the interaction of medical professionals with artificial intelligence–enabled clinical decision support systems: literature review. JMIR Hum Factors. 2022;9(1):e28639. doi:10.2196/28639
-
Ji M, Genchev GZ, Huang H, et al. Evaluation framework for successful artificial intelligence–enabled clinical decision support systems: mixed methods study. J Med Internet Res. 2021;23(6):e25929. doi:10.2196/25929
-
Amann J, Vetter D, Blomberg SN, et al. To explain or not to explain? Artificial intelligence explainability in clinical decision support systems. PLOS Digit Health. 2022;1(2):e0000016. doi:10.1371/journal.pdig.0000016
-
Graafsma J, Murphy RM, van de Garde EM, et al. The use of artificial intelligence to optimize medication alerts generated by clinical decision support systems: a scoping review. J Am Med Inform Assoc. 2024;31(6):1411-1422. doi:10.1093/jamia/ocae076
-
Van Baalen S, Boon M, Verhoef P. From clinical decision support to clinical reasoning support systems. J Eval Clin Pract. 2021;27(3):520-528. doi:10.1111/jep.13541
-
Sezen AI, Ozdemir MS, Ozdemir YE. Comparative evaluation of ChatGPT and Gemini in answering questions on vaccines and immunization. Genel Tip Derg. 2025;35(5):1011-1019. doi:10.54005/geneltip.1735723
-
Turkiye Endokrinoloji ve Metabolizma Dernegi. Diabetes mellitus ve komplikasyonlarinin tani, tedavi ve izlem kilavuzu. Published 2024. Accessed June 26, 2024.
-
Pornwattanakavee S, Leelakanok N, Todsarot T, et al. Effectiveness of ChatGPT, Google Gemini, and Microsoft Copilot in answering Thai drug information queries: cross-sectional study. JMIR AI. 2025;4:e79751. doi:10.2196/79751
-
Ordak M, Adamczyk J, Oskroba A, et al. Evaluation of the accuracy and reliability of responses generated by artificial intelligence related to clinical pharmacology. J Clin Med. 2025;14(21):7563. doi:10.3390/jcm14217563
-
Tran T, Le U, Phan V. Evaluating the accuracy and educational potential of generative AI models in pharmacy education: a comparative analysis of ChatGPT and Gemini across Bloom’s taxonomy. Pharmacy (Basel). 2025;14(1):1. doi:10.3390/pharmacy14010001
-
Karnan N, Nair S, Fidai FF, et al. Evaluating the efficacy of ChatGPT vs Google Gemini in generating patient education materials for GLP-1 receptor agonists (semaglutide, liraglutide, tirzepatide): a cross-sectional study. Cureus. 2025;17(4):e81993. doi:10.7759/cureus.81993
Additional Information
Publisher’s Note
Bayrakol MP remains neutral with regard to jurisdictional and institutional claims.
Rights and Permissions
About This Article
How to Cite This Article
Cengizhan Ceylan. Evaluation of artificial intelligence tools in diabetes treatment management: a comparison of ChatGPT 5.2 and Gemini Pro. Ann Clin Anal Med 2026;17(7):00. doi:10.4328/ACAM.50111
- Received:
- March 3, 2026
- Accepted:
- April 17, 2026
- Published Online:
- May 4, 2026
- Printed:
- July 1, 2026
