Skip to content

Evaluation of artificial intelligence tools in diabetes treatment management: a comparison of ChatGPT 5.2 and Gemini Pro

AI tools in diabetes treatment management

Original Research doi:10.4328/ACAM.50111

Authors

Affiliations

1Department of Clinical Pharmacy, Selçuk University Faculty of Pharmacy, Konya, Türkiye.

Corresponding Author

Abstract

Aim Diabetes Mellitus is a chronic and wide-spectrum metabolic disorder. Clinical Decision Support Systems are increasingly being used by physicians and patients in the management of diabetes.
Methods Thirty questions have been prepared regarding oral antidiabetic drugs and insulins used in diabetes treatment. The questions are grouped under four headings: basic pharmacology of the drug (6 questions), side effects and safety profile (6 questions), clinical use and management of the drug (12 questions), and patient education (6 questions). The prepared questions were then directed to ChatGPT 5.2 and Gemini Pro. The Global Quality Score was used to evaluate the responses to the questions.
Results When examining the responses generated by ChatGPT 5.2, it was observed that 6.66% of the questions under the basic pharmacology heading had correct but insufficient content, while 13.33% had correct and comprehensive content; no responses containing incorrect content were found under this heading. When Gemini Pro responses were evaluated, it was determined that 3.33% of the questions under the basic pharmacology heading contained correct but insufficient content, while 16.65% contained correct and comprehensive content; under the side effects and safety profile heading.
Conclusion Both artificial intelligence tools have potential as clinical decision support tools in diabetes management. Both artificial intelligence tools were found to be potentially useful as supplementary resources for healthcare professionals in clinical decision-making processes and patient education.

Keywords

artificial intelligence decision support systems diabetes mellitus generative artificial intelligence

Introduction

Diabetes Mellitus (DM) is a chronic, wide-spectrum metabolic disorder characterized by hyperglycemia, caused by relative or absolute insulin deficiency. According to data from the International Diabetes Federation (IDF), the number of adults with diabetes, which was 537 million in 2021, is projected to reach 783 million by 2045.1 DM poses a serious economic burden on healthcare systems due to its significant mortality and morbidity rates.2 Lifestyle changes, oral antidiabetic drugs, and insulin form the basis of diabetes treatment.3,4 However, the complexity of treatment regimens and patients' adherence issues makes achieving glycemic control difficult.5,6 This situation necessitates the use of new technologies to improve the patient's quality of life in DM management.2
Clinical Decision Support Systems (CDSS) are increasingly being used by physicians and patients in the management of DM.2 Significant progress has been made in the development of artificial intelligence (AI) supported tools for predicting and preventing diabetes-related complications.7,8 This feature can reduce the rate of medical errors made by clinicians and increase treatment success.9,10 Research focusing on adjusting insulin treatments and preventing hypoglycemia in diabetic patients is at the forefront of this field.11,12,13
It has been proven that AI-based applications optimize medication alerts, enabling the detection of more inappropriate prescriptions. In recent years, it has been demonstrated that AI-based applications can be used in conjunction with electronic health record (EHR)s for diabetes care. These systems can be trained using machine learning to acquire medical reasoning capabilities, but they have significant limitations in clinical practice. In particular, the accuracy of information and the risk of hallucinations must be carefully evaluated in terms of patient safety. As the information quality, transparency, explainability, and clinical reliability of these systems improve, their use will increase directly.14,15,16,17,18
Clinicians can also use their interpretation skills to benefit from CDSS during the treatment process.19 Therefore, there is a need for studies on the clinical management of diabetes using commonly used AI applications such as ChatGPT 5.2 and Gemini Pro. Although there are studies on the use of these two different AI applications in diabetes management, these studies have various limitations. The aim of this study is to analyze the quality and reliability of the responses provided by ChatGPT 5.2 and Gemini Pro to questions prepared about oral antidiabetic drugs and insulins.

Materials and Methods

Thirty questions have been prepared regarding oral antidiabetic drugs and insulins used in diabetes treatment. The questions are grouped under four headings: basic pharmacology of the drug (6 questions), side effects and safety profile (6 questions), clinical use and management of the drug (12 questions), and patient education (6 questions). The prepared questions were then directed to ChatGPT 5.2 and Gemini Pro. The Global Quality Score (GQS) was used to evaluate the responses to the questions. The evaluation was performed by two independent panelists (a pharmacologist and a clinical pharmacist), and a third panelist was included in the panel in case of disagreement between the panelists. The panelists used a 5-point Likert scale to evaluate the quality and reliability of the responses (1: Completely incorrect, 2: Largely incorrect and partially correct content, 3: Largely correct but partially incorrect content, 4: Correct but insufficient content, 5: Correct and comprehensive content).20 The DM and Complications Diagnosis, Treatment, and Monitoring Guide (2024), prepared by the Turkish Endocrinology and Metabolism Association, has been used as the gold standard reference in evaluating the responses provided by AI tools to the prepared questions.21
Ethical ApprovalThis study was approved by the Ethics Committee of Selçuk University (Date: 2026-02-17, No: 26/05).
Statistical AnalysisSPSS 22.0 (Statistical Package for the Social Sciences) software was used for statistical analysis. Continuous variables are expressed as mean ± standard deviation; ordinal and nominal data are expressed as counts (%). The nonparametric Wilcoxon Signed-Rank test was used to analyze the responses given by two different AI tools to the questions. Weighted Cohen's Kappa analysis was performed to evaluate the agreement between the two panelists. Kappa values were interpreted as follows: <0.20 poor agreement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 good agreement, and 0.81–1.00 very good agreement. Results were considered statistically significant at p < 0.05 with a 95% confidence interval.
Reporting GuidelinesThis study was reported in accordance with the STROBE statement.

Results

A total of 30 questions have been prepared under four headings: the basic pharmacology of the drug (6 questions), side effects and safety profile (6 questions), clinical use and management of the drug (12 questions), and patient education (6 questions).
When examining the responses generated by ChatGPT 5.2, it was observed that 6.66% of the questions under the basic pharmacology heading had correct but insufficient content, while 13.33% had correct and comprehensive content; no responses containing incorrect content were found under this heading. In the side effects and safety profile area, 10% of the questions had correct but insufficient content and 10% had correct and comprehensive content; no responses containing incorrect content were found in this heading. Regarding questions on the clinical use and management of the drug, 3.33% of the answers were largely correct but partially incorrect, 10% were correct but insufficient, and 26.67% were correct and comprehensive. In the patient education section, 13.33% of the questions were found to have correct but insufficient content, while 6.66% had correct and comprehensive content. Details are given in Table 1.
When Gemini Pro responses were evaluated, it was determined that 3.33% of the questions under the basic pharmacology heading contained correct but insufficient content, while 16.65% contained correct and comprehensive content; under the side effects and safety profile heading, 10% of the responses contained correct but insufficient content, and 10% contained correct and comprehensive content. In the clinical use and management area, 3.33% of responses were largely correct but partially incorrect, 3.33% were correct but insufficient, and 33.3% were correct and comprehensive. In the patient education section, it was determined that all questions were answered correctly and comprehensively, with no incorrect responses found. When examining the responses provided by ChatGPT 5.2 and Gemini Pro to the questions, no statistically significant difference was found, and the two different AI tools produced responses of similar accuracy and quality (p > 0.05). Details are given in Table 1. The detailed data are provided in Supplementary Table 1.
A high level of agreement was found between the two panelists conducting the evaluation, and statistical significance was determined. Details are given in Table 2. (Cohen’s kappa = 0.848; p < 0.001).

Discussion

In this study, the quality and reliability of responses provided by ChatGPT 5.2 and Gemini Pro to questions regarding the basic pharmacology, side effect profile, clinical management, and patient education of oral antidiabetic drugs and insulins used in diabetes treatment were compared. The results of the study revealed that there was no statistically significant difference between ChatGPT 5.2 and Gemini Pro in terms of the quality and reliability of their responses to the questions (p > 0.05). The responses provided by the two different AI tools to questions about diabetes pharmacotherapy had high GQS. Gemini Pro provided accurate and comprehensive answers to all questions under the patient education heading, but was similar to ChatGPT 5.2 in terms of the overall average. Both AI tools have potential as clinical decision support tools in DM management.
The number of studies comparing the performance of different AI tools in the fields of pharmacology and clinical pharmacy is increasing. In Thailand, it has been reported that ChatGPT-4o showed the highest overall accuracy in drug information queries, but Gemini and Microsoft Copilot performed better than ChatGPT on some pharmacology-specific questions. In our study, Gemini Pro yielded similar results to ChatGPT 5.2, supporting the use of different AI tools for accessing pharmacotherapeutic information. However, Ordak et al., in their study on clinical pharmacology cases, reported that ChatGPT-4o had a significantly higher accuracy rate than Gemini Advanced 2.0 and that Gemini's responses were less consistent. The reason why our study found no significant difference in the quality and reliability of the responses provided by the two different AI tools is thought to be because the research was conducted using both Pro, the improved version of Gemini, and 5.2, the improved version of ChatGPT.22,23
Our study found that both AI tools scored highly on the topic of clinical use and management of oral antidiabetic drugs and insulin. This demonstrates that Gemini Pro and ChatGPT 5.2 are successful in generating information equivalent to diabetes treatment guidelines. Similarly, in a study examining different AI models in pharmacy education, it was noted that ChatGPT-4o achieved a high accuracy rate of 97.5% in its responses to questions in the field of treatment. However, the same study emphasized that the performance of AI tools declined in higher-level cognitive questions requiring calculation and analysis.24 In our study, while high GQS scores were obtained for knowledge-based questions such as basic pharmacology and side effect profiles, it is still considered necessary for human oversight regarding the applicability of the information generated by AI tools in clinical practice.
In a study comparing patient education materials generated by ChatGPT and Google Gemini for GLP-1 receptor agonists, ChatGPT was found to produce longer and more detailed content, whereas Google Gemini generated materials with higher readability; however, no significant difference was observed between the two models in terms of reliability and overall quality.25 Although the responses generated in our study received high-quality scores, it is believed that the information provided by AI should be verified against treatment guidelines.
In our study, Gemini Pro's high performance in the patient education category is particularly noteworthy. There is a need for comprehensive research that also evaluates the comprehensibility of the information produced by AI tools.

Limitations

The limitations of the study include the fact that the questions did not cover all complications of diabetes and rare drug side effects, and that the results obtained are valid only for a specific time period due to the continuous updating of AI models.

Conclusion

In conclusion, ChatGPT 5.2 and Gemini Pro generally provided high-quality and reliable answers to questions about oral antidiabetic drugs and insulins. Both AI tools were found to be potentially useful as supplementary resources for healthcare professionals in clinical decision-making processes and patient education. However, the information must always be verified against treatment guidelines.

Declarations

Ethics Declarations

The study did not involve human participants or animals and was based solely on the evaluation of responses generated by artificial intelligence systems. All procedures were conducted in accordance with accepted ethical standards and the principles of the Declaration of Helsinki.

Animal and Human Rights Statement

This study did not involve human participants or animals. Therefore, no procedures requiring compliance with human or animal rights regulations were performed. The study was conducted in accordance with accepted ethical standards for research and with the principles of the Declaration of Helsinki.

Informed Consent

Informed consent was not required for this study because it did not involve human participants, patient data, or identifiable personal information. The study was based on the evaluation of responses generated by artificial intelligence systems using standardized questions.

Data Availability

The datasets used and/or analyzed during the current study are not publicly available due to patient privacy reasons but are available from the corresponding author on reasonable request.

Conflict of Interest

The authors declare that there is no conflict of interest.

Funding

None.

Author Contributions (CRediT Taxonomy)

Conceptualization: C.C.
Methodology: C.C.
Software: C.C.
Validation: C.C.
Formal Analysis: C.C.
Investigation: C.C.
Resources: C.C.
Data Curation: C.C.
Writing – Original Draft: C.C.
Writing – Review & Editing: C.C.
Visualization: C.C.
Supervision: C.C.
Project Administration: C.C.

Scientific Responsibility Statement

The authors declare that they are responsible for the article’s scientific content, including study design, data collection, analysis and interpretation, writing, and some of the main line, or all of the preparation and scientific review of the contents, and approval of the final version of the article.

AI Usage Disclosure

ChatGPT (version 5.2) and Gemini Pro were used as study tools for generating responses to standardized research questions. The AI systems were not used for manuscript writing, data interpretation, or statistical analysis. All evaluations, analyses, and conclusions were performed by the authors.

Abbreviations

AI: Artificial intelligence
CDSS: Clinical decision support systems
DM: Diabetes mellitus
EHR: Electronic health record
GQS: Global Quality Score
IDF: International Diabetes Federation
SPSS: Statistical Package for the Social Sciences
STROBE: Strengthening the Reporting of Observational Studies in Epidemiology

References

  1. Gieroba B, Kryska A, Sroka-Bartnicka A. Type 2 diabetes mellitus–conventional therapies and future perspectives in innovative treatment. Biochem Biophys Rep. 2025;42:102037. doi:10.1016/j.bbrep.2025.102037
  2. Huang S, Liang Y, Li J, Li X. Applications of clinical decision support systems in diabetes care: scoping review. J Med Internet Res. 2023;25:e51024. doi:10.2196/51024
  3. Tegegne BA, Adugna A, Yenet A, et al. A critical review on diabetes mellitus type 1 and type 2 management approaches: from lifestyle modification to current and novel targets and therapeutic agents. Front Endocrinol (Lausanne). 2024;15:1440456. doi:10.3389/fendo.2024.1440456
  4. Mackenzie SC, Sainsbury CA, Wake DJ. Diabetes and artificial intelligence beyond the closed loop: a review of the landscape, promise and challenges. Diabetologia. 2024;67(2):223-235. doi:10.1007/s00125-023-06038-8
  5. Tuppad A, Patil SD. Machine learning for diabetes clinical decision support: a review. Adv Comput Intell. 2022;2(2):22. doi:10.1007/s43674-022-00034-y
  6. Tarumi S, Takeuchi W, Chalkidis G, et al. Leveraging artificial intelligence to improve chronic disease care: methods and application to pharmacotherapy decision support for type 2 diabetes mellitus. Methods Inf Med. 2021;60(Suppl 1):e32-e43. doi:10.1055/s-0041-1728757
  7. Contreras I, Vehi J. Artificial intelligence for diabetes management and decision support: literature review. J Med Internet Res. 2018;20(5):e10775. doi:10.2196/10775
  8. Aljaaf AJ, Al-Jumeily D, Hussain AJ, et al. Toward an optimal use of artificial intelligence techniques within a clinical decision support system. In: Proceedings of the Science and Information Conference (SAI). IEEE; 2015:548-554. doi:10.1109/sai.2015.7237196
  9. Ouanes K, Farhah N. Effectiveness of artificial intelligence in clinical decision support systems and care delivery. J Med Syst. 2024;48(1):74. doi:10.1007/s10916-024-02098-4
  10. Ramgopal S, Sanchez-Pinto LN, Horvat CM, et al. Artificial intelligence-based clinical decision support in pediatrics. Pediatr Res. 2023;93(2):334-341. doi:10.1038/s41390-022-02226-1
  11. Tyler NS, Jacobs PG. Artificial intelligence in decision support systems for type 1 diabetes. Sensors (Basel). 2020;20(11):3214. doi:10.3390/s20113214
  12. Bajramagic M, Battelino T, Cos X, et al. Artificial intelligence-driven clinical decision support systems to assist healthcare professionals and people with diabetes in Europe at the point of care: a Delphi-based consensus roadmap. Diabetologia. 2026;69(25):2591-2731. doi:10.1007/s00125-025-06601-5
  13. Vettoretti M, Cappon G, Facchinetti A, Sparacino G. Advanced diabetes management using artificial intelligence and continuous glucose monitoring sensors. Sensors (Basel). 2020;20(14):3870. doi:10.3390/s20143870
  14. Tun HM, Rahman HA, Naing L, Malik OA. Trust in artificial intelligence–based clinical decision support systems among health care workers: systematic review. J Med Internet Res. 2025;27:e69678. doi:10.2196/69678
  15. Knop M, Weber S, Mueller M, Niehaves B. Human factors and technological characteristics influencing the interaction of medical professionals with artificial intelligence–enabled clinical decision support systems: literature review. JMIR Hum Factors. 2022;9(1):e28639. doi:10.2196/28639
  16. Ji M, Genchev GZ, Huang H, et al. Evaluation framework for successful artificial intelligence–enabled clinical decision support systems: mixed methods study. J Med Internet Res. 2021;23(6):e25929. doi:10.2196/25929
  17. Amann J, Vetter D, Blomberg SN, et al. To explain or not to explain? Artificial intelligence explainability in clinical decision support systems. PLOS Digit Health. 2022;1(2):e0000016. doi:10.1371/journal.pdig.0000016
  18. Graafsma J, Murphy RM, van de Garde EM, et al. The use of artificial intelligence to optimize medication alerts generated by clinical decision support systems: a scoping review. J Am Med Inform Assoc. 2024;31(6):1411-1422. doi:10.1093/jamia/ocae076
  19. Van Baalen S, Boon M, Verhoef P. From clinical decision support to clinical reasoning support systems. J Eval Clin Pract. 2021;27(3):520-528. doi:10.1111/jep.13541
  20. Sezen AI, Ozdemir MS, Ozdemir YE. Comparative evaluation of ChatGPT and Gemini in answering questions on vaccines and immunization. Genel Tip Derg. 2025;35(5):1011-1019. doi:10.54005/geneltip.1735723
  21. Turkiye Endokrinoloji ve Metabolizma Dernegi. Diabetes mellitus ve komplikasyonlarinin tani, tedavi ve izlem kilavuzu. Published 2024. Accessed June 26, 2024.
  22. Pornwattanakavee S, Leelakanok N, Todsarot T, et al. Effectiveness of ChatGPT, Google Gemini, and Microsoft Copilot in answering Thai drug information queries: cross-sectional study. JMIR AI. 2025;4:e79751. doi:10.2196/79751
  23. Ordak M, Adamczyk J, Oskroba A, et al. Evaluation of the accuracy and reliability of responses generated by artificial intelligence related to clinical pharmacology. J Clin Med. 2025;14(21):7563. doi:10.3390/jcm14217563
  24. Tran T, Le U, Phan V. Evaluating the accuracy and educational potential of generative AI models in pharmacy education: a comparative analysis of ChatGPT and Gemini across Bloom’s taxonomy. Pharmacy (Basel). 2025;14(1):1. doi:10.3390/pharmacy14010001
  25. Karnan N, Nair S, Fidai FF, et al. Evaluating the efficacy of ChatGPT vs Google Gemini in generating patient education materials for GLP-1 receptor agonists (semaglutide, liraglutide, tirzepatide): a cross-sectional study. Cureus. 2025;17(4):e81993. doi:10.7759/cureus.81993

Additional Information

Publisher’s Note
Bayrakol MP remains neutral with regard to jurisdictional and institutional claims.

Rights and Permissions

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). To view a copy of the license, visit https://creativecommons.org/licenses/by-nc/4.0/

About This Article

Received:
March 3, 2026
Accepted:
April 17, 2026
Published Online:
May 4, 2026