← Back to Issue

Evaluation of artificial intelligence tools in diabetes treatment management: a comparison of ChatGPT 5.2 and Gemini Pro

AI tools in diabetes treatment management

Original Research • doi:10.4328/ACAM.50111 • Published: July 1, 2026 • Ann Clin Anal Med 2026;17(7):00

Authors

Cengizhan Ceylan¹

Affiliations

¹Department of Clinical Pharmacy, Selçuk University Faculty of Pharmacy, Konya, Türkiye.

Corresponding Author

Cengizhan Ceylan

c.ceylan20@gmail.com

+90 (332) 241 00 41

Abstract

AimDiabetes mellitus is a chronic, wide-spectrum metabolic disorder. Clinical Decision Support Systems are increasingly being used by physicians and patients in the management of diabetes.
MethodsThirty questions have been prepared regarding oral antidiabetic drugs and insulins used in diabetes treatment. The questions are grouped under four headings: basic pharmacology of the drug (6 questions), side effects and safety profile (6 questions), clinical use and management of the drug (12 questions), and patient education (6 questions). The prepared questions were then directed to ChatGPT 5.2 and Gemini Pro. The Global Quality Score was used to evaluate the responses to the questions.
ResultsWhen examining the responses generated by ChatGPT 5.2, 33.33% of the questions under the basic pharmacology heading had correct but insufficient content, while 66.66% had correct and comprehensive content; no responses containing incorrect content were found under this heading. When Gemini Pro responses were evaluated, it was determined that 16.66% of the questions under the basic pharmacology heading contained correct but insufficient content, while 83.33% contained correct and comprehensive content under the side effects and safety profile heading.
ConclusionBoth artificial intelligence tools have potential as clinical decision support tools in diabetes management. Both artificial intelligence tools were found to be potentially useful as supplementary resources for healthcare professionals in clinical decision-making processes and patient education.

Keywords

artificial intelligence decision support systems diabetes mellitus generative artificial intelligence

Introduction

Diabetes Mellitus (DM) is a chronic, wide-spectrum metabolic disorder characterized by hyperglycemia, caused by relative or absolute insulin deficiency. According to data from the International Diabetes Federation (IDF), the number of adults with diabetes, which was 537 million in 2021, is projected to reach 783 million by 2045.¹ DM poses a serious economic burden on healthcare systems due to its significant mortality and morbidity rates.² Lifestyle modifications play a crucial role in the management of DM.³ Artificial intelligence(AI)–supported applications can be integrated into patient education and self-management strategies.⁴ Novel applications for DM treatment are being developed using machine learning approaches.⁵ Predictive clinical decision support systems(CDSS) for chronic diseases have been developed and successfully integrated with electronic health record systems.⁶ This situation necessitates the use of new technologies to improve the patient's quality of life in DM management.²
The use of AI tools for predicting and preventing diabetes and DM-related complications has been steadily increasing.⁷ Intelligent CDSS play a significant role in diagnosis and treatment management by aiming to improve patient safety and reduce treatment costs.⁸ A review of studies published between January 2018 and November 2023 found that AI in CDSS may optimize medical treatment by improving diagnostic accuracy.⁹ The performance of CDSS can be enhanced through the utilization of machine learning–based algorithms.¹⁰ In the management of type 1 DM, AI–supported CDSS can be used for insulin dose adjustment and for predicting and preventing hypoglycemia.¹¹ In parallel with the expected increase in diabetes prevalence in the coming years, the use of AI–supported CDSS is also anticipated to increase.¹² Furthermore, wearable and portable devices are expected to contribute to the development of AI–supported CDSS that provide personalized recommendations for DM management.¹³
A review including studies published between January 2020 and November 2024 reported that clinical reliability, human-centered design, and system transparency in AI–supported CDSS may help increase trust in these systems.¹⁴ Trust is considered one of the most important factors influencing healthcare professionals’ adoption and use of AI–supported CDSS.¹⁵ User acceptance is another critical determinant of the success of AI–supported CDSS.¹⁶ Another important factor is the system's explainability.¹⁷ AI–supported applications are also being used to optimize medication alerts in hospital settings.¹⁸
Clinicians can also use their interpretive skills to benefit from CDSS during treatment.¹⁹ Therefore, there is a need for studies on the clinical management of DM using commonly used AI applications such as ChatGPT 5.2 and Gemini Pro. Although there are studies on the use of these two different AI applications in DM management, these studies have various limitations. The aim of this study is to analyze the quality and reliability of responses from ChatGPT 5.2 and Gemini Pro to questions prepared on oral antidiabetic drugs and insulins.

Materials and Methods

Thirty questions have been prepared regarding oral antidiabetic drugs and insulins used in diabetes treatment. The questions are grouped under four headings: basic pharmacology of the drug (6 questions), side effects and safety profile (6 questions), clinical use and management of the drug (12 questions), and patient education (6 questions). The prepared questions were then directed to ChatGPT 5.2 and Gemini Pro. The Global Quality Score (GQS) was used to evaluate the responses to the questions. The evaluation was performed by two independent panelists (a pharmacologist and a clinical pharmacist), and a third panelist was included in the panel in case of disagreement between the panelists. The panelists used a 5-point Likert scale to evaluate the quality and reliability of the responses (1: Completely incorrect, 2: Largely incorrect and partially correct content, 3: Largely correct but partially incorrect content, 4: Correct but insufficient content, 5: Correct and comprehensive content).²⁰ The DM and Complications Diagnosis, Treatment, and Monitoring Guide (2024), prepared by the Turkish Endocrinology and Metabolism Association, has been used as the gold standard reference in evaluating the responses provided by AI tools to the prepared questions.²¹
Ethical ApprovalThis study was approved by the Ethics Committee of Selçuk University (Date: 17.02.2026, Decision No: 26/05).
Statistical AnalysisSPSS 22.0 (Statistical Package for the Social Sciences) software was used for statistical analysis. Continuous variables are expressed as mean ± standard deviation; ordinal and nominal data are expressed as counts (%). The nonparametric Wilcoxon Signed-Rank test was used to analyze the responses given by two different AI tools to the questions. Weighted Cohen's Kappa analysis was performed to evaluate the agreement between the two panelists. Kappa values were interpreted as follows: <0.20 poor agreement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 good agreement, and 0.81–1.00 very good agreement. Results were considered statistically significant at p<0.05 with a 95% confidence interval.
Reporting GuidelinesThe study was reported in accordance with STROBE guidelines.

Results

A total of 30 questions have been prepared under four headings: the basic pharmacology of the drug (6 questions), side effects and safety profile (6 questions), clinical use and management of the drug (12 questions), and patient education (6 questions).
When examining the responses generated by ChatGPT 5.2, it was observed that 33.33% of the questions under the basic pharmacology heading had correct but insufficient content, while 66.66% had correct and comprehensive content; no responses containing incorrect content were found under this heading. In the side effects and safety profile area, 50% of the questions had correct but insufficient content, and 50% had correct and comprehensive content; no responses contained incorrect content in this heading. Regarding questions on the clinical use and management of the drug, 8.33% of the answers were largely correct but partially incorrect, 25% were correct but insufficient, and 66.66% were correct and comprehensive. In the patient education section, 66.66% of the questions had correct but insufficient content, while 33.33% had correct and comprehensive content. Details are given in Table 1.
When Gemini Pro responses were evaluated, it was determined that 16.66% of the questions under the basic pharmacology heading contained correct but insufficient content, while 83.33% contained correct and comprehensive content; under the side effects and safety profile heading, 50% of the responses contained correct but insufficient content, and 50% contained correct and comprehensive content. In the clinical use and management area, 8.33% of responses were largely correct but partially incorrect, 8.33% were correct but insufficient, and 83.33% were correct and comprehensive. In the patient education section, it was determined that all questions were answered correctly and comprehensively, with no incorrect responses found. When examining the responses from ChatGPT 5.2 and Gemini Pro to the questions, no statistically significant difference was found, and the two AI tools produced responses of similar accuracy and quality (p > 0.05). Details are given in Table 1. The detailed data are provided in Supplementary Table 1.
A high level of agreement was found between the two AI tools conducting the evaluation, and statistical significance was determined. Details are given in Table 2. (Cohen’s kappa = 0.848; p<0.001).

Discussion

In this study, the quality and reliability of responses provided by ChatGPT 5.2 and Gemini Pro to questions regarding the basic pharmacology, side-effect profile, clinical management, and patient education of oral antidiabetic drugs and insulins used in diabetes treatment were compared. The results of the study revealed that there was no statistically significant difference between ChatGPT 5.2 and Gemini Pro in terms of the quality and reliability of their responses to the questions (p>0.05). The responses provided by the two different AI tools to questions about diabetes pharmacotherapy had high GQS. Gemini Pro provided accurate and comprehensive answers to all questions under the patient education heading, but was similar to ChatGPT 5.2 in terms of the overall average. Both AI tools have potential as clinical decision support tools in DM management.
The number of studies comparing the performance of different AI tools in the fields of pharmacology and clinical pharmacy is increasing. In Thailand, it has been reported that ChatGPT-4o showed the highest overall accuracy in drug information queries, but Gemini and Microsoft Copilot performed better than ChatGPT on some pharmacology-specific questions. In our study, Gemini Pro yielded results similar to ChatGPT 5.2, supporting the use of different AI tools to access pharmacotherapeutic information.²² However, Ordak et al., in their study on clinical pharmacology cases, reported that ChatGPT-4o had a significantly higher accuracy rate than Gemini Advanced 2.0 and that Gemini's responses were less consistent. The reason our study found no significant difference in the quality and reliability of responses from the two AI tools is thought to be that the research was conducted using both Pro, the improved version of Gemini, and 5.2, the improved version of ChatGPT.²³
Our study found that both AI tools scored highly on the topic of clinical use and management of oral antidiabetic drugs and insulin. This demonstrates that Gemini Pro and ChatGPT 5.2 are successful in generating information equivalent to diabetes treatment guidelines. Similarly, in a study examining different AI models in pharmacy education, it was noted that ChatGPT-4o achieved a high accuracy rate of 97.5% in its responses to questions in the field of treatment. However, the same study emphasized that AI tool performance declined on higher-level cognitive questions that require calculation and analysis.²⁴ In our study, while high GQS scores were obtained for knowledge-based questions such as basic pharmacology and side-effect profiles, it is still considered necessary for human oversight to assess the applicability of information generated by AI tools in clinical practice.
In a study comparing patient education materials generated by ChatGPT and Google Gemini for GLP-1 receptor agonists, ChatGPT produced longer, more detailed content, whereas Google Gemini generated materials with higher readability; however, no significant difference was observed between the two models in terms of reliability and overall quality.²⁵ Although the responses generated in our study received high-quality scores, it is believed that the information provided by AI should be verified against treatment guidelines.
In our study, Gemini Pro's high performance in the patient education category is particularly noteworthy. There is a need for comprehensive research that also evaluates the comprehensibility of the information produced by AI tools.

Limitations

The limitations of the study include the fact that the questions did not cover all complications of diabetes and rare drug side effects, and that the results obtained are valid only for a specific time period due to the continuous updating of AI models.

Conclusion

In conclusion, ChatGPT 5.2 and Gemini Pro generally provided high-quality and reliable answers to questions about oral antidiabetic drugs and insulins. Both AI tools were found to be potentially useful as supplementary resources for healthcare professionals in clinical decision-making processes and patient education. However, the information must always be verified against treatment guidelines.

Declarations

Ethics Declarations

The authors declare that all procedures performed in this study were conducted in accordance with institutional, national, and international ethical standards.

Animal and Human Rights Statement

This study did not involve human participants or animals.

Informed Consent

Not applicable.

Data Availability

The datasets used and/or analyzed during the current study are not publicly available due to patient privacy reasons, but are available from the corresponding author on reasonable request.

Conflict of Interest

The authors declare that there is no conflict of interest.

Funding

None.

Author Contributions (CRediT Taxonomy)

Conceptualization: C.C.
Methodology: C.C.
Software: C.C.
Validation: C.C.
Formal Analysis: C.C.
Investigation: C.C.
Resources: C.C.
Data Curation: C.C.
Writing – Original Draft: C.C.
Writing – Review & Editing: C.C.
Visualization: C.C.
Supervision: C.C.
Project Administration: C.C.

Scientific Responsibility Statement

The authors declare that they are responsible for the article’s scientific content, including study design, data collection, analysis and interpretation, writing, and some of the main line, or all of the preparation and scientific review of the contents, and approval of the final version of the article.

AI Usage Disclosure

ChatGPT (version 5.2) and Gemini Pro were used as study tools for generating responses to standardized research questions. The AI systems were not used for manuscript writing, data interpretation, or statistical analysis. All evaluations, analyses, and conclusions were performed by the authors.

Abbreviations

AI: Artificial intelligence
CDSS: Clinical decision support systems
DM: Diabetes mellitus
EHR: Electronic health record
GQS: Global quality score
IDF: International diabetes federation
SPSS: Statistical package for the social sciences
STROBE: Strengthening the reporting of observational studies in epidemiology

References

Gieroba B, Kryska A, Sroka-Bartnicka A. Type 2 diabetes mellitus–conventional therapies and future perspectives in innovative treatment. Biochem Biophys Rep. 2025;42:102037. doi:10.1016/j.bbrep.2025.102037

Article PubMed Google Scholar
Huang S, Liang Y, Li J, Li X. Applications of clinical decision support systems in diabetes care: scoping review. J Med Internet Res. 2023;25:e51024. doi:10.2196/51024

Article PubMed Google Scholar
Tegegne BA, Adugna A, Yenet A, et al. A critical review on diabetes mellitus type 1 and type 2 management approaches: from lifestyle modification to current and novel targets and therapeutic agents. Front Endocrinol (Lausanne). 2024;15:1440456. doi:10.3389/fendo.2024.1440456

Article PubMed Google Scholar
Mackenzie SC, Sainsbury CA, Wake DJ. Diabetes and artificial intelligence beyond the closed loop: a review of the landscape, promise and challenges. Diabetologia. 2024;67(2):223-235. doi:10.1007/s00125-023-06038-8

Article PubMed Google Scholar
Tuppad A, Patil SD. Machine learning for diabetes clinical decision support: a review. Adv Comput Intell. 2022;2(2):22. doi:10.1007/s43674-022-00034-y

Article PubMed Google Scholar
Tarumi S, Takeuchi W, Chalkidis G, et al. Leveraging artificial intelligence to improve chronic disease care: methods and application to pharmacotherapy decision support for type 2 diabetes mellitus. Methods Inf Med. 2021;60(Suppl 1):e32-e43. doi:10.1055/s-0041-1728757

Article PubMed Google Scholar
Contreras I, Vehi J. Artificial intelligence for diabetes management and decision support: literature review. J Med Internet Res. 2018;20(5):e10775. doi:10.2196/10775

Article PubMed Google Scholar
Aljaaf AJ, Al-Jumeily D, Hussain AJ, et al. Toward an optimal use of artificial intelligence techniques within a clinical decision support system. In: Proceedings of the Science and Information Conference (SAI). IEEE; 2015:548-554. doi:10.1109/sai.2015.7237196

Article PubMed Google Scholar
Ouanes K, Farhah N. Effectiveness of artificial intelligence in clinical decision support systems and care delivery. J Med Syst. 2024;48(1):74. doi:10.1007/s10916-024-02098-4

Article PubMed Google Scholar
Ramgopal S, Sanchez-Pinto LN, Horvat CM, et al. Artificial intelligence-based clinical decision support in pediatrics. Pediatr Res. 2023;93(2):334-341. doi:10.1038/s41390-022-02226-1

Article PubMed Google Scholar
Tyler NS, Jacobs PG. Artificial intelligence in decision support systems for type 1 diabetes. Sensors (Basel). 2020;20(11):3214. doi:10.3390/s20113214

Article PubMed Google Scholar
Bajramagic M, Battelino T, Cos X, et al. Artificial intelligence-driven clinical decision support systems to assist healthcare professionals and people with diabetes in Europe at the point of care: a Delphi-based consensus roadmap. Diabetologia. 2026;69(25):2591-2731. doi:10.1007/s00125-025-06601-5

Article PubMed Google Scholar
Vettoretti M, Cappon G, Facchinetti A, Sparacino G. Advanced diabetes management using artificial intelligence and continuous glucose monitoring sensors. Sensors (Basel). 2020;20(14):3870. doi:10.3390/s20143870

Article PubMed Google Scholar
Tun HM, Rahman HA, Naing L, Malik OA. Trust in artificial intelligence–based clinical decision support systems among health care workers: systematic review. J Med Internet Res. 2025;27:e69678. doi:10.2196/69678

Article PubMed Google Scholar
Knop M, Weber S, Mueller M, Niehaves B. Human factors and technological characteristics influencing the interaction of medical professionals with artificial intelligence–enabled clinical decision support systems: literature review. JMIR Hum Factors. 2022;9(1):e28639. doi:10.2196/28639

Article PubMed Google Scholar
Ji M, Genchev GZ, Huang H, et al. Evaluation framework for successful artificial intelligence–enabled clinical decision support systems: mixed methods study. J Med Internet Res. 2021;23(6):e25929. doi:10.2196/25929

Article PubMed Google Scholar
Amann J, Vetter D, Blomberg SN, et al. To explain or not to explain? Artificial intelligence explainability in clinical decision support systems. PLOS Digit Health. 2022;1(2):e0000016. doi:10.1371/journal.pdig.0000016

Article PubMed Google Scholar
Graafsma J, Murphy RM, van de Garde EM, et al. The use of artificial intelligence to optimize medication alerts generated by clinical decision support systems: a scoping review. J Am Med Inform Assoc. 2024;31(6):1411-1422. doi:10.1093/jamia/ocae076

Article PubMed Google Scholar
Van Baalen S, Boon M, Verhoef P. From clinical decision support to clinical reasoning support systems. J Eval Clin Pract. 2021;27(3):520-528. doi:10.1111/jep.13541

Article PubMed Google Scholar
Sezen AI, Ozdemir MS, Ozdemir YE. Comparative evaluation of ChatGPT and Gemini in answering questions on vaccines and immunization. Genel Tip Derg. 2025;35(5):1011-1019. doi:10.54005/geneltip.1735723

Article PubMed Google Scholar
Turkiye Endokrinoloji ve Metabolizma Dernegi. Diabetes mellitus ve komplikasyonlarinin tani, tedavi ve izlem kilavuzu. Published 2024. Accessed June 26, 2024.

PubMed Google Scholar
Pornwattanakavee S, Leelakanok N, Todsarot T, et al. Effectiveness of ChatGPT, Google Gemini, and Microsoft Copilot in answering Thai drug information queries: cross-sectional study. JMIR AI. 2025;4:e79751. doi:10.2196/79751

Article PubMed Google Scholar
Ordak M, Adamczyk J, Oskroba A, et al. Evaluation of the accuracy and reliability of responses generated by artificial intelligence related to clinical pharmacology. J Clin Med. 2025;14(21):7563. doi:10.3390/jcm14217563

Article PubMed Google Scholar
Tran T, Le U, Phan V. Evaluating the accuracy and educational potential of generative AI models in pharmacy education: a comparative analysis of ChatGPT and Gemini across Bloom’s taxonomy. Pharmacy (Basel). 2025;14(1):1. doi:10.3390/pharmacy14010001

Article PubMed Google Scholar
Karnan N, Nair S, Fidai FF, et al. Evaluating the efficacy of ChatGPT vs Google Gemini in generating patient education materials for GLP-1 receptor agonists (semaglutide, liraglutide, tirzepatide): a cross-sectional study. Cureus. 2025;17(4):e81993. doi:10.7759/cureus.81993

Article PubMed Google Scholar

Additional Information

Publisher’s Note
Bayrakol MP remains neutral with regard to jurisdictional and institutional claims.

Rights and Permissions

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). To view a copy of the license, visit https://creativecommons.org/licenses/by-nc/4.0/

About This Article

How to Cite This Article

Cengizhan Ceylan. Evaluation of artificial intelligence tools in diabetes treatment management: a comparison of ChatGPT 5.2 and Gemini Pro. Ann Clin Anal Med 2026;17(7):00. doi:10.4328/ACAM.50111

Received:: March 3, 2026
Accepted:: April 17, 2026
Published Online:: May 4, 2026
Printed:: July 1, 2026