Background

F1000Research

2046-1402

F1000 Research Limited

London, UK

10.12688/f1000research.179913.2

Research Article

Articles

Web-based Machine Learning Model for Predicting Chronic Kidney Disease in Patients with Type 2 Diabetes Mellitus: A Multicenter Study

[version 2; peer review: 1 approved, 1 approved with reservations]

Kresnowati

Lily

Conceptualization Data Curation Formal Analysis Funding Acquisition Investigation Methodology Project Administration Resources Software Validation Visualization Writing – Original Draft Preparation Writing – Review & Editing a 1 Suhartono

Suhartono

Supervision Validation Writing – Original Draft Preparation Writing – Review & Editing 2 Shaluhiyah

Zahroh

Supervision Validation Writing – Original Draft Preparation Writing – Review & Editing https://orcid.org/0000-0003-2663-7918 3 Widjanarko

Bagoes

Supervision Validation Writing – Original Draft Preparation Writing – Review & Editing 3 Hasan

Faizul

Conceptualization Data Curation Formal Analysis Funding Acquisition Investigation Methodology Project Administration Resources Software Supervision Validation Visualization Writing – Original Draft Preparation Writing – Review & Editing https://orcid.org/0000-0001-7802-1328 b 4 1Doctoral Program of Public Health, Faculty of Public Health, Universitas Diponegoro, Semarang, Central Java, Indonesia 2Department of Environmental Health, Diponegoro University School of Public Health, Semarang, Central Java, Indonesia 3Department of Health Promotion and Behavioral Science, Faculty of Public Health, Universitas Diponegoro, Semarang, Central Java, Indonesia 4Faculty of Nursing, Chulalongkorn University, Bangkok, Bangkok, Thailand

a lilykresnowati@undip.ac.id b faizul.h@chula.ac.th

No competing interests were disclosed.

10 6 2026

2026

690

3 6 2026

2026

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Background

Chronic kidney disease (CKD) is a serious complication of type 2 diabetes (T2DM), particularly in low- and middle-income countries with limited access to early diagnosis. Predicting CKD risk using routine clinical data could enable earlier nephroprotective care. This study developed and internally validated a machine learning-based web application to predict incident CKD among T2DM patients in Indonesia’s national health insurance program (Prolanis).

Methods

A machine learning prediction model was conducted using BPJS Prolanis data (2017–2023). Adults (≥18 years) with T2DM and no prior CKD were included. Six algorithms (Logistic Regression, Random Forest, Decision Tree, XGBoost, LightGBM, CatBoost) were trained on 80% of the data and internally validated on the remaining 20% to predict CKD. Performance was assessed via accuracy, precision, recall, F1 score, and AUC. SHAP was used for interpretability.

Results

Among 7,581 individuals, 864 (11.4%) developed CKD. CatBoost achieved the best performance (AUC = 0.847, accuracy = 0.797, precision = 0.643, recall = 0.525, F1 = 0.578). SHAP identified rapid-acting insulin analogues, amlodipine, furosemide, high blood urea nitrogen, and folic acid as key positive predictors. Advanced age and higher comorbidity burden increased risk, while chronic ischaemic heart disease and dental pulp diseases appeared protective—likely due to healthcare utilization bias. A web-based risk calculator was developed.

Conclusions

The CatBoost-based web app demonstrated strong discriminative ability for predicting incident CKD in T2DM patients using routine claims data. This tool may support risk stratification in primary care settings across Indonesia and similar low-resource environments.

chronic kidney disease type 2 diabetes mellitus machine learning prediction model web-based calculator.

The author(s) declared that no grants were involved in supporting this work.

Revised Amendments from Version 1

This updated version of the manuscript incorporates essential structural and reporting enhancements to fully comply with TRIPOD statement criteria and peer-review recommendations. Table 1 has been revised to stratify baseline clinical data alongside corresponding $p$-values, a new participant flow diagram has been incorporated as Figure 1, and all performance indicators have been standardised to three decimal places. The content has been thoroughly checked for formatting consistency, and the conclusion has been updated to explicitly present the web-based calculator as a preliminary screening tool poised for potential incorporation into Indonesia's national BPJS P-Care platform.

Introduction

Chronic kidney disease (CKD) constitutes a significant global public health challenge, with particularly concerning trends observed in low- and middle-income nations. ^{1,
2} Characterised by a gradual decline in renal function, CKD significantly elevates the risks of cardiovascular incidents, end-stage renal disease (ESRD), hospitalisation, and early mortality. ³ Approximately 10% of persons globally suffer from CKD, with diabetes mellitus and hypertension responsible for more than half of the cases. ⁴ The prevalence of CKD in Indonesia has consistently increased over the last decade, primarily due to the rising incidence of type 2 diabetes mellitus (T2DM), which currently impacts around 10.7 million adults nationwide. ^{5,
6} The economic and societal burdens are significant: ^{7,
8} chronic kidney disease necessitates continuous care, regular monitoring, and, in advanced stages, expensive renal replacement therapy.

Individuals with T2DM are at an elevated risk of developing CKD, referred to as diabetic kidney disease (DKD). Between 20% and 40% of patients with T2DM will develop DKD during their lifetime, establishing diabetes as the predominant cause of End-Stage Renal Disease in numerous countries. ^{9–
12} The etiology of DKD include intricate connections among hyperglycemia, haemodynamic changes, inflammation, and genetic predisposition. ^{13,
14} Hyperglycemia triggers glomerular hyperfiltration, oxidative stress, and the buildup of advanced glycation end-products, ultimately leading to glomerulosclerosis, tubulointerstitial fibrosis, and progressive nephron loss. ^{14,
15} Well-defined risk factors for DKD encompass advanced age, male gender, prolonged diabetes duration, higher haemoglobin A1c levels, hypertension, dyslipidaemia, obesity, and tobacco use. ^{9,
16} Timely diagnosis of T2DM patients at increased CKD risk is essential for the initiation of effective nephroprotective strategies: stringent glycaemic management, blood pressure control through RAAS inhibition, and lifestyle modifications.

Conventional DKD risk prediction techniques predominantly employ logistic regression or Cox proportional hazards, utilising a restricted set of predetermined factors. Despite achieving moderate performance (AUC values generally between 0.70 and 0.80), these models exhibit significant limitations: they presuppose linear relationships between predictors and outcomes, failing to adequately represent the intricate biology of DKD; they frequently necessitate comprehensive data on all predictors, which is often inaccessible in resource-limited environments; and many are developed from specific clinical trial populations or small, single-center cohorts, thereby raising concerns regarding their generalisability. ^{17,
18}

Machine learning (ML) provides a robust alternative for predicting clinical risks. Ensemble tree-based methodologies—specifically CatBoost, XGBoost, and LightGBM—exhibit considerable promise owing to their resilience against outliers, capacity to manage heterogeneous data types, and intrinsic mechanisms for addressing missing values, frequently surpassing logistic regression with AUCs ranging from 0.80 to 0.90. ^{19,
20} Nonetheless, significant deficiencies persist: the majority of ML research originates from high-income nations, with scant contributions from Southeast Asia, particularly Indonesia; numerous models depend on laboratory metrics that are not easily accessible in primary care settings; practical clinical application featuring user-friendly web interfaces is restricted; and model interpretability has only recently been tackled via SHAP analysis. ²¹

Indonesia’s national health insurance system (BPJS Kesehatan) encompasses roughly 80% of the populace and administers Prolanis, a chronic disease management initiative that methodically enrols T2DM patients and produces extensive real-world clinical data. Nonetheless, no machine learning-based instrument has been specially designed for the Indonesian T2DM population to forecast incident chronic kidney disease utilising frequently gathered Prolanis data. This study sought to create and internally validate a machine learning predictive model for incident chronic kidney disease in patients with T2DM enrolled in BPJS Indonesia’s national health insurance chronic disease management program (Prolanis), utilising routinely gathered demographic, clinical, and medication data. The study aimed to improve interpretability through SHAP analysis and to offer a practical web-based calculator similar to previously published AI risk assessment tools. ²²

Material and methods Study design and data source

We predictive ML study utilising a sample dataset from Indonesia’s Badan Penyelenggara Jaminan Sosial (BPJS) Chronic Disease Management Program, referred to as Prolanis. The collection included anonymised patient data gathered from January 1, 2017, to December 31, 2023. The BPJS Prolanis database comprises organised data on patient demographics, clinical diagnoses (classified according to the International Classification of Diseases, 10th Revision [ICD-10]), medication prescriptions (classified using the Anatomical Therapeutic Chemical [ATC] system), laboratory results, and outpatient visit records. The research adhered to the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) standards.

Study population

The trial cohort consisted of patients diagnosed with T2DM who engaged in the Prolanis program. Adults (≥18 years) were included if they had a minimum of two documented outpatient visits during the study period. The index date was established as the date of the initial T2DM diagnosis documented within the observation period.

Patients were excluded if they had a previous diagnosis of CKD at the start of their monitoring period. CKD was categorised with ICD-10 codes N18.1–N18.9 (chronic kidney disease stages 1–5, unspecified), alongside Z49 (dialysis care) and Z99.2 (dependency on renal dialysis). Furthermore, patients with absent outcome data or insufficient essential characteristic information were removed from the study.

Outcome definition

The primary outcome of this study was the onset of incident CKD after the diagnostic date of T2DM. CKD was established by a combination of diagnostic and laboratory criteria to guarantee thorough case identification. A patient was classified as having developed CKD if they met one of the following criteria: (1) a new ICD-10 diagnosis code for CKD (N18.1–N18.9, Z49 for dialysis care, or Z99.2 for renal dialysis dependence) recorded during the follow-up period, or (2) laboratory evidence of renal dysfunction noted in structured clinical documentation, defined as an estimated glomerular filtration rate (eGFR) below 60 mL/min/1.73m ² or the presence of albuminuria. Patients were monitored from the index date until the occurrence of the earliest event: diagnosis of CKD, death, conclusion of accessible data (December 31, 2023), or the last recorded clinical visit, whichever transpired first.

Data collection and input features

Data at the patient level were retrieved from the BPJS Prolanis database, including demographic details, clinical comorbidities, drug prescriptions, and available laboratory findings. Demographic characteristics encompassed age as a continuous variable (quantified in years), sex (male or female), and overweight or obesity status, defined as a body mass index of 25 kg/m ² or higher when such data were accessible. Comorbidities were detected with ICD-10 codes recorded on the index date or within the year prior; they encompassed hypertension, cardiovascular disease, heart failure, and the aggregate of unique diagnoses as an indicator of overall illness burden. Medication utilisation was assessed using Anatomical Therapeutic Chemical (ATC) codes, with a patient deemed exposed if they had a minimum of one prescription fill for a specific medication within the six months before to or subsequent to the index date. Medications of interest encompassed antidiabetic therapies (notably rapid-acting insulin analogues), aspirin, proton pump inhibitors (PPIs), non-steroidal anti-inflammatory drugs (NSAIDs), amlodipine, furosemide, folic acid, and other antihypertensive or antiplatelet agents. Laboratory data, particularly blood urea nitrogen (BUN) levels, were obtained where accessible. All features, with the exception of age, were binarized to indicate presence or absence, although age was maintained as a continuous variable. Only instances with complete data on essential features and the outcome were included in the final analysis.

Model development and internal validation

We developed and internally validated six ML algorithms aimed to predict incident CKD: Logistic Regression, Random Forest, Decision Tree, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), and Categorical Boosting (CatBoost). All models were executed utilising the PyCaret open-source machine learning package in Python. The complete dataset was randomly divided into a training set consisting of 80% of the patients and an internal validation (test) set including the remaining 20%, with stratification by outcome status to maintain the proportion of CKD cases in both subsets. Hyperparameters for each algorithm were calibrated using 10-fold cross-validation on the training dataset to enhance model performance and mitigate overfitting. Subsequent to hyperparameter adjustment, each model was trained on the training set and subsequently evaluated on the hold-out test set to determine its predictive efficacy on novel data. The CatBoost model, exhibiting superior overall performance, was chosen as the final model for subsequent interpretation and feature importance analysis.

Performance metrics

The model’s performance was evaluated using a thorough array of conventional classification measures to facilitate comparison among the six algorithms. Accuracy was determined as the ratio of right predictions (true positives plus true negatives) to total predictions. Precision, defined as the ratio of true positives to the sum of true positives and false positives, was employed to reduce false-positive predictions. Recall, or sensitivity, was computed as the ratio of true positives to the total of true positives and false negatives, indicating the model’s proficiency in accurately identifying actual CKD cases. The F1 score, defined as the harmonic mean of precision and recall, is calculated as 2 times (precision × recall) divided by (precision + recall), offering a balanced assessment of model accuracy that considers both false positives and false negatives. The area under the receiver operating characteristic curve (ROC-AUC) was employed to evaluate the model’s capacity to differentiate between patients who developed CKD and those who did not across various classification thresholds; an AUC of 0.5 signifies random performance, whereas an AUC of 1.0 denotes perfect discrimination.

Feature importance and explainability

SHapley Additive exPlanations (SHAP) were employed to explain model predictions and assess the contribution of each feature to the ultimate model output. A summary plot was created to illustrate the ten most significant elements. In the SHAP summary graphic, red signifies elevated feature values (augmenting CKD probability), whereas blue denotes diminished feature values (reducing CKD probability). SHAP values were calculated for the CatBoost model, which exhibited the greatest ROC-AUC among all evaluated algorithms.

Statistical analysis

All statistical analyses were conducted utilising Python (version 3.9) alongside the PyCaret, scikit-learn, and SHAP libraries. Continuous data are expressed as mean ± standard deviation (SD), whereas categorical variables are represented as frequencies and percentages. Baseline parameters were compared between individuals who developed CKD and those who did not, utilising independent t-tests for continuous variables and chi-square testing for categorical data. A two-tailed p-value of less than 0.05 was deemed statistically significant. No correction for multiple comparisons was implemented owing to the exploratory character of the model construction.

Ethics approval and consent to participate

The study utilised de-identified secondary data obtained from the BPJS Kesehatan Prolanis database. Ethical approval was obtained from the Institutional Review Board of the Faculty of Public Health at Diponegoro University (approval number: 1.EA/KEPK-FKM/2026). The ethics committee waived the requirement for informed consent due to the study’s reliance on retrospective analysis of anonymised secondary data, which entailed no direct interaction with human participants and did not provide researchers with any identifying information at any point. Consequently, no written nor verbal informed permission was acquired, in accordance with the committee’s waiver.

Results Study population and baseline characteristics

A total of 7,581 patients were included in the final analysis ( Figure 1). The average age of the cohort was 54.2 years (SD ± 9.0), with a predominance of females (54.1%). The average number of hospital visits per patient was 90.8, and the average number of recorded diagnoses was 30.6. 27.2% of the population exhibited overweight or obesity. Hypertension was the predominant comorbidity at 9.7%, succeeded by cardiovascular disease at 5.1% and heart failure at 0.3%. Antidiabetics were prescribed to 85.6% of patients, aspirin to 51.2%, proton pump inhibitors to 17.4%, and NSAIDs to 1.6%. CKD was identified in 864 patients (11.4%) ( Table 1).

Figure 1. Participant flow chart.

N = number of patients; CKD = chronic kidney disease.

Table 1. Demographic and outcome characteristic.

Variables	n (%)
Total patients	7581 (100)
Age, years (mean ± SD)	54.2 ± 9.0
Gender
Male	3477 (45.9)
Female	4104 (54.1)
Visit numbers, mean	90.8
Overweight/Obesity	2061 (27.2)
Comorbidity
Hypertension	732 (9.7)
Cardiovascular disease	385 (5.1)
Heart failure	22 (0.3)
Diagnoses count, mean	30.6
Drugs
Antidiabetics	6493 (85.6)
Aspirin	3879 (51.2)
Proton Pump Inhibitors	1320 (17.4)
NSAIDs	125 (1.6)
Outcome of having CKD
Yes	864 (11.4)
No	6717 (88.6)

CKD = Chronic Kidney Disease; n = number; NSAIDs = Non Steroid Anti Inflammatory Drugs; SD = standard deviation.

Model development and internal validation

Six machine learning algorithms were developed and subjected to internal validation. The CatBoost classifier attained the greatest ROC-AUC of 0.847, succeeded by Random Forest at 0.840, LightGBM at 0.836, Logistic Regression at 0.826, XGBoost at 0.821, and Decision Tree at 0.697. Regarding accuracy, Random Forest exhibited the highest performance at 0.810, whilst both CatBoost and Logistic Regression attained an accuracy of 0.797. CatBoost exhibited a precision of 0.643, a recall of 0.525, and an F1 score of 0.578 ( Table 2). The receiver operating characteristic (ROC) curves for all six models indicated that CatBoost had the highest true-positive rate over the majority of false-positive rate thresholds ( Figure 2).

Table 2. Performance comparison of the six machine learning models.

Model	Accuracy	Precision	Recall	F1	ROC_AUC
Lr	0.796992	0.651376	0.503546	0.568000	0.826196
Rf	0.810150	0.663934	0.574468	0.615970	0.839900
Dt	0.757519	0.540541	0.567376	0.553633	0.696731
Xgboost	0.798872	0.639344	0.553191	0.593156	0.820736
Lgbm	0.791353	0.622951	0.539007	0.577947	0.835673
Catboost	0.796992	0.643478	0.524823	0.578125	0.847373

Catboost = Categorical Boosting; Dt = Decision Tree; F1 = F1-score; Lgbm = Light Gradient Boosting Machine; Lr = Logistic Regression; Rf = Random Forest; ROC_AUC = Area Under the Receiver Operating Characteristic Curve; Xgboost = Extreme Gradient Boosting.

Figure 2. Receiver operating characteristic curve of top 5 model.

AUC = area under the curve; ROC = receiver operating characteristic curve.

Feature importance and SHAP analysis

SHAP research determined the ten most significant features influencing model predictions. Elevated levels (red) of rapid-acting insulin analogue utilisation were significantly correlated with a heightened likelihood of CKD, whereas diminished levels (blue) were linked to a reduced risk. Likewise, elevated levels of amlodipine, furosemide, folic acid, and BUN augmented the probability of CKD prediction. Conversely, advanced age, chronic ischaemic heart disease, and conditions affecting the pulp and periapical tissues were linked to a diminished predicted chance of CKD ( Figure 3). The protective association of chronic ischaemic heart disease and dental pulp problems may indicate healthcare-seeking behaviours or unaccounted confounding factors in claims-based data.

Figure 3. Shapley additive explanations analysis summary plot for top 10 feature important.

The red color in upper right side indicating the feature that has more possibility on the development of cognitive impairment. The blue upper left side indicating the features that has less possibility on the development of cognitive impairment.

Web-based calculator

A user-friendly web-based risk calculator was built utilising the CatBoost model to enhance clinical application. The calculator enables clinicians to input patient demographics, comorbidities, medications, and BUN values to get a personalised CKD risk assessment. The interface presents the primary contributing elements for each prediction, hence improving model transparency and clinical utility ( Figure 4).

Figure 4. The web-based machine learning calculator system.

The data entry for “Medication” and “Diagnoses” can be more than one.

Discussion

To the best of our knowledge, this study is among the first to develop and validate a web-based machine learning tool for predicting incident CKD in patients with T2DM participating in Prolanis program in Indonesia. In this study of 7,581 patients, we established that the CatBoost classifier attained the superior discriminative performance, evidenced by a ROC-AUC of 0.847, an accuracy of 0.797, and an F1 score of 0.578. The findings indicate that machine learning models, especially gradient boosting algorithms, might function as efficient screening tools able to identify high-risk T2DM patients who could benefit from early nephroprotective therapies.

Our results align with prior research utilising machine learning to predict CKD in diabetic cohorts. A thorough evaluation indicated that machine learning models for predicting diabetic kidney disease generally attain AUC values between 0.80 and 0.90, with gradient boosting techniques frequently surpassing conventional logistic regression. ¹⁹ A study by Song et al. (2020) utilising the Korean National Health Insurance Service database revealed that XGBoost attained an AUC of 0.84 for predicting CKD progression in T2DM patients, closely corresponding to our CatBoost AUC of 0.847. ²⁰ The equivalent efficacy of CatBoost in our analysis underscores the increasing agreement that ensemble tree-based techniques are adept at managing the high-dimensional, heterogeneous clinical data characteristic of real-world electronic health records.

However, our model’s recall (0.525) and F1 score (0.578) were moderate, suggesting that although the model has reasonable overall accuracy, it exhibits limited sensitivity in identifying all real CKD cases. This phenomenon is prevalent in imbalanced datasets where the result (CKD) manifests in merely 11.4% of the population, as observed in our cohort. Similar issues have been documented in other studies; for example, research utilising a Japanese claims database indicated a recall of 0.58 for CKD prediction employing random forests. ²³ These findings emphasise the necessity for prudence in utilising such models as exclusive decision-making instruments and stress the significance of integrating machine learning predictions with clinical expertise.

Our SHAP analysis discovered numerous significant predictors of incident CKD, many of which are physiologically plausible and align with current clinical knowledge. The utilisation of rapid-acting insulin analogues was one of the most significant predictors of CKD progression, with elevated values (shown in red on the SHAP summary plot) correlating with a heightened chance of CKD occurrence. This finding presumably indicates confounding by indication rather than a direct nephrotoxic effect of insulin. Patients necessitating rapid-acting insulin generally exhibit prolonged diabetes duration, suboptimal glycaemic management, and heightened insulin resistance—each serving as separate risk factors for diabetic kidney damage. The United Kingdom Prospective Diabetes Study (UKPDS) established that intensive glucose-lowering therapy, encompassing insulin, mitigated the progression of microalbuminuria, indicating that insulin administration serves as an indicator of disease severity rather than a causative factor in CKD. ^{24,
25}

The finding that the usage of amlodipine and furosemide elevates the probability of CKD can be interpreted as a reflection of the underlying disease burden. Amlodipine, a calcium channel blocker, is frequently used for hypertension, which impacts roughly 9.7% of our cohort and serves as a significant risk factor for the advancement of CKD. Furosemide, a loop diuretic, is frequently used to patients experiencing fluid overload, which may indicate deteriorating renal function. Although certain experimental studies have suggested that calcium channel blockers may expose glomerular capillaries to elevated systemic pressures, findings from major clinical trials, including the African American Study of Kidney Disease and Hypertension (AASK), indicate that CCB-based therapy may provide less renal protection compared with renin–angiotensin system blockade, particularly in patients with hypertensive CKD. ²⁶ Consequently, while these agents remain appropriate when clinically indicated, their use—especially as monotherapy—should be accompanied by careful monitoring of renal function, given their comparatively limited renoprotective effects.

In contrast, our SHAP analysis indicated that chronic ischaemic heart disease and conditions affecting the pulp and periapical tissues seemed to confer protective benefits against CKD, a result that necessitates meticulous interpretation. The observed protective effect of chronic ischaemic heart disease may be attributed to healthcare utilisation bias. Patients with diagnosed cardiovascular disease generally experience more frequent clinician visits, enhanced medication adherence (including antihypertensive and antiplatelet medications), and more rigorous control of risk factors than patients without these diagnoses. Research indicates that a high adherence rate to antihypertensive drugs (≥80%) correlates with a 33% decrease in the risk of end-stage renal disease. ^{27–
29} Likewise, the protective influence of pulp and periapical tissue illnesses may indicate superior health-seeking behaviour, as individuals who obtain regular dental care are likely more proactive in managing diabetes. ³⁰ Conversely, non-surgical periodontal therapy has demonstrated the capacity to diminish systemic inflammation, perhaps decelerating the progression of CKD, although residual confounding remains a possibility.

The CatBoost model exhibited strong internal validation performance, with a ROC-AUC of 0.847, which is advantageous compared to previously published prediction methods for CKD. A systematic study assessed CKD prediction models and found that most attained AUC values ranging from 0.70 to 0.85, ¹⁷ positioning our model within the higher spectrum of available methods. Furthermore, our utilisation of standard administrative claims data—rather than specialised laboratory assessments or imaging—augments the model’s scalability and practical relevance in low- and middle-income contexts such as Indonesia, where access to advanced diagnostic testing may be constrained.

Nonetheless, particular aspects of model performance warrant examination. The CatBoost model’s precision (0.643) significantly exceeded its recall (0.525), signifying that while the model predicts CKD, it is accurate about 64% of the time, although it fails to identify nearly half of the actual CKD cases. The compromise between precision and recall is permissible in a screening environment, when the objective is to identify a group of high-risk patients for confirmatory tests rather than to establish definite diagnoses. ³¹ The web-based tool we created enables doctors to modify the classification threshold according to local resources and preferences; for example, a reduced threshold enhances recall (identifying more true cases) while compromising precision (resulting in more false positives necessitating follow-up testing).

This study possesses numerous significant strengths. The utilisation of an extensive, real-world dataset from Indonesia’s national health insurance program (BPJS) ensures significant validity for the Indonesian population and presents a model that can be incorporated into current digital health frameworks. The incorporation of several machine learning methods, accompanied by systematic hyperparameter optimisation and internal validation, adheres to best-practice guidelines for the building of predictive models. Third, employing SHAP analysis improves model interpretability, countering a prevalent critique of “black box” machine learning models in clinical medicine. ²¹ The development of an intuitive web-based calculator enables prospective integration into clinical practice.

However, some limitations must be recognised. The retrospective cohort design includes potential biases associated with secondary data analysis, such as indication bias (as noted with insulin and antihypertensive drugs) and detection bias (patients with more frequent visits are more likely to receive a diagnosis of CKD). Secondly, the dataset was constrained to variables typically gathered in claims data; we could not include significant clinical factors such as smoking status, alcohol intake, physical activity, dietary habits, family history of kidney disease, comprehensive blood pressure readings, haemoglobin A1c levels, or urine albumin-to-creatinine ratios. The lack of these recognised risk indicators may have constrained model performance, especially recall. Third, the utilisation of ICD-10 codes for outcome determination may have led to misclassification bias, given chronic kidney disease is recognised to be under-represented in administrative databases. However, our composite definition, which incorporates laboratory criteria, partially alleviates this problem. The dataset was derived from a singular health insurance program in Indonesia, perhaps constraining its generalisability to other populations with varying genetic backgrounds, healthcare systems, and practice patterns. External validation with separate datasets from other areas or nations is crucial prior to extensive implementation. Fifth, the complete-case analysis, which excluded patients with missing data, may have created selection bias if the missingness was not entirely random. Sixth, we did not conduct external validation utilising a temporally or geographically separate dataset, which is a crucial subsequent step to verify the model’s generalisability and resilience against overfitting. Finally, The moderate recall noted in this study indicates the intrinsically uneven character of the dataset, considering the 11.4% prevalence of CKD. To address this in future iterations, technical solutions such as employing sophisticated oversampling methods (e.g., Synthetic Minority Over-sampling Technique, SMOTE) or systematically modifying algorithmic class weights could be investigated to enhance model sensitivity.

Future study must emphasise the external validation of the CatBoost model with independent datasets from various healthcare systems in Southeast Asia and beyond. Prospective validation studies would evaluate the model’s real-world efficacy and clinical applicability, encompassing its influence on clinician decision-making and patient outcomes. Subsequent model enhancement could integrate supplementary predictors, including longitudinal changes in eGFR, albuminuria, and haemoglobin A1c, thereby augmenting predictive precision. Ultimately, interventional studies are required to ascertain if machine learning-guided risk classification results in earlier nephrology referrals, enhanced blood pressure and glucose management, and ultimately a decreased incidence of end-stage renal disease in high-risk T2DM patients.

Conclusions

This study effectively constructed and internally validated a CatBoost-based machine learning model for predicting incident CKD in patients with T2DM, utilising routinely obtained claims data from Indonesia’s BPJS Prolanis program. The model exhibited strong discriminative capability (AUC 0.847) and recognised clinically significant risk factors, such as rapid-acting insulin use, amlodipine, furosemide, and increased BUN levels. The model’s moderate recall indicates potential for enhancement, yet its high precision and interpretability highlight its utility as a preliminary screening tool to inform clinical suspicion and identify high-risk individuals for targeted preventive interventions, rather than serving as an independent diagnostic instrument. A web-based calculator was created to enhance clinical application. This web-based calculator is ideal for Indonesia’s primary care digital environment as a preliminary screening tool. The CatBoost algorithm embedded directly into the national BPJS P-Care system would allow general practitioners to perform real-time risk stratification during routine visits and provide timely nephroprotective interventions for high-risk type 2 diabetes patients in the Prolanis program. Future external validation and prospective implementation studies are necessary prior to extensive clinical deployment.

Data availability statement

All datasets supporting this article are accessible via the following link: https://doi.org/10.5281/zenodo.19634415. ³²

Data are available under the terms of the Creative Commons Attribution 4.0 International.

Extended data

Zenodo: Supplementary data: https://doi.org/10.5281/zenodo.19521819. ³³

This project contains the following extended data: •

Tripod Checklist.docx.

Data are available under the terms of the Creative Commons Attribution 4.0 International.

Acknowledgements

We thank BPJS Kesehatan for access to the Prolanis dataset and the Faculty of Public Health, Universitas Diponegoro, for administrative and technical support.

References 1

Francis

Harhay

Ong

ACM

: Chronic kidney disease and the global public health agenda: an international consensus. Nat. Rev. Nephrol. 2024;20(7):473–485. 38570631

10.1038/s41581-024-00820-6

Liang

Liu

: Burden of chronic kidney disease and its risk-attributable burden in 137 low-and middle-income countries, 1990–2019: results from the global burden of disease study 2019. BMC Nephrol. 2022;23(1):17. 34986789

10.1186/s12882-021-02597-3

PMC8727977

Bikbov

Purcell

Levey

: Global, regional, and national burden of chronic kidney disease, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2020;395(10225):709–733. 32061315

10.1016/S0140-6736(20)30045-3

PMC7049905

Kovesdy

: Epidemiology of chronic kidney disease: an update 2022. Kidney Int. Suppl. 2022;12(1):7–11. 35529086

10.1016/j.kisu.2021.11.003

PMC9073222

Hustrini

Susalit

Harimurti

: Prevalence, incidence and risk factors of chronic kidney disease in people with diabetes and hypertension, and the prognosis and kidney function decline in Indonesia: a multicentre cross-sectional study in primary care centres. BMJ Open. 2025;15(10):e103779. 41125263

10.1136/bmjopen-2025-103779

PMC12548581

Duncan

Magliano

Boyko

: IDF diabetes atlas 11th edition 2025: global prevalence and projections for 2050. Oxford University Press;2025; vol.41:7–9. 10.1093/ndt/gfaf177

Johnston-Webber

Bencomo-Bermudez

Wharton

: A conceptual framework to assess the health, socioeconomic and environmental burden of chronic kidney disease. Health Policy. 2025;152:105244. 39827831

10.1016/j.healthpol.2024.105244

Jha

al-Ghamdi

SMG

: Global economic burden associated with chronic kidney disease: a pragmatic review of medical costs for the inside CKD research programme. Adv. Ther. 2023;40(10):4405–4420. 37493856

10.1007/s12325-023-02608-9

PMC10499937

Thomas

Brownlee

Susztak

: Diabetic kidney disease. Nat. Rev. Dis. Prim. 2015;1(1):15018. 27188921

10.1038/nrdp.2015.18

PMC7724636

Thomas

: The global burden of diabetic kidney disease: time trends and gender gaps. Curr. Diab. Rep. 2019;19(4):18. 30826889

10.1007/s11892-019-1133-6

Hoogeveen

: The epidemiology of diabetic kidney disease. Kidney and Dialysis. 2022;2(3):433–442. 10.3390/kidneydial2030038

Chen

Lee

: Diabetic kidney disease: challenges, advances, and opportunities. Kidney diseases. 2020;6(4):215–225. 32903946

10.1159/000506634

PMC7445658

Alicic

Rooney

Tuttle

: Diabetic kidney disease: challenges, progress, and possibilities. Clin. J. Am. Soc. Nephrol. 2017;12(12):2032–2045. 28522654

10.2215/CJN.11491116

PMC5718284

Hauwanga

Abdalhamed

Ezike

: The pathophysiology and vascular complications of diabetes in chronic kidney disease: A comprehensive review. Cureus. 2024;16(12). 10.7759/cureus.76498

Ding

Andoh

: The mechanism of hyperglycemia-induced renal cell injury in diabetic nephropathy disease: an update. Life. 2023;13(2):539. 36836895

10.3390/life13020539

PMC9967500

Zhang

: Association between the fatty liver index, metabolic dysfunction-associated steatotic liver disease, and the risk of kidney stones. Kidney Blood Press. Res. 2025;50(1):115–130. 39746337

10.1159/000543404

PMC11844708

Collins

Omar

Shanyinde

: A systematic review finds prediction models for chronic kidney disease were poorly reported and often developed using inappropriate methods. J. Clin. Epidemiol. 2013;66(3):268–277. 23116690

10.1016/j.jclinepi.2012.06.020

Feng

Wang

Jun

: Characterization of risk prediction models for acute kidney injury: a systematic review and meta-analysis. JAMA Netw. Open. 2023;6(5):e2313359. 37184837

10.1001/jamanetworkopen.2023.13359

PMC12011341

Ravizza

Huschto

Adamov

: Predicting the early risk of chronic kidney disease in patients with diabetes using real-world data. Nat. Med. 2019;25(1):57–59. 30617317

10.1038/s41591-018-0239-8

Song

Waitman

ASL

: Longitudinal risk prediction of chronic kidney disease in diabetic patients using a temporal-enhanced gradient boosting machine: retrospective cohort study. JMIR Med. Inform. 2020;8(1):e15510. 32012067

10.2196/15510

PMC7055762

Lundberg

Lee

S-I

: A unified approach to interpreting model predictions. Adv. Neural Inf. Proces. Syst. 2017;30.

Hasan

Muhtar

: Web-based artificial intelligence to predict cognitive impairment following stroke: A multicenter study. J. Stroke Cerebrovasc. Dis. 2024;33(8):107826. 38908612

10.1016/j.jstrokecerebrovasdis.2024.107826

Makino

Yoshimoto

Ono

: Artificial intelligence predicts the progression of diabetic kidney disease using big data machine learning. Sci. Rep. 2019;9(1):11862. 31413285

10.1038/s41598-019-48263-5

PMC6694113

Group, U.P.D.S.: Intensive blood-glucose control with sulphonylureas or insulin compared with conventional treatment and risk of complications in patients with type 2 diabetes (UKPDS 33). Lancet. 1998;352(9131):837–853. 10.1016/S0140-6736(98)07019-6

Usman

: Multiple Risk Factor Control in Individuals with Type 2 Diabetes and Microalbuminuria. University of Leicester;2020.

Clemmer

Pruett

Hester

: Predicting chronic responses to calcium channel blockade with a virtual population of African Americans with hypertensive chronic kidney disease. Frontiers in Systems Biology. 2024;4:1327357. 39606582

10.3389/fsysb.2024.1327357

PMC11600446

Roy

White-Guay

Dorais

: Adherence to antihypertensive agents improves risk reduction of end-stage renal disease. Kidney Int. 2013;84(3):570–577. 23698228

10.1038/ki.2013.103

Jia

Wang

: Systematic Review and Network Meta-Analysis of the Comparative Effectiveness of Adherence Enhancement Strategies in Chronic Kidney Disease. J. Evid. Based Med. 2025;18(3):e70078. 40994060

10.1111/jebm.70078

Burnier

Egan

: Adherence in hypertension: a review of prevalence, risk factors, impact, and management. Circ. Res. 2019;124(7):1124–1140. 10.1161/CIRCRESAHA.118.313220

Ghanem

Nagy

: Oral health's role in diabetes risk: a cross-sectional study with sociodemographic and lifestyle insights. Front. Endocrinol. (Lausanne). 2024;15:1342783. 38516406

10.3389/fendo.2024.1342783

PMC10955347

Powers

: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061. 2020.

Hasan

: ML Dataset.[Data set]. Zenodo. 2026. 10.5281/zenodo.19634415

Hasan

: Tripod Checklist. Zenodo. 2026. 10.5281/zenodo.19521819

10.5256/f1000research.198473.r483727

Reviewer response for version 1

Santana

Ewaldo Eder Carvalho

1 Referee 1State University of Maranhão, São Luís, Brazil

Competing interests: No competing interests were disclosed.

4 6 2026

2026

This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

recommendation

approve

This study developed and internally validated a machine learning model (CatBoost) to predict incident chronic kidney disease (CKD) among patients with type 2 diabetes mellitus (T2DM) enrolled in Indonesia’s national health insurance chronic disease management program (Prolanis).

Using a retrospective cohort of 7,581 patients, the authors compared six algorithms and found CatBoost achieved the best discriminative performance (AUC = 0.847). They used SHAP analysis to improve interpretability and built a web‑based risk calculator for clinical use. The manuscript is well structured, follows TRIPOD guidelines, and addresses an important public health gap in low‑ and middle‑income countries.

All my answers to the standard review questions were YES, meaning the study is scientifically sound, the methods are appropriate, the results are clearly presented, the discussion acknowledges limitations, and the conclusions are supported by the data.

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

Signal Processing; Machine Learning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

hasan

faizul

faculty of nursing, Chulalongkorn University, Bangkok, Bangkok, Thailand

Competing interests: none

4 6 2026

We sincerely appreciate your time and insightful evaluation.

We sincerely appreciate your favorable evaluation and are heartened that you considered the study scientifically robust, the methodologies suitable, and the results well-founded. Your acknowledgement of this work's significance for low- and middle-income countries is particularly important to us.

10.5256/f1000research.198473.r483733

Reviewer response for version 1

Pramana

Cipta

1 Referee https://orcid.org/0000-0001-8991-0147 1UNNES - Universitas Negeri Semarang, Central Java, Indonesia

Competing interests: No competing interests were disclosed.

27 5 2026

2026

recommendation

approve-with-reservations

Comments for the Authors:

Introduction (Citation Formatting): There are several typographical errors regarding text citations in the Introduction section. For instance, in paragraph 2, "3.14 Hyperglycemia triggers..." and "14.15 Well-defined risk factors..." appear to be misformatted reference numbers. Similarly, in paragraph 4, "...from 0.80 to 0.90.19.20" fuses the text with reference numbers 19 and 20. Please thoroughly revise the manuscript to ensure all citations strictly adhere to the journal's formatting guidelines.

Abstract (Methodology): To improve clarity for a broader audience, please consider adding a brief statement regarding the data-splitting ratio (e.g., 80% training and 20% testing) in the Methods subsection of the Abstract, if the word count permits.

Discussion/Conclusion Alignment: The authors correctly acknowledged the moderate recall (0.525) due to the imbalanced nature of the dataset. I highly recommend ensuring that the text in the Conclusion explicitly reinforces that this web-based calculator is intended as a preliminary screening tool to guide clinical suspicion rather than a definitive diagnostic instrument. Consistent terminology should also be checked (e.g., matching "ischaemic" vs "ischemic").

Methodology and Results Comments:

Methodology (Participant Flowchart): While the inclusion and exclusion criteria are explicitly defined, the manuscript lacks a participant flow diagram. To adhere strictly to the TRIPOD statement referenced by the authors, please provide a flowchart illustrating how the initial cohort was filtered down to the final sample size of 7,581 patients (e.g., numbers of excluded patients due to prior CKD or missing data).

Table 2 Title Typo: The title of Table 2 reads "Six nearest neighbor algorithms used in machine learning." This is factually incorrect as the models evaluated (CatBoost, XGBoost, Random Forest, etc.) are ensemble tree-based methods and logistic regression, not K-Nearest Neighbor algorithms. Please revise the title to accurately reflect the contents (e.g., "Performance comparison of the six machine learning models").

Results (Table 1 Reconstruction): In the statistical analysis section, you mentioned comparing baseline parameters between individuals who developed CKD and those who did not using t-tests and chi-square tests. However, Table 1 only displays the aggregate characteristics of the total population. Please reconstruct Table 1 to separate the data into "CKD" and "Non-CKD" groups and include the corresponding p-values to show baseline statistical significance.

Inconsistent AUC Values: There is a minor inconsistency in how AUC values are reported. For example, CatBoost's AUC is reported as 0.847373 in Table 2, 0.847 in the main text, and 0.85 in Figure 1's legend. Please ensure consistency in decimal precision across the text, tables, and figures (3 decimal places is recommended, e.g., 0.847).

Discussion and Conclusion Comments for the Authors:

Discussion (Typographical and Formatting Errors): There are a few minor formatting issues in the Discussion section. In paragraph 2 (Page 7), an erratic quotation mark appears inside the sentence: "...conventional logistic regression." A study by...". Additionally, on Page 9, a broken reference format is visible: "27- Likewise, the protective influence...". Please meticulously proofread the section to fix these citation alignment errors.

Discussion (Addressing Imbalanced Data): The authors provided an excellent and honest justification regarding the moderate recall due to the imbalanced nature of the dataset (11.4% CKD prevalence). However, it would strengthen the paper if the authors briefly suggested future technical workarounds in the limitation paragraph—such as utilizing oversampling techniques (e.g., SMOTE) or adjusting algorithmic class weights to improve sensitivity in future iterations.

Conclusions & Practical Implications: The conclusion is concise and well-balanced. However, given that this study leverages the national BPJS Prolanis dataset, the practical application could be stated more clearly. I suggest the authors add 1–2 sentences addressing how this tool could be realistically integrated into Indonesia's primary healthcare digital ecosystem (e.g., integration into the BPJS P-Care system) to assist general practitioners in risk stratification.

Is the work clearly and accurately presented and does it cite the current literature?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are the conclusions drawn adequately supported by the results?

Partly

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

Reviewer Expertise:

"No competing interests were disclosed."

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

hasan

faizul

faculty of nursing, Chulalongkorn University, Bangkok, Bangkok, Thailand

Competing interests: none

16 6 2026

Response: Thank you for your comment. We have rechecked all references throughout the entire manuscript. As per F1000Research guidelines, references follow a free format but must be consistent across the manuscript. In our study, we have adopted the AMA reference style.

Response: Thank you for this constructive suggestion. We have revised the Methods subsection of the Abstract to include the data-splitting ratio, as word count permits. The updated sentence now reads:

"Six algorithms (Logistic Regression, Random Forest, Decision Tree, XGBoost, LightGBM, CatBoost) were trained on 80% of the data and internally validated on the remaining 20% to predict CKD."

Response: Thank you for this important recommendation. We have made the following revisions:

Conclusion section: We have explicitly revised the text to state that the web-based calculator is intended as a preliminary screening tool to guide clinical suspicion, not as a definitive diagnostic instrument. The revised conclusion now reads:

“The model's moderate recall indicates potential for enhancement, yet its high precision and interpretability highlight its utility as a preliminary screening tool to inform clinical suspicion and identify high-risk individuals for targeted preventive interventions, rather than serving as an independent diagnostic instrument”

Terminology consistency: We have reviewed the entire manuscript and standardized all spelling to "ischaemic" (British English) throughout, ensuring consistency across all sections, including the Discussion, and Conclusion.

Methodology and Results Comments:

Response: Thank you for your comment. We agree that a participant flow diagram is essential for TRIPOD compliance. As you noted, we have now provided this flowchart as Figure 1 in the revised manuscript, illustrating the stepwise filtering of the initial cohort down to the final sample of 7,581 patients, including numbers excluded due to prior CKD and missing data.

Response: Thank you for catching this error. We have revised the title of Table 2 accordingly. The corrected title now reads:

"Performance comparison of the six machine learning models"

Response: We have reconstructed Table 1 to present baseline characteristics separately for CKD (n=864) and non-CKD (n=6,717) groups, with corresponding p-values from t-tests and chi-square tests. The revised table is included in the manuscript.

Response: Thank you for noting this inconsistency. We have standardized all AUC values to 3 decimal places (e.g., 0.847) throughout the manuscript, including the main text, Table 2, and Figure 1 legend.

Discussion and Conclusion Comments for the Authors:

Response: Thank you for this note. We have corrected both the erratic quotation mark on Page 7 and the broken reference format on Page 9. The entire Discussion section has been carefully proofread and revised to ensure all citation formatting is accurate and aligned.

Response: We thank the reviewer for the positive feedback on our data justification. We agree that highlighting technical workarounds for future research enhances the manuscript. Accordingly, we have revised the limitations paragraph in the Discussion section to explicitly mention the potential use of oversampling techniques (such as SMOTE) and the adjustment of algorithmic class weights to optimize sensitivity/recall in future iterations.

"... The moderate recall noted in this study indicates the intrinsically uneven character of the dataset, considering the 11.4% prevalence of CKD. To address this in future iterations, technical solutions such as employing sophisticated oversampling methods (e.g., Synthetic Minority Over-sampling Technique, SMOTE) or systematically modifying algorithmic class weights could be investigated to enhance model sensitivity."

Response: We thank the reviewer for this excellent and practical recommendation. We agree that contextualizing the deployment pathway within Indonesia's existing digital health infrastructure significantly strengthens the clinical utility of the study. Accordingly, we have updated the final paragraph of the manuscript to explicitly propose the integration of our CatBoost-based predictive tool into the national BPJS P-Care platform to aid primary care physicians in automated risk stratification.

“This web-based calculator is ideal for Indonesia's primary care digital environment as a preliminary screening tool. The CatBoost algorithm embedded directly into the national BPJS P-Care system would allow general practitioners to perform real-time risk stratification during routine visits and provide timely nephroprotective interventions for high-risk type 2 diabetes patients in the Prolanis program.”

Is the work clearly and accurately presented and does it cite the current literature?

Yes

Is the study design appropriate and is the work technically sound?

Yes

Are sufficient details of methods and analysis provided to allow replication by others?

Yes

If applicable, is the statistical analysis and its interpretation appropriate?

Yes

Are all the source data underlying the results available to ensure full reproducibility?

Yes

Are the conclusions drawn adequately supported by the results?

Partly

Response: Thank you for your acknowledgement and consideration.