ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Risk prediction model for 24-hour mortality in preterm infants using lactate and blood gas analysis: A machine learning approach and retrospective cohort study

[version 1; peer review: 2 not approved]
PUBLISHED 20 Apr 2022
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

This article is included in the Python collection.

Abstract

Background: This study aimed to evaluate the performance of machine learning algorithms using lactate and arterial blood gas parameters to predict the imminent risk of death in extremely low birth weight infants.
Methods: A retrospective cohort study analyzing preterm infants with birth weight less than 1000 grams in a single-center tertiary neonatal intensive care unit in São Paulo, Brazil, between 2012 and 2017 was carried out. We included all infants with at least one arterial blood gas analysis with paired serum lactate. To assess 24-hour mortality risk, we conducted three machine learning algorithms (Logistic Regression, Extreme Gradient Boosting, and AutoML Tables).
Results: We analyzed 1932 blood gas samples with matched lactate measurements. Our study population had a median gestational age of 27.1 (26 – 29.1) weeks and a median birth weight of 746 (600 – 880) grams. The Extreme Gradient Boosting model with lactate achieved the highest area under the receiver operating characteristic (AUROC) of 0.898. Base excess, lactate, and pH were, in order of importance, the most important features associated with 24-hour mortality.
Conclusions: Incorporating lactate and blood gas samples into real-time mortality predictive models may aid to identify those preterm infants with a higher risk of death.

Keywords

preterm, machine learning, artificial intelligence, prediction, mortality

Introduction

Despite the fact that child mortality has decreased significantly in recent decades, neonatal mortality remains the major contributor1 and prematurity is the largest direct cause of death in this population.2 More than half of neonatal deaths occur within the first three days of life.3 In this context, several predictive models use prenatal and immediately post-natal factors to predict death. However, in the neonatal intensive care unit (NICU) the patient’s condition changes over time. Moreover, more than 30% of neonatal deaths occur after three days of life.3 Real-time features, rather than non-modifiable baseline variables, may provide a more customized evaluation of mortality risk.

Predictive models are becoming increasingly popular in clinical research. A number of conditions have been predicted using machine learning techniques, from mortality in intensive care units to morbidities, such as acute kidney injury, septic shock, and heart failure.4 The models that are now available for assessing neonatal mortality risk are mostly focused on NICU admission.59 Unfortunately, few models use objective criteria to evaluate mortality risk in real-time.

To address this knowledge gap, we conducted a study to compare the effectiveness of machine learning algorithms for predicting NICU mortality using objective laboratorial features. The objective of this study was to evaluate serum lactate and blood gas analysis as a predictor of mortality in extremely low birth weight infants using machine learning algorithms.

Methods

Ethics statement

The study protocol was approved by the institutional ethics committee – Comitê de Ética do Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo, approval number: CAAE 15762719.6.0000.0068. The requirement for informed consent was waived by the committee given the retrospective nature of the study.

Study population

We analyzed data of extremely low birth weight infants born at a single-center tertiary neonatal intensive care unit in São Paulo, Brazil, between 2012 and 2017. The data collection and the methods have already been published in our previous study.10 All data were obtained from electronic medical records of each patient using specific keywords related to clinical and laboratorial parameters in December 2019 and extracted to a CSV file.22 All neonates with birth weight lower than 1000 grams born between 2012 and 2017 and who had at least one arterial blood gas analysis with paired serum lactate level during neonatal intensive care stay were included. Neonates with severe malformation, complex congenital heart disease, or transferred to another unit before discharge were excluded. Baseline characteristics included birth gestational age (in weeks), birth weight (grams), CRIB II score (Clinical Risk Index for Babies), small for gestational age (defined as birth weight < p10 Fenton growth scale), female gender, 5-minute APGAR score, vaginal birth, twin birth, antenatal corticoid, endotracheal intubation in the delivery room, epinephrine necessity in the delivery room, chorioamnionitis, and the lowest temperature in the first 12 hours of life. Serum lactate levels are presented in mmoL/L.

Predictive parameters

A total of seven feasible parameters were introduced into the machine learning algorithms. These parameters included blood gas analysis features (pH, pCO2, HCO3, base excess), serum lactate, and general characteristics (days of life and corrected gestational age at the time of laboratory measurement).

Machine learning model development

We compared the performance of three machine learning methods to assess 24-hour mortality risk: Logistic Regression,11 Extreme Gradient Boosting,12 and AutoML Tables.13 Patient data were randomly divided into two subsets: a training subset (80%) for hyperparameter tuning to create a plausible model, and a validation subset (20%) for testing the model’s performance. All data with missing values were excluded from the analysis. To select the optimal model, we performed hyperparameter tuning for the Extreme Gradient Boosting model. Hyperparameters are a set of extra parameters that must be established before the learning process to improve the algorithm performance.

Logistic regression classification is a machine learning technique that predicts outcomes using dependent variables and a logit function. It is frequently used as a baseline comparison for binary classifications because it is not only simple to construct but also capable of high performance.11 Extreme Gradient Boosting is an ensemble machine learning technique of weak prediction models. These weak models are added one at a time and fit to correct the prediction errors made by prior models.14 For Extreme Gradient Boosting the following hyperparameters were used: LEARN_RATE = 0.1, BOOSTER_TYPE = ‘GBTREE’, MAX_TREE_DEPTH = 3, SUBSAMPLE = 0.85, EARLY_STOP = TRUE. AutoML Tables is a Google Cloud Platform feature that automatically starts training for multiple model architectures (Linear, Feedforward deep neural network, Gradient Boosted Decision Tree, AdaNet, Ensembles), and it determines the best model. The primary outcome was death within 24 hours after collecting arterial blood gas and lactate.

Statistical analysis

Continuous variables were tested for normality using Kolmogorov-Smirnov test. To compare demographic characteristics, we used chi-square for categorical variables and Mann-Whitney test for continuous variables. After the optimal hyperparameters were determined for each ML algorithm, we calculated the area under the receiver operating characteristics (AUROC), accuracy, precision, and recall. All analyses were conducted using Python version 3.6.915 and Google Cloud Platform. All patients with missing data were excluded from the study.

Results

We identified a total of 257 neonates who had at least one blood gas analysis and a paired serum lactate during neonatal intensive care unit stay from 2012 to 2017. Eleven patients were excluded because of missing data. Baseline characteristics are presented in Table 1. The median gestational age was 27.1 (26 – 29.1) weeks and the median birth weight was 746 (600 – 880) grams. We found 1932 blood gas samples with corresponding serum lactate levels.

Table 1. Baseline characteristics of the study sample.

VariableTotal (n = 257)Survivors (n = 148)Non-survivors (n = 109)P-value
Gestational age (wk), median (IQR)27.1 (26–29.1)28 (26.5–30.1)26.3 (25–27.4)<0.001
Birth weight (g), median (IQR)746 (600–880)822 (700–940)610 (530–760)<0.001
CRIB II score, median (IQR)12 (10–14)11 (9–12)14 (12–15)<0.001
Small for gestational age, n (%)125 (48.6)73 (49.3)52 (47.7)0.798
Female gender, n (%)123 (47.9)74 (50)49 (45)0.424
5-minute APGAR score, median (IQR)8 (6–9)8 (7–9)8 (5–8)<0.001
Vaginal birth, n (%)41 (16)13 (8.8)28 (25.7)<0.001
Twin birth, n (%)77 (30)33 (22.3)44 (40.4)0.002
Antenatal corticoid, n (%)129 (50.2)79 (53.4)51 (46.8)0.234
Endotracheal intubation in the delivery room, n (%)167 (65)81 (54.7)86 (78.9)<0.001
Epinephrine in the delivery room, n (%)28 (10.9)11 (7.4)17 (15.6)0.038
Chorioamnionitis, n (%)29 (11.3)12 (8.1)17 (15.6)0.061
Lowest temperature in the first 12 hours of life, median (IQR)35 (34.1–35.8)35 (34.5–35.8)34.5 (33.8–35.2)<0.001

All models demonstrated good accuracy (91 – 94%) and AUROC results (0.807 – 0.898) when using blood gas measurements with lactate as features. However, their recall (9-29%) was low. The Extreme Gradient Boosting algorithm obtained the highest AUROC score (0.898), accuracy (94.1%), and precision (87.5%) (Table 2). We then used AutoML Tables to determine the importance of each feature associated with 24-hour mortality, and the top three features were, in order: base excess, lactate, and pH levels (Figure 1).

Table 2. 24-hour mortality prediction performance of machine learning models using blood gas features with lactate.

AUROC, area under the receiver operating characteristic.

AccuracyPrecisionRecallAUROC
Extreme Gradient Boosting0.9410.8750.250.898
AutoML Tables0.9330.8330.290.826
Logistic Regression0.9130.3750.090.807
cd4f9579-d054-4fe7-a5bd-b5213a9582c9_figure1.gif

Figure 1. Feature importance.

BE: Base excess; cGA: Corrected gestational age; BIC: HCO3; DOL: Days of life.

When lactate was removed as a feature in machine learning models, the AUROC score of the Extreme Gradient Boosting model dropped considerably (0.807) (Table 3).

Table 3. 24-hour mortality prediction performance of machine learning models using blood gas features without lactate.

AUROC, area under the receiver operating characteristic.

AccuracyPrecisionRecallAUROC
AutoML Tables0.9381.0000.2940.857
Logistic Regression0.9400.6000.1300.848
Extreme Gradient Boosting0.9420.6660.1730.807

Discussion

Our research found that utilizing blood gas samples and lactate, the Extreme Gradient Boosting algorithm may predict 24-hour mortality in extremely low birth weight infants. Lactate can be used to improve predictive models.

In the NICU, an accurate mortality estimate is a valuable tool that assists healthcare providers. However, most mortality predictive models in preterm infants are limited to NICU admission. For assessing the mortality risk in NICU admission, the Score for Neonatal Acute Physiology Perinatal Extension-II (SNAPPE-II) includes vital signs, laboratory, and baseline characteristics.5 The Clinical Risk for Infants and Babies (CRIB-II) score considers sex, birth weight, gestational age, temperature, and base excess to assess the mortality risk upon NICU admission. TRIPS-II,7 NMR-2000,8 and PISA9 are other recent models that predict mortality after admission. However, dynamic events in the critical care unit may have an impact on the initial risk. Combining baseline characteristics and real-time features provides a more individualized assessment of mortality risk that evolves as the patient’s health changes. To overcome this problem, Jaskari J et al.,16 Lee J et al.,17 and Feng J et al.18 created predictive mortality models based on vital sign data collected during the NICU stay. However, vital signs such as respiratory rate, heart rate, and blood pressure may vary amongst neonates and there is no consensus on what constitutes a normal reference range.19,20 Furthermore, in Lee J et al. study, baseline variables (birth weight and gestational age at birth) were more important than vital sign readings.

Lactate and blood gas values can be acquired quickly and is currently a standard clinical practice in intensive care units. The lack of bias from the examiner which contributes to objectivity is an advantage over clinical examination.21 As a result, our research combines the predictive capability of machine learning models with the objectivity of a blood gas analysis, which is a simple readily available, and widely used test. Our findings imply that machine learning models based on lactate and blood gas indicators appears to be superior to logistic regression classifiers in predicting 24-hour mortality in extremely low birth weight infants. To our knowledge, this is the first study to include real-time objective laboratorial features to predict death in preterm infants.

Our research has some limitations, which should be noted. First, this is a single-center retrospective study, and generalizing our findings is challenging. Second, larger datasets assist machine learning predictive models, therefore a larger study is required. Third, it is worth noting that using blood gas analysis to forecast mortality is a highly unbalanced problem; in other words, there are far more blood gas samples than events (death). In unbalanced data, we can acquire a high accuracy and AUROC score by simply predicting that all observations belong to the majority class. Lastly, it is important to note that our models had very high precision with low recall, and it should not be used for screening purposes.

Conclusions

Incorporating lactate and blood gas measurements into mortality predictive models may improve real-time risk stratification in preterm infants. Traditional logistic classification models appear to be outperformed by more robust machine learning algorithms. Extreme Gradient Boosting models could be used as a support tool for clinical risk stratification of extremely low birth weight infants in the neonatal intensive care unit.

Data availability

Underlying data

As this study involved extracting data from patient records, these records/patient files are considered the raw, source data. The raw data (patient files) are not available to readers and reviewers for data protection.

Harvard Dataverse: Blood Gas – Preterm. https://doi.org/10.7910/DVN/LFRNJE.22

This project contains the following underlying data: Lactate_Mortality – Sheet1.csv (contains 1932 serum lactate and blood gas analysis along with days of life and corrected gestational age at the time of measurement).

Extended data

Harvard Dataverse: Blood Gas – Preterm. https://doi.org/10.7910/DVN/LFRNJE.22

This project contains the following extended data: Data key.docx (contains key to make data more accessible)

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 20 Apr 2022
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Matsushita FY, Krebs VLJ and de Carvalho WB. Risk prediction model for 24-hour mortality in preterm infants using lactate and blood gas analysis: A machine learning approach and retrospective cohort study [version 1; peer review: 2 not approved]. F1000Research 2022, 11:444 (https://doi.org/10.12688/f1000research.110711.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 20 Apr 2022
Views
2
Cite
Reviewer Report 12 Dec 2023
Fu-Sheng Chou, Loma Linda University, CA, USA 
Not Approved
VIEWS 2
In their research, Matsushita et al. created several machine learning models aiming to predict 24-hour mortality in extremely low birth weight infants, utilizing arterial blood gas and lactate data from 2012 to 2017.
While the study addresses a crucial ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Chou FS. Reviewer Report For: Risk prediction model for 24-hour mortality in preterm infants using lactate and blood gas analysis: A machine learning approach and retrospective cohort study [version 1; peer review: 2 not approved]. F1000Research 2022, 11:444 (https://doi.org/10.5256/f1000research.122344.r211184)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
6
Cite
Reviewer Report 29 Nov 2023
Thomas Wood, University of Washington, Seattle, Washington, USA 
Not Approved
VIEWS 6
The authors address an important problem, but I have several concerns regarding how the methods were performed/reported:
  • The idea of “real-time” mortality risk is mentioned throughout the manuscript, but it is not clear if this analysis
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Wood T. Reviewer Report For: Risk prediction model for 24-hour mortality in preterm infants using lactate and blood gas analysis: A machine learning approach and retrospective cohort study [version 1; peer review: 2 not approved]. F1000Research 2022, 11:444 (https://doi.org/10.5256/f1000research.122344.r201605)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 20 Apr 2022
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.